Skip to content

Commit b66faf2

Browse files
jerryzh168facebook-github-bot
authored andcommitted
Update pytorch 2.0 export quantization doc
Summary: att Reviewed By: digantdesai Differential Revision: D47855806 fbshipit-source-id: aebd755878a394f3dfd81a2105b41b6a55727e41
1 parent 418dbbf commit b66faf2

File tree

8 files changed

+130
-42
lines changed

8 files changed

+130
-42
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Compared to the legacy Lite Interpreter, there are some major benefits:
4242
- [Exporting to Executorch](/docs/website/docs/tutorials/exporting_to_executorch.md)
4343
- [EXIR Spec](/docs/website/docs/ir_spec/00_exir.md)
4444
- [Exporting manual](/docs/website/docs/export/00_export_manual.md)
45+
- [Quantization](/docs/website/docs/tutorials/quantization_flow.md)
4546
- [Delegate to a backend](/docs/website/docs/tutorials/backend_delegate.md)
4647
- [Profiling](/docs/website/docs/tutorials/profiling.md)
4748
- [Executorch Google Colab](https://colab.research.google.com/drive/1m8iU4y7CRVelnnolK3ThS2l2gBo7QnAP#scrollTo=1o2t3LlYJQY5)

docs/website/docs/tutorials/exporting_to_executorch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ At this point, users can choose to run additional passes through the
9595
`exported_program.transform(passes)` function. A tutorial on how to write
9696
transformations can be found [here](./passes.md).
9797

98-
Additionally, users can run quantization at this step. A tutorial for doing so can be found [here](./short_term_quantization_flow.md).
98+
Additionally, users can run quantization at this step. A tutorial for doing so can be found [here](./quantization_flow.md).
9999

100100
### 1.2 Lower to EXIR Edge Dialect
101101

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Quantization Flow in Executorch
2+
3+
## 1. Capture the model with `export.capture_pre_autograd_graph`
4+
### Process
5+
The flow uses `PyTorch 2.0 Export Quantization` to quantize the model, that works on a model captured by `exir.capture`. If the model is not traceable, please see [here](https://pytorch.org/docs/main/generated/exportdb/index.html) for supported constructs in `export.capture_pre_autograd_graph` and how to make the model exportable.
6+
7+
```
8+
# program capture
9+
from torch._export import export
10+
11+
m = export.capture_pre_autograd_graph(m, copy.deepcopy(example_inputs))
12+
```
13+
### Result
14+
The result in this step will be a `fx.GraphModule`
15+
16+
## 2. Quantization
17+
### Process
18+
Note: Before quantizing models, each backend need to implement their own `Quantizer` by following [this tutorial](https://pytorch.org/tutorials/prototype/pt2e_quantizer.html).
19+
20+
Please take a look at the [pytorch 2.0 export post training static quantization tutorial](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html) to learn about all the steps of quantization. Main APIs that's used to quantize the model would be:
21+
* `prepare_pt2e`: used to insert observers to the model, it takes a backend specific `Quantizer` as argument, which will annotate the nodes with informations needed to quantize the model properly for the backend
22+
* (not an api) calibration: run the model through some sample data
23+
* `convert_pt2e`: convert a observed model to a quantized model, we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
24+
25+
### Result
26+
The result after these steps will be a reference quantized model, with quantize/dequantize operators being further decomposed. Example:
27+
28+
```
29+
# Reference Quantized Pattern for quantized linear
30+
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_int32, bias_scale, bias_zero_point, output_scale, output_zero_point):
31+
x_int16 = x_int8.to(torch.int16)
32+
weight_int16 = weight_int8.to(torch.int16)
33+
acc_int32 = torch.ops.out_dtype(torch.mm, torch.int32, (x_int16 - x_zero_point), (weight_int16 - weight_zero_point))
34+
acc_rescaled_int32 = torch.ops.out_dtype(torch.ops.aten.mul.Scalar, torch.int32, acc_int32, x_scale * weight_scale / output_scale)
35+
bias_int32 = torch.ops.out_dtype(torch.ops.aten.mul.Scalar, bias_int32 - bias_zero_point, bias_scale / output_scale))
36+
out_int8 = torch.ops.aten.clamp(acc_rescaled_int32 + bias_int32 + output_zero_point, qmin, qmax).to(torch.int8)
37+
return out_int8
38+
```
39+
40+
See [here](https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8) for some operators that has integer operator representations.
41+
42+
## 4. Lowering to Executorch
43+
You can lower the quantized model to executorch by following [this tutorial](https://github.com/pytorch/executorch/blob/main/docs/website/docs/tutorials/exporting_to_executorch.md#12-lower-to-exir-edge-dialect).

docs/website/docs/tutorials/short_term_quantization_flow.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# Short Term Quantization Flow in Executorch
1+
# [Deprecated, Please Don't Use] Short Term Quantization Flow in Executorch
2+
3+
Note: this is deprecated, pelase use [this](./quantization_flow.md) instead.
24

35
High level flow for short term quantization flow in exeuctorch looks like the following: https://fburl.com/8pspa022
46

docs/website/quantization_flow.md

Lines changed: 0 additions & 40 deletions
This file was deleted.

examples/export/TARGETS

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,15 @@ python_library(
99
"//executorch/exir:lib",
1010
],
1111
)
12+
13+
python_library(
14+
name = "export_example",
15+
srcs = [
16+
"export_example.py",
17+
],
18+
deps = [
19+
":utils",
20+
"//executorch/examples/models:models",
21+
"//executorch/exir:lib",
22+
],
23+
)

examples/quantization/TARGETS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
load("@fbcode_macros//build_defs:python_binary.bzl", "python_binary")
2+
3+
python_binary(
4+
name = "example",
5+
main_src = "example.py",
6+
deps = [
7+
"//caffe2:torch",
8+
"//executorch/examples/models:models",
9+
],
10+
)

examples/quantization/example.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
import argparse
8+
import copy
9+
10+
import torch._export as export
11+
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
12+
from torch.ao.quantization.quantizer import XNNPACKQuantizer
13+
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
14+
get_symmetric_quantization_config,
15+
)
16+
17+
# TODO: maybe move this to examples/export/utils.py?
18+
# from ..export.export_example import export_to_ff
19+
20+
from ..models import MODEL_NAME_TO_MODEL
21+
22+
23+
def quantize(model_name, model, example_inputs):
24+
m = model.eval()
25+
m = export.capture_pre_autograd_graph(m, copy.deepcopy(example_inputs))
26+
print("original model:", m)
27+
quantizer = XNNPACKQuantizer()
28+
# if we set is_per_channel to True, we also need to add out_variant of quantize_per_channel/dequantize_per_channel
29+
operator_config = get_symmetric_quantization_config(is_per_channel=False)
30+
quantizer.set_global(operator_config)
31+
m = prepare_pt2e(m, quantizer)
32+
# calibration
33+
m(*example_inputs)
34+
m = convert_pt2e(m)
35+
print("quantized model:", m)
36+
# make sure we can export to flat buffer
37+
# Note: this is not working yet due to missing out variant ops for quantize_per_tensor/dequantize_per_tensor ops
38+
# aten = export_to_ff(model_name, m, copy.deepcopy(example_inputs))
39+
40+
41+
if __name__ == "__main__":
42+
parser = argparse.ArgumentParser()
43+
parser.add_argument(
44+
"-m",
45+
"--model_name",
46+
required=True,
47+
help=f"Provide model name. Valid ones: {list(MODEL_NAME_TO_MODEL.keys())}",
48+
)
49+
50+
args = parser.parse_args()
51+
52+
if args.model_name not in MODEL_NAME_TO_MODEL:
53+
raise RuntimeError(
54+
f"Model {args.model_name} is not a valid name. "
55+
f"Available models are {list(MODEL_NAME_TO_MODEL.keys())}."
56+
)
57+
58+
model, example_inputs = MODEL_NAME_TO_MODEL[args.model_name]()
59+
60+
quantize(args.model_name, model, example_inputs)

0 commit comments

Comments
 (0)