Skip to content

Commit c013af4

Browse files
jerryzh168facebook-github-bot
authored andcommitted
Create quantization_flow.md using inpage editor
Summary: This diff has been automatically generated by the inpage editor. NOTE: If you want to update this diff, go via the preview link inside the static docs section below. Ensure you are editing the same page that was used to create this diff. Reviewed By: mergennachin Differential Revision: D47457075 fbshipit-source-id: 003ae829afdb198e0312b2fa3350fcb00bf5dbb4
1 parent 6169ba3 commit c013af4

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed

docs/website/quantization_flow.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Quantization Flow in Executorch
2+
3+
High level flow for short term quantization flow in exeuctorch looks like the following: https://docs.google.com/document/d/1UuktDffiMH0rXRuiL0e8bQaS3X0XfkIHEF1VtpFmA5A/edit#heading=h.mywdosyratgh
4+
5+
## 1. Capture the model with `exir.capture`
6+
### Process
7+
The flow uses `PyTorch 2.0 Export Quantization` to quantize the model, that works on a model captured by `exir.capture`. If the model is not traceable, please follow the [User Guide](TBD) to make changes to model, and see [here](https://pytorch.org/docs/main/generated/exportdb/index.html) for supported constructs in `exir.capture`.
8+
9+
```
10+
# program capture
11+
from executorch import exir
12+
from executorch.exir.tracer import ExirDynamoConfig
13+
from executorch.exir import CaptureConfig
14+
dynamo_config = exir.tracer.ExirDynamoConfig(
15+
capture_scalar_outputs=True,
16+
guard_nn_modules=True,
17+
dynamic_shapes=enable_dynamic_shape,
18+
specialize_int=True,
19+
verbose=True,
20+
)
21+
capture_config = exir.CaptureConfig(
22+
pt2_mode=True,
23+
_dynamo_config=dynamo_config,
24+
enable_dynamic_shape=enable_dynamic_shape,
25+
)
26+
exported_program = exir.capture(m, example_inputs, capture_config)
27+
m = exported_program.graph_module
28+
```
29+
### Result
30+
The result in this step will be a capturable model (`fx.GraphModule`)
31+
## 2. Quantization
32+
### Process
33+
Please take a look at the [pytorch 2.0 export post training static quantization tutorial](https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html) to learn about all the steps of quantization. Main APIs that's used to quantize the model would be:
34+
* `prepare_pt2e`: used to insert observers to the model, it takes a backend specific `Quantizer` as argument, which will annotate the nodes with informations needed to quantize the model properly for the backend
35+
* (not an api) calibration: run the model through some sample data
36+
* `convert_pt2e`: convert a observed model to a quantized model, we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
37+
38+
### Result
39+
The result after these steps will be a reference quantized model, with quantize/dequantize operators being further decomposed. Example:
40+
41+
(TODO): update
42+
```
43+
# Reference Quantized Pattern for quantized add
44+
x = torch.ops.quantized_decomposed.dequantize_per_tensor(x, x_scale, x_zero_point, x_qmin, x_qmax, torch.uint8)
45+
y = torch.ops.quantized_decomposed.dequantize_per_tensor(y, y_scale, y_zero_point, y_qmin, y_qmax, torch.uint8)
46+
out = x + y
47+
out = torch.ops.quantized_decomposed.quantize_per_tensor(out, out_scale, out_zero_point, out_qmin, out_qmax, torch.uint8)
48+
```
49+
50+
see https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8 for some operators that has integer operator representations.
51+
52+
## 4. Lowering to Executorch
53+
TODO: link

0 commit comments

Comments
 (0)