Skip to content

Commit c5327f0

Browse files
jerryzh168facebook-github-bot
authored andcommitted
Quantization doc cleanups (#767)
Summary: . Reviewed By: kimishpatel Differential Revision: D50107717
1 parent 61a7967 commit c5327f0

File tree

6 files changed

+18
-76
lines changed

6 files changed

+18
-76
lines changed

docs/source/compiler-backend-dialect.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ To lower edge ops to backend ops, a pass will perform pattern matching to identi
4242
* `transform()`. An API on ExportProgram that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed.
4343
* [ExecutorchBackendConfig.passes](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to ExecutorchProgram.
4444

45-
Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
45+
Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
4646

4747

4848
### Pattern Binding Decorator

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Topics in this section will help you get started with ExecuTorch.
135135
:caption: Quantization
136136
:hidden:
137137

138-
quantization-custom-quantization
138+
quantization
139139

140140
.. toctree::
141141
:glob:

docs/source/quantization-custom-quantization.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/source/quantization.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Quantization
2+
Quantization is a process the reduces the precision of computations and lowers memory footprint in the model, more details can be found in [concepts page](./concepts). It's typically required for models running in edge devices due to limited resources in these devices.
3+
4+
In terms of flow, quantization happens early in the executorch stack, see workflow graph in the [executorch tutorial](./tutorials/export-to-executorch-tutorial).
5+
6+
Quantization is specific to backend, and each backend will implement their own ``Quantizer``, built with PyTorch 2 Export Quantization API.
7+
8+
Modeling user will use the ``Quantizer`` for their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.
9+
10+
Backend developers will need to implement their own ``Quantizer`` to express how different operators or operator patterns are quantized in their backend and also provide APIs for modeling users to configure how they want the model to be quantized. Each backend should provide their own API documentation for their ``Quantizer``.
11+
12+
For an example quantization flow with ``XNPACKQuantizer`` and tutorials for how backend developers implement their own ``Quantizer``, please take a look at section ``Performing Quantization`` in [the executorch tutorial](./tutorials/export-to-executorch-tutorial).

docs/source/tutorials_source/export-to-executorch-tutorial.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ def f(x, y):
229229
#
230230
# Compared to
231231
# `FX Graph Mode Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__,
232-
# we will need to call two new APIs: ``prepare_pt2e`` and ``compare_pt2e``
232+
# we will need to call two new APIs: ``prepare_pt2e`` and ``convert_pt2e``
233233
# instead of ``prepare_fx`` and ``convert_fx``. It differs in that
234234
# ``prepare_pt2e`` takes a backend-specific ``Quantizer`` as an argument, which
235235
# will annotate the nodes in the graph with information needed to quantize the
@@ -250,6 +250,7 @@ def f(x, y):
250250
prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
251251
converted_graph = convert_pt2e(prepared_graph)
252252
print("Quantized Graph")
253+
print(converted_graph)
253254

254255
aten_dialect: ExportedProgram = export(converted_graph, example_args)
255256
print("ATen Dialect Graph")
@@ -258,7 +259,7 @@ def f(x, y):
258259
######################################################################
259260
# More information on how to quantize a model, and how a backend can implement a
260261
# ``Quantizer`` can be found
261-
# `here <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__ .
262+
# `here <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__.
262263

263264
######################################################################
264265
# Lowering to Edge Dialect
@@ -648,7 +649,7 @@ def forward(self, x):
648649
# ^^^^^^^^^^^^^^^
649650
#
650651
# - `torch.export Documentation <https://pytorch.org/docs/2.1/export.html>`__
651-
# - `Quantization Tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__
652+
# - `Quantization Documentation <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__
652653
# - `IR Spec <../ir-exir.html>`__
653654
# - `Writing Compiler Passes + Partitioner Documentation <../compiler-custom-compiler-passes.html>`__
654655
# - `Backend Delegation Documentation <../compiler-delegate-and-partitioner.html>`__

docs/website/docs/tutorials/quantization_flow.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

0 commit comments

Comments
 (0)