Skip to content

Commit 249cbab

Browse files
jerryzh168facebook-github-bot
authored andcommitted
Quantization doc cleanups (#767)
Summary: Pull Request resolved: #767 . Reviewed By: kimishpatel, kirklandsign Differential Revision: D50107717 fbshipit-source-id: 9bb07b55ef03f0e8ecdaeb3c67de19ee02fe6ee4
1 parent fd2b256 commit 249cbab

File tree

7 files changed

+22
-76
lines changed

7 files changed

+22
-76
lines changed
Loading

docs/source/compiler-backend-dialect.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ To lower edge ops to backend ops, a pass will perform pattern matching to identi
4242
* `transform()`. An API on ExportProgram that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed.
4343
* [ExecutorchBackendConfig.passes](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to ExecutorchProgram.
4444

45-
Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
45+
Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
4646

4747

4848
### Pattern Binding Decorator

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ Topics in this section will help you get started with ExecuTorch.
153153
:caption: Quantization
154154
:hidden:
155155

156-
quantization-custom-quantization
156+
quantization
157157

158158
.. toctree::
159159
:glob:

docs/source/quantization-custom-quantization.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/source/quantization.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Quantization
2+
Quantization is a process that reduces the precision of computations and lowers memory footprint in the model. To learn more, please visit the [ExecuTorch concepts page](./concepts.md#quantization). This is particularly useful for edge devices, which typically have limited resources such as processing power, memory, and battery life. By using quantization, we can make our models more efficient and enable them to run effectively on these devices.
3+
4+
In terms of flow, quantization happens early in the ExecuTorch stack:
5+
6+
![ExecuTorch Entry Points](/_static/img/executorch-entry-points.png).
7+
8+
A more detailed workflow can be found in the [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
9+
10+
Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific ``Quantizer`` class. ``Quantizer`` provides API for modeling users in terms of how they want their model to be quantized and also passes on the user intention to quantization workflow.
11+
12+
Backend developers will need to implement their own ``Quantizer`` to express how different operators or operator patterns are quantized in their backend. This is accomplished via [Annotation API](https://pytorch.org/tutorials/prototype/pt2e_quantizer.html) provided by quantization workflow. Since Quantizer is also user facing, it will expose specific APIs for modeling users to configure how they want the model to be quantized. Each backend should provide their own API documentation for their ``Quantizer``.
13+
14+
Modeling user will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.
15+
16+
For an example quantization flow with ``XNPACKQuantizer``, more docuemntations and tutorials, please see [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial.md#performing-quantization).

docs/source/tutorials_source/export-to-executorch-tutorial.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ def f(x, y):
229229
#
230230
# Compared to
231231
# `FX Graph Mode Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__,
232-
# we will need to call two new APIs: ``prepare_pt2e`` and ``compare_pt2e``
232+
# we will need to call two new APIs: ``prepare_pt2e`` and ``convert_pt2e``
233233
# instead of ``prepare_fx`` and ``convert_fx``. It differs in that
234234
# ``prepare_pt2e`` takes a backend-specific ``Quantizer`` as an argument, which
235235
# will annotate the nodes in the graph with information needed to quantize the
@@ -250,6 +250,7 @@ def f(x, y):
250250
prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
251251
converted_graph = convert_pt2e(prepared_graph)
252252
print("Quantized Graph")
253+
print(converted_graph)
253254

254255
aten_dialect: ExportedProgram = export(converted_graph, example_args)
255256
print("ATen Dialect Graph")
@@ -258,7 +259,7 @@ def f(x, y):
258259
######################################################################
259260
# More information on how to quantize a model, and how a backend can implement a
260261
# ``Quantizer`` can be found
261-
# `here <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__ .
262+
# `here <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__.
262263

263264
######################################################################
264265
# Lowering to Edge Dialect
@@ -648,7 +649,7 @@ def forward(self, x):
648649
# ^^^^^^^^^^^^^^^
649650
#
650651
# - `torch.export Documentation <https://pytorch.org/docs/2.1/export.html>`__
651-
# - `Quantization Tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__
652+
# - `Quantization Documentation <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__
652653
# - `IR Spec <../ir-exir.html>`__
653654
# - `Writing Compiler Passes + Partitioner Documentation <../compiler-custom-compiler-passes.html>`__
654655
# - `Backend Delegation Documentation <../compiler-delegate-and-partitioner.html>`__

docs/website/docs/tutorials/quantization_flow.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

0 commit comments

Comments
 (0)