Skip to content

Quantization doc cleanups #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/compiler-backend-dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ To lower edge ops to backend ops, a pass will perform pattern matching to identi
* `transform()`. An API on ExportProgram that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed.
* [ExecutorchBackendConfig.passes](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to ExecutorchProgram.

Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch


### Pattern Binding Decorator
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ Topics in this section will help you get started with ExecuTorch.
:caption: Quantization
:hidden:

quantization-custom-quantization
quantization-overview

.. toctree::
:glob:
Expand Down
8 changes: 4 additions & 4 deletions docs/source/native-delegates-executorch-xnnpack-delegate.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,10 @@ Here we initialize the `XNNPACKQuantizer` and set the quantization config to be

We can then configure the `XNNPACKQuantizer` as we wish. We set the following configs below as an example:
```python
quantizer.set_global(qconfig_opt) # qconfig_opt is an optional quantization config
.set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a module type
.set_object_type(torch.nn.functional.linear, qconfig_opt) # or torch functional op
.set_module_name("foo.bar", qconfig_opt)
quantizer.set_global(quantization_config)
.set_object_type(torch.nn.Conv2d, quantization_config) # can configure by module type
.set_object_type(torch.nn.functional.linear, quantization_config) # or torch functional op typea
.set_module_name("foo.bar", quantization_config) # or by module fully qualified name
```

### Quantizing your model with the XNNPACKQuantizer
Expand Down
3 changes: 0 additions & 3 deletions docs/source/quantization-custom-quantization.md

This file was deleted.

16 changes: 16 additions & 0 deletions docs/source/quantization-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Quantization Overview
Quantization is a process that reduces the precision of computations and lowers memory footprint in the model. To learn more, please visit the [ExecuTorch concepts page](./concepts.md#quantization). This is particularly useful for edge devices, which typically have limited resources such as processing power, memory, and battery life. By using quantization, we can make our models more efficient and enable them to run effectively on these devices.

In terms of flow, quantization happens early in the ExecuTorch stack:

![ExecuTorch Entry Points](/_static/img/executorch-entry-points.png).

A more detailed workflow can be found in the [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).

Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific ``Quantizer`` class. ``Quantizer`` provides API for modeling users in terms of how they want their model to be quantized and also passes on the user intention to quantization workflow.

Backend developers will need to implement their own ``Quantizer`` to express how different operators or operator patterns are quantized in their backend. This is accomplished via [Annotation API](https://pytorch.org/tutorials/prototype/pt2e_quantizer.html) provided by quantization workflow. Since Quantizer is also user facing, it will expose specific APIs for modeling users to configure how they want the model to be quantized. Each backend should provide their own API documentation for their ``Quantizer``.

Modeling user will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.

For an example quantization flow with ``XNPACKQuantizer``, more docuemntations and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ def f(x, y):
#
# Compared to
# `FX Graph Mode Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__,
# we will need to call two new APIs: ``prepare_pt2e`` and ``compare_pt2e``
# we will need to call two new APIs: ``prepare_pt2e`` and ``convert_pt2e``
# instead of ``prepare_fx`` and ``convert_fx``. It differs in that
# ``prepare_pt2e`` takes a backend-specific ``Quantizer`` as an argument, which
# will annotate the nodes in the graph with information needed to quantize the
Expand All @@ -234,8 +234,10 @@ def f(x, y):

quantizer = XNNPACKQuantizer().set_global(get_symmetric_quantization_config())
prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
# calibrate with a sample dataset
converted_graph = convert_pt2e(prepared_graph)
print("Quantized Graph")
print(converted_graph)

aten_dialect: ExportedProgram = export(converted_graph, example_args)
print("ATen Dialect Graph")
Expand All @@ -244,7 +246,7 @@ def f(x, y):
######################################################################
# More information on how to quantize a model, and how a backend can implement a
# ``Quantizer`` can be found
# `here <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__ .
# `here <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__.

######################################################################
# Lowering to Edge Dialect
Expand Down Expand Up @@ -634,7 +636,7 @@ def forward(self, x):
# ^^^^^^^^^^^^^^^
#
# - `torch.export Documentation <https://pytorch.org/docs/2.1/export.html>`__
# - `Quantization Tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__
# - `Quantization Documentation <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__
# - `IR Spec <../ir-exir.html>`__
# - `Writing Compiler Passes + Partitioner Documentation <../compiler-custom-compiler-passes.html>`__
# - `Backend Delegation Documentation <../compiler-delegate-and-partitioner.html>`__
Expand Down