Quantization doc cleanups (#767)

jerryzh168 · facebook-github-bot · commit c5327f05a13e · 2023-10-11T10:26:25.000-07:00
Summary:

.

Reviewed By: kimishpatel

Differential Revision: D50107717
diff --git a/docs/source/compiler-backend-dialect.md b/docs/source/compiler-backend-dialect.md
@@ -42,7 +42,7 @@ To lower edge ops to backend ops, a pass will perform pattern matching to identi
 * `transform()`. An API on ExportProgram that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed.
 * [ExecutorchBackendConfig.passes](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to ExecutorchProgram.
 
-Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
+Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
 
 
 ### Pattern Binding Decorator
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -135,7 +135,7 @@ Topics in this section will help you get started with ExecuTorch.
    :caption: Quantization
    :hidden:
 
-   quantization-custom-quantization
+   quantization
 
 .. toctree::
    :glob:
diff --git a/docs/source/quantization-custom-quantization.md b/docs/source/quantization-custom-quantization.md
diff --git a/docs/source/quantization.md b/docs/source/quantization.md
@@ -0,0 +1,12 @@
+# Quantization
+Quantization is a process the reduces the precision of computations and lowers memory footprint in the model, more details can be found in [concepts page](./concepts). It's typically required for models running in edge devices due to limited resources in these devices.
+
+In terms of flow, quantization happens early in the executorch stack, see workflow graph in the [executorch tutorial](./tutorials/export-to-executorch-tutorial).
+
+Quantization is specific to backend, and each backend will implement their own ``Quantizer``, built with PyTorch 2 Export Quantization API.
+
+Modeling user will use the ``Quantizer`` for their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.
+
+Backend developers will need to implement their own ``Quantizer`` to express how different operators or operator patterns are quantized in their backend and also provide APIs for modeling users to configure how they want the model to be quantized. Each backend should provide their own API documentation for their ``Quantizer``.
+
+For an example quantization flow with ``XNPACKQuantizer`` and tutorials for how backend developers implement their own ``Quantizer``, please take a look at section ``Performing Quantization`` in [the executorch tutorial](./tutorials/export-to-executorch-tutorial).
diff --git a/docs/source/tutorials_source/export-to-executorch-tutorial.py b/docs/source/tutorials_source/export-to-executorch-tutorial.py
@@ -229,7 +229,7 @@ def f(x, y):
 #
 # Compared to
 # `FX Graph Mode Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__,
-# we will need to call two new APIs: ``prepare_pt2e`` and ``compare_pt2e``
+# we will need to call two new APIs: ``prepare_pt2e`` and ``convert_pt2e``
 # instead of ``prepare_fx`` and ``convert_fx``. It differs in that
 # ``prepare_pt2e`` takes a backend-specific ``Quantizer`` as an argument, which
 # will annotate the nodes in the graph with information needed to quantize the
@@ -250,6 +250,7 @@ def f(x, y):
 prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
 converted_graph = convert_pt2e(prepared_graph)
 print("Quantized Graph")
+print(converted_graph)
 
 aten_dialect: ExportedProgram = export(converted_graph, example_args)
 print("ATen Dialect Graph")
@@ -258,7 +259,7 @@ def f(x, y):
 ######################################################################
 # More information on how to quantize a model, and how a backend can implement a
 # ``Quantizer`` can be found
-# `here <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__ .
+# `here <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__.
 
 ######################################################################
 # Lowering to Edge Dialect
@@ -648,7 +649,7 @@ def forward(self, x):
 # ^^^^^^^^^^^^^^^
 #
 # - `torch.export Documentation <https://pytorch.org/docs/2.1/export.html>`__
-# - `Quantization Tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__
+# - `Quantization Documentation <https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization>`__
 # - `IR Spec <../ir-exir.html>`__
 # - `Writing Compiler Passes + Partitioner Documentation <../compiler-custom-compiler-passes.html>`__
 # - `Backend Delegation Documentation <../compiler-delegate-and-partitioner.html>`__
diff --git a/docs/website/docs/tutorials/quantization_flow.md b/docs/website/docs/tutorials/quantization_flow.md