Update readme to clarify quantization bits

digantdesai · facebook-github-bot · commit f30c795323ff · 2023-10-13T18:51:44.000-07:00
Summary: Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: mcr229 Differential Revision: D50292593 fbshipit-source-id: 2426e28607d00e475042bc47cd4a2433407d8935
diff --git a/examples/xnnpack/README.md b/examples/xnnpack/README.md
@@ -1,21 +1,22 @@
 # XNNPACK Backend
 
-[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized of neural network inference operators for ARM and x86 platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
+[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized neural network operators for ARM and x86 CPU platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
 - [XNNPACK Backend Delegate Overview](https://github.com/pytorch/executorch/blob/main/docs/website/docs/source/native-delegates-executorch-xnnpack-delegate.md)
 - [XNNPACK Delegate Export Tutorial](https://github.com/pytorch/executorch/blob/main/docs/website/docs/source/tutorial-xnnpack-delegate-lowering.md)
 
 
 ## Directory structure
+
 ```bash
 examples/xnnpack
-├── quantization                      # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACK quantizer
+├── quantization                      # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACKQuantizer
 │   └── example.py
-├── aot_compiler.py                   # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK
-├── xnn_executor_runner               # ExecuTorch runtime with XNNPACK
+├── aot_compiler.py                   # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK delegate
+├── xnn_executor_runner               # ExecuTorch runtime application for XNNPACK delegate examples
 └── README.md                         # This file
 ```
 
-## XNNPACK delegation-only
+## Delegating a Floating-point Model
 
 The following command will produce a floating-point XNNPACK delegated model `mv2_xnnpack_fp32.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
 
@@ -30,23 +31,12 @@ Once we have the model binary (pte) file, then let's run it with ExecuTorch runt
 buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_fp32.pte
 ```
 
-## XNNPACK quantization + delegation
-
-The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
-
-```bash
-python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate
-```
-
-Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`.
+## Quantization
+First, learn more about the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md), if you are not familiar already.
 
-```bash
-buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
-```
-
-## XNNPACK quantization
-Learn the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md).
+Here we will discuss quantizing a model suitable for XNNPACK delegation using XNNPACKQuantizer.
 
+Though it is typical to run this quantized mode via XNNPACK delegate, we want to highlight that this is just another quantization flavor, and we can run this quantized model without necessarily using XNNPACK delegate, but only using standard quantization operators.
 
 A shared library to register the out variants of the quantized operators (e.g., `quantized_decomposed::add.out`) into EXIR is required. To generate this library, run the following command if using `buck2`:
 ```bash
@@ -70,3 +60,18 @@ A quantized model can be run via `executor_runner`:
 buck2 run examples/portable/executor_runner:executor_runner -- --model_path ./mv2_quantized.pte
 ```
 Please note that running a quantized model will require the presence of various quantized/dequantize operators in the [quantized kernel lib](../../kernels/quantized).
+
+
+## Delegating a Quantized Model
+
+The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
+
+```bash
+python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate
+```
+
+Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`.
+
+```bash
+buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
+```