Skip to content

Commit f30c795

Browse files
digantdesaifacebook-github-bot
authored andcommitted
Update readme to clarify quantization bits
Summary: Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: mcr229 Differential Revision: D50292593 fbshipit-source-id: 2426e28607d00e475042bc47cd4a2433407d8935
1 parent 6093168 commit f30c795

File tree

1 file changed

+25
-20
lines changed

1 file changed

+25
-20
lines changed

examples/xnnpack/README.md

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11
# XNNPACK Backend
22

3-
[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized of neural network inference operators for ARM and x86 platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
3+
[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized neural network operators for ARM and x86 CPU platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
44
- [XNNPACK Backend Delegate Overview](https://github.com/pytorch/executorch/blob/main/docs/website/docs/source/native-delegates-executorch-xnnpack-delegate.md)
55
- [XNNPACK Delegate Export Tutorial](https://github.com/pytorch/executorch/blob/main/docs/website/docs/source/tutorial-xnnpack-delegate-lowering.md)
66

77

88
## Directory structure
9+
910
```bash
1011
examples/xnnpack
11-
├── quantization # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACK quantizer
12+
├── quantization # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACKQuantizer
1213
│ └── example.py
13-
├── aot_compiler.py # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK
14-
├── xnn_executor_runner # ExecuTorch runtime with XNNPACK
14+
├── aot_compiler.py # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK delegate
15+
├── xnn_executor_runner # ExecuTorch runtime application for XNNPACK delegate examples
1516
└── README.md # This file
1617
```
1718

18-
## XNNPACK delegation-only
19+
## Delegating a Floating-point Model
1920

2021
The following command will produce a floating-point XNNPACK delegated model `mv2_xnnpack_fp32.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
2122

@@ -30,23 +31,12 @@ Once we have the model binary (pte) file, then let's run it with ExecuTorch runt
3031
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_fp32.pte
3132
```
3233

33-
## XNNPACK quantization + delegation
34-
35-
The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
36-
37-
```bash
38-
python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate
39-
```
40-
41-
Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`.
34+
## Quantization
35+
First, learn more about the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md), if you are not familiar already.
4236

43-
```bash
44-
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
45-
```
46-
47-
## XNNPACK quantization
48-
Learn the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md).
37+
Here we will discuss quantizing a model suitable for XNNPACK delegation using XNNPACKQuantizer.
4938

39+
Though it is typical to run this quantized mode via XNNPACK delegate, we want to highlight that this is just another quantization flavor, and we can run this quantized model without necessarily using XNNPACK delegate, but only using standard quantization operators.
5040

5141
A shared library to register the out variants of the quantized operators (e.g., `quantized_decomposed::add.out`) into EXIR is required. To generate this library, run the following command if using `buck2`:
5242
```bash
@@ -70,3 +60,18 @@ A quantized model can be run via `executor_runner`:
7060
buck2 run examples/portable/executor_runner:executor_runner -- --model_path ./mv2_quantized.pte
7161
```
7262
Please note that running a quantized model will require the presence of various quantized/dequantize operators in the [quantized kernel lib](../../kernels/quantized).
63+
64+
65+
## Delegating a Quantized Model
66+
67+
The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
68+
69+
```bash
70+
python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate
71+
```
72+
73+
Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`.
74+
75+
```bash
76+
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
77+
```

0 commit comments

Comments
 (0)