You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: mcr229
Differential Revision: D50292593
fbshipit-source-id: 2426e28607d00e475042bc47cd4a2433407d8935
Copy file name to clipboardExpand all lines: examples/xnnpack/README.md
+25-20Lines changed: 25 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,22 @@
1
1
# XNNPACK Backend
2
2
3
-
[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized of neural network inference operators for ARM and x86 platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
3
+
[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized neural network operators for ARM and x86 CPU platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
├── quantization # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACK quantizer
12
+
├── quantization # Scripts to illustrate PyTorch 2.0 quantization workflow with XNNPACKQuantizer
12
13
│ └── example.py
13
-
├── aot_compiler.py # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK
14
-
├── xnn_executor_runner # ExecuTorch runtime with XNNPACK
14
+
├── aot_compiler.py # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK delegate
15
+
├── xnn_executor_runner # ExecuTorch runtime application for XNNPACK delegate examples
15
16
└── README.md # This file
16
17
```
17
18
18
-
## XNNPACK delegation-only
19
+
## Delegating a Floating-point Model
19
20
20
21
The following command will produce a floating-point XNNPACK delegated model `mv2_xnnpack_fp32.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
21
22
@@ -30,23 +31,12 @@ Once we have the model binary (pte) file, then let's run it with ExecuTorch runt
30
31
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_fp32.pte
31
32
```
32
33
33
-
## XNNPACK quantization + delegation
34
-
35
-
The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`.
34
+
## Quantization
35
+
First, learn more about the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md), if you are not familiar already.
42
36
43
-
```bash
44
-
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
45
-
```
46
-
47
-
## XNNPACK quantization
48
-
Learn the generic PyTorch 2.0 quantization workflow in the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md).
37
+
Here we will discuss quantizing a model suitable for XNNPACK delegation using XNNPACKQuantizer.
49
38
39
+
Though it is typical to run this quantized mode via XNNPACK delegate, we want to highlight that this is just another quantization flavor, and we can run this quantized model without necessarily using XNNPACK delegate, but only using standard quantization operators.
50
40
51
41
A shared library to register the out variants of the quantized operators (e.g., `quantized_decomposed::add.out`) into EXIR is required. To generate this library, run the following command if using `buck2`:
52
42
```bash
@@ -70,3 +60,18 @@ A quantized model can be run via `executor_runner`:
70
60
buck2 run examples/portable/executor_runner:executor_runner -- --model_path ./mv2_quantized.pte
71
61
```
72
62
Please note that running a quantized model will require the presence of various quantized/dequantize operators in the [quantized kernel lib](../../kernels/quantized).
63
+
64
+
65
+
## Delegating a Quantized Model
66
+
67
+
The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
0 commit comments