Skip to content

Commit 8cfaa54

Browse files
mcr229dbort
authored andcommitted
update xnnpack static docs for alpha (pytorch#2755)
Summary: Updating static docs for xnnpack Pull Request resolved: pytorch#2755 Reviewed By: kirklandsign Differential Revision: D55507499 Pulled By: mcr229 fbshipit-source-id: aaada214b113f224da4ef028f3121585de0c17bc
1 parent 798ea54 commit 8cfaa54

File tree

2 files changed

+51
-15
lines changed

2 files changed

+51
-15
lines changed

docs/source/native-delegates-executorch-xnnpack-delegate.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,19 +61,19 @@ The XNNPACK delegate uses flatbuffer for serialization. In order to improve runt
6161
The XNNPACK backend’s runtime interfaces with the ExecuTorch runtime through the custom `init` and `execute` function. Each delegated subgraph is contained in an individually serialized XNNPACK blob. When the model is initialized, ExecuTorch calls `init` on all XNNPACK Blobs to load the subgraph from serialized flatbuffer. After, when the model is executed, each subgraph is executed via the backend through the custom `execute` function. To read more about how delegate runtimes interface with ExecuTorch, refer to this [resource](compiler-delegate-and-partitioner.md).
6262

6363

64-
#### XNNPACK Library
65-
The XNNPACK Library currently used by the delegate is on the following [version](https://github.com/google/XNNPACK/tree/51a987591a6fc9f0fc0707077f53d763ac132cbf). XNNPACK delegate supports CPU's on multiple platforms; more information on the supported hardware architectures can be found on the XNNPACK Library’s [README](https://github.com/google/XNNPACK).
64+
#### **XNNPACK Library**
65+
XNNPACK delegate supports CPU's on multiple platforms; more information on the supported hardware architectures can be found on the XNNPACK Library’s [README](https://github.com/google/XNNPACK).
6666

67-
#### Init
67+
#### **Init**
6868
When calling XNNPACK delegate’s `init`, we deserialize the preprocessed blobs via flatbuffer. We define the nodes (operators) and edges (intermediate tensors) to build the XNNPACK execution graph using the information we serialized ahead-of-time. As we mentioned earlier, the majority of processing has been done ahead-of-time, so that at runtime we can just call the XNNPACK APIs with the serialized arguments in succession. As we define static data into the execution graph, XNNPACK performs weight packing at runtime to prepare static data like weights and biases for efficient execution. After creating the execution graph, we create the runtime object and pass it on to `execute`.
6969

7070
Since weight packing creates an extra copy of the weights inside XNNPACK, We free the original copy of the weights inside the preprocessed XNNPACK Blob, this allows us to remove some of the memory overhead.
7171

7272

73-
#### Execute
73+
#### **Execute**
7474
When executing the XNNPACK subgraphs, we prepare the tensor inputs and outputs and feed them to the XNNPACK runtime graph. After executing the runtime graph, the output pointers are filled with the computed tensors.
7575

76-
#### Profiling
76+
#### **Profiling**
7777
We have enabled basic profiling for XNNPACK delegate that can be enabled with the following compiler flag `-DENABLE_XNNPACK_PROFILING`. After running the model it will produce basic per-op and total timings. We provide an example of the profiling below. The timings listed are the average across runs, and the units are in microseconds.
7878

7979
```

docs/source/tutorial-xnnpack-delegate-lowering.md

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ In this tutorial, you will learn how to export an XNNPACK lowered Model and run
1212
:class-card: card-prerequisites
1313
* [Setting up ExecuTorch](./getting-started-setup.md)
1414
* [Model Lowering Tutorial](./tutorials/export-to-executorch-tutorial)
15-
* [Custom Quantization](./quantization-custom-quantization.md)
1615
* [ExecuTorch XNNPACK Delegate](./native-delegates-executorch-xnnpack-delegate.md)
1716
:::
1817
::::
@@ -23,16 +22,18 @@ In this tutorial, you will learn how to export an XNNPACK lowered Model and run
2322
import torch
2423
import torchvision.models as models
2524

26-
from torch.export import export
25+
from torch.export import export, ExportedProgram
2726
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
2827
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
29-
from executorch.exir import to_edge
28+
from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge
29+
from executorch.exir.backend.backend_api import to_backend
3030

3131

3232
mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
3333
sample_inputs = (torch.randn(1, 3, 224, 224), )
3434

35-
edge = to_edge(export(mobilenet_v2, sample_inputs))
35+
exported_program: ExportedProgram = export(mobilenet_v2, sample_inputs)
36+
edge: EdgeProgramManager = to_edge(exported_program)
3637

3738
edge = edge.to_backend(XnnpackPartitioner())
3839
```
@@ -64,7 +65,7 @@ We print the graph after lowering above to show the new nodes that were inserted
6465
exec_prog = edge.to_executorch()
6566

6667
with open("xnnpack_mobilenetv2.pte", "wb") as file:
67-
file.write(exec_prog.buffer)
68+
exec_prog.write_to_file(file)
6869
```
6970
After lowering to the XNNPACK Program, we can then prepare it for executorch and save the model as a `.pte` file. `.pte` is a binary format that stores the serialized ExecuTorch graph.
7071

@@ -117,14 +118,14 @@ edge = edge.to_backend(XnnpackPartitioner())
117118
exec_prog = edge.to_executorch()
118119

119120
with open("qs8_xnnpack_mobilenetv2.pte", "wb") as file:
120-
file.write(exec_prog.buffer)
121+
exec_prog.write_to_file(file)
121122
```
122123

123124
## Lowering with `aot_compiler.py` script
124125
We have also provided a script to quickly lower and export a few example models. You can run the script to generate lowered fp32 and quantized models. This script is used simply for convenience and performs all the same steps as those listed in the previous two sections.
125126

126127
```
127-
python3 -m examples.xnnpack.aot_compiler --model_name="mv2" --quantize --delegate
128+
python -m examples.xnnpack.aot_compiler --model_name="mv2" --quantize --delegate
128129
```
129130

130131
Note in the example above,
@@ -134,13 +135,48 @@ Note in the example above,
134135

135136
The generated model file will be named `[model_name]_xnnpack_[qs8/fp32].pte` depending on the arguments supplied.
136137

137-
## Running the XNNPACK Model
138-
We will use `buck2` to run the `.pte` file with XNNPACK delegate instructions in it on your host platform. You can follow the instructions here to install [buck2](getting-started-setup.md#building-a-runtime). You can now run it with the prebuilt `xnn_executor_runner` provided in the examples. This will run the model on some sample inputs.
138+
## Running the XNNPACK Model with CMake
139+
After exporting the XNNPACK Delegated model, we can now try running it with example inputs using CMake. We can build and use the xnn_executor_runner, which is a sample wrapper for the ExecuTorch Runtime and XNNPACK Backend. We first begin by configuring the CMake build like such:
140+
```bash
141+
# cd to the root of executorch repo
142+
cd executorch
143+
144+
# Get a clean cmake-out directory
145+
rm- -rf cmake-out
146+
mkdir cmake-out
147+
148+
# Configure cmake
149+
cmake \
150+
-DCMAKE_INSTALL_PREFIX=cmake-out \
151+
-DCMAKE_BUILD_TYPE=Release \
152+
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
153+
-DEXECUTORCH_BUILD_XNNPACK=ON \
154+
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
155+
-DEXECUTORCH_ENABLE_LOGGING=1 \
156+
-DPYTHON_EXECUTABLE=python \
157+
-Bcmake-out .
158+
```
159+
Then you can build the runtime componenets with
160+
161+
```bash
162+
cmake --build cmake-out -j9 --target install --config Release
163+
```
164+
165+
Now you should be able to find the executable built at `./cmake-out/backends/xnnpack/xnn_executor_runner` you can run the executable with the model you generated as such
166+
```bash
167+
./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_fp32.pte
168+
# or to run the quantized variant
169+
./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_q8.pte
170+
```
171+
172+
173+
## Running the XNNPACK Model with Buck
174+
Alternatively, you can use `buck2` to run the `.pte` file with XNNPACK delegate instructions in it on your host platform. You can follow the instructions here to install [buck2](getting-started-setup.md#building-a-runtime). You can now run it with the prebuilt `xnn_executor_runner` provided in the examples. This will run the model on some sample inputs.
139175

140176
```bash
141177
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_fp32.pte
142178
# or to run the quantized variant
143-
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_qs8.pte
179+
buck2 run examples/xnnpack:xnn_executor_runner -- --model_path ./mv2_xnnpack_q8.pte
144180
```
145181

146182
## Building and Linking with the XNNPACK Backend

0 commit comments

Comments
 (0)