@@ -7,19 +7,20 @@ mechanism for leveraging the XNNPACK library to accelerate operators running on
7
7
CPU.
8
8
9
9
## Layout
10
- - ` runtime/ ` : Runtime logic used at inference. This contains all the cpp files
11
- used to build the runtime graph and execute the XNNPACK model
10
+ - ` cmake/ ` : CMake related files
11
+ - ` operators ` : the directory to store all of op visitors
12
+ - ` node_visitor.py ` : Implementation of serializing each lowerable operator
13
+ node
14
+ - ...
12
15
- ` partition/ ` : Partitioner is used to identify operators in model's graph that
13
16
are suitable for lowering to XNNPACK delegate
14
17
- ` xnnpack_partitioner.py ` : Contains partitioner that tags graph patterns
15
18
for XNNPACK lowering
16
19
- ` configs.py ` : Contains lists of op/modules for XNNPACK lowering
17
20
- ` passes/ ` : Contains passes which are used before preprocessing to prepare the
18
21
graph for XNNPACK lowering
19
- - ` operators ` : the directory to store all of op visitors
20
- - ` node_visitor.py ` : Implementation of serializing each lowerable operator
21
- node
22
- - ...
22
+ - ` runtime/ ` : Runtime logic used at inference. This contains all the cpp files
23
+ used to build the runtime graph and execute the XNNPACK model
23
24
- ` serialization/ ` : Contains files related to serializing the XNNPACK graph
24
25
representation of the PyTorch model
25
26
- ` schema.fbs ` : Flatbuffer schema of serialization format
@@ -28,64 +29,107 @@ CPU.
28
29
- ` xnnpack_graph_serialize ` : Implementation for serializing dataclasses
29
30
from graph schema to flatbuffer
30
31
- ` test/ ` : Tests for XNNPACK Delegate
32
+ - ` third-party/ ` : third-party libraries used by XNNPACK Delegate
31
33
- ` xnnpack_preprocess.py ` : Contains preprocess implementation which is called
32
34
by ` to_backend ` on the graph or subgraph of a model returning a preprocessed
33
35
blob responsible for executing the graph or subgraph at runtime
34
36
37
+ ## End to End Example
38
+
39
+ To further understand the features of the XNNPACK Delegate and how to use it, consider the following end to end example with MobilenetV2.
40
+
41
+ ### Lowering a model to XNNPACK
42
+ ``` python
43
+ import torch
44
+ import torchvision.models as models
45
+
46
+ from torch.export import export, ExportedProgram
47
+ from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
48
+ from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
49
+ from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge
50
+ from executorch.exir.backend.backend_api import to_backend
51
+
52
+
53
+ mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights = MobileNet_V2_Weights.DEFAULT ).eval()
54
+ sample_inputs = (torch.randn(1 , 3 , 224 , 224 ), )
55
+
56
+ exported_program: ExportedProgram = export(mobilenet_v2, sample_inputs)
57
+ edge: EdgeProgramManager = to_edge(exported_program)
58
+
59
+ edge = edge.to_backend(XnnpackPartitioner())
60
+ ```
61
+
62
+ We will go through this example with the [ MobileNetV2] ( https://pytorch.org/hub/pytorch_vision_mobilenet_v2/ ) pretrained model downloaded from the TorchVision library. The flow of lowering a model starts after exporting the model ` to_edge ` . We call the ` to_backend ` api with the ` XnnpackPartitioner ` . The partitioner identifies the subgraphs suitable for XNNPACK backend delegate to consume. Afterwards, the identified subgraphs will be serialized with the XNNPACK Delegate flatbuffer schema and each subgraph will be replaced with a call to the XNNPACK Delegate.
63
+
64
+ ``` python
65
+ >> > print (edge.exported_program().graph_module)
66
+ GraphModule(
67
+ (lowered_module_0): LoweredBackendModule()
68
+ (lowered_module_1): LoweredBackendModule()
69
+ )
70
+
71
+ def forward (self , arg314_1 ):
72
+ lowered_module_0 = self .lowered_module_0
73
+ executorch_call_delegate = torch.ops.higher_order.executorch_call_delegate(lowered_module_0, arg314_1); lowered_module_0 = arg314_1 = None
74
+ getitem = executorch_call_delegate[0 ]; executorch_call_delegate = None
75
+ aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(getitem, [1 , 1280 ]); getitem = None
76
+ aten_clone_default = executorch_exir_dialects_edge__ops_aten_clone_default(aten_view_copy_default); aten_view_copy_default = None
77
+ lowered_module_1 = self .lowered_module_1
78
+ executorch_call_delegate_1 = torch.ops.higher_order.executorch_call_delegate(lowered_module_1, aten_clone_default); lowered_module_1 = aten_clone_default = None
79
+ getitem_1 = executorch_call_delegate_1[0 ]; executorch_call_delegate_1 = None
80
+ return (getitem_1,)
81
+ ```
82
+
83
+ We print the graph after lowering above to show the new nodes that were inserted to call the XNNPACK Delegate. The subgraphs which are being delegated to XNNPACK are the first argument at each call site. It can be observed that the majority of ` convolution-relu-add ` blocks and ` linear ` blocks were able to be delegated to XNNPACK. We can also see the operators which were not able to be lowered to the XNNPACK delegate, such as ` clone ` and ` view_copy ` .
84
+
85
+ ``` python
86
+ exec_prog = edge.to_executorch()
87
+
88
+ with open (" xnnpack_mobilenetv2.pte" , " wb" ) as file :
89
+ exec_prog.write_to_file(file )
90
+ ```
91
+ After lowering to the XNNPACK Program, we can then prepare it for executorch and save the model as a ` .pte ` file. ` .pte ` is a binary format that stores the serialized ExecuTorch graph.
92
+
93
+
94
+ ### Running the XNNPACK Model with CMake
95
+ After exporting the XNNPACK Delegated model, we can now try running it with example inputs using CMake. We can build and use the xnn_executor_runner, which is a sample wrapper for the ExecuTorch Runtime and XNNPACK Backend. We first begin by configuring the CMake build like such:
96
+ ``` bash
97
+ # cd to the root of executorch repo
98
+ cd executorch
99
+
100
+ # Get a clean cmake-out directory
101
+ rm- -rf cmake-out
102
+ mkdir cmake-out
103
+
104
+ # Configure cmake
105
+ cmake \
106
+ -DCMAKE_INSTALL_PREFIX=cmake-out \
107
+ -DCMAKE_BUILD_TYPE=Release \
108
+ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
109
+ -DEXECUTORCH_BUILD_XNNPACK=ON \
110
+ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
111
+ -DEXECUTORCH_ENABLE_LOGGING=1 \
112
+ -DPYTHON_EXECUTABLE=python \
113
+ -Bcmake-out .
114
+ ```
115
+ Then you can build the runtime componenets with
116
+
117
+ ``` bash
118
+ cmake --build cmake-out -j9 --target install --config Release
119
+ ```
120
+
121
+ Now you should be able to find the executable built at ` ./cmake-out/backends/xnnpack/xnn_executor_runner ` you can run the executable with the model you generated as such
122
+ ``` bash
123
+ ./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_fp32.pte
124
+ ```
125
+
35
126
## Help & Improvements
36
127
If you have problems or questions, or have suggestions for ways to make
37
128
implementation and testing better, please reach out to the PyTorch Edge team or
38
129
create an issue on [ github] ( https://www.github.com/pytorch/executorch/issues ) .
39
130
40
- ## Contributing
41
-
42
- Please follow the following steps and guidelines when adding a new operator
43
- implementation to this library. The goals of these guidelines are to
44
- - Make it straightforward to add new XNNPACK operators.
45
- - Ensure that the newly added operators are of high quality, and are easy to
46
- maintain
47
- - Make it easy for users to find available operator implementations, and to
48
- trust in their quality and behavioral stability.
49
-
50
- ### AoT and Serialization Overview
51
- #### Serialization:
52
- XNNPACK delegate uses flatbuffer to serialize its nodes and values. In order to
53
- add
54
- [ preprocessing] ( https://github.com/pytorch/executorch/blob/main/backends/xnnpack/xnnpack_preprocess.py )
55
- support for a new operator, we must add the operator in both the flatbuffer
56
- [ schema] ( https://github.com/pytorch/executorch/blob/main/backends/xnnpack/serialization/schema.fbs ) ,
57
- as well as the mirrored python [ data
58
- class] ( https://github.com/pytorch/executorch/blob/main/backends/xnnpack/serialization/xnnpack_graph_schema.py ) .
59
- These tables are based on the arguments to the XNNPACK Subgraph APIs. These
60
- APIs can be found
61
- [ here] ( https://github.com/google/xnnpack/blob/master/include/xnnpack.h ) . We
62
- essentially serialize all the static arguments we need to call `define_ {new
63
- operator}()`.
64
-
65
- #### AoT Preprocess:
66
- To add logic to preprocess new operators for the XNNPACK Delegate, we can
67
- create new node_visitors that perform the serialization of the new operator. An
68
- example can be found [ here] ( ) . The function of these node_visitors is to
69
- serialize all the data we define to need in the schema above.
70
-
71
- #### AoT Partitioner:
72
- XnnpackPartitioner is used to select the pattern (like the linear module
73
- graph) in a big graph such that the selected nodes will be delegated to
74
- XNNPACK. To support a new op (for example, sigmoid), add the corresponding op
75
- or module to the
76
- [ config.py] ( https://github.com/pytorch/executorch/blob/main/backends/xnnpack/partition/configs.py ) ,
77
- which captures the sigmoid op.
78
-
79
- #### How does it work?
80
- - Tag the nodes: in the XNNPACK partitioner's config, which lists all ops that
81
- are supported by the current XNNPACK backend in executorch. When call
82
- ` XnnpackPartitioner.partition() ` , it will tag all the nodes that matches the
83
- patterns listed in self.pattern
84
- - Lower the nodes; when we call ` to_backend(graph_module, XnnpackPartitioner) ` ,
85
- it will loop through all the tagged nodes, and lower the group with the same
86
- tag.
87
-
88
-
89
- #### Adding Tests for newly minted operators
90
- To test newly added operators, we can add unit tests in:
91
- [ tests] ( https://github.com/pytorch/executorch/tree/main/backends/xnnpack/test )
131
+
132
+ ## See Also
133
+ For more information about the XNNPACK Delegate, please check out the following resources:
134
+ - [ ExecuTorch XNNPACK Delegate] ( https://pytorch.org/executorch/0.2/native-delegates-executorch-xnnpack-delegate.html )
135
+ - [ Building and Running ExecuTorch with XNNPACK Backend] ( https://pytorch.org/executorch/0.2/native-delegates-executorch-xnnpack-delegate.html )
0 commit comments