|
1 | 1 | # Backend and Delegate
|
2 | 2 |
|
| 3 | +Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch |
| 4 | + |
3 | 5 | Backend delegation is an entry point for backends to process and execute PyTorch
|
4 | 6 | programs to leverage performance and efficiency benefits of specialized
|
5 | 7 | backends and hardware, while still providing PyTorch users with an experience
|
@@ -73,205 +75,21 @@ virtual void destroy(__ET_UNUSED DelegateHandle* handle);
|
73 | 75 |
|
74 | 76 | Once the backend is ready, they can then be registered:
|
75 | 77 |
|
76 |
| -To register the backend for AOT lowering, just simply import the backend: |
77 |
| -
|
78 |
| -```python |
79 |
| -from executorch.exir.backend.test.backend_with_compiler_demo import BackendWithCompilerDemo |
80 |
| -``` |
81 |
| - |
82 | 78 | To register the backend for runtime, register via the `register_backend` API:
|
83 | 79 | ```cpp
|
84 | 80 | __ET_NODISCARD Error register_backend(const Backend& backend);
|
85 | 81 | ```
|
86 | 82 |
|
87 |
| -
|
88 |
| -## Frontend interfaces |
89 |
| -
|
90 |
| -There are three flows for delegating a program to a backend: |
91 |
| -
|
92 |
| -1. Lower the whole module to a backend. This is good for testing backends and |
93 |
| - the preprocessing stage. |
94 |
| -1. Lower the whole module to a backend and compose it with another module. This |
95 |
| - is good for reusing lowered modules exported from other flows. |
96 |
| -1. Lower parts of a module according to a partitioner. This is good for |
97 |
| - lowering models that include both lowerable and non-lowerable nodes, and is |
98 |
| - the most streamlined procecss. |
99 |
| -
|
100 |
| -### Flow 1: Lowering the whole module |
101 |
| -
|
102 |
| -This flow starts from a traced graph module with Edge Dialect representation. To |
103 |
| -lower it, we call the following function which returns a `LoweredBackendModule` |
104 |
| -(more documentation on this function can be found in the Python API reference): |
105 |
| -
|
106 |
| -```python |
107 |
| -# defined in backend_api.py |
108 |
| -def to_backend( |
109 |
| - backend_id: str, |
110 |
| - edge_program: ExportedProgram, |
111 |
| - compile_spec: List[CompileSpec], |
112 |
| -) -> LoweredBackendModule: |
113 |
| -``` |
114 |
| - |
115 |
| -Within this function, the backend's `preprocess()` function is called which |
116 |
| -produces a compiled blob which will be emitted to the flatbuffer binary. The |
117 |
| -lowered module can be directly captured, or be put back in a parent module to be |
118 |
| -captured. Eventually the captured module is serialized in the flatbuffer's model |
119 |
| -that can be loaded by the runtime. |
120 |
| - |
121 |
| -The following is an example of this flow: |
122 |
| - |
123 |
| -```python |
124 |
| -from executorch.exir.backend.backend_api import to_backend, MethodCompileSpec |
125 |
| -import executorch.exir as exir |
126 |
| -import torch |
127 |
| - |
128 |
| -# The submodule runs in a specific backend. In this example, `BackendWithCompilerDemo` backend |
129 |
| -class LowerableSubModel(torch.nn.Module): |
130 |
| - def __init__(self): |
131 |
| - super().__init__() |
132 |
| - |
133 |
| - def forward(self, x): |
134 |
| - return torch.sin(x) |
135 |
| - |
136 |
| -# Convert the lowerable module to Edge IR Representation |
137 |
| -to_be_lowered = LowerableSubModel() |
138 |
| -example_input = (torch.ones(1), ) |
139 |
| -to_be_lowered_exir_submodule = exir.capture(to_be_lowered, example_input).to_edge() |
140 |
| - |
141 |
| -# Import the backend implementation |
142 |
| -from executorch.exir.backend.test.backend_with_compiler_demo import ( |
143 |
| - BackendWithCompilerDemo, |
144 |
| -) |
145 |
| -lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule, []) |
146 |
| -``` |
147 |
| - |
148 |
| -We can serialize the program to a flatbuffer format by directly running: |
149 |
| - |
150 |
| -```python |
151 |
| -# Save the flatbuffer to a local file |
152 |
| -save_path = "delegate.pte" |
153 |
| -with open(save_path, "wb") as f: |
154 |
| - f.write(lowered_module.buffer()) |
155 |
| -``` |
156 |
| - |
157 |
| -### Flow 2: Lowering the whole module and composite |
158 |
| - |
159 |
| -Alternatively, after flow 1, we can compose this lowered module with another |
160 |
| -module: |
161 |
| - |
162 |
| -```python |
163 |
| -# This submodule runs in executor runtime |
164 |
| -class NonLowerableSubModel(torch.nn.Module): |
165 |
| - def __init__(self, bias): |
166 |
| - super().__init__() |
167 |
| - self.bias = bias |
168 |
| - |
169 |
| - def forward(self, a, b): |
170 |
| - return torch.add(torch.add(a, b), self.bias) |
171 |
| - |
172 |
| - |
173 |
| -# The composite module, including lower part and non-lowerpart |
174 |
| -class CompositeModel(torch.nn.Module): |
175 |
| - def __init__(self): |
176 |
| - super().__init__() |
177 |
| - self.non_lowerable = NonLowerableSubModel(torch.ones(1) * 0.3) |
178 |
| - self.lowerable = lowered_module |
179 |
| - |
180 |
| - def forward(self, x): |
181 |
| - a = self.lowerable(x) |
182 |
| - b = self.lowerable(a) |
183 |
| - ret = self.non_lowerable(a, b) |
184 |
| - return a, b, ret |
185 |
| - |
186 |
| -composite_model = CompositeModel() |
187 |
| -model_inputs = (torch.ones(1), ) |
188 |
| -exec_prog = exir.capture(composite_model, model_inputs).to_edge().to_executorch() |
189 |
| - |
190 |
| -# Save the flatbuffer to a local file |
191 |
| -save_path = "delegate.pte" |
192 |
| -with open(save_path, "wb") as f: |
193 |
| - f.write(exec_prog.buffer) |
194 |
| -``` |
195 |
| - |
196 |
| -### Flow 3: Partitioning |
197 |
| - |
198 |
| -The third flow also starts from a traced graph module with Edge Dialect |
199 |
| -representation. To lower certain nodes in this graph module, we can use the |
200 |
| -overloaded [`to_backend` |
201 |
| -function](https://github.com/pytorch/executorch/blob/d9eef24bb720804aa7b400b05241487510ae0dc2/exir/backend/backend_api.py#L39). |
202 |
| - |
203 |
| -```python |
204 |
| -def to_backend( |
205 |
| - edge_program: ExportedProgram, |
206 |
| - partitioner: Type[TPartitioner], |
207 |
| -) -> ExportedProgram: |
208 |
| -``` |
209 |
| - |
210 |
| -This function takes in a `Partitioner` which adds a tag to all the nodes that |
211 |
| -are meant to be lowered. It will return a `partition_tags` mapping tags to |
212 |
| -backend names and module compile specs. The tagged nodes will then be |
213 |
| -partitioned and lowered to their mapped backends using Flow 1's process. |
214 |
| -Available helper partitioner are documented |
215 |
| -[here](./compiler-custom-compiler-passes.md). These lowered modules |
216 |
| -will be inserted into the top-level module and serialized. |
217 |
| - |
218 |
| -The following is an example of the flow: |
219 |
| -```python |
220 |
| -from executorch.exir.backend.backend_api import to_backend |
221 |
| -import executorch.exir as exir |
222 |
| -import torch |
223 |
| - |
224 |
| -class Model(torch.nn.Module): |
225 |
| - def __init__(self): |
226 |
| - super().__init__() |
227 |
| - |
228 |
| - def forward(self, x, y): |
229 |
| - x = x + y |
230 |
| - x = x * y |
231 |
| - x = x - y |
232 |
| - x = x / y |
233 |
| - x = x * y |
234 |
| - x = x + y |
235 |
| - return x |
236 |
| - |
237 |
| -model = Model() |
238 |
| -model_inputs = (torch.randn(1, 3), torch.randn(1, 3)) |
239 |
| -gm = exir.capture(model, model_inputs).to_edge() |
240 |
| - |
241 |
| -from executorch.exir.backend.test.op_partitioner_demo import AddMulPartitionerDemo |
242 |
| -exec_prog = to_backend(gm, AddMulPartitionerDemo).to_executorch( |
243 |
| - exir.ExecutorchBackendConfig(passes=SpecPropPass()) |
244 |
| -) |
245 |
| - |
246 |
| -# Save the flatbuffer to a local file |
247 |
| -save_path = "delegate.pte" |
248 |
| -with open(save_path, "wb") as f: |
249 |
| - f.write(exec_prog.buffer) |
250 |
| -``` |
251 |
| - |
252 |
| -## Runtime |
253 |
| - |
254 |
| -The serialized flatbuffer model is loaded by the ExecuTorch runtime. The |
255 |
| -preprocessed blob is directly stored in the flatbuffer, which is loaded into a |
256 |
| -call to the backend's `init()` function during model initialization stage. At |
257 |
| -the model execution stage, the initialized handled can be executed through the |
258 |
| -backend's `execute()` function. |
259 |
| - |
260 |
| -To run the real model with executor: |
261 |
| - |
262 |
| -``` |
263 |
| -> :warning: **pybind is not ready for partner preview**: please use size_test_all_ops or executor_runner cpp binary for now. pybind to run executor will be ready before MVP |
| 83 | +The way to invoke `register_backend` It can either be static registered like |
| 84 | +```cpp |
| 85 | +namespace { |
| 86 | +auto cls = BackendWithCompiler(); |
| 87 | +Backend backend{"BackendWithCompilerDemo", &cls}; |
| 88 | +static auto success_with_compiler = register_backend(backend); |
| 89 | +} // namespace |
264 | 90 | ```
|
265 | 91 |
|
266 | 92 |
|
267 |
| -```python |
268 |
| -# Load the program with executor runtime |
269 |
| -executorch_module = _load_for_executorch_from_buffer(flatbuffer) |
270 |
| -print("model_inputs: ", model_inputs) |
271 |
| -# Execute the program |
272 |
| -model_outputs = executorch_module.forward([*model_inputs]) |
273 |
| -``` |
274 |
| - |
275 | 93 | ## Error Messages
|
276 | 94 |
|
277 | 95 | If there is an error in the backend, for example, if there is any operator that
|
|
0 commit comments