pytorch · cccclai · Sep 26, 2023
@@ -1,5 +1,7 @@
 # Backend and Delegate
 
+Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch
+
 Backend delegation is an entry point for backends to process and execute PyTorch
 programs to leverage performance and efficiency benefits of specialized
 backends and hardware, while still providing PyTorch users with an experience
@@ -73,205 +75,21 @@ virtual void destroy(__ET_UNUSED DelegateHandle* handle);
 
 Once the backend is ready, they can then be registered:
 
-To register the backend for AOT lowering, just simply import the backend:
-
-```python
-from executorch.exir.backend.test.backend_with_compiler_demo import BackendWithCompilerDemo
-```
-
 To register the backend for runtime, register via the `register_backend` API:
 ```cpp
 __ET_NODISCARD Error register_backend(const Backend& backend);
 ```
 
-
-## Frontend interfaces
-
-There are three flows for delegating a program to a backend:
-
-1. Lower the whole module to a backend. This is good for testing backends and
-    the preprocessing stage.
-1. Lower the whole module to a backend and compose it with another module. This
-    is good for reusing lowered modules exported from other flows.
-1. Lower parts of a module according to a partitioner. This is good for
-    lowering models that include both lowerable and non-lowerable nodes, and is
-    the most streamlined procecss.
-
-### Flow 1: Lowering the whole module
-
-This flow starts from a traced graph module with Edge Dialect representation. To
-lower it, we call the following function which returns a `LoweredBackendModule`
-(more documentation on this function can be found in the Python API reference):
-
-```python
-# defined in backend_api.py
-def to_backend(
-    backend_id: str,
-    edge_program: ExportedProgram,
-    compile_spec: List[CompileSpec],
-) -> LoweredBackendModule:
-```
-
-Within this function, the backend's `preprocess()` function is called which
-produces a compiled blob which will be emitted to the flatbuffer binary. The
-lowered module can be directly captured, or be put back in a parent module to be
-captured. Eventually the captured module is serialized in the flatbuffer's model
-that can be loaded by the runtime.
-
-The following is an example of this flow:
-
-```python
-from executorch.exir.backend.backend_api import to_backend, MethodCompileSpec
-import executorch.exir as exir
-import torch
-
-# The submodule runs in a specific backend. In this example,  `BackendWithCompilerDemo` backend
-class LowerableSubModel(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-
-    def forward(self, x):
-        return torch.sin(x)
-
-# Convert the lowerable module to Edge IR Representation
-to_be_lowered = LowerableSubModel()
-example_input = (torch.ones(1), )
-to_be_lowered_exir_submodule = exir.capture(to_be_lowered, example_input).to_edge()
-
-# Import the backend implementation
-from executorch.exir.backend.test.backend_with_compiler_demo import (
-    BackendWithCompilerDemo,
-)
-lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule, [])
-```
-
-We can serialize the program to a flatbuffer format by directly running:
-
-```python
-# Save the flatbuffer to a local file
-save_path = "delegate.pte"
-with open(save_path, "wb") as f:
-    f.write(lowered_module.buffer())
-```
-
-### Flow 2: Lowering the whole module and composite
-
-Alternatively, after flow 1, we can compose this lowered module with another
-module:
-
-```python
-# This submodule runs in executor runtime
-class NonLowerableSubModel(torch.nn.Module):
-    def __init__(self, bias):
-        super().__init__()
-        self.bias = bias
-
-    def forward(self, a, b):
-        return torch.add(torch.add(a, b), self.bias)
-
-
-# The composite module, including lower part and non-lowerpart
-class CompositeModel(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.non_lowerable = NonLowerableSubModel(torch.ones(1) * 0.3)
-        self.lowerable = lowered_module
-
-    def forward(self, x):
-        a = self.lowerable(x)
-        b = self.lowerable(a)
-        ret = self.non_lowerable(a, b)
-        return a, b, ret
-
-composite_model = CompositeModel()
-model_inputs = (torch.ones(1), )
-exec_prog = exir.capture(composite_model, model_inputs).to_edge().to_executorch()
-
-# Save the flatbuffer to a local file
-save_path = "delegate.pte"
-with open(save_path, "wb") as f:
-    f.write(exec_prog.buffer)
-```
-
-### Flow 3: Partitioning
-
-The third flow also starts from a traced graph module with Edge Dialect
-representation. To lower certain nodes in this graph module, we can use the
-overloaded [`to_backend`
-function](https://github.com/pytorch/executorch/blob/d9eef24bb720804aa7b400b05241487510ae0dc2/exir/backend/backend_api.py#L39).
-
-```python
-def to_backend(
-    edge_program: ExportedProgram,
-    partitioner: Type[TPartitioner],
-) -> ExportedProgram:
-```
-
-This function takes in a `Partitioner` which adds a tag to all the nodes that
-are meant to be lowered. It will return a `partition_tags` mapping tags to
-backend names and module compile specs. The tagged nodes will then be
-partitioned and lowered to their mapped backends using Flow 1's process.
-Available helper partitioner are documented
-[here](./compiler-custom-compiler-passes.md). These lowered modules
-will be inserted into the top-level module and serialized.
-
-The following is an example of the flow:
-```python
-from executorch.exir.backend.backend_api import to_backend
-import executorch.exir as exir
-import torch
-
-class Model(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-
-    def forward(self, x, y):
-        x = x + y
-        x = x * y
-        x = x - y
-        x = x / y
-        x = x * y
-        x = x + y
-        return x
-
-model = Model()
-model_inputs = (torch.randn(1, 3), torch.randn(1, 3))
-gm = exir.capture(model, model_inputs).to_edge()
-
-from executorch.exir.backend.test.op_partitioner_demo import AddMulPartitionerDemo
-exec_prog = to_backend(gm, AddMulPartitionerDemo).to_executorch(
-    exir.ExecutorchBackendConfig(passes=SpecPropPass())
-)
-
-# Save the flatbuffer to a local file
-save_path = "delegate.pte"
-with open(save_path, "wb") as f:
-    f.write(exec_prog.buffer)
-```
-
-## Runtime
-
-The serialized flatbuffer model is loaded by the ExecuTorch runtime. The
-preprocessed blob is directly stored in the flatbuffer, which is loaded into a
-call to the backend's `init()` function during model initialization stage. At
-the model execution stage, the initialized handled can be executed through the
-backend's `execute()` function.
-
-To run the real model with executor:
-
-```
-> :warning: **pybind is not ready for partner preview**: please use size_test_all_ops or executor_runner cpp binary for now. pybind to run executor will be ready before MVP
+The way to invoke `register_backend` It can either be static registered like
+```cpp
+namespace {
+auto cls = BackendWithCompiler();
+Backend backend{"BackendWithCompilerDemo", &cls};
+static auto success_with_compiler = register_backend(backend);
+} // namespace
 ```
 
 
-```python
-# Load the program with executor runtime
-executorch_module = _load_for_executorch_from_buffer(flatbuffer)
-print("model_inputs: ", model_inputs)
-# Execute the program
-model_outputs = executorch_module.forward([*model_inputs])
-```
-
 ## Error Messages
 
 If there is an error in the backend, for example, if there is any operator that