Skip to content

Seperate delegate doc to two parts: vendors and users #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 9 additions & 191 deletions docs/source/compiler-delegate-and-partitioner.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Backend and Delegate

Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch

Backend delegation is an entry point for backends to process and execute PyTorch
programs to leverage performance and efficiency benefits of specialized
backends and hardware, while still providing PyTorch users with an experience
Expand Down Expand Up @@ -73,205 +75,21 @@ virtual void destroy(__ET_UNUSED DelegateHandle* handle);

Once the backend is ready, they can then be registered:

To register the backend for AOT lowering, just simply import the backend:

```python
from executorch.exir.backend.test.backend_with_compiler_demo import BackendWithCompilerDemo
```

To register the backend for runtime, register via the `register_backend` API:
```cpp
__ET_NODISCARD Error register_backend(const Backend& backend);
```


## Frontend interfaces

There are three flows for delegating a program to a backend:

1. Lower the whole module to a backend. This is good for testing backends and
the preprocessing stage.
1. Lower the whole module to a backend and compose it with another module. This
is good for reusing lowered modules exported from other flows.
1. Lower parts of a module according to a partitioner. This is good for
lowering models that include both lowerable and non-lowerable nodes, and is
the most streamlined procecss.

### Flow 1: Lowering the whole module

This flow starts from a traced graph module with Edge Dialect representation. To
lower it, we call the following function which returns a `LoweredBackendModule`
(more documentation on this function can be found in the Python API reference):

```python
# defined in backend_api.py
def to_backend(
backend_id: str,
edge_program: ExportedProgram,
compile_spec: List[CompileSpec],
) -> LoweredBackendModule:
```

Within this function, the backend's `preprocess()` function is called which
produces a compiled blob which will be emitted to the flatbuffer binary. The
lowered module can be directly captured, or be put back in a parent module to be
captured. Eventually the captured module is serialized in the flatbuffer's model
that can be loaded by the runtime.

The following is an example of this flow:

```python
from executorch.exir.backend.backend_api import to_backend, MethodCompileSpec
import executorch.exir as exir
import torch

# The submodule runs in a specific backend. In this example, `BackendWithCompilerDemo` backend
class LowerableSubModel(torch.nn.Module):
def __init__(self):
super().__init__()

def forward(self, x):
return torch.sin(x)

# Convert the lowerable module to Edge IR Representation
to_be_lowered = LowerableSubModel()
example_input = (torch.ones(1), )
to_be_lowered_exir_submodule = exir.capture(to_be_lowered, example_input).to_edge()

# Import the backend implementation
from executorch.exir.backend.test.backend_with_compiler_demo import (
BackendWithCompilerDemo,
)
lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule, [])
```

We can serialize the program to a flatbuffer format by directly running:

```python
# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
f.write(lowered_module.buffer())
```

### Flow 2: Lowering the whole module and composite

Alternatively, after flow 1, we can compose this lowered module with another
module:

```python
# This submodule runs in executor runtime
class NonLowerableSubModel(torch.nn.Module):
def __init__(self, bias):
super().__init__()
self.bias = bias

def forward(self, a, b):
return torch.add(torch.add(a, b), self.bias)


# The composite module, including lower part and non-lowerpart
class CompositeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.non_lowerable = NonLowerableSubModel(torch.ones(1) * 0.3)
self.lowerable = lowered_module

def forward(self, x):
a = self.lowerable(x)
b = self.lowerable(a)
ret = self.non_lowerable(a, b)
return a, b, ret

composite_model = CompositeModel()
model_inputs = (torch.ones(1), )
exec_prog = exir.capture(composite_model, model_inputs).to_edge().to_executorch()

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
f.write(exec_prog.buffer)
```

### Flow 3: Partitioning

The third flow also starts from a traced graph module with Edge Dialect
representation. To lower certain nodes in this graph module, we can use the
overloaded [`to_backend`
function](https://github.com/pytorch/executorch/blob/d9eef24bb720804aa7b400b05241487510ae0dc2/exir/backend/backend_api.py#L39).

```python
def to_backend(
edge_program: ExportedProgram,
partitioner: Type[TPartitioner],
) -> ExportedProgram:
```

This function takes in a `Partitioner` which adds a tag to all the nodes that
are meant to be lowered. It will return a `partition_tags` mapping tags to
backend names and module compile specs. The tagged nodes will then be
partitioned and lowered to their mapped backends using Flow 1's process.
Available helper partitioner are documented
[here](./compiler-custom-compiler-passes.md). These lowered modules
will be inserted into the top-level module and serialized.

The following is an example of the flow:
```python
from executorch.exir.backend.backend_api import to_backend
import executorch.exir as exir
import torch

class Model(torch.nn.Module):
def __init__(self):
super().__init__()

def forward(self, x, y):
x = x + y
x = x * y
x = x - y
x = x / y
x = x * y
x = x + y
return x

model = Model()
model_inputs = (torch.randn(1, 3), torch.randn(1, 3))
gm = exir.capture(model, model_inputs).to_edge()

from executorch.exir.backend.test.op_partitioner_demo import AddMulPartitionerDemo
exec_prog = to_backend(gm, AddMulPartitionerDemo).to_executorch(
exir.ExecutorchBackendConfig(passes=SpecPropPass())
)

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
f.write(exec_prog.buffer)
```

## Runtime

The serialized flatbuffer model is loaded by the ExecuTorch runtime. The
preprocessed blob is directly stored in the flatbuffer, which is loaded into a
call to the backend's `init()` function during model initialization stage. At
the model execution stage, the initialized handled can be executed through the
backend's `execute()` function.

To run the real model with executor:

```
> :warning: **pybind is not ready for partner preview**: please use size_test_all_ops or executor_runner cpp binary for now. pybind to run executor will be ready before MVP
The way to invoke `register_backend` It can either be static registered like
```cpp
namespace {
auto cls = BackendWithCompiler();
Backend backend{"BackendWithCompilerDemo", &cls};
static auto success_with_compiler = register_backend(backend);
} // namespace
```


```python
# Load the program with executor runtime
executorch_module = _load_for_executorch_from_buffer(flatbuffer)
print("model_inputs: ", model_inputs)
# Execute the program
model_outputs = executorch_module.forward([*model_inputs])
```

## Error Messages

If there is an error in the backend, for example, if there is any operator that
Expand Down
Loading