|
| 1 | +# Backend Dialect |
| 2 | + |
| 3 | + |
| 4 | +## Properties |
| 5 | +Backend dialect is the name we gave to the `ExportedProgram` in Edge dialect, after optional **target specific** passes. The difference between backend dialect and edge dialect is that backend dialect is target-aware and may contain operators or submodules that are only meaningful to the target backend. Backend specific operators are new components we may see in a backend dialect, comparing with Edge dialect. They are a set of operators for the target backend. |
| 6 | + |
| 7 | +Another property to notice is that the memory formats of the tensor can be any format (this is subject to change in the near future when we introduce dim order to backend dialect). |
| 8 | + |
| 9 | + |
| 10 | +## Intent |
| 11 | + |
| 12 | +This dialect allows introduction of operators that do not conform to the schema defined in the canonical ATen operator set, and are not showing up in any of the dialects above (ATen dialect and edge dialect). Consider to use backend operators if your use case satisfies one or more of the following criteria: |
| 13 | + |
| 14 | +1. Your backend provides a library that optimizes a certain operator that is equivalent to a subgraph. E.g., linear_relu (equivalent to linear + relu) that can be executed faster on a certain backend. |
| 15 | +2. There's a need to retrace the graph module after it is already lowered to a backend. When we retrace, backend operators can transform back to the original subgraph (in ATen dialect) where normal custom op doesn't take care of that. |
| 16 | +3. Your backend specific operator doesn't have a generic CPU kernel but only a kernel for a certain backend. Using backend operator can workaround this issue by using the original subgraph as default kernel and keep the graph module runnable. |
| 17 | + |
| 18 | + |
| 19 | +## How to use |
| 20 | + |
| 21 | +To lower edge ops to backend ops, a pass will perform pattern matching to identify the edge ops of interest in the graph, and then replace them with equivalent backend operators. There are two APIs to register such passes: |
| 22 | + |
| 23 | +* `transform()`. An API on `ExportProgram` that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed. |
| 24 | +* [`ExecutorchBackendConfig.passes`](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to `ExecutorchProgram`. |
| 25 | + |
| 26 | +Example: one of such passes is `QuantFusion`. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuse this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace sym_size operators to the ones that are understood by ExecuTorch. |
| 27 | + |
| 28 | +## API |
| 29 | + |
| 30 | +We provide a decorator `bind_pattern_to_op` to help users to easily register their backend operators into EXIR. This decorator takes: |
| 31 | +* a `torch.Library` object, it indicates which library or namespace this backend operator belongs to. |
| 32 | +* a name or schema. If we already defined the schema of the backend operator in the `torch.Library` object, only a name is needed. Otherwise we can register the schema if a schema string is being passed in. |
| 33 | + |
| 34 | +This decorator should be added to the pattern we are trying to match (and then lower to this backend op) on edge dialect. This way we are registering this pattern as a `CompositeImplicitAutograd` kernel for this backend operator. |
| 35 | + |
| 36 | +Then the operator can be accessed/used from the passes. The `CompositeImplicitAutograd` kernel makes sure: |
| 37 | +1. No need for the user to write a (CPU) runnable kernel |
| 38 | +2. Ensures the retracability of `ExportProgram`. Once retraced, the backend operator will be decomposed into the ATen ops used in the pattern. |
| 39 | + |
| 40 | +## Op Set |
| 41 | +Unlike edge dialect where we have a well defined op set, for backend dialect, since it is target-aware we will be allowing user to use our API to register target-aware ops and they will be grouped by namespaces. Here are some examples: `executorch_prims` are ops that are used by ExecuTorch runtime to perform operation on `SymInt`s. `quantized_decomposed` are ops that fuses edge operators for quantization purpose and are meaningful to targets that support quantization. |
| 42 | + |
| 43 | +* `executorch_prims::add.int(SymInt a, SymInt b) -> SymInt` |
| 44 | + * pattern: builtin.add |
| 45 | + * backend: executor |
| 46 | +* `executorch_prims::mul.int(SymInt a, SymInt b) -> SymInt` |
| 47 | + * pattern: builtin.mul |
| 48 | + * backend: executor |
| 49 | +* `executorch_prims::sub.int(SymInt a, SymInt b) -> SymInt` |
| 50 | + * pattern: builtin.sub |
| 51 | + * backend: executor |
| 52 | +* `executorch_prims::floordiv.int(SymInt a, SymInt b) -> SymInt` |
| 53 | + * pattern: builtin.floordiv |
| 54 | + * backend: executor |
| 55 | +* `executorch_prims::gt.int(SymInt a, SymInt b) -> bool` |
| 56 | + * pattern: builtin.gt |
| 57 | + * backend: executor |
| 58 | +* `executorch_prims::lt.int(SymInt a, SymInt b) -> bool` |
| 59 | + * pattern: builtin.lt |
| 60 | + * backend: executor |
| 61 | +* `executorch_prims::ge.int(SymInt a, SymInt b) -> bool` |
| 62 | + * pattern: builtin.ge |
| 63 | + * backend: executor |
| 64 | +* `executorch_prims::le.int(SymInt a, SymInt b) -> bool` |
| 65 | + * pattern: builtin.le |
| 66 | + * backend: executor |
| 67 | +* `executorch_prims::eq.int(SymInt a, SymInt b) -> bool` |
| 68 | + * pattern: builtin.eq |
| 69 | + * backend: executor |
| 70 | +* `quantized_decomposed::embedding_byte(Tensor weight, Tensor weight_scales, Tensor weight_zero_points, int weight_quant_min, int weight_quant_max, Tensor indices) -> Tensor` |
| 71 | + * pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py) |
| 72 | + * backend: quantization |
| 73 | +* `quantized_decomposed::add(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc` |
| 74 | + * pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py) |
| 75 | + * backend: quantization |
| 76 | +* `quantized_decomposed::add.scalar(Tensor qa, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, ScalarType a_dtype, Scalar b, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max, ScalarType out_dtype) -> Tensor` |
| 77 | + * pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py) |
| 78 | + * backend: quantization |
| 79 | +* `quantized_decomposed::add_relu(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc` |
| 80 | + * pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py) |
| 81 | + * backend: quantization |
0 commit comments