Skip to content

Commit 01930ac

Browse files
angelayifacebook-github-bot
authored andcommitted
EXIR spec (#482)
Summary: Pull Request resolved: #482 Export IR Spec here: D49829187 Reviewed By: mergennachin Differential Revision: D49602668 fbshipit-source-id: 658108e49a089adcd82ff018a7510aa6c5264296
1 parent a5b6f5e commit 01930ac

File tree

1 file changed

+291
-2
lines changed

1 file changed

+291
-2
lines changed

docs/source/ir-exir.md

Lines changed: 291 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,292 @@
1-
# EXIR
1+
# Export IR Specification
22

3-
Title TBA
3+
Export IR is an intermediate representation (IR) for the result of
4+
`torch.export`. To read more on the details of Export IR, please read this
5+
[document](https://pytorch.org/docs/main/export.ir_spec.html).
6+
7+
The Exported IR is a specification that consists of the following parts:
8+
9+
1. A definition of computation graph model.
10+
2. Set of operators allowed in the graph.
11+
12+
A **dialect** is an Exported IR graph composed with the operations defined
13+
below, but with additional properties (such as restrictions on operator set or
14+
metadata) that are meant for a specific purpose.
15+
16+
The EXIR dialects that currently exist are:
17+
18+
* [ATen Dialect](./ir-exir-aten-dialect.md)
19+
* [Edge Dialect](./ir-exir-edge-dialect.md)
20+
* [Backend Dialect](./ir-exir-backend-dialect.md)
21+
22+
These dialects represent stages that a captured program goes through from
23+
program capture to conversion into an executable format. For example, the
24+
Executorch compilation process starts from a Python program capture into ATen
25+
Dialect, then ATen Dialect is converted to Edge Dialect, Edge to Backend, and
26+
finally to a binary format for execution.
27+
28+
## ATen Dialect
29+
30+
ATen dialect will be used as the entry point of the ExecuTorch compilation
31+
pipeline, it is the first time an eager mode Pytorch program becomes an Exported
32+
IR graph. At this stage, functionalization is performed, removing any tensor
33+
aliases and mutations, and allowing for more flexible graph transformations to
34+
be made. Additionally, all tensors are converted to continuous format.
35+
36+
The goal of this dialect is to capture users' programs as faithfully as possible
37+
(while remaining valid Exported IR). Registered custom operators that user has called
38+
in eager mode will preserve as-is in ATen dialect. However, we should refrain
39+
from adding custom ops in the graph via passes.
40+
41+
For now, the function of ATen dialect is to further lower to Edge dialect.
42+
However, in the future we can see this one as the common integration point for
43+
other export use cases.
44+
45+
### ATen Dialect Properties
46+
47+
An ATen dialect graph is a valid Export IR graph with the following additional
48+
properties:
49+
50+
1. All operators in `call_function` nodes are either ATen operators (in the
51+
`torch.ops.aten` namespace, higher order operators (like control flow
52+
operators), or a registered custom operator. A registered custom operator is
53+
an operator registered into the current Pytorch eager mode runtime, usually
54+
with `TORCH_LIBRARY` call (implies schema). Details for how to register a
55+
custom operator can be found
56+
[here](https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit#heading=h.3rgxk3v387wl).
57+
2. Every operator must also have a meta kernel. A meta kernel is a
58+
function that, given the shapes of the input tensors, can return the shape of
59+
output tensor. Details on how to write a meta kernel can be found
60+
[here](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.64r4npvq0w0).
61+
3. Input value type must be “Pytree-able”. As a consequence, the output
62+
types are also Pytree-able because all operators output are pytree-able.
63+
4. Ops of ATen dialect can choose to work Dynamic dtypes, implicit type
64+
promotions and implicit broadcasting of tensors.
65+
5. All tensors memory formats are in `torch.contiguous_format`.
66+
67+
### ATen Operator Definition
68+
69+
The operator set definition can be found [here](./ir-ops-set-definition.md).
70+
71+
## Edge Dialect
72+
73+
This dialect is meant to introduce specializations that are useful for Edge
74+
devices but not necessarily for general (server) export. However, we still
75+
withhold specializing further to each different hardware. In other words, we
76+
don’t want to introduce any new hardware dependent concepts or data; besides
77+
those already present in users’ original python program.
78+
79+
### Edge Dialect Properties
80+
81+
An Edge dialect graph is a valid Export IR graph with the following additional
82+
properties:
83+
84+
1. All operators in OpCall nodes are either from a predefined operator set,
85+
called **“Edge Operators”**, or a registered custom operator. An Edge operator is a
86+
ATen operator with dtype specialization. This allows users to register
87+
kernels that only work for certain dtypes to reduce binary size.
88+
2. Input and output of the graph, and as well as to every node, cannot be Scalar. I.e.
89+
All scalar types (such as float, int) are converted to Tensor.
90+
91+
### Using the Edge Dialect
92+
93+
The Edge dialect is represented with `exir.EdgeProgramManager` Python class in
94+
memory. This contains one or multiple `torch.export.ExportedProgram`s which
95+
contain the graph representation of a method.
96+
97+
```python
98+
import torch
99+
from executorch import exir
100+
101+
class MyModule(torch.nn.Module):
102+
...
103+
104+
a = MyModule()
105+
tracing_inputs = (torch.rand(2, 2),)
106+
aten_dialect_program = torch.export.export(a, tracing_inputs)
107+
edge_dialect_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect)
108+
print(edge_dialect_program.exported_program)
109+
```
110+
111+
At this point, user defined graph transformation can be run through
112+
`edge_dialect_program.transform(pass)`. Order matters. Note: If the custom pass
113+
is touching `node.target`, be aware that all of the `node.target` at this stage
114+
are "Edge ops" (more details below) and not torch ops like in the ATen dialect.
115+
A tutorial on pass writing can be found
116+
[here](./compiler-custom-compiler-passes.md). After all these passes are
117+
executed, `to_edge()` will make sure the graph is still valid.
118+
119+
### Edge Operators
120+
121+
As mentioned before, an edge operator is an ATen core operator with type
122+
specialization. This means an instance of the edge operator contains a set of
123+
dtype constraints, that describe all the tensor dtypes supported by both the
124+
ExecuTorch runtime and their ATen kernels. These dtype constraints are expressed
125+
in a DSL defined in
126+
[edge.yaml](https://github.com/pytorch/executorch/blob/main/exir/dialects/edge/edge.yaml).
127+
Here's an example of the dtype constraints:
128+
129+
```
130+
- func: sigmoid
131+
namespace: edge
132+
inherits: aten::sigmoid
133+
type_alias:
134+
T0: [Bool, Byte, Char, Int, Long, Short]
135+
T1: [Double, Float]
136+
T2: [Float]
137+
type_constraint:
138+
- self: T0
139+
__ret_0: T2
140+
- self: T1
141+
__ret_0: T1
142+
```
143+
This is saying if `self` tensor is one of the type `Bool, Byte, Char, Int, Long, Short`, then the return tensor would be `Float`. If `self` is one of `Double, Float`, the return tensor will be the same dtype.
144+
145+
After these dtype constraints are collected and documented in edge.yaml, EXIR
146+
consumes the file, and loads the constraints into EXIR Edge operators. This
147+
makes it convenient for developers to learn the supported dtypes of any argument
148+
in the Edge op schema. For example we can do:
149+
150+
151+
```python
152+
from executorch.exir.dialects._ops import ops as exir_ops # import dialects ops
153+
sigmoid = exir_ops.edge.aten.sigmoid.default
154+
print(sigmoid._schema)
155+
# aten::sigmoid(Tensor self) -> Tensor
156+
self_arg = sigmoid._schema.arguments[0]
157+
_return = sigmoid._schema.returns[0]
158+
159+
print(self_arg.allowed_types)
160+
# {torch.float32, torch.int8, torch.float64, torch.int16, torch.int32, torch.int64, torch.uint8, torch.bool}
161+
162+
print(_return.allowed_types)
163+
# {torch.float32, torch.float64}
164+
```
165+
166+
These constraints are helpful for someone who wants to write a custom kernel for this operator. Also inside EXIR, we offer a validator to check if the graph is still complying with these dtype constraints, after custom transformations.
167+
168+
### Op Set (WIP)
169+
170+
Check out
171+
[edge.yaml](https://github.com/pytorch/executorch/blob/main/exir/dialects/edge/edge.yaml)
172+
for the complete list of operators having dtype constraints specified. We are
173+
gradually expanding this operator set and targeting to provide dtype constraints
174+
for all core ATen ops.
175+
176+
## Backend Dialect
177+
178+
Backend dialect is the name we gave to the `ExportedProgram` in Edge dialect,
179+
after optional **target specific** passes. The difference between backend
180+
dialect and edge dialect is that backend dialect is target-aware and may contain
181+
operators or submodules that are only meaningful to the target backend. Backend
182+
specific operators are new components we may see in a backend dialect, comparing
183+
with Edge dialect. They are a set of operators for the target backend.
184+
185+
Another property to notice is that the memory formats of the tensor can be any
186+
format (this is subject to change in the near future when we introduce dim order
187+
to backend dialect).
188+
189+
This dialect allows introduction of operators that do not conform to the schema
190+
defined in the canonical ATen operator set, and are not showing up in any of the
191+
dialects above (ATen dialect and edge dialect). Consider to use backend
192+
operators if your use case satisfies one or more of the following criteria:
193+
194+
1. Your backend provides a library that optimizes a certain operator that is
195+
equivalent to a subgraph. E.g., linear_relu (equivalent to linear + relu) that
196+
can be executed faster on a certain backend.
197+
2. There's a need to retrace the graph module after it is already lowered to a
198+
backend. When we retrace, backend operators can transform back to the original
199+
subgraph (in ATen dialect) where normal custom op doesn't take care of that.
200+
3. Your backend specific operator doesn't have a generic CPU kernel but only a
201+
kernel for a certain backend. Using backend operator can workaround this issue
202+
by using the original subgraph as default kernel and keep the graph module
203+
runnable.
204+
205+
### Running Backend Passes
206+
207+
To lower edge ops to backend ops, a pass will perform pattern matching to
208+
identify the edge ops of interest in the graph, and then replace them with
209+
equivalent backend operators. There are two APIs to register such passes:
210+
211+
* `transform()`. An API on `ExportProgram` that allows users to provide custom
212+
passes. Note that this is not guarded by any validator so the soundness of the
213+
program is not guaranteed.
214+
* [`ExecutorchBackendConfig.passes`](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40).
215+
If added here, the pass will be part of the lowering process from backend
216+
dialect to `ExecutorchProgram`.
217+
218+
Example: One such pass is `QuantFusion`. This pass takes a "canonical
219+
quantization pattern", that is, "dequant - some_op - quant", and fusees this
220+
pattern into a single operator that is backend specific, that is,
221+
`quantized_decomposed::some_op`. You can find more details
222+
[here](./quantization-custom-quantization.md). Another simpler example is
223+
[here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20)
224+
where we replace sym_size operators with ones that are understood by ExecuTorch.
225+
226+
### Backend Dialect Operators
227+
228+
We provide a decorator `bind_pattern_to_op` to help users easily register their
229+
backend operators into Export IR. This decorator takes:
230+
their backend operators into Export IR. This decorator takes:
231+
* a `torch.Library` object, that indicates which library or namespace this backend
232+
operator belongs to.
233+
* a name or schema. If we already defined the schema of the backend operator in
234+
the `torch.Library` object, only a name is needed. Otherwise we can register
235+
the schema if a schema string is being passed in.
236+
237+
This decorator should be added to the pattern we are trying to match (and then
238+
lower to this backend op) on the edge dialect. This way we are registering this
239+
pattern as a `CompositeImplicitAutograd` kernel for this backend operator.
240+
241+
Then the operator can be accessed/used from the passes. The `CompositeImplicitAutograd` kernel makes sure:
242+
1. No need for the user to write a (CPU) runnable kernel
243+
2. Ensures the retracability of `ExportProgram`. Once retraced, the backend
244+
operator will be decomposed into the ATen ops used in the pattern.
245+
246+
Unlike edge dialect where we have a well defined op set, for backend dialect,
247+
since it is target-aware we will be allowing user to use our API to register
248+
target-aware ops and they will be grouped by namespaces. Here are some examples:
249+
`executorch_prims` are ops that are used by ExecuTorch runtime to perform
250+
operation on `SymInt`s. `quantized_decomposed` are ops that fuses edge operators
251+
for quantization purpose and are meaningful to targets that support
252+
quantization.
253+
254+
* `executorch_prims::add.int(SymInt a, SymInt b) -> SymInt`
255+
* pattern: builtin.add
256+
* backend: executor
257+
* `executorch_prims::mul.int(SymInt a, SymInt b) -> SymInt`
258+
* pattern: builtin.mul
259+
* backend: executor
260+
* `executorch_prims::sub.int(SymInt a, SymInt b) -> SymInt`
261+
* pattern: builtin.sub
262+
* backend: executor
263+
* `executorch_prims::floordiv.int(SymInt a, SymInt b) -> SymInt`
264+
* pattern: builtin.floordiv
265+
* backend: executor
266+
* `executorch_prims::gt.int(SymInt a, SymInt b) -> bool`
267+
* pattern: builtin.gt
268+
* backend: executor
269+
* `executorch_prims::lt.int(SymInt a, SymInt b) -> bool`
270+
* pattern: builtin.lt
271+
* backend: executor
272+
* `executorch_prims::ge.int(SymInt a, SymInt b) -> bool`
273+
* pattern: builtin.ge
274+
* backend: executor
275+
* `executorch_prims::le.int(SymInt a, SymInt b) -> bool`
276+
* pattern: builtin.le
277+
* backend: executor
278+
* `executorch_prims::eq.int(SymInt a, SymInt b) -> bool`
279+
* pattern: builtin.eq
280+
* backend: executor
281+
* `quantized_decomposed::embedding_byte(Tensor weight, Tensor weight_scales, Tensor weight_zero_points, int weight_quant_min, int weight_quant_max, Tensor indices) -> Tensor`
282+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
283+
* backend: quantization
284+
* `quantized_decomposed::add(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc`
285+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
286+
* backend: quantization
287+
* `quantized_decomposed::add.scalar(Tensor qa, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, ScalarType a_dtype, Scalar b, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max, ScalarType out_dtype) -> Tensor`
288+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
289+
* backend: quantization
290+
* `quantized_decomposed::add_relu(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc`
291+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
292+
* backend: quantization

0 commit comments

Comments
 (0)