Skip to content

Commit 1cc8162

Browse files
angelayifacebook-github-bot
authored andcommitted
EXIR spec (#482)
Summary: Pull Request resolved: #482 Export IR Spec here: D49829187 Differential Revision: D49602668 fbshipit-source-id: 6a90fdf4a095d4292e2f17a7992c58d6d89df95c
1 parent 578fdf3 commit 1cc8162

File tree

2 files changed

+301
-2
lines changed

2 files changed

+301
-2
lines changed

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
myst_enable_extensions = [
6868
"colon_fence",
6969
]
70+
myst_all_links_external = True
7071

7172
myst_heading_anchors = 3
7273

docs/source/ir-exir.md

Lines changed: 300 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,301 @@
1-
# EXIR
1+
# Export IR Specification
22

3-
Title TBA
3+
Export IR is an intermediate representation (IR) for the result of
4+
`torch.export`. To read more on the details of Export IR, please read this
5+
[document](https://pytorch.org/docs/main/export.ir_spec.html).
6+
7+
The Exported IR is a specification that consists of the following parts:
8+
9+
1. A definition of computation graph model.
10+
2. Set of operators allowed in the graph.
11+
12+
A **dialect** is an Exported IR graph composed with the operations defined
13+
below, but with additional properties (such as restrictions on operator set or
14+
metadata) that are meant for a specific purpose.
15+
16+
The EXIR dialects that currently exist are:
17+
18+
* [ATen Dialect](./ir-exir-aten-dialect.md)
19+
* [Edge Dialect](./ir-exir-edge-dialect.md)
20+
* [Backend Dialect](./ir-exir-backend-dialect.md)
21+
22+
These dialects represent stages that a captured program goes through from
23+
program capture to conversion into an executable format. For example, the
24+
Executorch compilation process starts from a Python program capture into ATen
25+
Dialect, then ATen Dialect is converted to Edge Dialect, Edge to Backend, and
26+
finally to a binary format for execution.
27+
28+
## ATen Dialect
29+
30+
### Properties:
31+
32+
An ATen dialect graph is a valid Export IR graph with the following additional
33+
properties:
34+
35+
1. All operators in `call_function` nodes are either ATen operators (in the
36+
`torch.ops.aten` namespace, higher order operators (like control flow
37+
operators), or a registered custom operator. A registered custom operator is
38+
an operator registered into the current Pytorch eager mode runtime, usually
39+
with `TORCH_LIBRARY` call (implies schema). Details for how to register a
40+
custom operator can be found
41+
[here](https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit#heading=h.3rgxk3v387wl).
42+
2. Every operator must also have a meta kernel. A meta kernel is a
43+
function that, given the shapes of the input tensors, can return the shape of
44+
output tensor. Details on how to write a meta kernel can be found
45+
[here](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.64r4npvq0w0).
46+
3. Input value type must be “Pytree-able”. As a consequence, the output
47+
types are also Pytree-able because all operators output are pytree-able.
48+
4. Ops of ATen dialect can choose to work Dynamic dtypes, implicit type
49+
promotions and implicit broadcasting of tensors.
50+
5. All tensors memory formats are in `torch.contiguous_format`.
51+
52+
### Intent
53+
54+
This section describes what we envision ATen dialect is used for.
55+
56+
ATen dialect will be used as the entry point of the ExecuTorch compilation
57+
pipeline, it is the first time an eager mode Pytorch program becomes an Exported
58+
IR graph. At this stage, functionalization is performed, so all the tensor
59+
aliases are made a copy of. Therefore, all tensors are converted to continuous
60+
format.
61+
62+
The goal of this dialect is to capture users' programs as faithfully as possible
63+
(while remaining valid Exported IR). Registered custom operators that user has called
64+
in eager mode will preserve as-is in ATen dialect. However, we should refrain
65+
from adding custom ops in the graph via passes.
66+
67+
For now, the function of ATen dialect is to further lower to Edge dialect.
68+
However, in the future we can see this one as the common integration point for
69+
other export use cases.
70+
71+
### ATen Operator Definition
72+
73+
The operator set definition can be found [here](./ir-ops-set-definition.md).
74+
75+
## Edge Dialect
76+
77+
### Properties
78+
79+
An Edge dialect graph is a valid Export IR graph with the following additional
80+
properties:
81+
82+
1. All operators in OpCall nodes are either from a predefined operator set,
83+
called **“Edge Operators”**, or a registered custom operator. An Edge operator is a
84+
ATen operator with dtype specialization.
85+
2. Input and output of the graph, and as well as to every node, cannot be Scalar. I.e.
86+
All scalar types (such as float, int) are converted to Tensor.
87+
88+
### Intent
89+
90+
This dialect is meant to introduce specializations that are useful for Edge
91+
devices but not necessarily for general (server) export. However, we still
92+
withhold specializing further to each different hardware. In other words, we
93+
don’t want to introduce any new hardware dependent concepts or data; besides
94+
those already present in users’ original python program.
95+
96+
## How to use
97+
98+
A GraphModule in Edge dialect is represented with `torch.fx.GraphModule` Python class
99+
in memory. To obtain such a class, one start with a `torch.nn.Module`:
100+
101+
```python
102+
import torch
103+
from executorch import exir
104+
105+
class MyModule(torch.nn.Module):
106+
...
107+
a = MyModule()
108+
tracing_inputs = (torch.rand(2, 2),)
109+
aten_dialect_program = torch.export(a, tracing_inputs)
110+
edge_dialect_program = exir.to_edge(aten_dialect)
111+
```
112+
113+
At this point, user defined graph transformation can be run through
114+
`edge_dialect_program.transform(pass)`. Order matters. Note: If the custom pass
115+
is touching `node.target`, be aware that all of the `node.target` at this stage
116+
are "Edge ops" (more details below) and not torch ops like in the ATen dialect.
117+
A tutorial on pass writing can be found
118+
[here](./compiler-custom-compiler-passes.md). After all these passes are
119+
executed, `to_edge()` will make sure the graph is still valid.
120+
121+
### Edge Operators
122+
123+
As mentioned before, an edge operator is an ATen core operator with type
124+
specialization. This means an instance of the edge operator contains a set of
125+
dtype constraints, that describe all the tensor dtypes supported by both the
126+
ExecuTorch runtime and their ATen kernels. These dtype constraints are expressed
127+
in a DSL defined in
128+
[edge.yaml](https://github.com/pytorch/executorch/blob/main/exir/dialects/edge/edge.yaml).
129+
Here's an example of the dtype constraints:
130+
131+
```
132+
- func: sigmoid
133+
namespace: edge
134+
inherits: aten::sigmoid
135+
type_alias:
136+
T0: [Bool, Byte, Char, Int, Long, Short]
137+
T1: [Double, Float]
138+
T2: [Float]
139+
type_constraint:
140+
- self: T0
141+
__ret_0: T2
142+
- self: T1
143+
__ret_0: T1
144+
```
145+
This is saying if `self` tensor is one of the type `Bool, Byte, Char, Int, Long, Short`, then the return tensor would be `Float`. If `self` is one of `Double, Float`, the return tensor will be the same dtype.
146+
147+
After these dtype constraints are collected and documented in edge.yaml, EXIR
148+
consumes the file, and loads the constraints into EXIR Edge operators. This
149+
makes it convenient for developers to learn the supported dtypes of any argument
150+
in the Edge op schema. For example we can do:
151+
152+
153+
```python
154+
from executorch.exir.dialects._ops import ops as exir_ops # import dialects ops
155+
sigmoid = exir_ops.edge.aten.sigmoid.default
156+
print(sigmoid._schema)
157+
# aten::sigmoid(Tensor self) -> Tensor
158+
self_arg = sigmoid._schema.arguments[0]
159+
_return = sigmoid._schema.returns[0]
160+
161+
print(self_arg.allowed_types)
162+
# {torch.float32, torch.int8, torch.float64, torch.int16, torch.int32, torch.int64, torch.uint8, torch.bool}
163+
164+
print(_return.allowed_types)
165+
# {torch.float32, torch.float64}
166+
```
167+
168+
These constraints are helpful for someone who wants to write a custom kernel for this operator. Also inside EXIR, we offer a validator to check if the graph is still complying with these dtype constraints, after custom transformations.
169+
170+
### Op Set (WIP)
171+
172+
Check out
173+
[edge.yaml](https://github.com/pytorch/executorch/blob/main/exir/dialects/edge/edge.yaml)
174+
for the complete list of operators having dtype constraints specified. We are
175+
gradually expanding this operator set and targeting to provide dtype constraints
176+
for all core ATen ops.
177+
178+
## Backend Dialect
179+
180+
181+
### Propertiesa
182+
183+
Backend dialect is the name we gave to the `ExportedProgram` in Edge dialect,
184+
after optional **target specific** passes. The difference between backend
185+
dialect and edge dialect is that backend dialect is target-aware and may contain
186+
operators or submodules that are only meaningful to the target backend. Backend
187+
specific operators are new components we may see in a backend dialect, comparing
188+
with Edge dialect. They are a set of operators for the target backend.
189+
190+
Another property to notice is that the memory formats of the tensor can be any
191+
format (this is subject to change in the near future when we introduce dim order
192+
to backend dialect).
193+
194+
### Intent
195+
196+
This dialect allows introduction of operators that do not conform to the schema
197+
defined in the canonical ATen operator set, and are not showing up in any of the
198+
dialects above (ATen dialect and edge dialect). Consider to use backend
199+
operators if your use case satisfies one or more of the following criteria:
200+
201+
1. Your backend provides a library that optimizes a certain operator that is
202+
equivalent to a subgraph. E.g., linear_relu (equivalent to linear + relu) that
203+
can be executed faster on a certain backend.
204+
2. There's a need to retrace the graph module after it is already lowered to a
205+
backend. When we retrace, backend operators can transform back to the original
206+
subgraph (in ATen dialect) where normal custom op doesn't take care of that.
207+
3. Your backend specific operator doesn't have a generic CPU kernel but only a
208+
kernel for a certain backend. Using backend operator can workaround this issue
209+
by using the original subgraph as default kernel and keep the graph module
210+
runnable.
211+
212+
213+
### How to use
214+
215+
To lower edge ops to backend ops, a pass will perform pattern matching to
216+
identify the edge ops of interest in the graph, and then replace them with
217+
equivalent backend operators. There are two APIs to register such passes:
218+
219+
* `transform()`. An API on `ExportProgram` that allows users to provide custom
220+
passes. Note that this is not guarded by any validator so the soundness of the
221+
program is not guaranteed.
222+
* [`ExecutorchBackendConfig.passes`](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40).
223+
If added here, the pass will be part of the lowering process from backend
224+
dialect to `ExecutorchProgram`.
225+
226+
Example: One such pass is `QuantFusion`. This pass takes a "canonical
227+
quantization pattern", that is, "dequant - some_op - quant", and fusees this
228+
pattern into a single operator that is backend specific, that is,
229+
`quantized_decomposed::some_op`. You can find more details
230+
[here](./quantization-custom-quantization.md). Another simpler example is
231+
[here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20)
232+
where we replace sym_size operators with ones that are understood by ExecuTorch.
233+
234+
### API
235+
236+
We provide a decorator `bind_pattern_to_op` to help users easily register their
237+
backend operators into EXIR. This decorator takes:
238+
their backend operators into EXIR. This decorator takes:
239+
* a `torch.Library` object, that indicates which library or namespace this backend
240+
operator belongs to.
241+
* a name or schema. If we already defined the schema of the backend operator in
242+
the `torch.Library` object, only a name is needed. Otherwise we can register
243+
the schema if a schema string is being passed in.
244+
245+
This decorator should be added to the pattern we are trying to match (and then
246+
lower to this backend op) on the edge dialect. This way we are registering this
247+
pattern as a `CompositeImplicitAutograd` kernel for this backend operator.
248+
249+
Then the operator can be accessed/used from the passes. The `CompositeImplicitAutograd` kernel makes sure:
250+
1. No need for the user to write a (CPU) runnable kernel
251+
2. Ensures the retracability of `ExportProgram`. Once retraced, the backend
252+
operator will be decomposed into the ATen ops used in the pattern.
253+
254+
## Op Set
255+
Unlike edge dialect where we have a well defined op set, for backend dialect,
256+
since it is target-aware we will be allowing user to use our API to register
257+
target-aware ops and they will be grouped by namespaces. Here are some examples:
258+
`executorch_prims` are ops that are used by ExecuTorch runtime to perform
259+
operation on `SymInt`s. `quantized_decomposed` are ops that fuses edge operators
260+
for quantization purpose and are meaningful to targets that support
261+
quantization.
262+
263+
* `executorch_prims::add.int(SymInt a, SymInt b) -> SymInt`
264+
* pattern: builtin.add
265+
* backend: executor
266+
* `executorch_prims::mul.int(SymInt a, SymInt b) -> SymInt`
267+
* pattern: builtin.mul
268+
* backend: executor
269+
* `executorch_prims::sub.int(SymInt a, SymInt b) -> SymInt`
270+
* pattern: builtin.sub
271+
* backend: executor
272+
* `executorch_prims::floordiv.int(SymInt a, SymInt b) -> SymInt`
273+
* pattern: builtin.floordiv
274+
* backend: executor
275+
* `executorch_prims::gt.int(SymInt a, SymInt b) -> bool`
276+
* pattern: builtin.gt
277+
* backend: executor
278+
* `executorch_prims::lt.int(SymInt a, SymInt b) -> bool`
279+
* pattern: builtin.lt
280+
* backend: executor
281+
* `executorch_prims::ge.int(SymInt a, SymInt b) -> bool`
282+
* pattern: builtin.ge
283+
* backend: executor
284+
* `executorch_prims::le.int(SymInt a, SymInt b) -> bool`
285+
* pattern: builtin.le
286+
* backend: executor
287+
* `executorch_prims::eq.int(SymInt a, SymInt b) -> bool`
288+
* pattern: builtin.eq
289+
* backend: executor
290+
* `quantized_decomposed::embedding_byte(Tensor weight, Tensor weight_scales, Tensor weight_zero_points, int weight_quant_min, int weight_quant_max, Tensor indices) -> Tensor`
291+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
292+
* backend: quantization
293+
* `quantized_decomposed::add(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc`
294+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
295+
* backend: quantization
296+
* `quantized_decomposed::add.scalar(Tensor qa, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, ScalarType a_dtype, Scalar b, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max, ScalarType out_dtype) -> Tensor`
297+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
298+
* backend: quantization
299+
* `quantized_decomposed::add_relu(Tensor a, float a_scale, int a_zero_point, int a_quant_min, int a_quant_max, Tensor b, float b_scale, int b_zero_point, int b_quant_min, int b_quant_max, float out_scale, int out_zero_point, int out_quant_min, int out_quant_max) -> Tensor qc`
300+
* pattern: [source](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py)
301+
* backend: quantization

0 commit comments

Comments
 (0)