Skip to content

Commit 0ae93a1

Browse files
larryliu0820facebook-github-bot
authored andcommitted
Add custom kernel registration doc (#619)
Summary: Pull Request resolved: #619 As titled https://docs.google.com/document/d/16dOVVa2gy8-pwmrfDRWUSZDynBuluTIlvTE7TvCJ8Ss/edit#heading=h.xlxamhdvb7w5 Reviewed By: cccclai Differential Revision: D49925812 fbshipit-source-id: 9d9871186a8d392bb5e93996eb30475d90d2d365
1 parent 7af058d commit 0ae93a1

File tree

2 files changed

+186
-2
lines changed

2 files changed

+186
-2
lines changed
Loading
Lines changed: 186 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,187 @@
1-
# Custom ATen Compliant Kernel Implementation and Linking
1+
# Overview
22

3-
TBA
3+
At the last stage of [ExecuTorch model exporting](./export-overview.md), we lower the operators in the dialect to the _out variants_ of the [core ATen operators](./ir-ops-set-definition.md). Then we serialize these operator names into the model artifact. During runtime execution, for each operator name we will need to find the actual _kernels_, i.e., the C++ functions that do the heavy-lifting calculations and return results.
4+
5+
Portable kernel library is the in-house default kernel library, it’s easy to use and portable for most of the target backends. However it’s not optimized for performance, because it’s not specialized for any certain target. Therefore we provide kernel registration APIs for ExecuTorch users to easily register their own optimized kernels.
6+
7+
8+
# Design Principles
9+
10+
**What do we support?** On the operator coverage side, the kernel registration APIs allow users to register kernels for all core ATen ops as well as custom ops, as long as the custom ops schemas are specified.
11+
12+
Notice that we also support _partial kernels, _for example the kernel only supports a subset of tensor dtypes and/or dim orders.
13+
14+
**Kernel contract**: kernels need to comply with the following requirements:
15+
16+
* Match the calling convention derived from operator schema. The kernel registration API will generate headers for the custom kernels as references.
17+
* Satisfy the dtype constraints defined in edge dialect. For tensors with certain dtypes as arguments, the result of a custom kernel needs to match the expected dtypes. The constraints are available in edge dialect ops.
18+
* Gives correct result. We will provide a testing framework to automatically test the custom kernels.
19+
20+
21+
# High Level Architecture
22+
23+
![](./_static/img/kernel-library-custom-aten-kernel.png)
24+
25+
ExecuTorch users are asked to provide:
26+
27+
1. the custom kernel library with C++ implementations
28+
29+
2. a yaml file associated with the library that describes what operators are being implemented by this library. For partial kernels, the yaml file also contains information on the dtypes and dim orders supported by the kernel. More details in the API section.
30+
31+
32+
## Workflow
33+
34+
At build time, the yaml files associated with kernel libraries will be passed to the _kernel resolver_ along with the model op info (see selective build doc) and the outcome is a mapping between a combination of operator names and tensor metadata, to kernel symbols. Then codegen tools will use this mapping to generate C++ bindings that connect the kernels to ExecuTorch runtime. ExecuTorch users need to link this generated library into their application to use these kernels.
35+
36+
At static object initialization time, kernels will be registered into the ExecuTorch kernel registry.
37+
38+
At runtime initialization stage, ExecuTorch will use the operator name and argument metadata as a key to lookup for the kernels. For example, with “aten::add.out” and inputs being float tensors with dim order (0, 1, 2, 3), ExecuTorch will go into the kernel registry and lookup for a kernel that matches the name and the input metadata.
39+
40+
41+
# APIs
42+
43+
There are two sets of APIs: yaml files that describe kernel - operator mappings and codegen tools to consume these mappings.
44+
45+
46+
## Yaml Entry for Core ATen Op Out Variant
47+
48+
Top level attributes:
49+
50+
51+
52+
* `op` (if the operator appears in `native_functions.yaml`) or `func` for custom operator. The value for this key needs to be the full operator name (including overload name) for `op` key, or a full operator schema (namespace, operator name, operator overload name and schema string), if we are describing a custom operator. For schema syntax please refer to this [instruction](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md).
53+
* `kernels`: defines kernel information. It consists of `arg_meta` and `kernel_name`, which are bound together to describe "for input tensors with these metadata, use this kernel".
54+
* `type_alias`(optional): we are giving aliases to possible dtype options. `T0: [Double, Float]` means `T0` can be one of `Double` or `Float`.
55+
* `dim_order_alias`(optional): similar to `type_alias`, we are giving names to possible dim order options.
56+
57+
Attributes under `kernels`:
58+
59+
60+
61+
* `arg_meta`: a list of "tensor arg name" entries. The values for these keys are dtypes and dim orders aliases, that are implemented by the corresponding `kernel_name`. This being `null` means the kernel will be used for all types of input.
62+
* `kernel_name`: the expected name of the C++ function that will implement this operator. You can put whatever you want to here, but you should follow the convention of replacing the `.` in the overload name with an underscore, and lowercasing all characters. In this example, `add.out` uses the C++ function named `add_out`. `add.Scalar_out` would become `add_scalar_out`, with a lowercase `S`. We support namespace for kernels, but note that we will be inserting a `native::` to the last level of namespace. So `custom::add_out` in the `kernel_name` will point to `custom::native::add_out`.
63+
64+
Some examples of operator entry:
65+
```yaml
66+
- op: add.out
67+
kernels:
68+
- arg_meta: null
69+
kernel_name: torch::executor::add_out
70+
```
71+
An out variant of a core ATen operator with a default kernel
72+
73+
ATen operator with a dtype/dim order specialized kernel (works for `Double` dtype and dim order needs to be (0, 1, 2, 3))
74+
```yaml
75+
- op: add.out
76+
type_alias:
77+
T0: [Double]
78+
dim_order_alias:
79+
D0: [[0, 1, 2, 3]]
80+
kernels:
81+
- arg_meta:
82+
self: [T0, D0]
83+
other: [T0 , D0]
84+
out: [T0, D0]
85+
kernel_name: torch::executor::add_out
86+
87+
```
88+
89+
## Custom Ops Yaml Entry
90+
91+
For custom ops (the ones that are not part of the out variants of core ATen opset) we need to specify the operator schema as well as a `kernel` section. So instead of `op` we use `func` with the operator schema. As an example, here’s a yaml entry for a custom op:
92+
```yaml
93+
- func: allclose.out(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False, bool dummy_param=False, *, Tensor(a!) out) -> Tensor(a!)
94+
kernels:
95+
- arg_meta: null
96+
kernel_name: torch::executor::allclose_out
97+
```
98+
The `kernel` section is the same as the one defined in core ATen ops. For operator schema, we are reusing the DSL defined in aten/src/ATen/native/README.md, with a few differences:
99+
100+
101+
### Out variants only
102+
103+
ExecuTorch only supports out-style operators, where:
104+
105+
106+
* The caller provides the output Tensor or Tensor list in the final position with the name `out`.
107+
* The C++ function modifies and returns the same `out` argument.
108+
* If the return type in the YAML file is `()` (which maps to void), the C++ function should still modify `out` but does not need to return anything.
109+
* The `out` argument must be keyword-only, which means it needs to follow an argument named `*` like in the `add.out` example below.
110+
* Conventionally, these out operators are named using the pattern `<name>.out` or `<name>.<overload>_out`.
111+
112+
Since all output values are returned via an `out` parameter, ExecuTorch ignores the actual C++ function return value. But, to be consistent, functions should always return `out` when the return type is non-`void`.
113+
114+
115+
### Can only return `Tensor` or `()`
116+
117+
ExecuTorch only supports operators that return a single `Tensor`, or the unit type `()` (which maps to `void`). It does not support returning any other types, including lists, optionals, tuples, or scalars like `bool`.
118+
119+
120+
### Supported argument types
121+
122+
ExecuTorch does not support all of the argument types that core PyTorch supports. See [this spreadsheet](https://docs.google.com/spreadsheets/d/1uArc0r1Yq1QSeyRJZKzZ8Wkz0eS9TsM39ghmMAZCXDA/edit#gid=0) for the list of supported and unsupported types.
123+
124+
125+
## Build Tool Macros
126+
127+
We provide build time macros to help users to build their kernel registration library. The macro takes the yaml file describing the kernel library as well as model operator metadata, and packages the generated C++ bindings into a C++ library. The macro is available on both CMake and Buck2.
128+
129+
130+
### CMake
131+
132+
`generate_bindings_for_kernels(functions_yaml, custom_ops_yaml)` takes a yaml file for core ATen op out variants and also a yaml file for custom ops, generate C++ bindings for kernel registration. It also depends on the selective build artifact generated by `gen_selected_ops()`, see selective build doc for more information. Then `gen_operators_lib` will package those bindings to be a C++ library. As an example:
133+
```cmake
134+
# SELECT_OPS_LIST: aten::add.out,aten::mm.out
135+
gen_selected_ops("" "${SELECT_OPS_LIST}" "")
136+
137+
# Look for functions.yaml associated with portable libs and generate C++ bindings
138+
generate_bindings_for_kernels(${EXECUTORCH_ROOT}/kernels/portable/functions.yaml "")
139+
140+
# Prepare a C++ library called "generated_lib" with _kernel_lib being the portable library, executorch is a dependency of it.
141+
gen_operators_lib("generated_lib" ${_kernel_lib} executorch)
142+
143+
# Link "generated_lib" into the application:
144+
target_link_libraries(executorch_binary generated_lib)
145+
146+
```
147+
148+
### Buck2
149+
150+
`executorch_generated_lib` is the macro that takes the yaml files and depends on the selective build macro `et_operator_library`. For an example:
151+
```python
152+
# Yaml file for kernel library
153+
export_file(
154+
name = "functions.yaml"
155+
)
156+
157+
# Kernel library
158+
cxx_library(
159+
name = "add_kernel",
160+
srcs = ["add.cpp"],
161+
)
162+
163+
# Selective build artifact, it allows all operators to be registered
164+
et_operator_library(
165+
name = "all_ops",
166+
include_all_ops = True, # Select all ops in functions.yaml
167+
)
168+
169+
# Prepare a generated_lib
170+
executorch_generated_lib(
171+
name = "generated_lib",
172+
functions_yaml_target = ":functions.yaml",
173+
deps = [
174+
":all_ops",
175+
":add_kernel",
176+
],
177+
)
178+
179+
# Link generated_lib to ExecuTorch binary
180+
cxx_binary(
181+
name = "executorch_bin",
182+
deps = [
183+
":generated_lib",
184+
],
185+
)
186+
187+
```

0 commit comments

Comments
 (0)