Skip to content

Commit 73ad1fb

Browse files
larryliu0820facebook-github-bot
authored andcommitted
Update custom kernel registration API
Summary: As titled Reviewed By: lucylq Differential Revision: D56532035
1 parent b669056 commit 73ad1fb

File tree

1 file changed

+103
-1
lines changed

1 file changed

+103
-1
lines changed

docs/source/kernel-library-custom-aten-kernel.md

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,88 @@ ATen operator with a dtype/dim order specialized kernel (works for `Double` dtyp
8686
kernel_name: torch::executor::add_out
8787
8888
```
89+
### Custom Ops C++ API
90+
91+
For a custom kernel that implements a custom operator, we provides 2 ways to register it into ExecuTorch runtime:
92+
1. Using `EXECUTORCH_LIBRARY` and `WRAP_TO_ATEN` C++ macros.
93+
2. Using `functions.yaml` and codegen'd C++ libraries.
94+
95+
The first option requires C++17 and doesn't have selective build support yet, but it's faster than the second option where we have to go through yaml authoring and build system tweaking.
96+
97+
The first option is particularly suitable for fast prototyping but can also be used in production.
98+
99+
Similar to `TORCH_LIBRARY`, `EXECUTORCH_LIBRARY` takes the operator name and the C++ function name and register them into ExecuTorch runtime.
100+
101+
#### Prepare custom kernel implementation
102+
103+
Define your custom operator schema for both functional variant (used in AOT compilation) and out variant (used in ExecuTorch runtime). The schema needs to follow PyTorch ATen convention (see native_functions.yaml). For example:
104+
105+
```yaml
106+
custom_linear(Tensor weight, Tensor input, Tensor(?) bias) -> Tensor
107+
custom_linear.out(Tensor weight, Tensor input, Tensor(?) bias, *, Tensor(a!) out) -> Tensor(a!)
108+
```
109+
110+
Then write your custom kernel according to the schema using ExecuTorch types, along with APIs to register to ExecuTorch runtime:
111+
112+
113+
```c++
114+
// custom_linear.h/custom_linear.cpp
115+
#include <executorch/runtime/kernel/kernel_includes.h>
116+
Tensor& custom_linear_out(const Tensor& weight, const Tensor& input, optional<Tensor> bias, Tensor& out) {
117+
// calculation
118+
return out;
119+
}
120+
```
121+
#### Use a C++ macro to register it into PyTorch & ExecuTorch
122+
123+
Append the following line in the example above:
124+
```c++
125+
// custom_linear.h/custom_linear.cpp
126+
// opset namespace myop
127+
EXECUTORCH_LIBRARY(myop, "custom_linear.out", custom_linear_out);
128+
```
129+
130+
Now we need to write some wrapper for this op to show up in PyTorch, but don’t worry we don’t need to rewrite the kernel. Create a separate .cpp for this purpose:
131+
132+
```c++
133+
// custom_linear_pytorch.cpp
134+
#include "custom_linear.h"
135+
#include <torch/library.h>
136+
137+
at::Tensor custom_linear(const at::Tensor& weight, const at::Tensor& input, std::optional<at::Tensor> bias) {
138+
// initialize out
139+
at::Tensor out = at::empty({weight.size(1), input.size(1)});
140+
// wrap kernel in custom_linear.cpp into ATen kernel
141+
WRAP_TO_ATEN(custom_linear_out, 3)(weight, input, bias, out);
142+
return out;
143+
}
144+
// standard API to register ops into PyTorch
145+
TORCH_LIBRARY(myop, m) {
146+
m.def("custom_linear(Tensor weight, Tensor input, Tensor(?) bias) -> Tensor", custom_linear);
147+
m.def("custom_linear.out(Tensor weight, Tensor input, Tensor(?) bias, *, Tensor(a!) out) -> Tensor(a!)", WRAP_TO_ATEN(custom_linear_out, 3));
148+
}
149+
```
150+
151+
#### Compile and link the custom kernel
152+
153+
Link it into ExecuTorch runtime: In our `CMakeLists.txt`` that builds the binary/application, we just need to add custom_linear.h/cpp into the binary target. We can build a dynamically loaded library (.so or .dylib) and link it as well.
154+
155+
Link it into PyTorch runtime: We need to package custom_linear.h, custom_linear.cpp and custom_linear_pytorch.cpp into a dynamically loaded library (.so or .dylib) and load it into our python environment. One way of doing this is:
156+
157+
```python
158+
import torch
159+
torch.ops.load_library("libcustom_linear.so/dylib")
160+
161+
# Now we have access to the custom op, backed by kernel implemented in custom_linear.cpp.
162+
op = torch.ops.myop.custom_linear.default
163+
```
164+
89165

90166
### Custom Ops Yaml Entry
91167

92-
For custom ops (the ones that are not part of the out variants of core ATen opset) we need to specify the operator schema as well as a `kernel` section. So instead of `op` we use `func` with the operator schema. As an example, here’s a yaml entry for a custom op:
168+
As mentioned above, this option provides more support in terms of selective build and features such as merging operator libraries.
169+
170+
First we need to specify the operator schema as well as a `kernel` section. So instead of `op` we use `func` with the operator schema. As an example, here’s a yaml entry for a custom op:
93171
```yaml
94172
- func: allclose.out(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False, bool dummy_param=False, *, Tensor(a!) out) -> Tensor(a!)
95173
kernels:
@@ -159,6 +237,30 @@ target_link_libraries(executorch_binary generated_lib)
159237
160238
```
161239

240+
We also provide the ability to merge two yaml files, given a precedence. `merge_yaml(FUNCTIONS_YAML functions_yaml FALLBACK_YAML fallback_yaml OUTPUT_DIR out_dir)` merges functions_yaml and fallback_yaml into a single yaml, if there's duplicate entries in functions_yaml and fallback_yaml, this macro will always take the one in functions_yaml.
241+
242+
Example:
243+
244+
```yaml
245+
# functions.yaml
246+
- op: add.out
247+
kernels:
248+
- arg_meta: null
249+
kernel_name: torch::executor::opt_add_out
250+
```
251+
252+
And out fallback:
253+
254+
```yaml
255+
# fallback.yaml
256+
- op: add.out
257+
kernels:
258+
- arg_meta: null
259+
kernel_name: torch::executor::add_out
260+
```
261+
262+
The merged yaml will have the entry in functions.yaml.
263+
162264
#### Buck2
163265

164266
`executorch_generated_lib` is the macro that takes the yaml files and depends on the selective build macro `et_operator_library`. For an example:

0 commit comments

Comments
 (0)