Add custom ops registration to examples

larryliu0820 · facebook-github-bot · commit 9870e11b074c · 2023-08-04T16:30:21.000-07:00
Summary:
## Context
I plan to add 2 (or 3) examples for different custom ops registration mechanisms. User should be able to use any of these options to use their custom ops. A proper README.md will be added.

## Solution
For the first option, we support the traditional PyTorch op registration python API. This requires users to write python implementations of both functional op and out variant op, like demonstrated in this diff. Note that those ops are only being registered into PyTorch JIT runtime for EXIR to consume. We also use buck2 target macro `executorch_generated_lib` to register custom ops to Executorch runtime.

For the second option, we want to leverage the C++ kernel user wrote for Executorch runtime, treat it as a valid PyTorch op kernel and register it into PyTorch JIT runtime. This way we don't have to write any python kernel. This can be done through CMake build, by pulling in PyTorch C++ dependency, then enabling ATen mode. This will be done once CMake diff D47927863 is landed.

The third option will be the same as the first but on CMake build system.

Note that CMake and Buck2 will then have different capabilities because pulling PyTorch C++ lib in Buck2 can't reuse the existing BUCK files.

Reviewed By: cccclai

Differential Revision: D48054313

fbshipit-source-id: 15fe77a4a69f3260b8fe09d7ce51b2f1e92cce68
diff --git a/examples/custom_ops/README.md b/examples/custom_ops/README.md
@@ -0,0 +1,31 @@
+# Custom Operator Registration Examples (WIP)
+This folder contains examples to register custom operators into PyTorch as well as register its kernels into Executorch runtime.
+
+## How to run
+
+Prerequisite: finish the [setting up wiki](https://github.com/pytorch/executorch/blob/main/docs/website/docs/tutorials/00_setting_up_executorch.md).
+
+Run:
+
+```bash
+bash test_custom_ops.sh
+```
+
+## AOT registration
+
+In order to use custom ops in Executorch AOT flow (EXIR), the first option is to register the custom ops into PyTorch JIT runtime using `torch.library` APIs.
+
+We can see the example in `custom_ops_1.py` where we try to register `my_ops::mul3` and `my_ops::mul3_out`. `my_ops` is the namespace and it will show up in the way we use the operator like `torch.ops.my_ops.mul3.default`. For more information about PyTorch operator, checkout [`pytorch/torch/_ops.py`](https://github.com/pytorch/pytorch/blob/main/torch/_ops.py).
+
+Notice that we need both functional variant and out variant for custom ops, because EXIR will need to perform memory planning on the out variant `my_ops::mul3_out`.
+
+## C++ kernel registration
+
+After the model is exported by EXIR, we need C++ implementations of these custom ops in order to run it. `custom_ops_1.cpp` is an example C++ kernel. Other than that, we also need a way to bind the PyTorch op to this kernel. This binding is specified in `custom_ops.yaml`:
+```yaml
+- func: my_ops::mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
+  kernels:
+    - arg_meta: null
+      kernel_name: custom::mul3_out_impl # sub-namespace native:: is auto-added
+```
+For how to write these YAML entries, please refer to [`kernels/portable/README.md`](https://github.com/pytorch/executorch/blob/main/kernels/portable/README.md).
diff --git a/examples/custom_ops/TARGETS b/examples/custom_ops/TARGETS
@@ -0,0 +1,8 @@
+# Any targets that should be shared between fbcode and xplat must be defined in
+# targets.bzl. This file can contain fbcode-only targets.
+
+load(":targets.bzl", "define_common_targets")
+
+oncall("executorch")
+
+define_common_targets()
diff --git a/examples/custom_ops/custom_ops.yaml b/examples/custom_ops/custom_ops.yaml
@@ -0,0 +1,10 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# See the kernels/portable/README.md for a description of the syntax used
+# by this file.
+
+# important to keep the namespace
+- func: my_ops::mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
+  kernels:
+    - arg_meta: null
+      kernel_name: custom::mul3_out_impl # sub-namespace native:: is auto-added
diff --git a/examples/custom_ops/custom_ops_1.cpp b/examples/custom_ops/custom_ops_1.cpp
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the BSD-style license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <executorch/runtime/kernel/kernel_includes.h>
+
+namespace custom {
+namespace native {
+
+using exec_aten::ScalarType;
+using exec_aten::Tensor;
+using torch::executor::RuntimeContext;
+
+namespace {
+void check_preconditions(const Tensor& in, Tensor& out) {
+  ET_CHECK_MSG(
+      out.scalar_type() == ScalarType::Float,
+      "Expected out tensor to have dtype Float, but got %hhd instead",
+      out.scalar_type());
+  ET_CHECK_MSG(
+      in.scalar_type() == ScalarType::Float,
+      "Expected in tensor to have dtype Float, but got %hhd instead",
+      in.scalar_type());
+  ET_CHECK_MSG(
+      out.dim() == in.dim(),
+      "Number of dims of out tensor is not compatible with inputs");
+  ET_CHECK_MSG(
+      out.numel() == in.numel(),
+      "Number of elements of out tensor %zd is not compatible with inputs %zd",
+      ssize_t(out.numel()),
+      ssize_t(in.numel()));
+}
+} // namespace
+// mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
+Tensor& mul3_out_impl(RuntimeContext& ctx, const Tensor& in, Tensor& out) {
+  (void)ctx;
+
+  check_preconditions(in, out);
+  float* out_data = out.mutable_data_ptr<float>();
+  const float* in_data = in.const_data_ptr<float>();
+  for (size_t out_idx = 0; out_idx < out.numel(); ++out_idx) {
+    out_data[out_idx] = in_data[out_idx] * 3;
+  }
+  return out;
+}
+} // namespace native
+} // namespace custom
diff --git a/examples/custom_ops/custom_ops_1.py b/examples/custom_ops/custom_ops_1.py
@@ -0,0 +1,51 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Example of showcasing registering custom operator through torch library API."""
+import torch
+
+from examples.export.export_example import export_to_ff
+from torch.library import impl, Library
+
+my_op_lib = Library("my_ops", "DEF")
+
+# registering an operator that multiplies input tensor by 3 and returns it.
+my_op_lib.define("mul3(Tensor input) -> Tensor")  # should print 'mul3'
+
+
+@impl(my_op_lib, "mul3", dispatch_key="CompositeExplicitAutograd")
+def mul3_impl(a: torch.Tensor) -> torch.Tensor:
+    return a * 3
+
+
+# registering the out variant.
+my_op_lib.define(
+    "mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)"
+)  # should print 'mul3.out'
+
+
+@impl(my_op_lib, "mul3.out", dispatch_key="CompositeExplicitAutograd")
+def mul3_out_impl(a: torch.Tensor, *, out: torch.Tensor) -> torch.Tensor:
+    a.mul_(3)
+    out.copy_(a)
+    return out
+
+
+# example model
+class Model(torch.nn.Module):
+    def forward(self, a):
+        return torch.ops.my_ops.mul3.default(a)
+
+
+def main():
+    m = Model()
+    input = torch.randn(2, 3)
+    # capture and lower
+    export_to_ff("custom_ops_1", m, (input,))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/custom_ops/targets.bzl b/examples/custom_ops/targets.bzl
@@ -0,0 +1,51 @@
+load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
+load("@fbsource//xplat/executorch/codegen:codegen.bzl", "et_operator_library", "executorch_generated_lib")
+
+def define_common_targets():
+    """Defines targets that should be shared between fbcode and xplat.
+
+    The directory containing this targets.bzl file should also contain both
+    TARGETS and BUCK files that call this function.
+    """
+    runtime.export_file(
+        name = "custom_ops.yaml",
+        visibility = [
+            "//executorch/...",
+            "@EXECUTORCH_CLIENTS",
+        ],
+    )
+
+    et_operator_library(
+        name = "executorch_all_ops",
+        include_all_operators = True,
+        define_static_targets = True,
+        visibility = [
+            "//executorch/codegen/...",
+            "@EXECUTORCH_CLIENTS",
+        ],
+    )
+
+    runtime.cxx_library(
+        name = "custom_kernel_lib",
+        srcs = ["custom_ops_1.cpp"],
+        deps = [
+            "//executorch/runtime/kernel:kernel_includes",
+        ],
+        visibility = [
+            "//executorch/...",
+            "@EXECUTORCH_CLIENTS",
+        ],
+    )
+
+    executorch_generated_lib(
+        name = "generated_lib",
+        deps = [
+            ":executorch_all_ops",
+            ":custom_kernel_lib",
+        ],
+        custom_ops_yaml_target = ":custom_ops.yaml",
+        visibility = [
+            "//executorch/...",
+            "@EXECUTORCH_CLIENTS",
+        ],
+    )
diff --git a/examples/custom_ops/test_custom_ops.sh b/examples/custom_ops/test_custom_ops.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Test the end-to-end flow of using custom operator in a PyTorch model and use EXIR to capture and export a model file. Then use `executor_runner` demo C++ binary to run the model.
+
+test_custom_op_1() {
+  echo 'Exporting custom_ops_1.pte'
+  python3 -m examples.custom_ops.custom_ops_1
+  # should save file custom_ops_1.pte
+
+  echo 'Running executor_runner'
+  buck2 run //fbcode/executorch/examples/executor_runner:executor_runner -- --model_path=./custom_ops_1.pte
+  # should give correct result
+
+  echo 'Removing custom_ops_1.pte'
+  rm ./custom_ops_1.pte
+}
+
+test_custom_op_1
diff --git a/examples/executor_runner/targets.bzl b/examples/executor_runner/targets.bzl
@@ -17,6 +17,7 @@ def define_common_targets():
             "//executorch/extension/data_loader:file_data_loader",
             "//executorch/util:util",
             "//executorch/kernels/portable:generated_lib_all_ops",
+            "//executorch/examples/custom_ops:generated_lib",
         ],
         external_deps = [
             "gflags",