pytorch
diff --git a/‎.clang-tidy
Lines changed: 10 additions & 0 deletions b/‎.clang-tidy
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 115 additions & 0 deletions b/‎README.md
Lines changed: 115 additions & 0 deletions
diff --git a/‎backends/backend.cpp
Lines changed: 52 additions & 0 deletions b/‎backends/backend.cpp
Lines changed: 52 additions & 0 deletions
diff --git a/‎backends/backend.h
Lines changed: 159 additions & 0 deletions b/‎backends/backend.h
Lines changed: 159 additions & 0 deletions
@@ -0,0 +1,10 @@
+---
+# NOTE there must be no spaces before the '-', so put the comma last.
+InheritParentConfig: true
+Checks: '
+-facebook-hte-BadMemberName,
+-facebook-hte-NullableReturn,
+'
+AnalyzeTemporaryDtors: false
+CheckOptions:
+...
@@ -0,0 +1,115 @@
+# executorch
+A unified ML software stack within the PyTorch platform for edge devices. It defines new compiler entry points as well as a state-of-art runtime.
+
+https://fburl.com/executorch
+
+## Why Executorch?
+Compared to the legacy Lite Interpreter, there are some major benefits:
+* Performance wins compared to Lite Interpreter
+  * Faster (orders of magnitude lower framework tax in both [DSP] (https://fb.workplace.com/notes/156263446923296) and [CPU](https://fb.workplace.com/notes/821255839309664))
+  * Much smaller binary size, [~1.5 MB vs. ~30 KB without operators](https://docs.google.com/document/d/11_QzIO1TEaRtLIcX4ubVzx-sUl2RPUm9Iwbn2Kt1ce4/edit#heading=h.7xrtrf77n4w5)
+  * Smaller memory footprint because we do ahead of time memory planning in ExecuTorch and also have clear granular control over where the runtime allocations are done.
+* Long term alignment with the direction of PyTorch infrastructure
+  * Lite Interpreter relies on TorchScript, which is being phased out; ExecuTorch is the planned replacement for Lite Interpreter.
+* Model Authoring & Productivity gains
+  * More and better defined entry points to perform model, device, and/or use-case specific optimizations (e.g. better backend delegation, user-defined compiler transformations, default or user-defined memory planning, etc)
+  * Ability to lower constructs like dynamic control flow to run on device.
+
+## Meta Internal Users
+See the [Using PyTorch > Executorch](https://www.internalfb.com/intern/wiki/PyTorch/Using_PyTorch/Executorch/)
+wiki for pointers to internal workplace groups, how-tos, and other resources.
+
+## Docs
+* [Executorch stack diagram](https://docs.google.com/drawings/d/1bBIbG6YDIjdx8emS_6K23YM6WyRkKVpPt-26nznxroU/edit)
+* [High-level design doc](https://docs.google.com/document/d/1Z12w6-KtwoFDh781LQAbfwUdEZw9cwTCmz5BmKuS6U8/edit#)
+* Planning docs
+  * H22022 [roadmap](https://fburl.com/executorch-plan)
+  * H12022 [roadmap](https://fburl.com/executorch-plan-h12022), [summary runtime](https://fb.workplace.com/notes/1023022781732209), [summary EXIR](https://fb.workplace.com/notes/1094704288071438)
+
+## Better Engineering
+* [Coding guidelines](https://docs.google.com/document/d/1RERjvvUSNNQ_gysD-kkHhvbWyfCAaXk7pes9ZdZ1kqM/edit)
+* [BE Tasks](https://www.internalfb.com/intern/taskgraph/?q=5567456559966061) -- Please add "[executorch][BE]" in the task title
+
+## Model Migration
+* [Model inventory onboarding guide](https://docs.google.com/document/d/1ofoKUvufDFZdZdEYQ1jgTCsuNbSLspDsahgiHhWncCY/edit)
+* [End-to-end model testing](https://docs.google.com/document/d/1AeLlSgwhe9Gnj-44kIYv9iyLWdpb6ASF_epr0q4ey5s/edit)
+## Design goals
+* Minimal binary size (< 50KB not including kernels)
+* Minimal framework tax: loading program, initializing executor, kernel and
+  backend-delegate dispatch, runtime memory utilization
+* Portable (cross-compile across many toolchains)
+* Executes ATen kernels (or ATen custom kernels)
+* Executes custom op kernels
+* Supports inter op asynchronous execution
+* Supports static memory allocation (heapless)
+* Supports custom allocation across memory hierarchies
+* Supports control flow needed by models
+* Allows selective build of kernels
+* Allows backend delegation with lightweight interface
+
+## Terminology
+### ATen mode
+ATen mode uses the ATen (pytorch core) implementation of Tensor (`at::Tensor`)
+along with related types (ScalarType, etc.)
+* `at::Tensor` is big and complex, and often allocates memory with new/malloc
+* The ATen kernels, which rely on the full `at::Tensor` API, are usable in this
+  configuration
+* Those kernels also tend to do dynamic memory allocation, and often have extra
+  flexibility (and thus overhead) to handle things not needed by mobile/embedded
+  clients: e.g., CUDA support, sparse tensor support, dtype promotion
+### Lean mode
+Lean mode uses Executorch's smaller `torch::executor::Tensor` (aka ETensor)
+implementation, along with related types (`torch::executor::ScalarType`, etc.)
+* ETensor's API is a source-compatible subset of `at::Tensor`. Code that is
+  written against ETensor can also build against `at::Tensor`.
+* "lean mode kernels" are any operator implementations that are written to be
+  compatible with ETensor. But that means that the can also build against
+  `at::Tensor` if desired, and used in the same model as ATen kernels.
+* ETensor does not own or allocate memory on its own
+  * (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this
+    warning when they are.) To support dynamic shapes, kernels can allocate
+    Tensor data using the MemoryAllocator provided by the client.
+### Portable kernels
+See [//executorch/kernels/portable/README.md](portable/README.md) for technical details.
+
+Portable kernels, which live under `//executorch/kernels/portable`, are:
+* Lean mode kernels
+* Compatible with ATen operator signatures
+* Written in portable C++ so that they can build for any target
+* Written as reference implementations, prioritizing clarity and simplicity
+  over optimization
+* Generally much smaller in code size than ATen kernels
+* Written to avoid dynamically allocating memory using new/malloc
+  * (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this
+    warning when they are.) To support dynamic shapes, some kernels may allocate
+    Tensor data using the MemoryAllocator provided by the client.
+
+## Local tests
+### General tests
+```
+buck2 test fbcode//executorch/...
+```
+### Run a model in lean mode
+* Uses the lean Executorch `Tensor` class and related types
+* Uses the kernels under `//executorch/kernels/portable` instead of the ATen kernels
+```
+buck2 run fbcode//executorch/test:executor_runner -- --model_path=fbcode/executorch/test/models/linear_out.ff
+```
+### Run a model in ATen mode
+* Instead of the lean Executorch `Tensor`, using ATen tensor so that all ATen kernels can be leveraged
+* Note there can be significant size regression in ATen mode
+```
+buck2 run fbcode//executorch/test:executor_runner_aten -- --model_path=fbcode/executorch/test/models/linear_out.ff
+```
+
+## Special build modes
+### Android/mobile builds
+In xplat:
+```
+buck2 build @fbandroid/mode/opt @fbandroid/mode/ndk_libcxx -c user.ndk_cxxflags="-frtti -fexceptions" fbsource//xplat/executorch/test:executor_runner
+```
+## ARVR builds
+In xplat:
+```
+buck2 build @arvr/mode/android/linux/opt-stripped -c ndk.custom_libcxx=false fbsource//xplat/executorch/test:executor_runner
+```
@@ -0,0 +1,52 @@
+#include <executorch/backends/backend.h>
+#include <executorch/core/Assert.h>
+
+namespace torch {
+namespace executor {
+
+PyTorchBackendInterface::~PyTorchBackendInterface(){};
+
+// Task t128866626: Remove global static variables.
+// We want to be able to run multiple Executor instances
+// and having a global registration isn't a viable solution
+// in the long term.
+BackendRegistry& getBackendRegistry();
+BackendRegistry& getBackendRegistry() {
+  static BackendRegistry backend_reg;
+  return backend_reg;
+}
+
+PyTorchBackendInterface* get_backend_class(const char* name) {
+  return getBackendRegistry().get_backend_class(name);
+}
+
+PyTorchBackendInterface* BackendRegistry::get_backend_class(const char* name) {
+  for (size_t idx = 0; idx < registrationTableSize_; idx++) {
+    Backend backend = backend_table_[idx];
+    if (strcmp(backend.name_, name) == 0) {
+      return backend.interface_ptr_;
+    }
+  }
+  return nullptr;
+}
+
+Error register_backend(const Backend& backend) {
+  return getBackendRegistry().register_backend(backend);
+}
+
+Error BackendRegistry::register_backend(const Backend& backend) {
+  if (registrationTableSize_ >= kRegistrationTableMaxSize) {
+    return Error::Internal;
+  }
+
+  // Check if the name already exists in the table
+  if (this->get_backend_class(backend.name_) != nullptr) {
+    return Error::InvalidArgument;
+  }
+
+  backend_table_[registrationTableSize_++] = backend;
+  return Error::Ok;
+}
+
+} // namespace executor
+} // namespace torch
@@ -0,0 +1,159 @@
+#pragma once
+
+#include <cstring>
+
+#include <executorch/compiler/Compiler.h>
+#include <executorch/core/ArrayRef.h>
+#include <executorch/core/Error.h>
+#include <executorch/core/FreeableBuffer.h>
+#include <executorch/core/Result.h>
+#include <executorch/core/values/Evalue.h>
+#include <executorch/executor/MemoryAllocator.h>
+
+namespace torch {
+namespace executor {
+
+struct SizedBuffer {
+  void* buffer;
+  size_t nbytes; // number of bytes of buffer
+};
+
+struct CompileSpec {
+  const char* key; // spec key
+  SizedBuffer value; // spec value
+};
+
+/**
+ * An opaque handle managed by a backend. Typically points to a backend-private
+ * class/struct.
+ */
+using DelegateHandle = void;
+
+class PyTorchBackendInterface {
+ public:
+  virtual ~PyTorchBackendInterface() = 0;
+
+  /**
+   * Returns true if the backend is available to process delegation calls.
+   */
+  virtual bool is_available() const = 0;
+
+  /**
+   * Responsible to further process (compile/transform/optimize) the compiled
+   * unit that was produced, ahead-of-time, as well as perform any backend
+   * initialization to ready it for execution. This method is called every time
+   * the PyTorch program is initialized. Consequently, this is the place to
+   * perform any backend initialization as well as transformations,
+   * optimizations, and even compilation that depend on the target device. As
+   * such, it is strongly encouraged to push as much processing as possible to
+   * the ahead-of-time processing.
+   *
+   * @param[in] processed An opaque (to PyTorch) compiled unit from the
+   *     preprocessor. Can contain anything the backend needs to execute the
+   *     equivalent semantics of the passed-in Module and its method. Often
+   *     passed unmodified to `execute()` as a `DelegateHandle`, unless it needs
+   *     further processing at init time to be fully executable. If the data is
+   *     not needed after init(), calling processed->Free() can reclaim its
+   *     memory.
+   * @param[in] compile_specs The exact same compiler specification that
+   *     was used ahead-of-time to produce `processed`.
+   * @param[in] memory_allocator The allocator to allocate from if necessary.
+   *
+   * @returns On success, an opaque handle representing the the method
+   *     implemented by the delegate. This handle is passed to `execute()` and
+   *     `destroy()`, and the memory it points to is owned by the backend.
+   *     Typically points to a backend-private class/struct.
+   * @returns On error, a value other than Error:Ok.
+   */
+  __ET_NODISCARD virtual Result<DelegateHandle*> init(
+      FreeableBuffer* processed,
+      ArrayRef<CompileSpec> compile_specs,
+      MemoryAllocator* memory_allocator) const = 0;
+
+  /**
+   * Responsible for executing the given method’s handle, as it was produced
+   * by compile.
+   *
+   * @param[in] handle An opaque handle returned by `init()`. Usually a backend
+   *     executable unit. This executable unit should be ready to execute the
+   *     delegate blobs.
+   * @param[in] args The method’s inputs and outputs.
+   * @retval Error::Ok if successful.
+   */
+  __ET_NODISCARD virtual Error execute(DelegateHandle* handle, EValue** args)
+      const = 0;
+
+  /**
+   * Responsible for destroying a handle, if it's required for some backend.
+   * It may be needed for some backends. For example, resources associated with
+   * this handle needs to be released. This method is called when the execution
+   * plan is destroyed (i.e., the program is out of its lifespan).
+   *
+   * @param[in] handle The handle to be destroyed. An opaque handle returned by
+   *     `init()`.
+   */
+  virtual void destroy(__ET_UNUSED DelegateHandle* handle) const {}
+};
+
+struct Backend {
+  const char* name_;
+  PyTorchBackendInterface* interface_ptr_;
+};
+
+// The max number of backends that can be registered in
+// an app. It's hard coded to 16 because it's not estimated
+// to have more than 16 backends in a system. Each table
+// element has two pointers, represented by Backend struct.
+// The memory overhead for this table is minimum (only a few bytes).
+constexpr size_t kRegistrationTableMaxSize = 16;
+
+class BackendRegistry {
+ public:
+  BackendRegistry() : registrationTableSize_(0) {}
+
+  /**
+   * Registers the Backend object (i.e. string name and PyTorchBackendInterface
+   * pair) so that it could be called via the name during the runtime.
+   * @param[in] name Name of the user-defined backend delegate.
+   * @retval Error code representing whether registration was succesful.
+   */
+  __ET_NODISCARD Error register_backend(const Backend& backend);
+
+  /**
+   * Returns the corresponding object pointer for a given string name.
+   * The mapping is populated using register_backend method.
+   *
+   * @param[in] name Name of the user-defined backend delegate.
+   * @retval Pointer to the appropriate object that implements
+   *         PyTorchBackendInterface. Nullptr if it can't find anything
+   *         with the given name.
+   */
+  PyTorchBackendInterface* get_backend_class(const char* name);
+
+ private:
+  Backend backend_table_[kRegistrationTableMaxSize];
+  size_t registrationTableSize_;
+};
+
+/**
+ * Returns the corresponding object pointer for a given string name.
+ * The mapping is populated using register_backend method.
+ *
+ * @param[in] name Name of the user-defined backend delegate.
+ * @retval Pointer to the appropriate object that implements
+ *         PyTorchBackendInterface. Nullptr if it can't find anything
+ *         with the given name.
+ */
+PyTorchBackendInterface* get_backend_class(const char* name);
+
+/**
+ * Registers the Backend object (i.e. string name and PyTorchBackendInterface
+ * pair) so that it could be called via the name during the runtime.
+ *
+ * @param[in] backend Backend object
+ * @retval Error code representing whether registration was succesful.
+ */
+__ET_NODISCARD Error register_backend(const Backend& backend);
+
+} // namespace executor
+} // namespace torch