Skip to content

Commit 2fc4e7d

Browse files
Initial commit
fbshipit-source-id: 73359312fe09bec9834c12f3564d3821cd93050f
0 parents  commit 2fc4e7d

File tree

875 files changed

+159035
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

875 files changed

+159035
-0
lines changed

.clang-tidy

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
# NOTE there must be no spaces before the '-', so put the comma last.
3+
InheritParentConfig: true
4+
Checks: '
5+
-facebook-hte-BadMemberName,
6+
-facebook-hte-NullableReturn,
7+
'
8+
AnalyzeTemporaryDtors: false
9+
CheckOptions:
10+
...

README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# executorch
2+
A unified ML software stack within the PyTorch platform for edge devices. It defines new compiler entry points as well as a state-of-art runtime.
3+
4+
https://fburl.com/executorch
5+
6+
## Why Executorch?
7+
Compared to the legacy Lite Interpreter, there are some major benefits:
8+
* Performance wins compared to Lite Interpreter
9+
* Faster (orders of magnitude lower framework tax in both [DSP] (https://fb.workplace.com/notes/156263446923296) and [CPU](https://fb.workplace.com/notes/821255839309664))
10+
* Much smaller binary size, [~1.5 MB vs. ~30 KB without operators](https://docs.google.com/document/d/11_QzIO1TEaRtLIcX4ubVzx-sUl2RPUm9Iwbn2Kt1ce4/edit#heading=h.7xrtrf77n4w5)
11+
* Smaller memory footprint because we do ahead of time memory planning in ExecuTorch and also have clear granular control over where the runtime allocations are done.
12+
* Long term alignment with the direction of PyTorch infrastructure
13+
* Lite Interpreter relies on TorchScript, which is being phased out; ExecuTorch is the planned replacement for Lite Interpreter.
14+
* Model Authoring & Productivity gains
15+
* More and better defined entry points to perform model, device, and/or use-case specific optimizations (e.g. better backend delegation, user-defined compiler transformations, default or user-defined memory planning, etc)
16+
* Ability to lower constructs like dynamic control flow to run on device.
17+
18+
## Meta Internal Users
19+
See the [Using PyTorch > Executorch](https://www.internalfb.com/intern/wiki/PyTorch/Using_PyTorch/Executorch/)
20+
wiki for pointers to internal workplace groups, how-tos, and other resources.
21+
22+
## Docs
23+
* [Executorch stack diagram](https://docs.google.com/drawings/d/1bBIbG6YDIjdx8emS_6K23YM6WyRkKVpPt-26nznxroU/edit)
24+
* [High-level design doc](https://docs.google.com/document/d/1Z12w6-KtwoFDh781LQAbfwUdEZw9cwTCmz5BmKuS6U8/edit#)
25+
* Planning docs
26+
* H22022 [roadmap](https://fburl.com/executorch-plan)
27+
* H12022 [roadmap](https://fburl.com/executorch-plan-h12022), [summary runtime](https://fb.workplace.com/notes/1023022781732209), [summary EXIR](https://fb.workplace.com/notes/1094704288071438)
28+
29+
## Better Engineering
30+
* [Coding guidelines](https://docs.google.com/document/d/1RERjvvUSNNQ_gysD-kkHhvbWyfCAaXk7pes9ZdZ1kqM/edit)
31+
* [BE Tasks](https://www.internalfb.com/intern/taskgraph/?q=5567456559966061) -- Please add "[executorch][BE]" in the task title
32+
33+
## Model Migration
34+
* [Model inventory onboarding guide](https://docs.google.com/document/d/1ofoKUvufDFZdZdEYQ1jgTCsuNbSLspDsahgiHhWncCY/edit)
35+
* [End-to-end model testing](https://docs.google.com/document/d/1AeLlSgwhe9Gnj-44kIYv9iyLWdpb6ASF_epr0q4ey5s/edit)
36+
## Design goals
37+
* Minimal binary size (< 50KB not including kernels)
38+
* Minimal framework tax: loading program, initializing executor, kernel and
39+
backend-delegate dispatch, runtime memory utilization
40+
* Portable (cross-compile across many toolchains)
41+
* Executes ATen kernels (or ATen custom kernels)
42+
* Executes custom op kernels
43+
* Supports inter op asynchronous execution
44+
* Supports static memory allocation (heapless)
45+
* Supports custom allocation across memory hierarchies
46+
* Supports control flow needed by models
47+
* Allows selective build of kernels
48+
* Allows backend delegation with lightweight interface
49+
50+
## Terminology
51+
### ATen mode
52+
ATen mode uses the ATen (pytorch core) implementation of Tensor (`at::Tensor`)
53+
along with related types (ScalarType, etc.)
54+
* `at::Tensor` is big and complex, and often allocates memory with new/malloc
55+
* The ATen kernels, which rely on the full `at::Tensor` API, are usable in this
56+
configuration
57+
* Those kernels also tend to do dynamic memory allocation, and often have extra
58+
flexibility (and thus overhead) to handle things not needed by mobile/embedded
59+
clients: e.g., CUDA support, sparse tensor support, dtype promotion
60+
### Lean mode
61+
Lean mode uses Executorch's smaller `torch::executor::Tensor` (aka ETensor)
62+
implementation, along with related types (`torch::executor::ScalarType`, etc.)
63+
* ETensor's API is a source-compatible subset of `at::Tensor`. Code that is
64+
written against ETensor can also build against `at::Tensor`.
65+
* "lean mode kernels" are any operator implementations that are written to be
66+
compatible with ETensor. But that means that the can also build against
67+
`at::Tensor` if desired, and used in the same model as ATen kernels.
68+
* ETensor does not own or allocate memory on its own
69+
* (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this
70+
warning when they are.) To support dynamic shapes, kernels can allocate
71+
Tensor data using the MemoryAllocator provided by the client.
72+
### Portable kernels
73+
See [//executorch/kernels/portable/README.md](portable/README.md) for technical details.
74+
75+
Portable kernels, which live under `//executorch/kernels/portable`, are:
76+
* Lean mode kernels
77+
* Compatible with ATen operator signatures
78+
* Written in portable C++ so that they can build for any target
79+
* Written as reference implementations, prioritizing clarity and simplicity
80+
over optimization
81+
* Generally much smaller in code size than ATen kernels
82+
* Written to avoid dynamically allocating memory using new/malloc
83+
* (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this
84+
warning when they are.) To support dynamic shapes, some kernels may allocate
85+
Tensor data using the MemoryAllocator provided by the client.
86+
87+
## Local tests
88+
### General tests
89+
```
90+
buck2 test fbcode//executorch/...
91+
```
92+
### Run a model in lean mode
93+
* Uses the lean Executorch `Tensor` class and related types
94+
* Uses the kernels under `//executorch/kernels/portable` instead of the ATen kernels
95+
```
96+
buck2 run fbcode//executorch/test:executor_runner -- --model_path=fbcode/executorch/test/models/linear_out.ff
97+
```
98+
### Run a model in ATen mode
99+
* Instead of the lean Executorch `Tensor`, using ATen tensor so that all ATen kernels can be leveraged
100+
* Note there can be significant size regression in ATen mode
101+
```
102+
buck2 run fbcode//executorch/test:executor_runner_aten -- --model_path=fbcode/executorch/test/models/linear_out.ff
103+
```
104+
105+
## Special build modes
106+
### Android/mobile builds
107+
In xplat:
108+
```
109+
buck2 build @fbandroid/mode/opt @fbandroid/mode/ndk_libcxx -c user.ndk_cxxflags="-frtti -fexceptions" fbsource//xplat/executorch/test:executor_runner
110+
```
111+
## ARVR builds
112+
In xplat:
113+
```
114+
buck2 build @arvr/mode/android/linux/opt-stripped -c ndk.custom_libcxx=false fbsource//xplat/executorch/test:executor_runner
115+
```

backends/backend.cpp

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
#include <executorch/backends/backend.h>
2+
#include <executorch/core/Assert.h>
3+
4+
namespace torch {
5+
namespace executor {
6+
7+
PyTorchBackendInterface::~PyTorchBackendInterface(){};
8+
9+
// Task t128866626: Remove global static variables.
10+
// We want to be able to run multiple Executor instances
11+
// and having a global registration isn't a viable solution
12+
// in the long term.
13+
BackendRegistry& getBackendRegistry();
14+
BackendRegistry& getBackendRegistry() {
15+
static BackendRegistry backend_reg;
16+
return backend_reg;
17+
}
18+
19+
PyTorchBackendInterface* get_backend_class(const char* name) {
20+
return getBackendRegistry().get_backend_class(name);
21+
}
22+
23+
PyTorchBackendInterface* BackendRegistry::get_backend_class(const char* name) {
24+
for (size_t idx = 0; idx < registrationTableSize_; idx++) {
25+
Backend backend = backend_table_[idx];
26+
if (strcmp(backend.name_, name) == 0) {
27+
return backend.interface_ptr_;
28+
}
29+
}
30+
return nullptr;
31+
}
32+
33+
Error register_backend(const Backend& backend) {
34+
return getBackendRegistry().register_backend(backend);
35+
}
36+
37+
Error BackendRegistry::register_backend(const Backend& backend) {
38+
if (registrationTableSize_ >= kRegistrationTableMaxSize) {
39+
return Error::Internal;
40+
}
41+
42+
// Check if the name already exists in the table
43+
if (this->get_backend_class(backend.name_) != nullptr) {
44+
return Error::InvalidArgument;
45+
}
46+
47+
backend_table_[registrationTableSize_++] = backend;
48+
return Error::Ok;
49+
}
50+
51+
} // namespace executor
52+
} // namespace torch

backends/backend.h

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
#pragma once
2+
3+
#include <cstring>
4+
5+
#include <executorch/compiler/Compiler.h>
6+
#include <executorch/core/ArrayRef.h>
7+
#include <executorch/core/Error.h>
8+
#include <executorch/core/FreeableBuffer.h>
9+
#include <executorch/core/Result.h>
10+
#include <executorch/core/values/Evalue.h>
11+
#include <executorch/executor/MemoryAllocator.h>
12+
13+
namespace torch {
14+
namespace executor {
15+
16+
struct SizedBuffer {
17+
void* buffer;
18+
size_t nbytes; // number of bytes of buffer
19+
};
20+
21+
struct CompileSpec {
22+
const char* key; // spec key
23+
SizedBuffer value; // spec value
24+
};
25+
26+
/**
27+
* An opaque handle managed by a backend. Typically points to a backend-private
28+
* class/struct.
29+
*/
30+
using DelegateHandle = void;
31+
32+
class PyTorchBackendInterface {
33+
public:
34+
virtual ~PyTorchBackendInterface() = 0;
35+
36+
/**
37+
* Returns true if the backend is available to process delegation calls.
38+
*/
39+
virtual bool is_available() const = 0;
40+
41+
/**
42+
* Responsible to further process (compile/transform/optimize) the compiled
43+
* unit that was produced, ahead-of-time, as well as perform any backend
44+
* initialization to ready it for execution. This method is called every time
45+
* the PyTorch program is initialized. Consequently, this is the place to
46+
* perform any backend initialization as well as transformations,
47+
* optimizations, and even compilation that depend on the target device. As
48+
* such, it is strongly encouraged to push as much processing as possible to
49+
* the ahead-of-time processing.
50+
*
51+
* @param[in] processed An opaque (to PyTorch) compiled unit from the
52+
* preprocessor. Can contain anything the backend needs to execute the
53+
* equivalent semantics of the passed-in Module and its method. Often
54+
* passed unmodified to `execute()` as a `DelegateHandle`, unless it needs
55+
* further processing at init time to be fully executable. If the data is
56+
* not needed after init(), calling processed->Free() can reclaim its
57+
* memory.
58+
* @param[in] compile_specs The exact same compiler specification that
59+
* was used ahead-of-time to produce `processed`.
60+
* @param[in] memory_allocator The allocator to allocate from if necessary.
61+
*
62+
* @returns On success, an opaque handle representing the the method
63+
* implemented by the delegate. This handle is passed to `execute()` and
64+
* `destroy()`, and the memory it points to is owned by the backend.
65+
* Typically points to a backend-private class/struct.
66+
* @returns On error, a value other than Error:Ok.
67+
*/
68+
__ET_NODISCARD virtual Result<DelegateHandle*> init(
69+
FreeableBuffer* processed,
70+
ArrayRef<CompileSpec> compile_specs,
71+
MemoryAllocator* memory_allocator) const = 0;
72+
73+
/**
74+
* Responsible for executing the given method’s handle, as it was produced
75+
* by compile.
76+
*
77+
* @param[in] handle An opaque handle returned by `init()`. Usually a backend
78+
* executable unit. This executable unit should be ready to execute the
79+
* delegate blobs.
80+
* @param[in] args The method’s inputs and outputs.
81+
* @retval Error::Ok if successful.
82+
*/
83+
__ET_NODISCARD virtual Error execute(DelegateHandle* handle, EValue** args)
84+
const = 0;
85+
86+
/**
87+
* Responsible for destroying a handle, if it's required for some backend.
88+
* It may be needed for some backends. For example, resources associated with
89+
* this handle needs to be released. This method is called when the execution
90+
* plan is destroyed (i.e., the program is out of its lifespan).
91+
*
92+
* @param[in] handle The handle to be destroyed. An opaque handle returned by
93+
* `init()`.
94+
*/
95+
virtual void destroy(__ET_UNUSED DelegateHandle* handle) const {}
96+
};
97+
98+
struct Backend {
99+
const char* name_;
100+
PyTorchBackendInterface* interface_ptr_;
101+
};
102+
103+
// The max number of backends that can be registered in
104+
// an app. It's hard coded to 16 because it's not estimated
105+
// to have more than 16 backends in a system. Each table
106+
// element has two pointers, represented by Backend struct.
107+
// The memory overhead for this table is minimum (only a few bytes).
108+
constexpr size_t kRegistrationTableMaxSize = 16;
109+
110+
class BackendRegistry {
111+
public:
112+
BackendRegistry() : registrationTableSize_(0) {}
113+
114+
/**
115+
* Registers the Backend object (i.e. string name and PyTorchBackendInterface
116+
* pair) so that it could be called via the name during the runtime.
117+
* @param[in] name Name of the user-defined backend delegate.
118+
* @retval Error code representing whether registration was succesful.
119+
*/
120+
__ET_NODISCARD Error register_backend(const Backend& backend);
121+
122+
/**
123+
* Returns the corresponding object pointer for a given string name.
124+
* The mapping is populated using register_backend method.
125+
*
126+
* @param[in] name Name of the user-defined backend delegate.
127+
* @retval Pointer to the appropriate object that implements
128+
* PyTorchBackendInterface. Nullptr if it can't find anything
129+
* with the given name.
130+
*/
131+
PyTorchBackendInterface* get_backend_class(const char* name);
132+
133+
private:
134+
Backend backend_table_[kRegistrationTableMaxSize];
135+
size_t registrationTableSize_;
136+
};
137+
138+
/**
139+
* Returns the corresponding object pointer for a given string name.
140+
* The mapping is populated using register_backend method.
141+
*
142+
* @param[in] name Name of the user-defined backend delegate.
143+
* @retval Pointer to the appropriate object that implements
144+
* PyTorchBackendInterface. Nullptr if it can't find anything
145+
* with the given name.
146+
*/
147+
PyTorchBackendInterface* get_backend_class(const char* name);
148+
149+
/**
150+
* Registers the Backend object (i.e. string name and PyTorchBackendInterface
151+
* pair) so that it could be called via the name during the runtime.
152+
*
153+
* @param[in] backend Backend object
154+
* @retval Error code representing whether registration was succesful.
155+
*/
156+
__ET_NODISCARD Error register_backend(const Backend& backend);
157+
158+
} // namespace executor
159+
} // namespace torch

0 commit comments

Comments
 (0)