Skip to content

Commit 796148b

Browse files
committed
[executorch][docs] runtime-overview.md
Pull Request resolved: #571 First draft of a high-level document describing the runtime. ghstack-source-id: 202664022 @exported-using-ghexport Differential Revision: [D49852127](https://our.internmc.facebook.com/intern/diff/D49852127/)
1 parent 22354c8 commit 796148b

File tree

2 files changed

+165
-1
lines changed

2 files changed

+165
-1
lines changed
Loading

docs/source/runtime-overview.md

Lines changed: 165 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,167 @@
11
# Runtime Overview
22

3-
TBA
3+
This document discusses the design of the ExecuTorch runtime, which executes
4+
ExecuTorch program files on edge devices like smartphones, wearables, and
5+
embedded devices. The code for the main execution API is under
6+
`[executorch/runtime/executor/](https://github.com/pytorch/executorch/tree/main/runtime/executor)`.
7+
8+
Before reading this document we recommend that you read [How Does ExecuTorch
9+
Work](intro-how-it-works.md).
10+
11+
At the highest level, the ExecuTorch runtime is responsible for:
12+
13+
* Loading binary `.pte` program files that were generated by the
14+
`to_executorch()` step of the model-lowering process.
15+
* Executing the series of instructions that implement a lowered model.
16+
17+
This diagram shows the high-level flow of and components involved with exporting
18+
and executing an ExecuTorch program:
19+
20+
![High-level diagram of the ExecuTorch
21+
Runtime](/_static/img/runtime-overview-high-level.png)
22+
23+
The runtime is also responsible for:
24+
25+
* Managing the memory used during load and execution, potentially across
26+
multiple memory banks like SRAM and DRAM.
27+
* Mapping symbolic operator names like `"aten::add.out"` to concrete C++
28+
functions or [_kernels_](kernel-library-overview.md) that implement the
29+
semantics of those operators.
30+
* Dispatching predetermined sections of the model to [backend
31+
delegates](compiler-delegate-and-partitioner.md) for acceleration.
32+
* Optionally gathering [profiling data](sdk-profiling.md) during load and
33+
execution.
34+
35+
## Design Goals
36+
37+
The ExecuTorch runtime was designed to run on a wide variety of edge devices,
38+
from modern smartphone CPUs to resource-constrained microcontrollers and DSPs.
39+
It has first-class support for
40+
[delegating](compiler-delegate-and-partitioner.md) execution to one or more
41+
backends to take advantage of architecture-specific optimizations and modern
42+
heterogeneous architectures. It is small and portable enough to run directly in
43+
bare-metal embedded environments with no operating systems, dynamic memory, or
44+
threads.
45+
46+
### Low Execution Overhead
47+
48+
#### Memory
49+
50+
* The core runtime library is less than 50kB when built without kernels or
51+
backends.
52+
* Constant tensors point directly into the `.pte` file data, avoiding copies of
53+
that data. The alignment of these data chunks can be adjusted at `.pte`
54+
creation time.
55+
* Backend delegates can choose to unload their precompiled data after model
56+
initialization, reducing peak memory usage.
57+
* Mutable tensor memory layout is planned ahead of time and packed into a small
58+
set of user-allocated buffers, providing fine-grained control over memory
59+
location. This is especially useful on systems with heterogeneous memory
60+
hierarchies, allowing placement onto (e.g.) SRAM or DRAM close to the core
61+
that will operate on the data.
62+
63+
#### CPU
64+
65+
* Model execution is a simple loop over an array of instructions, most of which
66+
are function pointers to kernels and backend delegates. This keeps the
67+
execution overhead small, on the order of microseconds to nanoseconds per
68+
operation.
69+
* The implementation of an operation (like "add" or "conv3d") can be fully
70+
customized for a particular target system without needing to modify the
71+
original model or generated `.pte` file.
72+
73+
### Familiar PyTorch Semantics
74+
75+
ExecuTorch is a first-class component of the PyTorch stack, and reuses APIs and
76+
semantics whenever possible.
77+
78+
* The C++ types used by ExecuTorch are source-compatible with the corresponding
79+
types from core PyTorch's `c10::` and `at::` libraries, and ExecuTorch
80+
provides
81+
[`aten_bridge`](https://github.com/pytorch/executorch/blob/main/extension/aten_util/aten_bridge.h)
82+
to convert between the two. This can be helpful for projects that already use
83+
PyTorch C++ types.
84+
* The behavior of operators like "aten::add" and "aten::sigmoid" are identical
85+
between ExecuTorch and core PyTorch. ExecuTorch provides a testing framework
86+
to ensure this, and to help test future implementations of these operators.
87+
88+
### Portable Code and Architecture
89+
90+
The ExecuTorch runtime is implemented with portability in mind, so that users
91+
can build it for a wide variety of target systems.
92+
93+
#### C++ Language Considerations
94+
95+
* The code is C++11-compatible to work with older toolchains.
96+
* The runtime does not use exceptions or RTTI, although it is not antagonistic
97+
to them.
98+
* The code is compatible with gcc and clang, and has also been built with
99+
several proprietary embedded toolchains.
100+
* The repo provides both CMake and buck2 build systems to make integration
101+
easier.
102+
103+
#### Operating System Considerations
104+
105+
The runtime makes no direct system calls. All access to memory, files, logging,
106+
and clocks are abstracted through the [_Runtime Platform Abstraction Layer
107+
(PAL)_](runtime-platform-abstraction-layer.md) and injected interfaces like
108+
`DataLoader` and `MemoryAllocator`. [TODO: link these types to their generated
109+
docs]
110+
111+
Applications can control all memory allocation through the `MemoryManager`,
112+
`MemoryAllocator`, `HierarchicalAllocator`, and `DataLoader` classes. The core
113+
runtime makes no direct calls to `malloc()` or `new`, or to types like
114+
`std::vector` that allocate under the hood. This makes it possible to:
115+
116+
* run in environments without a heap, but still use the heap if desired.
117+
* avoid synchronization on the heap during model load and execution.
118+
* control which memory region to use for different types of data. For example,
119+
one set of mutable tensors could live in SRAM while another set lived in DRAM.
120+
* easily monitor how much memory the runtime uses.
121+
122+
However, please note that specific kernel or backend implementations may use
123+
arbitrary runtime or operating system features. Users should double-check the
124+
docs for the kernel and backend libraries that they use.
125+
126+
#### Threading Considerations
127+
128+
The core runtime does no threading or locking, and does not use thread local
129+
variables. But, it plays well with higher-level synchronization.
130+
131+
* Each `Program` instance is immutable and therefore _[fully
132+
thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#thread-safe)_.
133+
Multiple threads may concurrently access a single `Program` instance.
134+
* Each `Method` instance is mutable but self-contained, and therefore
135+
_[conditionally
136+
thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#conditionally-thread-safe)_.
137+
Multiple threads can concurrently access and execute independent `Method`
138+
instances, but access and execution of a single instance must be serialized.
139+
140+
However, please note:
141+
142+
* There are two global tables that may be read during `Program::load_method()`:
143+
the kernel registration table and the backend registration table.
144+
* In practice, these tables are only modified at process/system load time,
145+
and are effectively frozen before the first `Program` is loaded. But some
146+
applications may need to be aware of these tables, especially if they
147+
manually mutate them after process/system load time.
148+
* Specific kernel or backend implementations may have their own threading
149+
restrictions. Users should double-check the docs for the kernel and backend
150+
libraries that they use.
151+
152+
## Further Reading
153+
154+
For more details about the ExecuTorch runtime, please see:
155+
156+
* The
157+
[`executor_runner`](https://github.com/pytorch/executorch/blob/main/examples/executor_runner/executor_runner.cpp)
158+
example tool
159+
* [Runtime API](runtime-api.md)
160+
* [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
161+
* [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
162+
* [Custom Memory Allocation](runtime-custom-memory-allocator.md)
163+
* [Runtime Error Handling](runtime-error-handling.md)
164+
* [Runtime Profiling](sdk-profiling.md)
165+
* [Backends and Delegates](compiler-delegate-and-partitioner.md)
166+
* [Backend Delegate Implementation](runtime-backend-delegate-implementation-and-linking.md)
167+
* [Kernel Library Overview(kernel-library-overview.md)

0 commit comments

Comments
 (0)