pytorch
diff --git a/‎docs/source/_static/img/runtime-overview-high-level.png
106 KB b/‎docs/source/_static/img/runtime-overview-high-level.png
106 KB
diff --git a/‎docs/source/runtime-overview.md
Lines changed: 165 additions & 1 deletion b/‎docs/source/runtime-overview.md
Lines changed: 165 additions & 1 deletion
@@ -1,3 +1,167 @@
 # Runtime Overview
 
-TBA
+This document discusses the design of the ExecuTorch runtime, which executes
+ExecuTorch program files on edge devices like smartphones, wearables, and
+embedded devices. The code for the main execution API is under
+`[executorch/runtime/executor/](https://github.com/pytorch/executorch/tree/main/runtime/executor)`.
+
+Before reading this document we recommend that you read [How Does ExecuTorch
+Work](intro-how-it-works.md).
+
+At the highest level, the ExecuTorch runtime is responsible for:
+
+* Loading binary `.pte` program files that were generated by the
+  `to_executorch()` step of the model-lowering process.
+* Executing the series of instructions that implement a lowered model.
+
+This diagram shows the high-level flow of and components involved with exporting
+and executing an ExecuTorch program:
+
+![High-level diagram of the ExecuTorch
+Runtime](/_static/img/runtime-overview-high-level.png)
+
+The runtime is also responsible for:
+
+* Managing the memory used during load and execution, potentially across
+  multiple memory banks like SRAM and DRAM.
+* Mapping symbolic operator names like `"aten::add.out"` to concrete C++
+  functions or [_kernels_](kernel-library-overview.md) that implement the
+  semantics of those operators.
+* Dispatching predetermined sections of the model to [backend
+  delegates](compiler-delegate-and-partitioner.md) for acceleration.
+* Optionally gathering [profiling data](sdk-profiling.md) during load and
+  execution.
+
+## Design Goals
+
+The ExecuTorch runtime was designed to run on a wide variety of edge devices,
+from modern smartphone CPUs to resource-constrained microcontrollers and DSPs.
+It has first-class support for
+[delegating](compiler-delegate-and-partitioner.md) execution to one or more
+backends to take advantage of architecture-specific optimizations and modern
+heterogeneous architectures. It is small and portable enough to run directly in
+bare-metal embedded environments with no operating systems, dynamic memory, or
+threads.
+
+### Low Execution Overhead
+
+#### Memory
+
+* The core runtime library is less than 50kB when built without kernels or
+  backends.
+* Constant tensors point directly into the `.pte` file data, avoiding copies of
+  that data. The alignment of these data chunks can be adjusted at `.pte`
+  creation time.
+* Backend delegates can choose to unload their precompiled data after model
+  initialization, reducing peak memory usage.
+* Mutable tensor memory layout is planned ahead of time and packed into a small
+  set of user-allocated buffers, providing fine-grained control over memory
+  location. This is especially useful on systems with heterogeneous memory
+  hierarchies, allowing placement onto (e.g.) SRAM or DRAM close to the core
+  that will operate on the data.
+
+#### CPU
+
+* Model execution is a simple loop over an array of instructions, most of which
+  are function pointers to kernels and backend delegates. This keeps the
+  execution overhead small, on the order of microseconds to nanoseconds per
+  operation.
+* The implementation of an operation (like "add" or "conv3d") can be fully
+  customized for a particular target system without needing to modify the
+  original model or generated `.pte` file.
+
+### Familiar PyTorch Semantics
+
+ExecuTorch is a first-class component of the PyTorch stack, and reuses APIs and
+semantics whenever possible.
+
+* The C++ types used by ExecuTorch are source-compatible with the corresponding
+  types from core PyTorch's `c10::` and `at::` libraries, and ExecuTorch
+  provides
+  [`aten_bridge`](https://github.com/pytorch/executorch/blob/main/extension/aten_util/aten_bridge.h)
+  to convert between the two. This can be helpful for projects that already use
+  PyTorch C++ types.
+* The behavior of operators like "aten::add" and "aten::sigmoid" are identical
+  between ExecuTorch and core PyTorch. ExecuTorch provides a testing framework
+  to ensure this, and to help test future implementations of these operators.
+
+### Portable Code and Architecture
+
+The ExecuTorch runtime is implemented with portability in mind, so that users
+can build it for a wide variety of target systems.
+
+#### C++ Language Considerations
+
+* The code is C++11-compatible to work with older toolchains.
+* The runtime does not use exceptions or RTTI, although it is not antagonistic
+  to them.
+* The code is compatible with gcc and clang, and has also been built with
+  several proprietary embedded toolchains.
+* The repo provides both CMake and buck2 build systems to make integration
+  easier.
+
+#### Operating System Considerations
+
+The runtime makes no direct system calls. All access to memory, files, logging,
+and clocks are abstracted through the [_Runtime Platform Abstraction Layer
+(PAL)_](runtime-platform-abstraction-layer.md) and injected interfaces like
+`DataLoader` and `MemoryAllocator`. [TODO: link these types to their generated
+docs]
+
+Applications can control all memory allocation through the `MemoryManager`,
+`MemoryAllocator`, `HierarchicalAllocator`, and `DataLoader` classes. The core
+runtime makes no direct calls to `malloc()` or `new`, or to types like
+`std::vector` that allocate under the hood. This makes it possible to:
+
+* run in environments without a heap, but still use the heap if desired.
+* avoid synchronization on the heap during model load and execution.
+* control which memory region to use for different types of data. For example,
+  one set of mutable tensors could live in SRAM while another set lived in DRAM.
+* easily monitor how much memory the runtime uses.
+
+However, please note that specific kernel or backend implementations may use
+arbitrary runtime or operating system features. Users should double-check the
+docs for the kernel and backend libraries that they use.
+
+#### Threading Considerations
+
+The core runtime does no threading or locking, and does not use thread local
+variables. But, it plays well with higher-level synchronization.
+
+* Each `Program` instance is immutable and therefore _[fully
+  thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#thread-safe)_.
+  Multiple threads may concurrently access a single `Program` instance.
+* Each `Method` instance is mutable but self-contained, and therefore
+  _[conditionally
+  thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#conditionally-thread-safe)_.
+  Multiple threads can concurrently access and execute independent `Method`
+  instances, but access and execution of a single instance must be serialized.
+
+However, please note:
+
+* There are two global tables that may be read during `Program::load_method()`:
+  the kernel registration table and the backend registration table.
+    * In practice, these tables are only modified at process/system load time,
+      and are effectively frozen before the first `Program` is loaded. But some
+      applications may need to be aware of these tables, especially if they
+      manually mutate them after process/system load time.
+* Specific kernel or backend implementations may have their own threading
+  restrictions. Users should double-check the docs for the kernel and backend
+  libraries that they use.
+
+## Further Reading
+
+For more details about the ExecuTorch runtime, please see:
+
+* The
+  [`executor_runner`](https://github.com/pytorch/executorch/blob/main/examples/executor_runner/executor_runner.cpp)
+  example tool
+* [Runtime API](runtime-api.md)
+* [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
+* [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
+* [Custom Memory Allocation](runtime-custom-memory-allocator.md)
+* [Runtime Error Handling](runtime-error-handling.md)
+* [Runtime Profiling](sdk-profiling.md)
+* [Backends and Delegates](compiler-delegate-and-partitioner.md)
+* [Backend Delegate Implementation](runtime-backend-delegate-implementation-and-linking.md)
+* [Kernel Library Overview(kernel-library-overview.md)