Skip to content

Commit 097d09b

Browse files
dbortfacebook-github-bot
authored andcommitted
Point to control flow docs in core pytorch (#1083)
Summary: Pull Request resolved: #1083 Stop referring to the deprecated `docs/website` tree. Pytorch core already provides some information about control flow concepts, so we can point to that instead. While I'm here, remove stray blank lines. Reviewed By: mergennachin Differential Revision: D50612692 fbshipit-source-id: a56d2127cf7a0d29538b9943c180075c5f2b25f1
1 parent c4fb662 commit 097d09b

File tree

1 file changed

+13
-25
lines changed

1 file changed

+13
-25
lines changed

docs/source/getting-started-architecture.md

Lines changed: 13 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,90 +6,79 @@ This page describes the technical architecture of ExecuTorch and its individual
66

77
In order to target on-device AI with diverse hardware, critical power requirements, and realtime processing needs, a single monolithic solution is not practical. Instead, a modular, layered, and extendable architecture is desired. ExecuTorch defines a streamlined workflow to prepare (export, transformation, and compilation) and execute a PyTorch program, with opinionated out-of-the-box default components and well-defined entry points for customizations. This architecture greatly improves portability, allowing engineers to use a performant lightweight, cross-platform runtime that easily integrates into different devices and platforms.
88

9-
109
## Overview
1110

1211
There are three phases to deploy a PyTorch model to on-device: program preparation, runtime preparation, and program execution, as shown in the diagram below, with a number of user entry points. We’ll discuss each step separately in this documentation.
1312

14-
1513
![](./executorch_stack.png)
1614

17-
1815
**Figure 1.** The figure illustrates the three phases - program preparation, runtime preparation and program execution.
1916

20-
2117
## Program Preparation
2218

23-
ExecuTorch extends the flexibility and usability of PyTorch to edge devices. It leverages PyTorch 2 compiler and export functionality ([TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html), [AOTAutograd](https://pytorch.org/functorch/stable/notebooks/aot_autograd_optimizations.html), [Quantization](https://pytorch.org/docs/main/quantization.html), dynamic shapes, control flow, etc.) to prepare a PyTorch program for execution on devices.
19+
ExecuTorch extends the flexibility and usability of PyTorch to edge devices. It
20+
leverages PyTorch 2 compiler and export functionality
21+
([TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html),
22+
[AOTAutograd](https://pytorch.org/functorch/stable/notebooks/aot_autograd_optimizations.html),
23+
[Quantization](https://pytorch.org/docs/main/quantization.html),
24+
[dynamic shapes](https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever),
25+
[control flow](https://pytorch.org/docs/main/export.html#data-shape-dependent-control-flow),
26+
etc.) to prepare a PyTorch program for execution on devices.
2427

2528
Program preparation is often simply called AOT (ahead-of-time) because export, transformations and compilations to the program are performed before it is eventually run with the ExecuTorch runtime, written in C++. To have a lightweight runtime and small overhead in execution, we push work as much as possible to AOT.
2629

2730
Starting from the program source code, below are the steps you would go through to accomplish the program preparation.
2831

29-
3032
### Program Source Code
3133

32-
33-
3434
* Like all PyTorch use cases, ExecuTorch starts from model authoring, where standard `nn.Module` eager mode PyTorch programs are created.
35-
* Export-specific helpers are used to represent advanced features like [control flow](https://github.com/pytorch/executorch/blob/main/docs/website/docs/ir_spec/control_flow.md) (for example, helper functions to trace both branches of if-else) and [dynamic shapes](https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever) (for example, data dependent dynamic shape constraint).
36-
35+
* Export-specific helpers are used to represent advanced features like [control
36+
flow](https://pytorch.org/docs/main/export.html#data-shape-dependent-control-flow)
37+
(for example, helper functions to trace both branches of if-else) and [dynamic
38+
shapes](https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever)
39+
(for example, data dependent dynamic shape constraint).
3740

3841
### Export
3942

4043
To deploy the program to the device, engineers need to have a graph representation for compiling a model to run on various backends. With [`torch.export()`](https://pytorch.org/docs/main/export.html), an [EXIR](./ir-exir.md) (export intermediate representation) is generated with ATen dialect. All AOT compilations are based on this EXIR, but can have multiple dialects along the lowering path as detailed below.
4144

42-
43-
4445
* _[ATen Dialect](./ir-exir.md#aten-dialect)_. PyTorch Edge is based on PyTorch’s Tensor library ATen, which has clear contracts for efficient execution. ATen Dialect is a graph represented by ATen nodes which are fully ATen compliant. Custom operators are allowed, but must be registered with the dispatcher. It’s flatten with no module hierarchy (submodules in a bigger module), but the source code and module hierarchy are preserved in the metadata. This representation is also autograd safe.
4546
* Optionally, _quantization_, either QAT (quantization-aware training) or PTQ (post training quantization) can be applied to the whole ATen graph before converting to Core ATen. Quantization helps with reducing the model size, which is important for edge devices.
4647
* _[Core ATen Dialect](./ir-ops-set-definition.md)_. ATen has thousands of operators. It’s not ideal for some fundamental transforms and kernel library implementation. The operators from the ATen Dialect graph are decomposed into fundamental operators so that the operator set (op set) is smaller and more fundamental transforms can be applied. The Core ATen dialect is also serializable and convertible to Edge Dialect as detailed below.
4748

48-
4949
### Edge Compilation
5050

5151
The Export process discussed above operates on a graph that is agnostic to the edge device where the code is ultimately executed. During the edge compilation step, we work on representations that are Edge specific.
5252

53-
54-
5553
* _[Edge Dialect](./ir-exir.md#edge-dialect)_. All operators are either compliant with ATen operators with dtype plus memory layout information (represented as `dim_order`) or registered custom operators. Scalars are converted to Tensors. Those specifications allow following steps focusing on a smaller Edge domain. In addition, it enables the selective build which is based on specific dtypes and memory layouts.
5654

5755
With the Edge dialect, there are two target-aware ways to further lower the graph to the _[Backend Dialect](./compiler-backend-dialect.md)_. At this point, delegates for specific hardware can perform many operations. For example, Core ML on iOS, QNN on Qualcomm, or TOSA on Arm can rewrite the graph. The options at this level are:
5856

59-
60-
6157
* _[Backend Delegate](./compiler-delegate-and-partitioner.md)_. The entry point to compile the graph (either full or partial) to a specific backend. The compiled graph is swapped with the semantically equivalent graph during this transformation. The compiled graph will be offloaded to the backend (aka `delegated`) later during the runtime for improved performance.
6258
* _User-defined passes_. Target-specific transforms can also be performed by the user. Good examples of this are kernel fusion, async behavior, memory layout conversion, and others.
6359

64-
6560
### Compile to ExecuTorch Program
6661

6762
The Edge program above is good for compilation, but not suitable for the runtime environment. On-device deployment engineers can lower the graph that can be efficiently loaded and executed by the runtime.
6863

6964
On most Edge environments, dynamic memory allocation/freeing has significant performance and power overhead. It can be avoided using AOT memory planning, and a static execution graph.
7065

71-
72-
7366
* The ExecuTorch runtime is static (in the sense of graph representation, but control flow and dynamic shapes are still supported). To avoid output creation and return, all functional operator representations are converted to out variants (outputs passed as arguments).
7467
* Optionally, users can apply their own memory planning algorithms. For example, there can be specific layers of memory hierarchy for an embedded system. Users can have their customized memory planning to that memory hierarchy.
7568
* The program is emitted to the format that our ExecuTorch runtime can recognize.
7669

7770
Finally, the emitted program can be serialized to [flatbuffer](https://github.com/pytorch/executorch/blob/main/schema/program.fbs) format.
7871

79-
8072
## Runtime Preparation
8173

8274
With the serialized program, and provided kernel libraries (for operator calls) or backend libraries (for delegate calls), model deployment engineers can now prepare the program for the runtime.
8375

8476
ExecuTorch has the _[selective build](./kernel-library-selective-build.md)_ APIs, to build the runtime that links to only kernels used by the program, which can provide significant binary size savings in the resulting application.
8577

86-
8778
## Program Execution
8879

8980
The ExecuTorch runtime is written in C++ with minimal dependencies for portability and execution efficiency. Because the program is well prepared AOT, the core runtime components are minimal and include:
9081

91-
92-
9382
* Platform abstraction layer
9483
* Logging and optionally profiling
9584
* Execution data types
@@ -98,7 +87,6 @@ The ExecuTorch runtime is written in C++ with minimal dependencies for portabili
9887

9988
_Executor_ is the entry point to load the program and execute it. The execution triggers corresponding operator kernels or backend execution from this very minimal runtime.
10089

101-
10290
## SDK
10391

10492
It should be efficient for users to go from research to production using the flow above. Productivity is essentially important, for users to author, optimize and deploy their models. We provide [ExecuTorch SDK](./sdk-overview.md) to improve productivity. The SDK is not in the diagram. Instead it's a tool set that covers the developer workflow in all three phases.

0 commit comments

Comments
 (0)