Skip to content

Commit f2320e6

Browse files
committed
Move backend docs to new locations (#8413)
* Temporarily remove new backend pages * Move backend docs to new locations * Update backend titles and inline contents
1 parent 0a4ed6e commit f2320e6

17 files changed

+375
-33
lines changed

docs/source/executorch-arm-delegate-tutorial.md renamed to docs/source/backends-arm-ethos-u.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!---- Name is a WIP - this reflects better what it can do today ----->
2-
# Building and Running ExecuTorch with ARM Ethos-U Backend
2+
# ARM Ethos-U Backend
33

44
<!----This will show a grid card on the page----->
55
::::{grid} 2

docs/source/build-run-xtensa.md renamed to docs/source/backends-cadence.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Building and Running ExecuTorch on Xtensa HiFi4 DSP
1+
# Cadence Xtensa Backend
22

33

44
In this tutorial we will walk you through the process of getting setup to build ExecuTorch for an Xtensa HiFi4 DSP and running a simple model on it.

docs/source/build-run-coreml.md renamed to docs/source/backends-coreml.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Building and Running ExecuTorch with Core ML Backend
1+
# Core ML Backend
22

33
Core ML delegate uses Core ML APIs to enable running neural networks via Apple's hardware acceleration. For more about Core ML you can read [here](https://developer.apple.com/documentation/coreml). In this tutorial, we will walk through the steps of lowering a PyTorch model to Core ML delegate
44

docs/source/build-run-mediatek-backend.md renamed to docs/source/backends-mediatek.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Building and Running ExecuTorch with MediaTek Backend
1+
# MediaTek Backend
22

33
MediaTek backend empowers ExecuTorch to speed up PyTorch models on edge devices that equips with MediaTek Neuron Processing Unit (NPU). This document offers a step-by-step guide to set up the build environment for the MediaTek ExecuTorch libraries.
44

docs/source/backends-mps.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# MPS Backend
2+
3+
In this tutorial we will walk you through the process of getting setup to build the MPS backend for ExecuTorch and running a simple model on it.
4+
5+
The MPS backend device maps machine learning computational graphs and primitives on the [MPS Graph](https://developer.apple.com/documentation/metalperformanceshadersgraph/mpsgraph?language=objc) framework and tuned kernels provided by [MPS](https://developer.apple.com/documentation/metalperformanceshaders?language=objc).
6+
7+
::::{grid} 2
8+
:::{grid-item-card} What you will learn in this tutorial:
9+
:class-card: card-prerequisites
10+
* In this tutorial you will learn how to export [MobileNet V3](https://pytorch.org/vision/main/models/mobilenetv3.html) model to the MPS delegate.
11+
* You will also learn how to compile and deploy the ExecuTorch runtime with the MPS delegate on macOS and iOS.
12+
:::
13+
:::{grid-item-card} Tutorials we recommend you complete before this:
14+
:class-card: card-prerequisites
15+
* [Introduction to ExecuTorch](intro-how-it-works.md)
16+
* [Setting up ExecuTorch](getting-started-setup.md)
17+
* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
18+
* [ExecuTorch iOS Demo App](demo-apps-ios.md)
19+
* [ExecuTorch iOS LLaMA Demo App](llm/llama-demo-ios.md)
20+
:::
21+
::::
22+
23+
24+
## Prerequisites (Hardware and Software)
25+
26+
In order to be able to successfully build and run a model using the MPS backend for ExecuTorch, you'll need the following hardware and software components:
27+
28+
### Hardware:
29+
- A [mac](https://www.apple.com/mac/) for tracing the model
30+
31+
### Software:
32+
33+
- **Ahead of time** tracing:
34+
- [macOS](https://www.apple.com/macos/) 12
35+
36+
- **Runtime**:
37+
- [macOS](https://www.apple.com/macos/) >= 12.4
38+
- [iOS](https://www.apple.com/ios) >= 15.4
39+
- [Xcode](https://developer.apple.com/xcode/) >= 14.1
40+
41+
## Setting up Developer Environment
42+
43+
***Step 1.*** Please finish tutorial [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup).
44+
45+
***Step 2.*** Install dependencies needed to lower MPS delegate:
46+
47+
```bash
48+
./backends/apple/mps/install_requirements.sh
49+
```
50+
51+
## Build
52+
53+
### AOT (Ahead-of-time) Components
54+
55+
**Compiling model for MPS delegate**:
56+
- In this step, you will generate a simple ExecuTorch program that lowers MobileNetV3 model to the MPS delegate. You'll then pass this Program (the `.pte` file) during the runtime to run it using the MPS backend.
57+
58+
```bash
59+
cd executorch
60+
# Note: `mps_example` script uses by default the MPSPartitioner for ops that are not yet supported by the MPS delegate. To turn it off, pass `--no-use_partitioner`.
61+
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --bundled --use_fp16
62+
63+
# To see all options, run following command:
64+
python3 -m examples.apple.mps.scripts.mps_example --help
65+
```
66+
67+
### Runtime
68+
69+
**Building the MPS executor runner:**
70+
```bash
71+
# In this step, you'll be building the `mps_executor_runner` that is able to run MPS lowered modules:
72+
cd executorch
73+
./examples/apple/mps/scripts/build_mps_executor_runner.sh
74+
```
75+
76+
## Run the mv3 generated model using the mps_executor_runner
77+
78+
```bash
79+
./cmake-out/examples/apple/mps/mps_executor_runner --model_path mv3_mps_bundled_fp16.pte --bundled_program
80+
```
81+
82+
- You should see the following results. Note that no output file will be generated in this example:
83+
```
84+
I 00:00:00.003290 executorch:mps_executor_runner.mm:286] Model file mv3_mps_bundled_fp16.pte is loaded.
85+
I 00:00:00.003306 executorch:mps_executor_runner.mm:292] Program methods: 1
86+
I 00:00:00.003308 executorch:mps_executor_runner.mm:294] Running method forward
87+
I 00:00:00.003311 executorch:mps_executor_runner.mm:349] Setting up non-const buffer 1, size 606112.
88+
I 00:00:00.003374 executorch:mps_executor_runner.mm:376] Setting up memory manager
89+
I 00:00:00.003376 executorch:mps_executor_runner.mm:392] Loading method name from plan
90+
I 00:00:00.018942 executorch:mps_executor_runner.mm:399] Method loaded.
91+
I 00:00:00.018944 executorch:mps_executor_runner.mm:404] Loading bundled program...
92+
I 00:00:00.018980 executorch:mps_executor_runner.mm:421] Inputs prepared.
93+
I 00:00:00.118731 executorch:mps_executor_runner.mm:438] Model executed successfully.
94+
I 00:00:00.122615 executorch:mps_executor_runner.mm:501] Model verified successfully.
95+
```
96+
97+
### [Optional] Run the generated model directly using pybind
98+
1. Make sure `pybind` MPS support was installed:
99+
```bash
100+
./install_executorch.sh --pybind mps
101+
```
102+
2. Run the `mps_example` script to trace the model and run it directly from python:
103+
```bash
104+
cd executorch
105+
# Check correctness between PyTorch eager forward pass and ExecuTorch MPS delegate forward pass
106+
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --no-use_fp16 --check_correctness
107+
# You should see following output: `Results between ExecuTorch forward pass with MPS backend and PyTorch forward pass for mv3_mps are matching!`
108+
109+
# Check performance between PyTorch MPS forward pass and ExecuTorch MPS forward pass
110+
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --no-use_fp16 --bench_pytorch
111+
```
112+
113+
### Profiling:
114+
1. [Optional] Generate an [ETRecord](./etrecord.rst) while you're exporting your model.
115+
```bash
116+
cd executorch
117+
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --generate_etrecord -b
118+
```
119+
2. Run your Program on the ExecuTorch runtime and generate an [ETDump](./etdump.md).
120+
```
121+
./cmake-out/examples/apple/mps/mps_executor_runner --model_path mv3_mps_bundled_fp16.pte --bundled_program --dump-outputs
122+
```
123+
3. Create an instance of the Inspector API by passing in the ETDump you have sourced from the runtime along with the optionally generated ETRecord from step 1.
124+
```bash
125+
python3 -m sdk.inspector.inspector_cli --etdump_path etdump.etdp --etrecord_path etrecord.bin
126+
```
127+
128+
## Deploying and Running on Device
129+
130+
***Step 1***. Create the ExecuTorch core and MPS delegate frameworks to link on iOS
131+
```bash
132+
cd executorch
133+
./build/build_apple_frameworks.sh --mps
134+
```
135+
136+
`mps_delegate.xcframework` will be in `cmake-out` folder, along with `executorch.xcframework` and `portable_delegate.xcframework`:
137+
```bash
138+
cd cmake-out && ls
139+
```
140+
141+
***Step 2***. Link the frameworks into your XCode project:
142+
Go to project Target’s `Build Phases` - `Link Binaries With Libraries`, click the **+** sign and add the frameworks: files located in `Release` folder.
143+
- `executorch.xcframework`
144+
- `portable_delegate.xcframework`
145+
- `mps_delegate.xcframework`
146+
147+
From the same page, include the needed libraries for the MPS delegate:
148+
- `MetalPerformanceShaders.framework`
149+
- `MetalPerformanceShadersGraph.framework`
150+
- `Metal.framework`
151+
152+
In this tutorial, you have learned how to lower a model to the MPS delegate, build the mps_executor_runner and run a lowered model through the MPS delegate, or directly on device using the MPS delegate static library.
153+
154+
155+
## Frequently encountered errors and resolution.
156+
157+
If you encountered any bugs or issues following this tutorial please file a bug/issue on the [ExecuTorch repository](https://github.com/pytorch/executorch/issues), with hashtag **#mps**.

docs/source/build-run-qualcomm-ai-engine-direct-backend.md renamed to docs/source/backends-qualcomm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Building and Running ExecuTorch with Qualcomm AI Engine Direct Backend
1+
# Qualcomm AI Engine Backend
22

33
In this tutorial we will walk you through the process of getting started to
44
build ExecuTorch for Qualcomm AI Engine Direct and running a model on it.

docs/source/backends-vulkan.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Vulkan Backend
2+
3+
The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is
4+
built on top of the cross-platform Vulkan GPU API standard. It is primarily
5+
designed to leverage the GPU to accelerate model inference on Android devices,
6+
but can be used on any platform that supports an implementation of Vulkan:
7+
laptops, servers, and edge devices.
8+
9+
::::{note}
10+
The Vulkan delegate is currently under active development, and its components
11+
are subject to change.
12+
::::
13+
14+
## What is Vulkan?
15+
16+
Vulkan is a low-level GPU API specification developed as a successor to OpenGL.
17+
It is designed to offer developers more explicit control over GPUs compared to
18+
previous specifications in order to reduce overhead and maximize the
19+
capabilities of the modern graphics hardware.
20+
21+
Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both
22+
desktop and mobile) in the market support Vulkan. Vulkan is also included in
23+
Android from Android 7.0 onwards.
24+
25+
**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it
26+
provides a way to execute compute and graphics operations on a GPU, but does not
27+
come with a built-in library of performant compute kernels.
28+
29+
## The Vulkan Compute Library
30+
31+
The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as
32+
the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to
33+
provide GPU implementations for PyTorch operators via GLSL compute shaders.
34+
35+
The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html).
36+
The core components of the PyTorch Vulkan backend were forked into ExecuTorch
37+
and adapted for an AOT graph-mode style of model inference (as opposed to
38+
PyTorch which adopted an eager execution style of model inference).
39+
40+
The components of the Vulkan Compute Library are contained in the
41+
`executorch/backends/vulkan/runtime/` directory. The core components are listed
42+
and described below:
43+
44+
```
45+
runtime/
46+
├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects
47+
└── graph/ .................. ComputeGraph class which implements graph mode inference
48+
└── ops/ ................ Base directory for operator implementations
49+
├── glsl/ ........... GLSL compute shaders
50+
│ ├── *.glsl
51+
│ └── conv2d.glsl
52+
└── impl/ ........... C++ code to dispatch GPU compute shaders
53+
├── *.cpp
54+
└── Conv2d.cpp
55+
```
56+
57+
## Features
58+
59+
The Vulkan delegate currently supports the following features:
60+
61+
* **Memory Planning**
62+
* Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference.
63+
* **Capability Based Partitioning**:
64+
* A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs
65+
* **Support for upper-bound dynamic shapes**:
66+
* Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering
67+
68+
In addition to increasing operator coverage, the following features are
69+
currently in development:
70+
71+
* **Quantization Support**
72+
* We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future.
73+
* **Memory Layout Management**
74+
* Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication.
75+
* **Selective Build**
76+
* We plan to make it possible to control build size by selecting which operators/shaders you want to build with
77+
78+
## End to End Example
79+
80+
To further understand the features of the Vulkan Delegate and how to use it,
81+
consider the following end to end example with a simple single operator model.
82+
83+
### Compile and lower a model to the Vulkan Delegate
84+
85+
Assuming ExecuTorch has been set up and installed, the following script can be
86+
used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`.
87+
88+
Once ExecuTorch has been set up and installed, the following script can be used
89+
to generate a simple model and lower it to the Vulkan delegate.
90+
91+
```
92+
# Note: this script is the same as the script from the "Setting up ExecuTorch"
93+
# page, with one minor addition to lower to the Vulkan backend.
94+
import torch
95+
from torch.export import export
96+
from executorch.exir import to_edge
97+
98+
from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner
99+
100+
# Start with a PyTorch model that adds two input tensors (matrices)
101+
class Add(torch.nn.Module):
102+
def __init__(self):
103+
super(Add, self).__init__()
104+
105+
def forward(self, x: torch.Tensor, y: torch.Tensor):
106+
return x + y
107+
108+
# 1. torch.export: Defines the program with the ATen operator set.
109+
aten_dialect = export(Add(), (torch.ones(1), torch.ones(1)))
110+
111+
# 2. to_edge: Make optimizations for Edge devices
112+
edge_program = to_edge(aten_dialect)
113+
# 2.1 Lower to the Vulkan backend
114+
edge_program = edge_program.to_backend(VulkanPartitioner())
115+
116+
# 3. to_executorch: Convert the graph to an ExecuTorch program
117+
executorch_program = edge_program.to_executorch()
118+
119+
# 4. Save the compiled .pte program
120+
with open("vk_add.pte", "wb") as file:
121+
file.write(executorch_program.buffer)
122+
```
123+
124+
Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate
125+
using the `to_backend()` API. The Vulkan Delegate implements the
126+
`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph
127+
that are supported by the Vulkan delegate, and separates compatible sections of
128+
the model to be executed on the GPU.
129+
130+
This means the a model can be lowered to the Vulkan delegate even if it contains
131+
some unsupported operators. This will just mean that only parts of the graph
132+
will be executed on the GPU.
133+
134+
135+
::::{note}
136+
The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py)
137+
Vulkan partitioner code can be inspected to examine which ops are currently
138+
implemented in the Vulkan delegate.
139+
::::
140+
141+
### Build Vulkan Delegate libraries
142+
143+
The easiest way to build and test the Vulkan Delegate is to build for Android
144+
and test on a local Android device. Android devices have built in support for
145+
Vulkan, and the Android NDK ships with a GLSL compiler which is needed to
146+
compile the Vulkan Compute Library's GLSL compute shaders.
147+
148+
The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON`
149+
when building with CMake.
150+
151+
First, make sure that you have the Android NDK installed; any NDK version past
152+
NDK r19c should work. Note that the examples in this doc have been validated with
153+
NDK r27b. The Android SDK should also be installed so that you have access to `adb`.
154+
155+
The instructions in this page assumes that the following environment variables
156+
are set.
157+
158+
```shell
159+
export ANDROID_NDK=<path_to_ndk>
160+
# Select the appropriate Android ABI for your device
161+
export ANDROID_ABI=arm64-v8a
162+
# All subsequent commands should be performed from ExecuTorch repo root
163+
cd <path_to_executorch_root>
164+
# Make sure adb works
165+
adb --version
166+
```
167+
168+
To build and install ExecuTorch libraries (for Android) with the Vulkan
169+
Delegate:
170+
171+
```shell
172+
# From executorch root directory
173+
(rm -rf cmake-android-out && \
174+
pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \
175+
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
176+
-DANDROID_ABI=$ANDROID_ABI \
177+
-DEXECUTORCH_BUILD_VULKAN=ON \
178+
-DPYTHON_EXECUTABLE=python \
179+
-Bcmake-android-out && \
180+
cmake --build cmake-android-out -j16 --target install)
181+
```
182+
183+
### Run the Vulkan model on device
184+
185+
::::{note}
186+
Since operator support is currently limited, only binary arithmetic operators
187+
will run on the GPU. Expect inference to be slow as the majority of operators
188+
are being executed via Portable operators.
189+
::::
190+
191+
Now, the partially delegated model can be executed (partially) on your device's
192+
GPU!
193+
194+
```shell
195+
# Build a model runner binary linked with the Vulkan delegate libs
196+
cmake --build cmake-android-out --target vulkan_executor_runner -j32
197+
198+
# Push model to device
199+
adb push vk_add.pte /data/local/tmp/vk_add.pte
200+
# Push binary to device
201+
adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin
202+
203+
# Run the model
204+
adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte
205+
```

docs/source/build-run-mps.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/source/build-run-vulkan.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)