|
| 1 | +# Vulkan Backend |
| 2 | + |
| 3 | +The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is |
| 4 | +built on top of the cross-platform Vulkan GPU API standard. It is primarily |
| 5 | +designed to leverage the GPU to accelerate model inference on Android devices, |
| 6 | +but can be used on any platform that supports an implementation of Vulkan: |
| 7 | +laptops, servers, and edge devices. |
| 8 | + |
| 9 | +::::{note} |
| 10 | +The Vulkan delegate is currently under active development, and its components |
| 11 | +are subject to change. |
| 12 | +:::: |
| 13 | + |
| 14 | +## What is Vulkan? |
| 15 | + |
| 16 | +Vulkan is a low-level GPU API specification developed as a successor to OpenGL. |
| 17 | +It is designed to offer developers more explicit control over GPUs compared to |
| 18 | +previous specifications in order to reduce overhead and maximize the |
| 19 | +capabilities of the modern graphics hardware. |
| 20 | + |
| 21 | +Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both |
| 22 | +desktop and mobile) in the market support Vulkan. Vulkan is also included in |
| 23 | +Android from Android 7.0 onwards. |
| 24 | + |
| 25 | +**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it |
| 26 | +provides a way to execute compute and graphics operations on a GPU, but does not |
| 27 | +come with a built-in library of performant compute kernels. |
| 28 | + |
| 29 | +## The Vulkan Compute Library |
| 30 | + |
| 31 | +The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as |
| 32 | +the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to |
| 33 | +provide GPU implementations for PyTorch operators via GLSL compute shaders. |
| 34 | + |
| 35 | +The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html). |
| 36 | +The core components of the PyTorch Vulkan backend were forked into ExecuTorch |
| 37 | +and adapted for an AOT graph-mode style of model inference (as opposed to |
| 38 | +PyTorch which adopted an eager execution style of model inference). |
| 39 | + |
| 40 | +The components of the Vulkan Compute Library are contained in the |
| 41 | +`executorch/backends/vulkan/runtime/` directory. The core components are listed |
| 42 | +and described below: |
| 43 | + |
| 44 | +``` |
| 45 | +runtime/ |
| 46 | +├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects |
| 47 | +└── graph/ .................. ComputeGraph class which implements graph mode inference |
| 48 | + └── ops/ ................ Base directory for operator implementations |
| 49 | + ├── glsl/ ........... GLSL compute shaders |
| 50 | + │ ├── *.glsl |
| 51 | + │ └── conv2d.glsl |
| 52 | + └── impl/ ........... C++ code to dispatch GPU compute shaders |
| 53 | + ├── *.cpp |
| 54 | + └── Conv2d.cpp |
| 55 | +``` |
| 56 | + |
| 57 | +## Features |
| 58 | + |
| 59 | +The Vulkan delegate currently supports the following features: |
| 60 | + |
| 61 | +* **Memory Planning** |
| 62 | + * Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference. |
| 63 | +* **Capability Based Partitioning**: |
| 64 | + * A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs |
| 65 | +* **Support for upper-bound dynamic shapes**: |
| 66 | + * Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering |
| 67 | + |
| 68 | +In addition to increasing operator coverage, the following features are |
| 69 | +currently in development: |
| 70 | + |
| 71 | +* **Quantization Support** |
| 72 | + * We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future. |
| 73 | +* **Memory Layout Management** |
| 74 | + * Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication. |
| 75 | +* **Selective Build** |
| 76 | + * We plan to make it possible to control build size by selecting which operators/shaders you want to build with |
| 77 | + |
| 78 | +## End to End Example |
| 79 | + |
| 80 | +To further understand the features of the Vulkan Delegate and how to use it, |
| 81 | +consider the following end to end example with a simple single operator model. |
| 82 | + |
| 83 | +### Compile and lower a model to the Vulkan Delegate |
| 84 | + |
| 85 | +Assuming ExecuTorch has been set up and installed, the following script can be |
| 86 | +used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`. |
| 87 | + |
| 88 | +Once ExecuTorch has been set up and installed, the following script can be used |
| 89 | +to generate a simple model and lower it to the Vulkan delegate. |
| 90 | + |
| 91 | +``` |
| 92 | +# Note: this script is the same as the script from the "Setting up ExecuTorch" |
| 93 | +# page, with one minor addition to lower to the Vulkan backend. |
| 94 | +import torch |
| 95 | +from torch.export import export |
| 96 | +from executorch.exir import to_edge |
| 97 | +
|
| 98 | +from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner |
| 99 | +
|
| 100 | +# Start with a PyTorch model that adds two input tensors (matrices) |
| 101 | +class Add(torch.nn.Module): |
| 102 | + def __init__(self): |
| 103 | + super(Add, self).__init__() |
| 104 | +
|
| 105 | + def forward(self, x: torch.Tensor, y: torch.Tensor): |
| 106 | + return x + y |
| 107 | +
|
| 108 | +# 1. torch.export: Defines the program with the ATen operator set. |
| 109 | +aten_dialect = export(Add(), (torch.ones(1), torch.ones(1))) |
| 110 | +
|
| 111 | +# 2. to_edge: Make optimizations for Edge devices |
| 112 | +edge_program = to_edge(aten_dialect) |
| 113 | +# 2.1 Lower to the Vulkan backend |
| 114 | +edge_program = edge_program.to_backend(VulkanPartitioner()) |
| 115 | +
|
| 116 | +# 3. to_executorch: Convert the graph to an ExecuTorch program |
| 117 | +executorch_program = edge_program.to_executorch() |
| 118 | +
|
| 119 | +# 4. Save the compiled .pte program |
| 120 | +with open("vk_add.pte", "wb") as file: |
| 121 | + file.write(executorch_program.buffer) |
| 122 | +``` |
| 123 | + |
| 124 | +Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate |
| 125 | +using the `to_backend()` API. The Vulkan Delegate implements the |
| 126 | +`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph |
| 127 | +that are supported by the Vulkan delegate, and separates compatible sections of |
| 128 | +the model to be executed on the GPU. |
| 129 | + |
| 130 | +This means the a model can be lowered to the Vulkan delegate even if it contains |
| 131 | +some unsupported operators. This will just mean that only parts of the graph |
| 132 | +will be executed on the GPU. |
| 133 | + |
| 134 | + |
| 135 | +::::{note} |
| 136 | +The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py) |
| 137 | +Vulkan partitioner code can be inspected to examine which ops are currently |
| 138 | +implemented in the Vulkan delegate. |
| 139 | +:::: |
| 140 | + |
| 141 | +### Build Vulkan Delegate libraries |
| 142 | + |
| 143 | +The easiest way to build and test the Vulkan Delegate is to build for Android |
| 144 | +and test on a local Android device. Android devices have built in support for |
| 145 | +Vulkan, and the Android NDK ships with a GLSL compiler which is needed to |
| 146 | +compile the Vulkan Compute Library's GLSL compute shaders. |
| 147 | + |
| 148 | +The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON` |
| 149 | +when building with CMake. |
| 150 | + |
| 151 | +First, make sure that you have the Android NDK installed; any NDK version past |
| 152 | +NDK r19c should work. Note that the examples in this doc have been validated with |
| 153 | +NDK r27b. The Android SDK should also be installed so that you have access to `adb`. |
| 154 | + |
| 155 | +The instructions in this page assumes that the following environment variables |
| 156 | +are set. |
| 157 | + |
| 158 | +```shell |
| 159 | +export ANDROID_NDK=<path_to_ndk> |
| 160 | +# Select the appropriate Android ABI for your device |
| 161 | +export ANDROID_ABI=arm64-v8a |
| 162 | +# All subsequent commands should be performed from ExecuTorch repo root |
| 163 | +cd <path_to_executorch_root> |
| 164 | +# Make sure adb works |
| 165 | +adb --version |
| 166 | +``` |
| 167 | + |
| 168 | +To build and install ExecuTorch libraries (for Android) with the Vulkan |
| 169 | +Delegate: |
| 170 | + |
| 171 | +```shell |
| 172 | +# From executorch root directory |
| 173 | +(rm -rf cmake-android-out && \ |
| 174 | + pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \ |
| 175 | + -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ |
| 176 | + -DANDROID_ABI=$ANDROID_ABI \ |
| 177 | + -DEXECUTORCH_BUILD_VULKAN=ON \ |
| 178 | + -DPYTHON_EXECUTABLE=python \ |
| 179 | + -Bcmake-android-out && \ |
| 180 | + cmake --build cmake-android-out -j16 --target install) |
| 181 | +``` |
| 182 | + |
| 183 | +### Run the Vulkan model on device |
| 184 | + |
| 185 | +::::{note} |
| 186 | +Since operator support is currently limited, only binary arithmetic operators |
| 187 | +will run on the GPU. Expect inference to be slow as the majority of operators |
| 188 | +are being executed via Portable operators. |
| 189 | +:::: |
| 190 | + |
| 191 | +Now, the partially delegated model can be executed (partially) on your device's |
| 192 | +GPU! |
| 193 | + |
| 194 | +```shell |
| 195 | +# Build a model runner binary linked with the Vulkan delegate libs |
| 196 | +cmake --build cmake-android-out --target vulkan_executor_runner -j32 |
| 197 | + |
| 198 | +# Push model to device |
| 199 | +adb push vk_add.pte /data/local/tmp/vk_add.pte |
| 200 | +# Push binary to device |
| 201 | +adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin |
| 202 | + |
| 203 | +# Run the model |
| 204 | +adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte |
| 205 | +``` |
0 commit comments