Skip to content

Commit beeb497

Browse files
lhezorca-zhang
authored andcommitted
docs: add OpenCL (ggml-org#11697)
1 parent 1afcc35 commit beeb497

File tree

2 files changed

+206
-0
lines changed

2 files changed

+206
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
235235
| [HIP](docs/build.md#hip) | AMD GPU |
236236
| [Vulkan](docs/build.md#vulkan) | GPU |
237237
| [CANN](docs/build.md#cann) | Ascend NPU |
238+
| [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
238239

239240
## Building the project
240241

docs/backend/OPENCL.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# llama.cpp for OpenCL
2+
3+
- [Background](#background)
4+
- [OS](#os)
5+
- [Hardware](#hardware)
6+
- [DataType Supports](#datatype-supports)
7+
- [Model Preparation](#model-preparation)
8+
- [CMake Options](#cmake-options)
9+
- [Android](#android)
10+
- [Windows 11 Arm64](#windows-11-arm64)
11+
- [Known Issue](#known-issues)
12+
- [TODO](#todo)
13+
14+
## Background
15+
16+
OpenCL (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL specifies a programming language (based on C99) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. Similar to CUDA, OpenCL has been widely used to program GPUs and is supported by most GPU vendors.
17+
18+
### Llama.cpp + OpenCL
19+
20+
The llama.cpp OpenCL backend is designed to enable llama.cpp on **Qualcomm Adreno GPU** firstly via OpenCL. Thanks to the portabilty of OpenCL, the OpenCL backend can also run on certain Intel GPUs although the performance is not optimal.
21+
22+
## OS
23+
24+
| OS | Status | Verified |
25+
|---------|---------|------------------------------------------------|
26+
| Android | Support | Snapdragon 8 Gen 3, Snapdragon 8 Elite |
27+
| Windows | Support | Windows 11 Arm64 with Snapdragon X Elite |
28+
| Linux | Support | Ubuntu 22.04 WSL2 with Intel 12700H |
29+
30+
## Hardware
31+
32+
### Adreno GPU
33+
34+
**Verified devices**
35+
36+
| Adreno GPU | Status |
37+
|:------------------------------------:|:-------:|
38+
| Adreno 750 (Snapdragon 8 Gen 3) | Support |
39+
| Adreno 830 (Snapdragon 8 Elite) | Support |
40+
| Adreno X85 (Snapdragon X Elite) | Support |
41+
42+
## DataType Supports
43+
44+
| DataType | Status |
45+
|:----------------------:|:--------------------------:|
46+
| Q4_0 | Support |
47+
| Q6_K | Support, but not optimized |
48+
49+
## Model Preparation
50+
51+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration.
52+
53+
Currently we support `Q4_0` quantization and have optimize for it. To achieve best performance on Adreno GPU, add `--pure` to `llama-quantize`. For example,
54+
55+
```sh
56+
./llama-quantize --pure ggml-model-qwen2.5-3b-f16.gguf ggml-model-qwen-3b-Q4_0.gguf Q4_0
57+
```
58+
59+
Since `Q6_K` is also supported, `Q4_0` quantization without `--pure` will also work. However, the performance will be worse compared to pure `Q4_0` quantization.
60+
61+
## CMake Options
62+
63+
The OpenCL backend has the following CMake options that control the behavior of the backend.
64+
65+
| CMake options | Default value | Description |
66+
|:---------------------------------:|:--------------:|:------------------------------------------|
67+
| `GGML_OPENCL_EMBED_KERNELS` | `ON` | Embed OpenCL kernels into the executable. |
68+
| `GGML_OPENCL_USE_ADRENO_KERNELS` | `ON` | Use kernels optimized for Adreno. |
69+
70+
## Android
71+
72+
Ubuntu 22.04 is used for targeting Android. Make sure the following tools are accessible from command line,
73+
74+
* Git
75+
* CMake 3.29
76+
* Ninja
77+
* Python3
78+
79+
### I. Setup Environment
80+
81+
1. **Install NDK**
82+
83+
```sh
84+
cd ~
85+
wget https://dl.google.com/android/repository/commandlinetools-linux-8512546_latest.zip && \
86+
unzip commandlinetools-linux-8512546_latest.zip && \
87+
mkdir -p ~/android-sdk/cmdline-tools && \
88+
mv cmdline-tools latest && \
89+
mv latest ~/android-sdk/cmdline-tools/ && \
90+
rm -rf commandlinetools-linux-8512546_latest.zip
91+
92+
yes | ~/android-sdk/cmdline-tools/latest/bin/sdkmanager "ndk;26.3.11579264"
93+
```
94+
95+
2. **Install OpenCL Headers and Library**
96+
97+
```sh
98+
mkdir -p ~/dev/llm
99+
cd ~/dev/llm
100+
101+
git clone https://github.com/KhronosGroup/OpenCL-Headers && \
102+
cd OpenCL-Headers && \
103+
cp -r CL ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include
104+
105+
cd ~/dev/llm
106+
107+
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && \
108+
cd OpenCL-ICD-Loader && \
109+
mkdir build_ndk26 && cd build_ndk26 && \
110+
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
111+
-DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \
112+
-DOPENCL_ICD_LOADER_HEADERS_DIR=$HOME/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \
113+
-DANDROID_ABI=arm64-v8a \
114+
-DANDROID_PLATFORM=24 \
115+
-DANDROID_STL=c++_shared && \
116+
ninja && \
117+
cp libOpenCL.so ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android
118+
```
119+
120+
### II. Build llama.cpp
121+
122+
```sh
123+
cd ~/dev/llm
124+
125+
git clone https://github.com/ggerganov/llama.cpp && \
126+
cd llama.cpp && \
127+
mkdir build-android && cd build-android
128+
129+
cmake .. -G Ninja \
130+
-DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \
131+
-DANDROID_ABI=arm64-v8a \
132+
-DANDROID_PLATFORM=android-28 \
133+
-DBUILD_SHARED_LIBS=OFF \
134+
-DGGML_OPENCL=ON
135+
136+
ninja
137+
```
138+
139+
## Windows 11 Arm64
140+
141+
A Snapdragon X Elite device with Windows 11 Arm64 is used. Make sure the following tools are accessible from command line,
142+
143+
* Git
144+
* CMake 3.29
145+
* Clang 19
146+
* Ninja
147+
* Visual Studio 2022
148+
149+
Powershell is used for the following instructions.
150+
151+
### I. Setup Environment
152+
153+
1. **Install OpenCL Headers and Library**
154+
155+
```powershell
156+
mkdir -p ~/dev/llm
157+
158+
cd ~/dev/llm
159+
git clone https://github.com/KhronosGroup/OpenCL-Headers && cd OpenCL-Headers
160+
mkdir build && cd build
161+
cmake .. -G Ninja `
162+
-DBUILD_TESTING=OFF `
163+
-DOPENCL_HEADERS_BUILD_TESTING=OFF `
164+
-DOPENCL_HEADERS_BUILD_CXX_TESTS=OFF `
165+
-DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
166+
cmake --build . --target install
167+
168+
cd ~/dev/llm
169+
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && cd OpenCL-ICD-Loader
170+
mkdir build && cd build
171+
cmake .. -G Ninja `
172+
-DCMAKE_BUILD_TYPE=Release `
173+
-DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" `
174+
-DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
175+
cmake --build . --target install
176+
```
177+
178+
### II. Build llama.cpp
179+
180+
```powershell
181+
182+
mkdir -p ~/dev/llm
183+
cd ~/dev/llm
184+
185+
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
186+
mkdir build && cd build
187+
188+
cmake .. -G Ninja `
189+
-DCMAKE_TOOLCHAIN_FILE="$HOME/dev/llm/llama.cpp/cmake/arm64-windows-llvm.cmake" `
190+
-DCMAKE_BUILD_TYPE=Release `
191+
-DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" `
192+
-DBUILD_SHARED_LIBS=OFF `
193+
-DGGML_OPENCL=ON
194+
ninja
195+
```
196+
197+
## Known Issues
198+
199+
- Qwen2.5 0.5B model produces gibberish output with Adreno kernels.
200+
201+
## TODO
202+
203+
- Fix Qwen2.5 0.5B
204+
- Optimization for Q6_K
205+
- Support and optimization for Q4_K

0 commit comments

Comments
 (0)