26
26
27
27
### Llama.cpp + SYCL
28
28
29
- The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it could support other vendor GPUs: Nvidia GPU ( * AMD GPU coming * ) .
29
+ The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
30
30
31
31
## Recommended Release
32
32
@@ -111,10 +111,18 @@ SYCL backend supports Intel GPU Family:
111
111
112
112
** Verified devices**
113
113
114
- | Nvidia GPU | Status | Verified Model |
115
- | --------------------------| ---------| ----------------|
116
- | Ampere Series | Support | A100, A4000 |
117
- | Ampere Series * (Mobile)* | Support | RTX 40 Series |
114
+ | Nvidia GPU | Status | Verified Model |
115
+ | --------------------------| -----------| ----------------|
116
+ | Ampere Series | Supported | A100, A4000 |
117
+ | Ampere Series * (Mobile)* | Supported | RTX 40 Series |
118
+
119
+ | AMD GPU | Status | Verified Model |
120
+ | --------------------------| --------------| ----------------|
121
+ | Radeon Pro | Experimental | W6800 |
122
+ | Radeon RX | Experimental | 6700 XT |
123
+
124
+ Note: AMD GPU support is highly experimental and is incompatible with F16.
125
+ Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
118
126
119
127
## Docker
120
128
The docker build option is currently limited to * intel GPU* targets.
@@ -186,6 +194,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
186
194
187
195
In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements * -found [ here] ( README.md#cuda ) -* are installed.
188
196
197
+ - ** AMD GPU**
198
+
199
+ To target AMD GPUs with SYCL, the ROCm stack must be installed first.
200
+
189
201
2 . ** Install Intel® oneAPI Base toolkit**
190
202
191
203
- ** For Intel GPU**
@@ -212,6 +224,19 @@ cmake -B buildWithCublas -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENAB
212
224
cmake --build buildWithCublas --config Release
213
225
```
214
226
227
+ - ** Adding support to AMD GPUs**
228
+
229
+ ** oneAPI Plugin** : In order to enable SYCL support on AMD GPUs, please install the [ Codeplay oneAPI Plugin for AMD GPUs] ( https://developer.codeplay.com/products/oneapi/amd/download ) . As with Nvidia GPUs, the user should also make sure the plugin version matches the installed base toolkit.
230
+
231
+ ** oneMKL for rocBlas** : The current oneMKL releases * (shipped with the oneAPI base-toolkit)* doesn't contain the rocBLAS backend. A build from source of the upstream [ oneMKL] ( https://github.com/oneapi-src/oneMKL ) with the * rocBLAS* backend enabled is thus required to run it on AMD GPUs.
232
+
233
+ ``` sh
234
+ git clone https://github.com/oneapi-src/oneMKL
235
+ cd oneMKL
236
+ # Find your HIPTARGET with rocminfo, under the key 'Name:'
237
+ cmake -B buildWithrocBLAS -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCBLAS_BACKEND=ON -DHIPTARGETS=${HIPTARGET} -DTARGET_DOMAINS=blas
238
+ cmake --build buildWithrocBLAS --config Release
239
+ ```
215
240
216
241
3 . ** Verify installation and environment**
217
242
@@ -223,22 +248,32 @@ sycl-ls
223
248
224
249
- ** Intel GPU**
225
250
226
- When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` ext_oneapi_level_zero :gpu:0 ` ] in the sample output below:
251
+ When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` level_zero :gpu` ] in the sample output below:
227
252
228
253
```
229
- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
230
- [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
231
- [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
232
- [ext_oneapi_level_zero :gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
254
+ [opencl:acc][opencl :0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
255
+ [opencl:cpu][opencl :1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
256
+ [opencl:gpu][opencl :2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
257
+ [level_zero :gpu][level_zero :0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
233
258
```
234
259
235
260
- ** Nvidia GPU**
236
261
237
- Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` ext_oneapi_cuda:gpu ` ] as bellow:
262
+ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` cuda:gpu ` ] as below:
263
+
264
+ ```
265
+ [opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
266
+ [opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
267
+ [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
268
+ ```
269
+
270
+ - ** AMD GPU**
271
+
272
+ For AMD GPUs we should expect at least one SYCL-HIP device [ ` hip:gpu ` ] :
273
+
238
274
```
239
- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
240
- [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
241
- [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.2]
275
+ [opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
276
+ [hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
242
277
```
243
278
244
279
### II. Build llama.cpp
@@ -266,6 +301,7 @@ cmake --build build --config Release -j -v
266
301
```
267
302
268
303
#### Nvidia GPU
304
+
269
305
``` sh
270
306
# Export relevant ENV variables
271
307
export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithCublas/lib:$LD_LIBRARY_PATH
@@ -283,7 +319,25 @@ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=NVIDIA -DCMAKE_C_COMPILER=icx -
283
319
284
320
# build all binary
285
321
cmake --build build --config Release -j -v
322
+ ```
323
+
324
+ #### AMD GPU
286
325
326
+ ``` sh
327
+ # Export relevant ENV variables
328
+ export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LD_LIBRARY_PATH
329
+ export LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LIBRARY_PATH
330
+ export CPLUS_INCLUDE_DIR=/path/to/oneMKL/buildWithrocBLAS/include:$CPLUS_INCLUDE_DIR
331
+
332
+ # Build LLAMA with rocBLAS acceleration through SYCL
333
+
334
+ # # AMD
335
+ # Use FP32, FP16 is not supported
336
+ # Find your GGML_SYCL_HIP_TARGET with rocminfo, under the key 'Name:'
337
+ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=AMD -DGGML_SYCL_HIP_TARGET=${GGML_SYCL_HIP_TARGET} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
338
+
339
+ # build all binary
340
+ cmake --build build --config Release -j -v
287
341
```
288
342
289
343
### III. Run the inference
@@ -586,11 +640,11 @@ use 1 SYCL GPUs: [0] with Max compute units:512
586
640
587
641
#### Build
588
642
589
- | Name | Value | Function |
590
- | --------------------| -----------------------------------| ---------------------------------------------|
591
- | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
592
- | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA | Set the SYCL target device type. |
593
- | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
643
+ | Name | Value | Function |
644
+ | --------------------| --------------------------------------- | ---------------------------------------------|
645
+ | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
646
+ | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
647
+ | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
594
648
| CMAKE_C_COMPILER | ` icx ` * (Linux)* , ` icx/cl ` * (Windows)* | Set ` icx ` compiler for SYCL code path. |
595
649
| CMAKE_CXX_COMPILER | ` icpx ` * (Linux)* , ` icx ` * (Windows)* | Set ` icpx/icx ` compiler for SYCL code path. |
596
650
0 commit comments