26
26
27
27
### Llama.cpp + SYCL
28
28
29
- The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it could support other vendor GPUs: Nvidia GPU ( * AMD GPU coming * ) .
29
+ The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
30
30
31
31
## Recommended Release
32
32
@@ -115,10 +115,18 @@ SYCL backend supports Intel GPU Family:
115
115
116
116
** Verified devices**
117
117
118
- | Nvidia GPU | Status | Verified Model |
119
- | --------------------------| ---------| ----------------|
120
- | Ampere Series | Support | A100, A4000 |
121
- | Ampere Series * (Mobile)* | Support | RTX 40 Series |
118
+ | Nvidia GPU | Status | Verified Model |
119
+ | --------------------------| -----------| ----------------|
120
+ | Ampere Series | Supported | A100, A4000 |
121
+ | Ampere Series * (Mobile)* | Supported | RTX 40 Series |
122
+
123
+ | AMD GPU | Status | Verified Model |
124
+ | --------------------------| --------------| ----------------|
125
+ | Radeon Pro | Experimental | W6800 |
126
+ | Radeon RX | Experimental | 6700 XT |
127
+
128
+ Note: AMD GPU support is highly experimental and is incompatible with F16.
129
+ Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
122
130
123
131
## Docker
124
132
The docker build option is currently limited to * intel GPU* targets.
@@ -190,6 +198,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
190
198
191
199
In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements * -found [ here] ( README.md#cuda ) -* are installed.
192
200
201
+ - ** AMD GPU**
202
+
203
+ To target AMD GPUs with SYCL, the ROCm stack must be installed first.
204
+
193
205
2 . ** Install Intel® oneAPI Base toolkit**
194
206
195
207
- ** For Intel GPU**
@@ -216,6 +228,19 @@ cmake -B buildWithCublas -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENAB
216
228
cmake --build buildWithCublas --config Release
217
229
```
218
230
231
+ - ** Adding support to AMD GPUs**
232
+
233
+ ** oneAPI Plugin** : In order to enable SYCL support on AMD GPUs, please install the [ Codeplay oneAPI Plugin for AMD GPUs] ( https://developer.codeplay.com/products/oneapi/amd/download ) . As with Nvidia GPUs, the user should also make sure the plugin version matches the installed base toolkit.
234
+
235
+ ** oneMKL for rocBlas** : The current oneMKL releases * (shipped with the oneAPI base-toolkit)* doesn't contain the rocBLAS backend. A build from source of the upstream [ oneMKL] ( https://github.com/oneapi-src/oneMKL ) with the * rocBLAS* backend enabled is thus required to run it on AMD GPUs.
236
+
237
+ ``` sh
238
+ git clone https://github.com/oneapi-src/oneMKL
239
+ cd oneMKL
240
+ # Find your HIPTARGET with rocminfo, under the key 'Name:'
241
+ cmake -B buildWithrocBLAS -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCBLAS_BACKEND=ON -DHIPTARGETS=${HIPTARGET} -DTARGET_DOMAINS=blas
242
+ cmake --build buildWithrocBLAS --config Release
243
+ ```
219
244
220
245
3 . ** Verify installation and environment**
221
246
@@ -227,22 +252,32 @@ sycl-ls
227
252
228
253
- ** Intel GPU**
229
254
230
- When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` ext_oneapi_level_zero :gpu:0 ` ] in the sample output below:
255
+ When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` level_zero :gpu` ] in the sample output below:
231
256
232
257
```
233
- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
234
- [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
235
- [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
236
- [ext_oneapi_level_zero :gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
258
+ [opencl:acc][opencl :0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
259
+ [opencl:cpu][opencl :1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
260
+ [opencl:gpu][opencl :2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
261
+ [level_zero :gpu][level_zero :0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
237
262
```
238
263
239
264
- ** Nvidia GPU**
240
265
241
- Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` ext_oneapi_cuda:gpu ` ] as bellow:
266
+ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` cuda:gpu ` ] as below:
267
+
268
+ ```
269
+ [opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
270
+ [opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
271
+ [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
272
+ ```
273
+
274
+ - ** AMD GPU**
275
+
276
+ For AMD GPUs we should expect at least one SYCL-HIP device [ ` hip:gpu ` ] :
277
+
242
278
```
243
- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
244
- [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
245
- [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.2]
279
+ [opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
280
+ [hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
246
281
```
247
282
248
283
### II. Build llama.cpp
@@ -270,6 +305,7 @@ cmake --build build --config Release -j -v
270
305
```
271
306
272
307
#### Nvidia GPU
308
+
273
309
``` sh
274
310
# Export relevant ENV variables
275
311
export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithCublas/lib:$LD_LIBRARY_PATH
@@ -287,7 +323,25 @@ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=NVIDIA -DCMAKE_C_COMPILER=icx -
287
323
288
324
# build all binary
289
325
cmake --build build --config Release -j -v
326
+ ```
327
+
328
+ #### AMD GPU
290
329
330
+ ``` sh
331
+ # Export relevant ENV variables
332
+ export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LD_LIBRARY_PATH
333
+ export LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LIBRARY_PATH
334
+ export CPLUS_INCLUDE_DIR=/path/to/oneMKL/buildWithrocBLAS/include:$CPLUS_INCLUDE_DIR
335
+
336
+ # Build LLAMA with rocBLAS acceleration through SYCL
337
+
338
+ # # AMD
339
+ # Use FP32, FP16 is not supported
340
+ # Find your GGML_SYCL_HIP_TARGET with rocminfo, under the key 'Name:'
341
+ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=AMD -DGGML_SYCL_HIP_TARGET=${GGML_SYCL_HIP_TARGET} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
342
+
343
+ # build all binary
344
+ cmake --build build --config Release -j -v
291
345
```
292
346
293
347
### III. Run the inference
@@ -618,11 +672,11 @@ use 1 SYCL GPUs: [0] with Max compute units:512
618
672
619
673
#### Build
620
674
621
- | Name | Value | Function |
622
- | --------------------| -----------------------------------| ---------------------------------------------|
623
- | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
624
- | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA | Set the SYCL target device type. |
625
- | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
675
+ | Name | Value | Function |
676
+ | --------------------| --------------------------------------- | ---------------------------------------------|
677
+ | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
678
+ | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
679
+ | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
626
680
| CMAKE_C_COMPILER | ` icx ` * (Linux)* , ` icx/cl ` * (Windows)* | Set ` icx ` compiler for SYCL code path. |
627
681
| CMAKE_CXX_COMPILER | ` icpx ` * (Linux)* , ` icx ` * (Windows)* | Set ` icpx/icx ` compiler for SYCL code path. |
628
682
0 commit comments