Skip to content

Commit 465bd6e

Browse files
Alcpzarthw
authored andcommitted
[SYCL] Initial cmake support of SYCL for AMD GPUs (ggml-org#9658)
sycl: initial cmake support of SYCL for AMD GPUs
1 parent b3aada4 commit 465bd6e

File tree

2 files changed

+90
-21
lines changed

2 files changed

+90
-21
lines changed

docs/backend/SYCL.md

Lines changed: 73 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
### Llama.cpp + SYCL
2828

29-
The llama.cpp SYCL backend is designed to support **Intel GPU** firstly. Based on the cross-platform feature of SYCL, it could support other vendor GPUs: Nvidia GPU (*AMD GPU coming*).
29+
The llama.cpp SYCL backend is designed to support **Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
3030

3131
## Recommended Release
3232

@@ -115,10 +115,18 @@ SYCL backend supports Intel GPU Family:
115115

116116
**Verified devices**
117117

118-
| Nvidia GPU | Status | Verified Model |
119-
|--------------------------|---------|----------------|
120-
| Ampere Series | Support | A100, A4000 |
121-
| Ampere Series *(Mobile)* | Support | RTX 40 Series |
118+
| Nvidia GPU | Status | Verified Model |
119+
|--------------------------|-----------|----------------|
120+
| Ampere Series | Supported | A100, A4000 |
121+
| Ampere Series *(Mobile)* | Supported | RTX 40 Series |
122+
123+
| AMD GPU | Status | Verified Model |
124+
|--------------------------|--------------|----------------|
125+
| Radeon Pro | Experimental | W6800 |
126+
| Radeon RX | Experimental | 6700 XT |
127+
128+
Note: AMD GPU support is highly experimental and is incompatible with F16.
129+
Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
122130

123131
## Docker
124132
The docker build option is currently limited to *intel GPU* targets.
@@ -190,6 +198,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
190198

191199
In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements *-found [here](README.md#cuda)-* are installed.
192200

201+
- **AMD GPU**
202+
203+
To target AMD GPUs with SYCL, the ROCm stack must be installed first.
204+
193205
2. **Install Intel® oneAPI Base toolkit**
194206

195207
- **For Intel GPU**
@@ -216,6 +228,19 @@ cmake -B buildWithCublas -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENAB
216228
cmake --build buildWithCublas --config Release
217229
```
218230

231+
- **Adding support to AMD GPUs**
232+
233+
**oneAPI Plugin**: In order to enable SYCL support on AMD GPUs, please install the [Codeplay oneAPI Plugin for AMD GPUs](https://developer.codeplay.com/products/oneapi/amd/download). As with Nvidia GPUs, the user should also make sure the plugin version matches the installed base toolkit.
234+
235+
**oneMKL for rocBlas**: The current oneMKL releases *(shipped with the oneAPI base-toolkit)* doesn't contain the rocBLAS backend. A build from source of the upstream [oneMKL](https://github.com/oneapi-src/oneMKL) with the *rocBLAS* backend enabled is thus required to run it on AMD GPUs.
236+
237+
```sh
238+
git clone https://github.com/oneapi-src/oneMKL
239+
cd oneMKL
240+
# Find your HIPTARGET with rocminfo, under the key 'Name:'
241+
cmake -B buildWithrocBLAS -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCBLAS_BACKEND=ON -DHIPTARGETS=${HIPTARGET} -DTARGET_DOMAINS=blas
242+
cmake --build buildWithrocBLAS --config Release
243+
```
219244

220245
3. **Verify installation and environment**
221246

@@ -227,22 +252,32 @@ sycl-ls
227252

228253
- **Intel GPU**
229254

230-
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [`ext_oneapi_level_zero:gpu:0`] in the sample output below:
255+
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [`level_zero:gpu`] in the sample output below:
231256

232257
```
233-
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
234-
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
235-
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
236-
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
258+
[opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
259+
[opencl:cpu][opencl:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
260+
[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
261+
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
237262
```
238263

239264
- **Nvidia GPU**
240265

241-
Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [`ext_oneapi_cuda:gpu`] as bellow:
266+
Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [`cuda:gpu`] as below:
267+
268+
```
269+
[opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
270+
[opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
271+
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
272+
```
273+
274+
- **AMD GPU**
275+
276+
For AMD GPUs we should expect at least one SYCL-HIP device [`hip:gpu`]:
277+
242278
```
243-
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
244-
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
245-
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.2]
279+
[opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
280+
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
246281
```
247282

248283
### II. Build llama.cpp
@@ -270,6 +305,7 @@ cmake --build build --config Release -j -v
270305
```
271306

272307
#### Nvidia GPU
308+
273309
```sh
274310
# Export relevant ENV variables
275311
export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithCublas/lib:$LD_LIBRARY_PATH
@@ -287,7 +323,25 @@ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=NVIDIA -DCMAKE_C_COMPILER=icx -
287323

288324
# build all binary
289325
cmake --build build --config Release -j -v
326+
```
327+
328+
#### AMD GPU
290329

330+
```sh
331+
# Export relevant ENV variables
332+
export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LD_LIBRARY_PATH
333+
export LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LIBRARY_PATH
334+
export CPLUS_INCLUDE_DIR=/path/to/oneMKL/buildWithrocBLAS/include:$CPLUS_INCLUDE_DIR
335+
336+
# Build LLAMA with rocBLAS acceleration through SYCL
337+
338+
## AMD
339+
# Use FP32, FP16 is not supported
340+
# Find your GGML_SYCL_HIP_TARGET with rocminfo, under the key 'Name:'
341+
cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=AMD -DGGML_SYCL_HIP_TARGET=${GGML_SYCL_HIP_TARGET} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
342+
343+
# build all binary
344+
cmake --build build --config Release -j -v
291345
```
292346

293347
### III. Run the inference
@@ -618,11 +672,11 @@ use 1 SYCL GPUs: [0] with Max compute units:512
618672

619673
#### Build
620674

621-
| Name | Value | Function |
622-
|--------------------|-----------------------------------|---------------------------------------------|
623-
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
624-
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA | Set the SYCL target device type. |
625-
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
675+
| Name | Value | Function |
676+
|--------------------|---------------------------------------|---------------------------------------------|
677+
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
678+
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
679+
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
626680
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
627681
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
628682

ggml/src/CMakeLists.txt

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -511,8 +511,8 @@ if (GGML_HIPBLAS)
511511
endif()
512512

513513
if (GGML_SYCL)
514-
if (NOT GGML_SYCL_TARGET MATCHES "^(INTEL|NVIDIA)$")
515-
message(FATAL_ERROR "Invalid backend chosen, supported options are INTEL or NVIDIA")
514+
if (NOT GGML_SYCL_TARGET MATCHES "^(INTEL|NVIDIA|AMD)$")
515+
message(FATAL_ERROR "Invalid backend chosen, supported options are INTEL, NVIDIA, or AMD")
516516
endif()
517517

518518
check_cxx_compiler_flag("-fsycl" SUPPORTS_SYCL)
@@ -532,6 +532,9 @@ if (GGML_SYCL)
532532
list(APPEND GGML_CDEF_PUBLIC GGML_USE_SYCL)
533533

534534
if (GGML_SYCL_F16)
535+
if (GGML_SYCL_TARGET STREQUAL "AMD")
536+
message(WARNING "AMD target does not entirely support FP16 in the SYCL backend.")
537+
endif()
535538
add_compile_definitions(GGML_SYCL_F16)
536539
endif()
537540

@@ -543,6 +546,12 @@ if (GGML_SYCL)
543546

544547
if (GGML_SYCL_TARGET STREQUAL "NVIDIA")
545548
add_compile_definitions(GGML_SYCL_WARP_SIZE=32)
549+
elseif (GGML_SYCL_TARGET STREQUAL "AMD")
550+
# INFO: Allowed Sub_group_sizes are not consistent through all
551+
# hip targets. For example, 64 is used for certain models, but the backend
552+
# does not support it.
553+
# Target archs tested working: gfx1030, gfx1031, (Only tested sub_group_size = 32)
554+
add_compile_definitions(GGML_SYCL_WARP_SIZE=32)
546555
else()
547556
add_compile_definitions(GGML_SYCL_WARP_SIZE=16)
548557
endif()
@@ -576,6 +585,12 @@ if (GGML_SYCL)
576585
elseif (GGML_SYCL_TARGET STREQUAL "NVIDIA")
577586
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl-targets=nvptx64-nvidia-cuda")
578587
list(APPEND GGML_EXTRA_LIBS_PRIVATE sycl pthread m dl onemkl)
588+
elseif (GGML_SYCL_TARGET STREQUAL "AMD")
589+
if (GGML_SYCL_HIP_TARGET STREQUAL "")
590+
message(ERROR "Can't enable SYCL hip backend, GGML_SYCL_HIP_TARGET has not been set.")
591+
endif()
592+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=${GGML_SYCL_HIP_TARGET}")
593+
list(APPEND GGML_EXTRA_LIBS_PRIVATE sycl pthread m dl onemkl)
579594
endif()
580595
endif()
581596
endif()

0 commit comments

Comments
 (0)