Skip to content

Commit a21c6fd

Browse files
authored
update guide (ggml-org#8909)
Co-authored-by: Neo Zhang <>
1 parent 33309f6 commit a21c6fd

File tree

1 file changed

+106
-39
lines changed

1 file changed

+106
-39
lines changed

docs/backend/SYCL.md

Lines changed: 106 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,22 @@ The following release is verified with good quality:
8080

8181
### Intel GPU
8282

83-
**Verified devices**
83+
SYCL backend supports Intel GPU Family:
84+
85+
- Intel Data Center Max Series
86+
- Intel Flex Series, Arc Series
87+
- Intel Built-in Arc GPU
88+
- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
89+
90+
#### Verified devices
8491

8592
| Intel GPU | Status | Verified Model |
8693
|-------------------------------|---------|---------------------------------------|
8794
| Intel Data Center Max Series | Support | Max 1550, 1100 |
8895
| Intel Data Center Flex Series | Support | Flex 170 |
8996
| Intel Arc Series | Support | Arc 770, 730M, Arc A750 |
9097
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake |
91-
| Intel iGPU | Support | iGPU in i5-1250P, i7-1260P, i7-1165G7 |
98+
| Intel iGPU | Support | iGPU in 13700k, i5-1250P, i7-1260P, i7-1165G7 |
9299

93100
*Notes:*
94101

@@ -237,6 +244,13 @@ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA devic
237244
### II. Build llama.cpp
238245

239246
#### Intel GPU
247+
248+
```
249+
./examples/sycl/build.sh
250+
```
251+
252+
or
253+
240254
```sh
241255
# Export relevant ENV variables
242256
source /opt/intel/oneapi/setvars.sh
@@ -276,23 +290,26 @@ cmake --build build --config Release -j -v
276290

277291
### III. Run the inference
278292

279-
1. Retrieve and prepare model
293+
#### Retrieve and prepare model
280294

281295
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
282296

283-
2. Enable oneAPI running environment
297+
##### Check device
298+
299+
1. Enable oneAPI running environment
284300

285301
```sh
286302
source /opt/intel/oneapi/setvars.sh
287303
```
288304

289-
3. List devices information
305+
2. List devices information
290306

291307
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
292308

293309
```sh
294310
./build/bin/llama-ls-sycl-device
295311
```
312+
296313
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
297314
```
298315
found 2 SYCL devices:
@@ -304,12 +321,37 @@ found 2 SYCL devices:
304321
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
305322
```
306323

324+
#### Choose level-zero devices
325+
326+
|Chosen Device ID|Setting|
327+
|-|-|
328+
|0|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
329+
|1|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
330+
|0 & 1|`export ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
331+
332+
#### Execute
333+
334+
Choose one of following methods to run.
335+
336+
1. Script
337+
338+
- Use device 0:
339+
340+
```sh
341+
./examples/sycl/run_llama2.sh 0
342+
```
343+
- Use multiple devices:
344+
345+
```sh
346+
./examples/sycl/run_llama2.sh
347+
```
307348

308-
4. Launch inference
349+
2. Command line
350+
Launch inference
309351

310352
There are two device selection modes:
311353

312-
- Single device: Use one device target specified by the user.
354+
- Single device: Use one device assigned by user. Default device id is 0.
313355
- Multiple devices: Automatically choose the devices with the same backend.
314356

315357
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
@@ -326,24 +368,13 @@ Examples:
326368
```sh
327369
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
328370
```
329-
or run by script:
330-
331-
```sh
332-
./examples/sycl/run_llama2.sh 0
333-
```
334371

335372
- Use multiple devices:
336373

337374
```sh
338375
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
339376
```
340377

341-
Otherwise, you can run the script:
342-
343-
```sh
344-
./examples/sycl/run_llama2.sh
345-
```
346-
347378
*Notes:*
348379

349380
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
@@ -390,7 +421,7 @@ c. Verify installation
390421
In the oneAPI command line, run the following to print the available SYCL devices:
391422

392423
```
393-
sycl-ls
424+
sycl-ls.exe
394425
```
395426

396427
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
@@ -411,6 +442,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
411442

412443
### II. Build llama.cpp
413444

445+
You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
446+
447+
Choose one of following methods to build from source code.
448+
449+
1. Script
450+
451+
```sh
452+
.\examples\sycl\win-build-sycl.bat
453+
```
454+
455+
2. CMake
456+
414457
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
415458

416459
```
@@ -425,12 +468,8 @@ cmake -B build -G "Ninja" -DGGML_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPI
425468
cmake --build build --config Release -j
426469
```
427470

428-
Otherwise, run the `win-build-sycl.bat` wrapper which encapsulates the former instructions:
429-
```sh
430-
.\examples\sycl\win-build-sycl.bat
431-
```
432-
433471
Or, use CMake presets to build:
472+
434473
```sh
435474
cmake --preset x64-windows-sycl-release
436475
cmake --build build-x64-windows-sycl-release -j --target llama-cli
@@ -442,31 +481,35 @@ cmake --preset x64-windows-sycl-debug
442481
cmake --build build-x64-windows-sycl-debug -j --target llama-cli
443482
```
444483

445-
Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
484+
3. Visual Studio
485+
486+
You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
446487

447488
*Notes:*
448489

449490
- In case of a minimal experimental setup, the user can build the inference executable only through `cmake --build build --config Release -j --target llama-cli`.
450491

451492
### III. Run the inference
452493

453-
1. Retrieve and prepare model
494+
#### Retrieve and prepare model
454495

455-
You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
496+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
456497

457-
2. Enable oneAPI running environment
498+
##### Check device
499+
500+
1. Enable oneAPI running environment
458501

459502
On the oneAPI command line window, run the following and step into the llama.cpp directory:
460503
```
461504
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
462505
```
463506

464-
3. List devices information
507+
2. List devices information
465508

466509
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
467510

468511
```
469-
build\bin\ls-sycl-device.exe
512+
build\bin\llama-ls-sycl-device.exe
470513
```
471514

472515
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
@@ -478,10 +521,28 @@ found 2 SYCL devices:
478521
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
479522
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
480523
524+
```
525+
#### Choose level-zero devices
526+
527+
|Chosen Device ID|Setting|
528+
|-|-|
529+
|0|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
530+
|1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
531+
|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
532+
533+
#### Execute
534+
535+
Choose one of following methods to run.
536+
537+
1. Script
538+
539+
```
540+
examples\sycl\win-run-llama2.bat
481541
```
482542

543+
2. Command line
483544

484-
4. Launch inference
545+
Launch inference
485546

486547
There are two device selection modes:
487548

@@ -508,11 +569,7 @@ build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website ca
508569
```
509570
build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
510571
```
511-
Otherwise, run the following wrapper script:
512572

513-
```
514-
.\examples\sycl\win-run-llama2.bat
515-
```
516573

517574
Note:
518575

@@ -526,17 +583,18 @@ Or
526583
use 1 SYCL GPUs: [0] with Max compute units:512
527584
```
528585

586+
529587
## Environment Variable
530588

531589
#### Build
532590

533591
| Name | Value | Function |
534592
|--------------------|-----------------------------------|---------------------------------------------|
535-
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path. |
593+
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
536594
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA | Set the SYCL target device type. |
537595
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
538-
| CMAKE_C_COMPILER | icx | Set *icx* compiler for SYCL code path. |
539-
| CMAKE_CXX_COMPILER | icpx *(Linux)*, icx *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
596+
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
597+
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
540598

541599
#### Runtime
542600

@@ -572,9 +630,18 @@ use 1 SYCL GPUs: [0] with Max compute units:512
572630
```
573631
Otherwise, please double-check the GPU driver installation steps.
574632

633+
- Can I report Ollama issue on Intel GPU to llama.cpp SYCL backend?
634+
635+
No. We can't support Ollama issue directly, because we aren't familiar with Ollama.
636+
637+
Sugguest reproducing on llama.cpp and report similar issue to llama.cpp. We will surpport it.
638+
639+
It's same for other projects including llama.cpp SYCL backend.
640+
641+
575642
### **GitHub contribution**:
576643
Please add the **[SYCL]** prefix/tag in issues/PRs titles to help the SYCL-team check/address them without delay.
577644

578645
## TODO
579646

580-
- Support row layer split for multiple card runs.
647+
- NA

0 commit comments

Comments
 (0)