Skip to content

Commit 74f8780

Browse files
committed
update guide (ggml-org#8909)
Co-authored-by: Neo Zhang <>
1 parent f7aac82 commit 74f8780

File tree

1 file changed

+109
-45
lines changed

1 file changed

+109
-45
lines changed

docs/backend/SYCL.md

Lines changed: 109 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -85,15 +85,22 @@ For CI and performance test summary, please refer to [llama.cpp CI for SYCL Back
8585

8686
### Intel GPU
8787

88-
**Verified devices**
88+
SYCL backend supports Intel GPU Family:
89+
90+
- Intel Data Center Max Series
91+
- Intel Flex Series, Arc Series
92+
- Intel Built-in Arc GPU
93+
- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
94+
95+
#### Verified devices
8996

9097
| Intel GPU | Status | Verified Model |
9198
|-------------------------------|---------|---------------------------------------|
9299
| Intel Data Center Max Series | Support | Max 1550, 1100 |
93100
| Intel Data Center Flex Series | Support | Flex 170 |
94101
| Intel Arc Series | Support | Arc 770, 730M, Arc A750 |
95102
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake |
96-
| Intel iGPU | Support | iGPU in i5-1250P, i7-1260P, i7-1165G7 |
103+
| Intel iGPU | Support | iGPU in 13700k, i5-1250P, i7-1260P, i7-1165G7 |
97104

98105
*Notes:*
99106

@@ -242,6 +249,13 @@ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA devic
242249
### II. Build llama.cpp
243250

244251
#### Intel GPU
252+
253+
```
254+
./examples/sycl/build.sh
255+
```
256+
257+
or
258+
245259
```sh
246260
# Export relevant ENV variables
247261
source /opt/intel/oneapi/setvars.sh
@@ -281,24 +295,27 @@ cmake --build build --config Release -j -v
281295

282296
### III. Run the inference
283297

284-
1. Retrieve and prepare model
298+
#### Retrieve and prepare model
285299

286300
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
287301

288-
2. Enable oneAPI running environment
302+
##### Check device
303+
304+
1. Enable oneAPI running environment
289305

290306
```sh
291307
source /opt/intel/oneapi/setvars.sh
292308
```
293309

294-
3. List devices information
310+
2. List devices information
295311

296312
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
297313

298314
```sh
299315
./build/bin/llama-ls-sycl-device
300316
```
301-
A example of such log in a system with 1 *intel CPU* and 1 *intel GPU* can look like the following:
317+
318+
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
302319
```
303320
found 6 SYCL devices:
304321
Part1:
@@ -326,13 +343,40 @@ Part2:
326343
|------------------------|-------------------------------------------------------------|
327344
| compute capability 1.3 | Level-zero driver/runtime, recommended |
328345
| compute capability 3.0 | OpenCL driver/runtime, slower than level-zero in most cases |
346+
#### Choose level-zero devices
347+
348+
|Chosen Device ID|Setting|
349+
|-|-|
350+
|0|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
351+
|1|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
352+
|0 & 1|`export ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
353+
354+
#### Execute
355+
356+
Choose one of following methods to run.
329357

330-
4. Launch inference
358+
1. Script
359+
360+
- Use device 0:
361+
362+
```sh
363+
./examples/sycl/run_llama2.sh 0
364+
```
365+
- Use multiple devices:
366+
367+
```sh
368+
./examples/sycl/run_llama2.sh
369+
```
370+
371+
2. Command line
372+
Launch inference
331373

332374
There are two device selection modes:
333375

334-
- Single device: Use one device target specified by the user.
335-
- Multiple devices: Automatically select the devices with the same largest Max compute-units.
376+
- Single device: Use one device assigned by user. Default device id is 0.
377+
- Multiple devices: Automatically choose the devices with the same Max compute-units..
378+
379+
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
336380

337381
| Device selection | Parameter |
338382
|------------------|----------------------------------------|
@@ -346,24 +390,13 @@ Examples:
346390
```sh
347391
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
348392
```
349-
or run by script:
350-
351-
```sh
352-
./examples/sycl/run_llama2.sh 0
353-
```
354393

355394
- Use multiple devices:
356395

357396
```sh
358397
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
359398
```
360399

361-
Otherwise, you can run the script:
362-
363-
```sh
364-
./examples/sycl/run_llama2.sh
365-
```
366-
367400
*Notes:*
368401

369402
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
@@ -410,7 +443,7 @@ c. Verify installation
410443
In the oneAPI command line, run the following to print the available SYCL devices:
411444

412445
```
413-
sycl-ls
446+
sycl-ls.exe
414447
```
415448

416449
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
@@ -431,6 +464,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
431464

432465
### II. Build llama.cpp
433466

467+
You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
468+
469+
Choose one of following methods to build from source code.
470+
471+
1. Script
472+
473+
```sh
474+
.\examples\sycl\win-build-sycl.bat
475+
```
476+
477+
2. CMake
478+
434479
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
435480

436481
```
@@ -445,12 +490,8 @@ cmake -B build -G "Ninja" -DGGML_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPI
445490
cmake --build build --config Release -j
446491
```
447492

448-
Otherwise, run the `win-build-sycl.bat` wrapper which encapsulates the former instructions:
449-
```sh
450-
.\examples\sycl\win-build-sycl.bat
451-
```
452-
453493
Or, use CMake presets to build:
494+
454495
```sh
455496
cmake --preset x64-windows-sycl-release
456497
cmake --build build-x64-windows-sycl-release -j --target llama-cli
@@ -462,31 +503,35 @@ cmake --preset x64-windows-sycl-debug
462503
cmake --build build-x64-windows-sycl-debug -j --target llama-cli
463504
```
464505

465-
Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
506+
3. Visual Studio
507+
508+
You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
466509

467510
*Notes:*
468511

469512
- In case of a minimal experimental setup, the user can build the inference executable only through `cmake --build build --config Release -j --target llama-cli`.
470513

471514
### III. Run the inference
472515

473-
1. Retrieve and prepare model
516+
#### Retrieve and prepare model
474517

475-
You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
518+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
519+
520+
##### Check device
476521

477-
2. Enable oneAPI running environment
522+
1. Enable oneAPI running environment
478523

479524
On the oneAPI command line window, run the following and step into the llama.cpp directory:
480525
```
481526
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
482527
```
483528

484-
3. List devices information
529+
2. List devices information
485530

486531
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
487532

488533
```
489-
build\bin\ls-sycl-device.exe
534+
build\bin\llama-ls-sycl-device.exe
490535
```
491536

492537
The output of this command in a system with 1 *intel CPU* and 1 *intel GPU* would look like the following:
@@ -512,14 +557,27 @@ Part2:
512557
| 5| 64| 67108864| 64|2024.17.5.0.08_160000.xmain-hotfix|
513558
514559
```
560+
#### Choose level-zero devices
561+
562+
|Chosen Device ID|Setting|
563+
|-|-|
564+
|0|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
565+
|1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
566+
|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
567+
568+
#### Execute
515569

516-
| Attribute | Note |
517-
|------------------------|-----------------------------------------------------------|
518-
| compute capability 1.3 | Level-zero running time, recommended |
519-
| compute capability 3.0 | OpenCL running time, slower than level-zero in most cases |
570+
Choose one of following methods to run.
571+
572+
1. Script
573+
574+
```
575+
examples\sycl\win-run-llama2.bat
576+
```
520577

578+
2. Command line
521579

522-
4. Launch inference
580+
Launch inference
523581

524582
There are two device selection modes:
525583

@@ -544,11 +602,7 @@ build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website ca
544602
```
545603
build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
546604
```
547-
Otherwise, run the following wrapper script:
548605

549-
```
550-
.\examples\sycl\win-run-llama2.bat
551-
```
552606

553607
Note:
554608

@@ -562,17 +616,18 @@ Or
562616
use 1 SYCL GPUs: [0] with Max compute units:512
563617
```
564618

619+
565620
## Environment Variable
566621

567622
#### Build
568623

569624
| Name | Value | Function |
570625
|--------------------|-----------------------------------|---------------------------------------------|
571-
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path. |
626+
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
572627
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA | Set the SYCL target device type. |
573628
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
574-
| CMAKE_C_COMPILER | icx | Set *icx* compiler for SYCL code path. |
575-
| CMAKE_CXX_COMPILER | icpx *(Linux)*, icx *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
629+
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
630+
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
576631

577632
#### Runtime
578633

@@ -634,9 +689,18 @@ The parameters about device choose of llama.cpp works with SYCL backend rule to
634689
```
635690
Otherwise, please double-check the GPU driver installation steps.
636691

692+
- Can I report Ollama issue on Intel GPU to llama.cpp SYCL backend?
693+
694+
No. We can't support Ollama issue directly, because we aren't familiar with Ollama.
695+
696+
Sugguest reproducing on llama.cpp and report similar issue to llama.cpp. We will surpport it.
697+
698+
It's same for other projects including llama.cpp SYCL backend.
699+
700+
637701
### **GitHub contribution**:
638702
Please add the **[SYCL]** prefix/tag in issues/PRs titles to help the SYCL-team check/address them without delay.
639703

640704
## TODO
641705

642-
- Support row layer split for multiple card runs.
706+
- NA

0 commit comments

Comments
 (0)