Skip to content

Commit 6c0b287

Browse files
NeoZhangJianyuabhilash1910AidanBeltonSairMeng
authored
update readme sycl for new update (#6151)
* update readme sycl for new update * Update README-sycl.md Co-authored-by: Abhilash Majumder <[email protected]> * Update README-sycl.md Co-authored-by: Abhilash Majumder <[email protected]> * Update README-sycl.md Co-authored-by: Abhilash Majumder <[email protected]> * Update README-sycl.md Co-authored-by: Abhilash Majumder <[email protected]> * Update README-sycl.md Co-authored-by: AidanBeltonS <[email protected]> * Update README-sycl.md Co-authored-by: AidanBeltonS <[email protected]> * update by review comments * update w64devkit link * update for verify device id part * Update README-sycl.md Co-authored-by: Meng, Hengyu <[email protected]> --------- Co-authored-by: Abhilash Majumder <[email protected]> Co-authored-by: AidanBeltonS <[email protected]> Co-authored-by: Meng, Hengyu <[email protected]>
1 parent d26e8b6 commit 6c0b287

File tree

2 files changed

+93
-38
lines changed

2 files changed

+93
-38
lines changed

README-sycl.md

Lines changed: 93 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
2929
## News
3030

3131
- 2024.3
32+
- New base line is ready: [tag b2437](https://github.com/ggerganov/llama.cpp/tree/b2437).
3233
- Support multiple cards: **--split-mode**: [none|layer]; not support [row], it's on developing.
3334
- Support to assign main GPU by **--main-gpu**, replace $GGML_SYCL_DEVICE.
3435
- Support detecting all GPUs with level-zero and same top **Max compute units**.
@@ -81,7 +82,7 @@ For dGPU, please make sure the device memory is enough. For llama-2-7b.Q4_0, rec
8182
|-|-|-|
8283
|Ampere Series| Support| A100|
8384

84-
### oneMKL
85+
### oneMKL for CUDA
8586

8687
The current oneMKL release does not contain the oneMKL cuBlas backend.
8788
As a result for Nvidia GPU's oneMKL must be built from source.
@@ -254,29 +255,52 @@ Run without parameter:
254255
Check the ID in startup log, like:
255256

256257
```
257-
found 4 SYCL devices:
258-
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
259-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
260-
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
261-
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
262-
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
263-
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
264-
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
265-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
266-
258+
found 6 SYCL devices:
259+
| | | |Compute |Max compute|Max work|Max sub| |
260+
|ID| Device Type| Name|capability|units |group |group |Global mem size|
261+
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
262+
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
263+
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
264+
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
265+
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
266+
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
267+
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
267268
```
268269

269270
|Attribute|Note|
270271
|-|-|
271272
|compute capability 1.3|Level-zero running time, recommended |
272273
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
273274

274-
4. Set device ID and execute llama.cpp
275+
4. Device selection and execution of llama.cpp
276+
277+
There are two device selection modes:
278+
279+
- Single device: Use one device assigned by user.
280+
- Multiple devices: Automatically choose the devices with the same biggest Max compute units.
281+
282+
|Device selection|Parameter|
283+
|-|-|
284+
|Single device|--split-mode none --main-gpu DEVICE_ID |
285+
|Multiple devices|--split-mode layer (default)|
275286

276-
Set device ID = 0 by **GGML_SYCL_DEVICE=0**
287+
Examples:
288+
289+
- Use device 0:
277290

278291
```sh
279-
GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
292+
ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
293+
```
294+
or run by script:
295+
296+
```sh
297+
./examples/sycl/run_llama2.sh 0
298+
```
299+
300+
- Use multiple devices:
301+
302+
```sh
303+
ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
280304
```
281305
or run by script:
282306

@@ -289,13 +313,19 @@ Note:
289313
- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.
290314

291315

292-
5. Check the device ID in output
316+
5. Verify the device ID in output
317+
318+
Verify to see if the selected GPU is shown in the output, like:
293319

294-
Like:
295320
```
296-
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
321+
detect 1 SYCL GPUs: [0] with top Max compute units:512
322+
```
323+
Or
324+
```
325+
use 1 SYCL GPUs: [0] with Max compute units:512
297326
```
298327

328+
299329
## Windows
300330

301331
### Setup Environment
@@ -355,7 +385,7 @@ a. Download & install cmake for Windows: https://cmake.org/download/
355385

356386
b. Download & install mingw-w64 make for Windows provided by w64devkit
357387

358-
- Download the latest fortran version of [w64devkit](https://github.com/skeeto/w64devkit/releases).
388+
- Download the 1.19.0 version of [w64devkit](https://github.com/skeeto/w64devkit/releases/download/v1.19.0/w64devkit-1.19.0.zip).
359389

360390
- Extract `w64devkit` on your pc.
361391

@@ -430,15 +460,16 @@ build\bin\main.exe
430460
Check the ID in startup log, like:
431461

432462
```
433-
found 4 SYCL devices:
434-
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
435-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
436-
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
437-
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
438-
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
439-
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
440-
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
441-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
463+
found 6 SYCL devices:
464+
| | | |Compute |Max compute|Max work|Max sub| |
465+
|ID| Device Type| Name|capability|units |group |group |Global mem size|
466+
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
467+
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
468+
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
469+
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
470+
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
471+
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
472+
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
442473
443474
```
444475

@@ -447,13 +478,31 @@ found 4 SYCL devices:
447478
|compute capability 1.3|Level-zero running time, recommended |
448479
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
449480

450-
4. Set device ID and execute llama.cpp
451481

452-
Set device ID = 0 by **set GGML_SYCL_DEVICE=0**
482+
4. Device selection and execution of llama.cpp
483+
484+
There are two device selection modes:
485+
486+
- Single device: Use one device assigned by user.
487+
- Multiple devices: Automatically choose the devices with the same biggest Max compute units.
488+
489+
|Device selection|Parameter|
490+
|-|-|
491+
|Single device|--split-mode none --main-gpu DEVICE_ID |
492+
|Multiple devices|--split-mode layer (default)|
493+
494+
Examples:
495+
496+
- Use device 0:
453497

454498
```
455-
set GGML_SYCL_DEVICE=0
456-
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0
499+
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm none -mg 0
500+
```
501+
502+
- Use multiple devices:
503+
504+
```
505+
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
457506
```
458507
or run by script:
459508

@@ -466,11 +515,17 @@ Note:
466515
- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.
467516

468517

469-
5. Check the device ID in output
470518

471-
Like:
519+
5. Verify the device ID in output
520+
521+
Verify to see if the selected GPU is shown in the output, like:
522+
472523
```
473-
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
524+
detect 1 SYCL GPUs: [0] with top Max compute units:512
525+
```
526+
Or
527+
```
528+
use 1 SYCL GPUs: [0] with Max compute units:512
474529
```
475530

476531
## Environment Variable
@@ -489,7 +544,6 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
489544

490545
|Name|Value|Function|
491546
|-|-|-|
492-
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
493547
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
494548
|ZES_ENABLE_SYSMAN| 0 (default) or 1|Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer|
495549

@@ -507,6 +561,9 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
507561

508562
## Q&A
509563

564+
Note: please add prefix **[SYCL]** in issue title, so that we will check it as soon as possible.
565+
566+
510567
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
511568

512569
Miss to enable oneAPI running environment.
@@ -538,4 +595,4 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
538595

539596
## Todo
540597

541-
- Support multiple cards.
598+
- Support row layer split for multiple card runs.

examples/sycl/win-run-llama2.bat

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
66
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
77

88

9-
set GGML_SYCL_DEVICE=0
10-
rem set GGML_SYCL_DEBUG=1
119
.\build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0
1210

1311

0 commit comments

Comments
 (0)