Skip to content

Commit de91f8e

Browse files
Neo ZhangNeo Zhang
authored andcommitted
rebase
1 parent d08c20e commit de91f8e

File tree

9 files changed

+793
-376
lines changed

9 files changed

+793
-376
lines changed

README-sycl.md

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -296,15 +296,25 @@ Similar to the native `sycl-ls`, available SYCL devices can be queried as follow
296296
A example of such log in a system with 1 *intel CPU* and 1 *intel GPU* can look like the following:
297297
```
298298
found 6 SYCL devices:
299-
| | | |Compute |Max compute|Max work|Max sub| |
300-
|ID| Device Type| Name|capability|units |group |group |Global mem size|
301-
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
302-
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
303-
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
304-
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
305-
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
306-
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
307-
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
299+
Part1:
300+
|ID| Device Type| Ver| Name|Global mem size|
301+
|--|-------------------|----|---------------------------------------|---------------|
302+
| 0| [level_zero:gpu:0]| 1.3| Intel Data Center GPU Flex 170| 16225M|
303+
| 1| [level_zero:gpu:1]| 1.3| Intel Data Center GPU Flex 170| 16225M|
304+
| 2| [opencl:gpu:0]| 3.0| Intel Data Center GPU Flex 170| 16225M|
305+
| 3| [opencl:gpu:1]| 3.0| Intel Data Center GPU Flex 170| 16225M|
306+
| 4| [opencl:cpu:0]| 3.0| Intel Xeon Gold 6346 CPU @ 3.10GHz| 540700M|
307+
| 5| [opencl:acc:0]| 1.2| Intel FPGA Emulation Device| 540700M|
308+
Part2:
309+
|ID|Max compute units|Max work group|Max subgroup| Driver version|
310+
|--|-----------------|--------------|------------|----------------------------------|
311+
| 0| 512| 1024| 32| 1.3.27642|
312+
| 1| 512| 1024| 32| 1.3.27642|
313+
| 2| 512| 1024| 32| 23.43.27642.40|
314+
| 3| 512| 1024| 32| 23.43.27642.40|
315+
| 4| 64| 8192| 64|2024.17.5.0.08_160000.xmain-hotfix|
316+
| 5| 64| 67108864| 64|2024.17.5.0.08_160000.xmain-hotfix|
317+
308318
```
309319

310320
| Attribute | Note |
@@ -477,15 +487,24 @@ build\bin\ls-sycl-device.exe
477487
The output of this command in a system with 1 *intel CPU* and 1 *intel GPU* would look like the following:
478488
```
479489
found 6 SYCL devices:
480-
| | | |Compute |Max compute|Max work|Max sub| |
481-
|ID| Device Type| Name|capability|units |group |group |Global mem size|
482-
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
483-
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
484-
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
485-
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
486-
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
487-
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
488-
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
490+
Part1:
491+
|ID| Device Type| Ver| Name|Global mem size|
492+
|--|-------------------|----|---------------------------------------|---------------|
493+
| 0| [level_zero:gpu:0]| 1.3| Intel Data Center GPU Flex 170| 16225M|
494+
| 1| [level_zero:gpu:1]| 1.3| Intel Data Center GPU Flex 170| 16225M|
495+
| 2| [opencl:gpu:0]| 3.0| Intel Data Center GPU Flex 170| 16225M|
496+
| 3| [opencl:gpu:1]| 3.0| Intel Data Center GPU Flex 170| 16225M|
497+
| 4| [opencl:cpu:0]| 3.0| Intel Xeon Gold 6346 CPU @ 3.10GHz| 540700M|
498+
| 5| [opencl:acc:0]| 1.2| Intel FPGA Emulation Device| 540700M|
499+
Part2:
500+
|ID|Max compute units|Max work group|Max subgroup| Driver version|
501+
|--|-----------------|--------------|------------|----------------------------------|
502+
| 0| 512| 1024| 32| 1.3.27642|
503+
| 1| 512| 1024| 32| 1.3.27642|
504+
| 2| 512| 1024| 32| 23.43.27642.40|
505+
| 3| 512| 1024| 32| 23.43.27642.40|
506+
| 4| 64| 8192| 64|2024.17.5.0.08_160000.xmain-hotfix|
507+
| 5| 64| 67108864| 64|2024.17.5.0.08_160000.xmain-hotfix|
489508
490509
```
491510

@@ -556,6 +575,32 @@ use 1 SYCL GPUs: [0] with Max compute units:512
556575
|-------------------|------------------|---------------------------------------------------------------------------------------------------------------------------|
557576
| GGML_SYCL_DEBUG | 0 (default) or 1 | Enable log function by macro: GGML_SYCL_DEBUG |
558577
| ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer |
578+
| GGML_SYCL_VISIBLE_DEVICES|id1,id2,...|It's like `CUDA_VISIBLE_DEVICES`, define the SYCL device ID list to visible. Like "0", "0,2", "2,1" |
579+
| ONEAPI_DEVICE_SELECTOR|Refer to [oneapi-device-selector](https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector)|be used to limit the choice of devices available when the SYCL-using application is run|
580+
581+
##### Choose SYCL Devices in Running Time
582+
583+
In SYCL running time, a physical device could be mapped to two logical devices on different running times: Level-Zero and OpenCL. So it will show more devices in SYCL view. But we need avoid to run code on these two logical devices on same physical device in same time.
584+
585+
The SCYL backend supports dGPU or iGPU in same machine.
586+
587+
##### SYCL Backend Rule:
588+
589+
|Mode|Explain|Example|Recommend Cases|Note|
590+
|-|-|-|-|-|
591+
|Normal|Use all powest devices. Default mode. No special setting.<br>SYCL backend will detect and choose the **Level-Zero** devices which have top `Max compute units`.<br> ||Most cases of normal user.||
592+
|Advanced|Allow user choose one or more SYCL devices which could be Level-Zero or OpenCL or both.<br>Set the device list by environment variable: **GGML_SYCL_VISIBLE_DEVICES**, like `CUDA_VISIBLE_DEVICES`.<br>SYCL backend will choose all devices by it.| `set/export GGML_SYCL_VISIBLE_DEVICES=1`<br>`set/export GGML_SYCL_VISIBLE_DEVICES=0,1`<br>`set/export GGML_SYCL_VISIBLE_DEVICES=2,1`|Use iGPU or both in dGPU + iGPU environment<br>Use a dGPU in mulitple dGPU environment.<br>Use one or more OpenCL devices|There is known issue of OpenCL device. WIP.|
593+
|Developer|Allow SYCL developer choose one or more SYCL devices by environment varibale **ONEAPI_DEVICE_SELECTOR** with flexiable grammar.<br>Refer to [oneapi-device-selector](https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector).|`set/export ONEAPI_DEVICE_SELECTOR=level_zero:1`<br>`set/export ONEAPI_DEVICE_SELECTOR=opencl:*`<br>`set/export ONEAPI_DEVICE_SELECTOR=opencl:gpu;level_zero:gpu`<br>|Cover the Advanced mode. It will impact **Normal** and **Advanced** modes as low level principle.<br>Flexiable grammar support more complex device environments.|There is known issue of OpenCL device. WIP.|
594+
595+
##### Parameters of Llama.cpp
596+
597+
The parameters about device choose of llama.cpp works with SYCL backend rule to decide the final result. User could use one or all chosen devices by SYCL backend rule.
598+
599+
|Device|Values|Note|
600+
|-|-|-|
601+
|Single Device|`--split-mode=none` and `--main-gpu=id`|The value of `main-gpu` must be in the chosen device lists printed out during llama.cpp startup. Like:<br>`detect 2 SYCL level-zero GPUs:[0,1]`.<br>`main-gpu` should be set to `0` or `1`.|
602+
|Multiple Device|`--split-mode=layer`|Default|
603+
559604

560605
## Known Issues
561606

examples/sycl/CMakeLists.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22
# Copyright (C) 2024 Intel Corporation
33
# SPDX-License-Identifier: MIT
44

5+
add_compile_options(-I${PROJECT_SOURCE_DIR}/ggml)
6+
add_compile_options(-I${PROJECT_SOURCE_DIR}/ggml/src)
7+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl")
8+
59
set(TARGET llama-ls-sycl-device)
610
add_executable(${TARGET} ls-sycl-device.cpp)
711
install(TARGETS ${TARGET} RUNTIME)

examples/sycl/win-run-llama2.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
66
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
77

88

9-
.\build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0
9+
.\build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0
1010

1111

ggml/include/ggml-sycl.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ GGML_API GGML_CALL void ggml_sycl_get_device_description(int device, char *des
3434
GGML_API GGML_CALL int ggml_backend_sycl_get_device_count();
3535
GGML_API GGML_CALL void ggml_backend_sycl_get_device_memory(int device, size_t *free, size_t *total);
3636

37+
GGML_API GGML_CALL int ggml_backend_sycl_get_device_index(int device_id);
38+
GGML_API GGML_CALL int ggml_backend_sycl_get_device_id(int index);
39+
GGML_API GGML_CALL void ggml_sycl_set_single_device(int main_gpu_id);
40+
3741
// SYCL doesn't support registering host memory, keep here for reference
3842
// GGML_API GGML_CALL bool ggml_backend_sycl_register_host_buffer(void * buffer, size_t size);
3943
// GGML_API GGML_CALL void ggml_backend_sycl_unregister_host_buffer(void * buffer);

0 commit comments

Comments
 (0)