Skip to content

Commit 7156413

Browse files
Support multiple GPUs (split mode) on SYCL backend (#5806)
* suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments
1 parent 9bf297a commit 7156413

File tree

8 files changed

+1534
-842
lines changed

8 files changed

+1534
-842
lines changed

README-sycl.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# llama.cpp for SYCL
22

33
- [Background](#background)
4+
- [News](#news)
45
- [OS](#os)
56
- [Intel GPU](#intel-gpu)
67
- [Docker](#docker)
@@ -25,6 +26,21 @@ The llama.cpp for SYCL is used to support Intel GPUs.
2526

2627
For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
2728

29+
## News
30+
31+
- 2024.3
32+
- Support multiple cards: **--split-mode**: [none|layer]; not support [row], it's on developing.
33+
- Support to assign main GPU by **--main-gpu**, replace $GGML_SYCL_DEVICE.
34+
- Support detecting all GPUs with level-zero and same top **Max compute units**.
35+
- Support OPs
36+
- hardsigmoid
37+
- hardswish
38+
- pool2d
39+
40+
- 2024.1
41+
- Create SYCL backend for Intel GPU.
42+
- Support Windows build
43+
2844
## OS
2945

3046
|OS|Status|Verified|
@@ -449,6 +465,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
449465
|-|-|-|
450466
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
451467
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
468+
|ZES_ENABLE_SYSMAN| 0 (default) or 1|Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer|
452469

453470
## Known Issue
454471

@@ -458,6 +475,10 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
458475

459476
Solution: add **--no-mmap** or **--mmap 0**.
460477

478+
- Split-mode: [row] is not supported
479+
480+
It's on developing.
481+
461482
## Q&A
462483

463484
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.

common/common.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -640,6 +640,10 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
640640
} else if (arg_next == "layer") {
641641
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
642642
} else if (arg_next == "row") {
643+
#ifdef GGML_USE_SYCL
644+
fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n");
645+
exit(1);
646+
#endif // GGML_USE_SYCL
643647
params.split_mode = LLAMA_SPLIT_MODE_ROW;
644648
} else {
645649
invalid_param = true;

examples/llama-bench/llama-bench.cpp

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -123,20 +123,15 @@ static std::string get_gpu_info() {
123123
}
124124
#endif
125125
#ifdef GGML_USE_SYCL
126-
int device_list[GGML_SYCL_MAX_DEVICES];
127-
ggml_sycl_get_gpu_list(device_list, GGML_SYCL_MAX_DEVICES);
128-
129-
for (int i = 0; i < GGML_SYCL_MAX_DEVICES; i++) {
130-
if (device_list[i] >0 ){
131-
char buf[128];
132-
ggml_sycl_get_device_description(i, buf, sizeof(buf));
133-
id += buf;
126+
int count = ggml_backend_sycl_get_device_count();
127+
for (int i = 0; i < count; i++) {
128+
char buf[128];
129+
ggml_sycl_get_device_description(i, buf, sizeof(buf));
130+
id += buf;
131+
if (i < count - 1) {
134132
id += "/";
135133
}
136134
}
137-
if (id.length() >2 ) {
138-
id.pop_back();
139-
}
140135
#endif
141136
// TODO: other backends
142137
return id;

examples/sycl/ls-sycl-device.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
#include "ggml-sycl.h"
99

10-
int main(int argc, char ** argv) {
10+
int main() {
1111
ggml_backend_sycl_print_sycl_devices();
1212
return 0;
1313
}

examples/sycl/run-llama2.sh

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,19 @@ INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
88
source /opt/intel/oneapi/setvars.sh
99

1010
if [ $# -gt 0 ]; then
11-
export GGML_SYCL_DEVICE=$1
11+
GGML_SYCL_DEVICE=$1
1212
else
13-
export GGML_SYCL_DEVICE=0
13+
GGML_SYCL_DEVICE=0
1414
fi
15-
echo GGML_SYCL_DEVICE=$GGML_SYCL_DEVICE
15+
echo "use $GGML_SYCL_DEVICE as main GPU"
1616
#export GGML_SYCL_DEBUG=1
17-
./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0
18-
#./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 5 -e -ngl 33 -t 1 -s 0
17+
18+
19+
#ZES_ENABLE_SYSMAN=1, Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory. Recommended to use when --split-mode = layer.
20+
21+
#use all GPUs with same max compute units
22+
ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0
23+
24+
#use main GPU only
25+
#ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0 -mg $GGML_SYCL_DEVICE -sm none
1926

0 commit comments

Comments
 (0)