Skip to content

Commit af3ba5d

Browse files
[SYCL] update guide of SYCL backend (#5254)
* update guide for make installation, memory, gguf model link, rm todo for windows build * add vs install requirement * update for gpu device check * update help of llama-bench * fix grammer issues
1 parent e1e7210 commit af3ba5d

File tree

3 files changed

+77
-23
lines changed

3 files changed

+77
-23
lines changed

README-sycl.md

Lines changed: 55 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
4242

4343
## Intel GPU
4444

45+
### Verified
46+
4547
|Intel GPU| Status | Verified Model|
4648
|-|-|-|
4749
|Intel Data Center Max Series| Support| Max 1550|
@@ -50,6 +52,17 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
5052
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
5153
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
5254

55+
Note: If the EUs (Execution Unit) in iGPU is less than 80, the inference speed will be too slow to use.
56+
57+
### Memory
58+
59+
The memory is a limitation to run LLM on GPUs.
60+
61+
When run llama.cpp, there is print log to show the applied memory on GPU. You could know how much memory to be used in your case. Like `llm_load_tensors: buffer size = 3577.56 MiB`.
62+
63+
For iGPU, please make sure the shared memory from host memory is enough. For llama-2-7b.Q4_0, recommend the host memory is 8GB+.
64+
65+
For dGPU, please make sure the device memory is enough. For llama-2-7b.Q4_0, recommend the device memory is 4GB+.
5366

5467
## Linux
5568

@@ -105,7 +118,7 @@ source /opt/intel/oneapi/setvars.sh
105118
sycl-ls
106119
```
107120

108-
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
121+
There should be one or more level-zero devices. Please confirm that at least one GPU is present, like **[ext_oneapi_level_zero:gpu:0]**.
109122

110123
Output (example):
111124
```
@@ -152,6 +165,8 @@ Note:
152165

153166
1. Put model file to folder **models**
154167

168+
You could download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) as example.
169+
155170
2. Enable oneAPI running environment
156171

157172
```
@@ -223,7 +238,13 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
223238

224239
Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
225240

226-
2. Install Intel® oneAPI Base toolkit.
241+
Note: **The driver is mandatory for compute function**.
242+
243+
2. Install Visual Studio.
244+
245+
Please install [Visual Studio](https://visualstudio.microsoft.com/) which impact oneAPI environment enabling in Windows.
246+
247+
3. Install Intel® oneAPI Base toolkit.
227248

228249
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
229250

@@ -252,23 +273,29 @@ In oneAPI command line:
252273
sycl-ls
253274
```
254275

255-
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
276+
There should be one or more level-zero devices. Please confirm that at least one GPU is present, like **[ext_oneapi_level_zero:gpu:0]**.
256277

257278
Output (example):
258279
```
259280
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
260281
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
261282
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
262283
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
263-
264284
```
265285

266-
3. Install cmake & make
286+
4. Install cmake & make
287+
288+
a. Download & install cmake for Windows: https://cmake.org/download/
267289

268-
a. Download & install cmake for windows: https://cmake.org/download/
290+
b. Download & install make for Windows provided by mingw-w64
269291

270-
b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
292+
- Download binary package for Windows in https://github.com/niXman/mingw-builds-binaries/releases.
271293

294+
Like [x86_64-13.2.0-release-win32-seh-msvcrt-rt_v11-rev1.7z](https://github.com/niXman/mingw-builds-binaries/releases/download/13.2.0-rt_v11-rev1/x86_64-13.2.0-release-win32-seh-msvcrt-rt_v11-rev1.7z).
295+
296+
- Unzip the binary package. In the **bin** sub-folder and rename **xxx-make.exe** to **make.exe**.
297+
298+
- Add the **bin** folder path in the Windows system PATH environment.
272299

273300
### Build locally:
274301

@@ -309,6 +336,8 @@ Note:
309336

310337
1. Put model file to folder **models**
311338

339+
You could download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) as example.
340+
312341
2. Enable oneAPI running environment
313342

314343
- In Search, input 'oneAPI'.
@@ -419,8 +448,25 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
419448

420449
Miss to enable oneAPI running environment.
421450

422-
## Todo
451+
- Meet compile error.
452+
453+
Remove folder **build** and try again.
454+
455+
- I can **not** see **[ext_oneapi_level_zero:gpu:0]** afer install GPU driver in Linux.
423456

424-
- Support to build in Windows.
457+
Please run **sudo sycl-ls**.
458+
459+
If you see it in result, please add video/render group to your ID:
460+
461+
```
462+
sudo usermod -aG render username
463+
sudo usermod -aG video username
464+
```
465+
466+
Then **relogin**.
467+
468+
If you do not see it, please check the installation GPU steps again.
469+
470+
## Todo
425471

426472
- Support multiple cards.

examples/llama-bench/README.md

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,19 +23,23 @@ usage: ./llama-bench [options]
2323
2424
options:
2525
-h, --help
26-
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
27-
-p, --n-prompt <n> (default: 512)
28-
-n, --n-gen <n> (default: 128)
29-
-b, --batch-size <n> (default: 512)
30-
--memory-f32 <0|1> (default: 0)
31-
-t, --threads <n> (default: 16)
32-
-ngl N, --n-gpu-layers <n> (default: 99)
33-
-mg i, --main-gpu <i> (default: 0)
34-
-mmq, --mul-mat-q <0|1> (default: 1)
35-
-ts, --tensor_split <ts0/ts1/..>
36-
-r, --repetitions <n> (default: 5)
37-
-o, --output <csv|json|md|sql> (default: md)
38-
-v, --verbose (default: 0)
26+
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
27+
-p, --n-prompt <n> (default: 512)
28+
-n, --n-gen <n> (default: 128)
29+
-b, --batch-size <n> (default: 512)
30+
-ctk <t>, --cache-type-k <t> (default: f16)
31+
-ctv <t>, --cache-type-v <t> (default: f16)
32+
-t, --threads <n> (default: 112)
33+
-ngl, --n-gpu-layers <n> (default: 99)
34+
-sm, --split-mode <none|layer|row> (default: layer)
35+
-mg, --main-gpu <i> (default: 0)
36+
-nkvo, --no-kv-offload <0|1> (default: 0)
37+
-mmp, --mmap <0|1> (default: 1)
38+
-mmq, --mul-mat-q <0|1> (default: 1)
39+
-ts, --tensor_split <ts0/ts1/..> (default: 0)
40+
-r, --repetitions <n> (default: 5)
41+
-o, --output <csv|json|md|sql> (default: md)
42+
-v, --verbose (default: 0)
3943
4044
Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
4145
```
@@ -51,6 +55,10 @@ Each test is repeated the number of times given by `-r`, and the results are ave
5155

5256
For a description of the other options, see the [main example](../main/README.md).
5357

58+
Note:
59+
60+
- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`.
61+
5462
## Examples
5563

5664
### Text generation with different models

examples/sycl/win-run-llama2.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
:: Copyright (C) 2024 Intel Corporation
33
:: SPDX-License-Identifier: MIT
44

5-
INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
5+
set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
66
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
77

88

0 commit comments

Comments
 (0)