You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -80,15 +80,22 @@ The following release is verified with good quality:
80
80
81
81
### Intel GPU
82
82
83
-
**Verified devices**
83
+
SYCL backend supports Intel GPU Family:
84
+
85
+
- Intel Data Center Max Series
86
+
- Intel Flex Series, Arc Series
87
+
- Intel Built-in Arc GPU
88
+
- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
282
296
283
-
2. Enable oneAPI running environment
297
+
##### Check device
298
+
299
+
1. Enable oneAPI running environment
284
300
285
301
```sh
286
302
source /opt/intel/oneapi/setvars.sh
287
303
```
288
304
289
-
3. List devices information
305
+
2. List devices information
290
306
291
307
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
292
308
293
309
```sh
294
310
./build/bin/llama-ls-sycl-device
295
311
```
312
+
296
313
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
- Single device: Use one device target specified by the user.
354
+
- Single device: Use one device assigned by user. Default device id is 0.
313
355
- Multiple devices: Automatically choose the devices with the same backend.
314
356
315
357
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
@@ -326,24 +368,13 @@ Examples:
326
368
```sh
327
369
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
328
370
```
329
-
or run by script:
330
-
331
-
```sh
332
-
./examples/sycl/run_llama2.sh 0
333
-
```
334
371
335
372
- Use multiple devices:
336
373
337
374
```sh
338
375
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
339
376
```
340
377
341
-
Otherwise, you can run the script:
342
-
343
-
```sh
344
-
./examples/sycl/run_llama2.sh
345
-
```
346
-
347
378
*Notes:*
348
379
349
380
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
@@ -390,7 +421,7 @@ c. Verify installation
390
421
In the oneAPI command line, run the following to print the available SYCL devices:
391
422
392
423
```
393
-
sycl-ls
424
+
sycl-ls.exe
394
425
```
395
426
396
427
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
@@ -411,6 +442,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
411
442
412
443
### II. Build llama.cpp
413
444
445
+
You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
446
+
447
+
Choose one of following methods to build from source code.
448
+
449
+
1. Script
450
+
451
+
```sh
452
+
.\examples\sycl\win-build-sycl.bat
453
+
```
454
+
455
+
2. CMake
456
+
414
457
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
484
+
3. Visual Studio
485
+
486
+
You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
446
487
447
488
*Notes:*
448
489
449
490
- In case of a minimal experimental setup, the user can build the inference executable only through `cmake --build build --config Release -j --target llama-cli`.
450
491
451
492
### III. Run the inference
452
493
453
-
1. Retrieve and prepare model
494
+
####Retrieve and prepare model
454
495
455
-
You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
496
+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
456
497
457
-
2. Enable oneAPI running environment
498
+
##### Check device
499
+
500
+
1. Enable oneAPI running environment
458
501
459
502
On the oneAPI command line window, run the following and step into the llama.cpp directory:
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
467
510
468
511
```
469
-
build\bin\ls-sycl-device.exe
512
+
build\bin\llama-ls-sycl-device.exe
470
513
```
471
514
472
515
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
0 commit comments