You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -85,15 +85,22 @@ For CI and performance test summary, please refer to [llama.cpp CI for SYCL Back
85
85
86
86
### Intel GPU
87
87
88
-
**Verified devices**
88
+
SYCL backend supports Intel GPU Family:
89
+
90
+
- Intel Data Center Max Series
91
+
- Intel Flex Series, Arc Series
92
+
- Intel Built-in Arc GPU
93
+
- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
287
301
288
-
2. Enable oneAPI running environment
302
+
##### Check device
303
+
304
+
1. Enable oneAPI running environment
289
305
290
306
```sh
291
307
source /opt/intel/oneapi/setvars.sh
292
308
```
293
309
294
-
3. List devices information
310
+
2. List devices information
295
311
296
312
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
297
313
298
314
```sh
299
315
./build/bin/llama-ls-sycl-device
300
316
```
301
-
A example of such log in a system with 1 *intel CPU* and 1 *intel GPU* can look like the following:
317
+
318
+
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
- Single device: Use one device target specified by the user.
335
-
- Multiple devices: Automatically select the devices with the same largest Max compute-units.
376
+
- Single device: Use one device assigned by user. Default device id is 0.
377
+
- Multiple devices: Automatically choose the devices with the same Max compute-units..
378
+
379
+
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
348
392
```
349
-
or run by script:
350
-
351
-
```sh
352
-
./examples/sycl/run_llama2.sh 0
353
-
```
354
393
355
394
- Use multiple devices:
356
395
357
396
```sh
358
397
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
359
398
```
360
399
361
-
Otherwise, you can run the script:
362
-
363
-
```sh
364
-
./examples/sycl/run_llama2.sh
365
-
```
366
-
367
400
*Notes:*
368
401
369
402
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
@@ -410,7 +443,7 @@ c. Verify installation
410
443
In the oneAPI command line, run the following to print the available SYCL devices:
411
444
412
445
```
413
-
sycl-ls
446
+
sycl-ls.exe
414
447
```
415
448
416
449
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
@@ -431,6 +464,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
431
464
432
465
### II. Build llama.cpp
433
466
467
+
You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
468
+
469
+
Choose one of following methods to build from source code.
470
+
471
+
1. Script
472
+
473
+
```sh
474
+
.\examples\sycl\win-build-sycl.bat
475
+
```
476
+
477
+
2. CMake
478
+
434
479
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
506
+
3. Visual Studio
507
+
508
+
You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
466
509
467
510
*Notes:*
468
511
469
512
- In case of a minimal experimental setup, the user can build the inference executable only through `cmake --build build --config Release -j --target llama-cli`.
470
513
471
514
### III. Run the inference
472
515
473
-
1. Retrieve and prepare model
516
+
####Retrieve and prepare model
474
517
475
-
You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
518
+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
519
+
520
+
##### Check device
476
521
477
-
2. Enable oneAPI running environment
522
+
1. Enable oneAPI running environment
478
523
479
524
On the oneAPI command line window, run the following and step into the llama.cpp directory:
0 commit comments