You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -85,15 +85,22 @@ For CI and performance test summary, please refer to [llama.cpp CI for SYCL Back
85
85
86
86
### Intel GPU
87
87
88
-
**Verified devices**
88
+
SYCL backend supports Intel GPU Family:
89
+
90
+
- Intel Data Center Max Series
91
+
- Intel Flex Series, Arc Series
92
+
- Intel Built-in Arc GPU
93
+
- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
287
301
288
-
2. Enable oneAPI running environment
302
+
##### Check device
303
+
304
+
1. Enable oneAPI running environment
289
305
290
306
```sh
291
307
source /opt/intel/oneapi/setvars.sh
292
308
```
293
309
294
-
3. List devices information
310
+
2. List devices information
295
311
296
312
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
297
313
298
314
```sh
299
315
./build/bin/llama-ls-sycl-device
300
316
```
301
317
A example of such log in a system with 1 *intel CPU* and 1 *intel GPU* can look like the following:
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
348
391
```
349
-
or run by script:
350
-
351
-
```sh
352
-
./examples/sycl/run_llama2.sh 0
353
-
```
354
392
355
393
- Use multiple devices:
356
394
357
395
```sh
358
396
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
359
397
```
360
398
361
-
Otherwise, you can run the script:
362
-
363
-
```sh
364
-
./examples/sycl/run_llama2.sh
365
-
```
366
-
367
399
*Notes:*
368
400
369
401
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
@@ -410,7 +442,7 @@ c. Verify installation
410
442
In the oneAPI command line, run the following to print the available SYCL devices:
411
443
412
444
```
413
-
sycl-ls
445
+
sycl-ls.exe
414
446
```
415
447
416
448
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
@@ -431,6 +463,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
431
463
432
464
### II. Build llama.cpp
433
465
466
+
You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
467
+
468
+
Choose one of following methods to build from source code.
469
+
470
+
1. Script
471
+
472
+
```sh
473
+
.\examples\sycl\win-build-sycl.bat
474
+
```
475
+
476
+
2. CMake
477
+
434
478
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
505
+
3. Visual Studio
506
+
507
+
You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
466
508
467
509
*Notes:*
468
510
469
511
- In case of a minimal experimental setup, the user can build the inference executable only through `cmake --build build --config Release -j --target llama-cli`.
470
512
471
513
### III. Run the inference
472
514
473
-
1. Retrieve and prepare model
515
+
#### Retrieve and prepare model
516
+
517
+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
474
518
475
-
You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
519
+
##### Check device
476
520
477
-
2. Enable oneAPI running environment
521
+
1. Enable oneAPI running environment
478
522
479
523
On the oneAPI command line window, run the following and step into the llama.cpp directory:
0 commit comments