You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax
* refactor to disable mmap for SYCL backend
* fix compile error in other os
* refactor the solution, use host buf to fix it, instead of disable mmap
* keep to support mmap()
* use host buff to reduce malloc times
* revert to malloc/free solution, for threaad safe
| Intel Data Center Max Series | Support | Max 1550|
71
+
| Intel Data Center Max Series | Support | Max 1550, 1100|
72
72
| Intel Data Center Flex Series | Support | Flex 170 |
73
73
| Intel Arc Series | Support | Arc 770, 730M |
74
74
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake |
@@ -84,8 +84,7 @@ It has the similar design of other llama.cpp BLAS-based paths such as *OpenBLAS,
84
84
-**Execution Unit (EU)**
85
85
- If the iGPU has less than 80 EUs, the inference speed will likely be too slow for practical use.
86
86
87
-
### Nvidia GPU
88
-
The BLAS acceleration on Nvidia GPU through oneAPI can be obtained using the Nvidia plugins for oneAPI and the cuBLAS backend of the upstream oneMKL library. Details and instructions on how to setup the runtime and library can be found in [this section](#i-setup-environment)
87
+
### Other Vendor GPU
89
88
90
89
**Verified devices**
91
90
@@ -94,14 +93,9 @@ The BLAS acceleration on Nvidia GPU through oneAPI can be obtained using the Nvi
94
93
| Ampere Series | Support | A100, A4000 |
95
94
| Ampere Series *(Mobile)*| Support | RTX 40 Series |
96
95
97
-
*Notes:*
98
-
- Support for Nvidia targets through oneAPI is currently limited to Linux platforms.
99
-
100
-
- Please make sure the native oneAPI MKL *(dedicated to intel CPUs and GPUs)* is not "visible" at this stage to properly setup and use the built-from-source oneMKL with cuBLAS backend in llama.cpp for Nvidia GPUs.
101
-
102
-
103
96
## Docker
104
97
The docker build option is currently limited to *intel GPU* targets.
98
+
105
99
### Build image
106
100
```sh
107
101
# Using FP16
@@ -168,29 +162,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
168
162
-**Nvidia GPU**
169
163
170
164
In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements *-found [here](README.md#cuda)-* are installed.
171
-
Installation can be verified by running the following:
172
-
```sh
173
-
nvidia-smi
174
-
```
175
-
Please make sure at least one CUDA device is available, which can be displayed like this *(here an A100-40GB Nvidia GPU)*:
The base toolkit can be obtained from the official [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) page.
196
171
@@ -202,10 +177,10 @@ Upon a successful installation, SYCL is enabled for the available intel devices,
202
177
203
178
-**Adding support to Nvidia GPUs**
204
179
205
-
**oneAPI**: In order to enable SYCL support on Nvidia GPUs, please install the [Codeplay oneAPI Plugin for Nvidia GPUs](https://developer.codeplay.com/products/oneapi/nvidia/download). User should also make sure the plugin version matches the installed base toolkit one *(previous step)* for a seamless "oneAPI on Nvidia GPU" setup.
180
+
**oneAPI Plugin**: In order to enable SYCL support on Nvidia GPUs, please install the [Codeplay oneAPI Plugin for Nvidia GPUs](https://developer.codeplay.com/products/oneapi/nvidia/download). User should also make sure the plugin version matches the installed base toolkit one *(previous step)* for a seamless "oneAPI on Nvidia GPU" setup.
206
181
207
182
208
-
**oneMKL**: The current oneMKL releases *(shipped with the oneAPI base-toolkit)* do not contain the cuBLAS backend. A build from source of the upstream [oneMKL](https://github.com/oneapi-src/oneMKL) with the *cuBLAS* backend enabled is thus required to run it on Nvidia GPUs.
183
+
**oneMKL for cuBlas**: The current oneMKL releases *(shipped with the oneAPI base-toolkit)* do not contain the cuBLAS backend. A build from source of the upstream [oneMKL](https://github.com/oneapi-src/oneMKL) with the *cuBLAS* backend enabled is thus required to run it on Nvidia GPUs.
209
184
210
185
```sh
211
186
git clone https://github.com/oneapi-src/oneMKL
@@ -237,7 +212,7 @@ When targeting an intel GPU, the user should expect one or more level-zero devic
237
212
238
213
-**Nvidia GPU**
239
214
240
-
Similarly, user targetting Nvidia GPUs should expect at least one SYCL-CUDA device [`ext_oneapi_cuda:gpu`] as bellow:
215
+
Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [`ext_oneapi_cuda:gpu`] as bellow:
@@ -357,7 +339,6 @@ Otherwise, you can run the script:
357
339
358
340
*Notes:*
359
341
360
-
- By default, `mmap` is used to read the model file. In some cases, it causes runtime hang issues. Please disable it by passing `--no-mmap` to the `/bin/main` if faced with the issue.
361
342
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
Otherwise, run the `win-build-sycl.bat` wrapper which encapsulates the former instructions:
@@ -525,7 +506,6 @@ Otherwise, run the following wrapper script:
525
506
526
507
Note:
527
508
528
-
- By default, `mmap` is used to read the model file. In some cases, it causes runtime hang issues. Please disable it by passing `--no-mmap` to the `main.exe` if faced with the issue.
529
509
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
530
510
531
511
```sh
@@ -557,12 +537,6 @@ use 1 SYCL GPUs: [0] with Max compute units:512
557
537
558
538
## Known Issues
559
539
560
-
- Hanging during startup
561
-
562
-
llama.cpp uses *mmap* as the default mode for reading the model file and copying it to the GPU. In some systems, `memcpy` might behave abnormally and therefore hang.
563
-
564
-
-**Solution**: add `--no-mmap` or `--mmap 0` flag to the `main` executable.
565
-
566
540
-`Split-mode:[row]` is not supported.
567
541
568
542
## Q&A
@@ -574,7 +548,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
574
548
575
549
- General compiler error:
576
550
577
-
- Remove build folder or try a clean-build.
551
+
- Remove **build** folder or try a clean-build.
578
552
579
553
- I can **not** see `[ext_oneapi_level_zero:gpu]` afer installing the GPU driver on Linux.
0 commit comments