@@ -123,10 +123,15 @@ should be used.
123
123
124
124
There is experimental support for DPC++ for CUDA devices.
125
125
126
- To enable support for CUDA devices, the following arguments need to be added to
127
- the CMake command when building the DPC++ compiler.
126
+ To enable support for CUDA devices, follow the instructions for the Linux
127
+ DPC++ toolchain, but replace the cmake command with the following one:
128
+
128
129
129
130
```
131
+ cmake -DCMAKE_BUILD_TYPE=Release \
132
+ -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
133
+ -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$DPCPP_HOME/llvm/sycl \
134
+ -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$DPCPP_HOME/llvm/llvm-spirv \
130
135
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/ \
131
136
-DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl;libclc" \
132
137
-DSYCL_BUILD_PI_CUDA=ON \
@@ -145,6 +150,24 @@ above.
145
150
146
151
# Use DPC++ toolchain
147
152
153
+ ## Using the DPC++ toolchain on CUDA platforms
154
+
155
+ The DPC++ toolchain support on CUDA platforms is still in an experimental phase.
156
+ Currently, the DPC++ toolchain relies on having a recent OpenCL implementation
157
+ on the system in order to link applications to the DPC++ runtime.
158
+ The OpenCL implementation is not used at runtime if only the CUDA backend is
159
+ used in the application, but must be installed.
160
+
161
+ The OpenCL implementation provided by the CUDA SDK is OpenCL 1.2, which is
162
+ too old to link with the DPC++ runtime and lacks some symbols.
163
+
164
+ We recommend installing the low level CPU runtime, following the instructions
165
+ in the next section.
166
+
167
+ Instead of installing the low level CPU runtime, it is possible to build and
168
+ install the [ Khronos ICD loader] ( https://github.com/KhronosGroup/OpenCL-ICD-Loader ) ,
169
+ which contains all the symbols required.
170
+
148
171
## Install low level runtime
149
172
150
173
To run DPC++ applications on OpenCL devices, OpenCL implementation(s) must be
@@ -262,6 +285,9 @@ ninja check-all
262
285
If no OpenCL GPU/CPU runtimes are available, the corresponding tests are
263
286
skipped.
264
287
288
+ If CUDA support has been built, it is tested only if there are CUDA devices
289
+ available.
290
+
265
291
### Run Khronos\* SYCL\* conformance test suite (optional)
266
292
267
293
Khronos\* SYCL\* conformance test suite (CTS) is intended to validate
@@ -394,6 +420,19 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
394
420
This ` simple-sycl-app.exe ` application doesn't specify SYCL device for
395
421
execution, so SYCL runtime will use ` default_selector ` logic to select one
396
422
of accelerators available in the system or SYCL host device.
423
+ In this case, the behaviour of the ` default_selector ` can be altered
424
+ using the ` SYCL_BE ` environment variable, setting ` PI_CUDA ` forces
425
+ the usage of the CUDA backend (if available), ` PI_OPENCL ` will
426
+ force the usage of the OpenCL backend.
427
+
428
+ ``` bash
429
+ SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
430
+ ```
431
+
432
+ The default is the OpenCL backend if available.
433
+ If there are no OpenCL or CUDA devices available, the SYCL host device is used.
434
+ The SYCL host device executes the SYCL application directly in the host,
435
+ without using any low-level API.
397
436
398
437
Note: ` nvptx64-nvidia-cuda-sycldevice ` is usable with ` -fsycl-targets `
399
438
if clang was built with the cmake option ` SYCL_BUILD_PI_CUDA=ON ` .
@@ -403,6 +442,7 @@ if clang was built with the cmake option `SYCL_BUILD_PI_CUDA=ON`.
403
442
./simple-sycl-app.exe
404
443
The results are correct!
405
444
```
445
+
406
446
** Note** :
407
447
Currently, when the application has been built with the CUDA target, the CUDA
408
448
backend must be selected at runtime using the ` SYCL_BE ` environment variable.
@@ -411,7 +451,7 @@ backend must be selected at runtime using the `SYCL_BE` environment variable.
411
451
SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
412
452
```
413
453
414
- NOTE: DPC++/SYCL developer can specify SYCL device for execution using device
454
+ NOTE: DPC++/SYCL developers can specify SYCL device for execution using device
415
455
selectors (e.g. ` cl::sycl::cpu_selector ` , ` cl::sycl::gpu_selector ` ,
416
456
[ Intel FPGA selector(s)] ( extensions/IntelFPGA/FPGASelector.md ) ) as
417
457
explained in following section [ Code the program for a specific
0 commit comments