21
21
== Notice
22
22
23
23
[%hardbreaks]
24
- Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
24
+ Copyright (C) 2023-2024 Intel Corporation. All rights reserved.
25
25
26
26
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27
27
of The Khronos Group Inc.
@@ -54,11 +54,11 @@ This extension also depends on the following other SYCL extensions:
54
54
55
55
== Status
56
56
57
- This is an experimental extension specification, intended to provide early
58
- access to features and gather community feedback. Interfaces defined in
59
- this specification are implemented in DPC++ , but they are not finalized
60
- and may change incompatibly in future versions of DPC++ without prior notice.
61
- *Shipping software products should not rely on APIs defined in
57
+ This is an experimental extension specification, intended to provide early
58
+ access to features and gather community feedback. Interfaces defined in
59
+ this specification are implemented in {dpcpp} , but they are not finalized
60
+ and may change incompatibly in future versions of {dpcpp} without prior notice.
61
+ *Shipping software products should not rely on APIs defined in
62
62
this specification.*
63
63
64
64
@@ -101,7 +101,8 @@ This extension adds the `opencl` enumerator to the `source_language`
101
101
enumeration, which indicates that a kernel bundle defines kernels in the
102
102
OpenCL C language.
103
103
104
- ```
104
+ [source,c++]
105
+ ----
105
106
namespace sycl::ext::oneapi::experimental {
106
107
107
108
enum class source_language : /*unspecified*/ {
@@ -110,7 +111,7 @@ enum class source_language : /*unspecified*/ {
110
111
};
111
112
112
113
} // namespace sycl::ext::oneapi::experimental
113
- ```
114
+ ----
114
115
115
116
=== Source code is text format
116
117
@@ -278,60 +279,106 @@ functions identify a kernel using the function name, exactly as it appears in
278
279
the OpenCL C source code.
279
280
For example, if the kernel is defined this way in OpenCL C:
280
281
281
- ```
282
+ [source,c++]
283
+ ----
282
284
__kernel
283
285
void foo(__global int *in, __global int *out) {/*...*/}
284
- ```
286
+ ----
285
287
286
288
Then the application's host code can query for the kernel like so:
287
289
288
- ```
290
+ [source,c++]
291
+ ----
289
292
sycl::kernel_bundle<sycl::bundle_state::executable> kb = /*...*/;
290
293
sycl::kernel k = kb.ext_oneapi_get_kernel("foo");
291
- ```
294
+ ----
292
295
293
296
=== Kernel argument restrictions
294
297
295
- When a kernel is defined in OpenCL C and invoked from SYCL via a `kernel`
296
- object, the arguments to the kernel are restricted to certain types.
297
- In general, the host application passes an argument value via
298
- `handler::set_arg` using one type and the kernel receives the argument value
299
- as a corresponding OpenCL C type.
300
- The following table lists the set of valid types for these kernel arguments:
301
-
298
+ The following table defines the set of OpenCL C kernel argument types that are
299
+ supported by this extension and explains how to pass each type of argument from
300
+ SYCL.
302
301
303
302
[%header,cols="1,1"]
304
303
|===
305
- |Type in SYCL host code
306
- |Type in OpenCL C kernel
304
+ |OpenCL C type
305
+ |Corresponding SYCL type
307
306
308
- |One of the OpenCL scalar types (e.g. `cl_int`, `cl_float`, etc.)
309
- |The corresponding OpenCL C type (e.g. `int`, `float`, etc.)
307
+ |One of the OpenCL C scalar types (e.g. `int`, `float`, etc.)
308
+ |A {cpp} type that is device copyable, which has the same width and data
309
+ representation.
310
310
311
- |A USM pointer.
312
- |A `+__global+` pointer of the corresponding type.
311
+ [_Note:_ Applications typically use the corresponding OpenCL type (e.g.
312
+ `cl_int`, `cl_float`, etc.)
313
+ _{endnote}_]
313
314
314
- |A class (or struct) that is device copyable in SYCL whose elements are
315
- composed of OpenCL scalar types or USM pointers.
316
- |A class (or struct) passed by value whose elements have the corresponding
317
- OpenCL C types.
315
+ |A `+__global+` pointer.
316
+ |Either a {cpp} pointer (typically a pointer to USM memory) or an `accessor`
317
+ whose target is `target::device`.
318
318
319
- |An `accessor` with `target::device` whose `DataT` is an OpenCL scalar type,
320
- a USM pointer, or a device copyable class (or struct) whose elements are
321
- composed of these types.
322
- |A `+__global+` pointer to the first element of the accessor's buffer.
323
- The pointer has the corresponding OpenCL C type.
319
+ |A `+__local+` pointer.
320
+ |A `local_accessor`.
324
321
325
- [_Note:_ The accessor's size is not passed as a kernel argument, so the host
326
- code must pass a separate argument with the size if this is desired.
322
+ [_Note:_ The `local_accessor` merely conveys the size of the local memory, such
323
+ that the kernel argument points to a local memory buffer of _N_ bytes, where
324
+ _N_ is the value returned by `local_accessor::byte_size`.
325
+ If the application wants to pass other information from the `local_accessor` to
326
+ the kernel (such as the value _N_), it must pass this as separate kernel
327
+ arguments.
327
328
_{endnote}_]
328
329
329
- |A `local_accessor` whose `DataT` is an OpenCL scalar type, a USM pointer, or a
330
- device copyable class (or struct) whose elements are composed of these types.
331
- |A `+__local+` pointer to the first element of the accessor's local memory.
332
- The pointer has the corresponding OpenCL C type.
330
+ |A class (or struct) passed by value.
331
+ |A {cpp} struct or class that is device copyable, which has the same size and
332
+ data representation as the OpenCL C struct.
333
+
334
+ [_Note:_ The SYCL argument must not contain any `accessor` or `local_accessor`
335
+ members because these types are not device copyable.
336
+ If the OpenCL C structure contains a pointer member, the corresponding SYCL
337
+ structure member is typically a USM pointer.
338
+ _{endnote}_]
333
339
|===
334
340
341
+ When data allocated on the host is accessed by the kernel via a pointer, the
342
+ application must ensure that the data has the same size and representation on
343
+ the host and inside the OpenCL C kernel.
344
+ Applications can use the OpenCL types (e.g. `cl_int`) for this purpose.
345
+
346
+ === Iteration space and work-item functions
347
+
348
+ A `kernel` object created from OpenCL C source code must be launched either as
349
+ a single-task kernel or as an nd-range kernel.
350
+ Attempting to launch such a kernel with a simple range iteration space results
351
+ in undefined behavior.
352
+
353
+ If the kernel is launched as a single-task kernel, it is executed with a
354
+ 1-dimensional nd-range, with one work-group of one work-item.
355
+ Because it is launched as an nd-range kernel, the kernel can use features that
356
+ are normally prohibited in single-task kernels.
357
+ For example, the `local_accessor` type is allowed as a kernel argument, and the
358
+ kernel can use OpenCL C work-group collective functions and sub-group
359
+ functions.
360
+ Of course, these features have limited use because the kernel is launched with
361
+ just a single work-item.
362
+
363
+ If the kernel is launched as an nd-range kernel, the number of work-group
364
+ dimensions is the same as the number of dimensions in the `nd_range`.
365
+ The global size, local size, and the number of work-groups is determined in the
366
+ usual way from the `nd_range`.
367
+ If the OpenCL C kernel is decorated with the `reqd_work_group_size` attribute,
368
+ the local size in the `nd_range` must match this value.
369
+
370
+ The kernel may call the functions defined in section 6.15.1 "Work-Item
371
+ Functions" of the OpenCL C specification, with the following clarification.
372
+ Some of these functions take a `dimindx` parameter that selects a dimension
373
+ index.
374
+ This index has the opposite sense from SYCL, as described in section C.7.7
375
+ "OpenCL kernel conventions and SYCL" of the core SYCL specification.
376
+ To illustrate, consider a call to `get_global_size` from a kernel that is
377
+ invoked with a 3-dimensional `nd_range`.
378
+ Calling `get_global_size(0)` retrieves the global size from dimension 2 of the
379
+ `nd_range`, and calling `get_global_size(2)` retrieves the global size from
380
+ dimension 0 of the `nd_range`.
381
+
335
382
336
383
== Examples
337
384
@@ -340,9 +387,10 @@ _{endnote}_]
340
387
The following example shows a simple SYCL program that defines an OpenCL C
341
388
kernel as a string and then compiles and launches it.
342
389
343
- ```
390
+ [source,c++]
391
+ ----
344
392
#include <sycl/sycl.hpp>
345
- #include <OpenCL /opencl.h>
393
+ #include <CL /opencl.h>
346
394
namespace syclex = sycl::ext::oneapi::experimental;
347
395
348
396
int main() {
@@ -372,6 +420,7 @@ int main() {
372
420
sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel");
373
421
374
422
constexpr int N = 4;
423
+ constexpr int WGSIZE = 1;
375
424
cl_int input[N] = {0, 1, 2, 3};
376
425
cl_int output[N] = {};
377
426
@@ -385,19 +434,21 @@ int main() {
385
434
// Each argument to the kernel is a SYCL accessor.
386
435
cgh.set_args(in, out);
387
436
388
- // Invoke the kernel over a range.
389
- cgh.parallel_for(sycl::range{N}, k);
437
+ // Invoke the kernel over an nd-range.
438
+ sycl::nd_range ndr{{N}, {WGSIZE}};
439
+ cgh.parallel_for(ndr, k);
390
440
});
391
441
}
392
- ```
442
+ ----
393
443
394
444
=== Querying supported features and extensions
395
445
396
446
This example demonstrates how to query the version of OpenCL C that is
397
447
supported, how to query the supported features, and how to query the
398
448
supported extensions.
399
449
400
- ```
450
+ [source,c++]
451
+ ----
401
452
#include <iostream>
402
453
#include <sycl/sycl.hpp>
403
454
namespace syclex = sycl::ext::oneapi::experimental;
@@ -426,24 +477,4 @@ int main() {
426
477
std::cout << "Device supports online compilation with the OpenCL full profile\n";
427
478
428
479
}
429
- ```
430
-
431
-
432
- == Issues
433
-
434
- * Do we need to document some restrictions on the OpenCL C
435
- https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions[
436
- work-item functions] that the kernel can call, which depends on how the
437
- kernel was launched?
438
- For example, can a kernel launched with the simple `range` form of
439
- `parallel_for` call `get_local_size`?
440
- In OpenCL, there is only one way to launch kernels
441
- (`clEnqueueNDRangeKernel`), so it is always legal to call any of the
442
- work-item functions.
443
- If an OpenCL kernel is launched with a NULL `local_work_size` (which is
444
- roughly equivalent to SYCL's `range` form of `parallel_for`), the
445
- `get_local_size` function returns the local work-group size that is chosen by
446
- the implementation.
447
- Level Zero, similarly, has only one way to launch kernels.
448
- Therefore, maybe it is OK to let kernels in this extension call any of the
449
- work-item functions, regardless of how they are launched?
480
+ ----
0 commit comments