Skip to content

Commit eca61f1

Browse files
authored
[SYCL][Doc] Clarify WI funcs in kernel compiler (#12891)
Clarify the SPIR-V and OpenCL kernel compiler specifications to explain how the SYCL iteration space maps to the SPIR-V / OpenCL C languages and clarify that these kernels can use their normal language mechanisms to find the current work-item's position in the iteration space. We also disallow launching a SPIR-V or OpenCL C kernel as a simple "range" kernel. This seems consistent with our view that a "range" kernel is not just a degenerate form of an nd-range kernel. Since SPIR-V and OpenCL C kernels always have access to nd-range features, it does not make sense to launch them as range kernels. This is also consistent with our decision to limit SYCL free function kernels to "nd-range" and "single-task" forms. Some other cleanup of these specifications also: * Clarify what happens when a `local_accessor` is passed as a kernel argument to a SPIR-V or OpenCL kernel. This was causing some confusion from users. * Reformat the table in the OpenCL spec describing kernel arguments so that it has the same layout as the equivalent SPIR-V table. * Fix the name of the OpenCL header file in the example.
1 parent 96d744f commit eca61f1

File tree

2 files changed

+166
-86
lines changed

2 files changed

+166
-86
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_opencl.asciidoc

Lines changed: 98 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
== Notice
2222

2323
[%hardbreaks]
24-
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
24+
Copyright (C) 2023-2024 Intel Corporation. All rights reserved.
2525

2626
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
2727
of The Khronos Group Inc.
@@ -54,11 +54,11 @@ This extension also depends on the following other SYCL extensions:
5454

5555
== Status
5656

57-
This is an experimental extension specification, intended to provide early
58-
access to features and gather community feedback. Interfaces defined in
59-
this specification are implemented in DPC++, but they are not finalized
60-
and may change incompatibly in future versions of DPC++ without prior notice.
61-
*Shipping software products should not rely on APIs defined in
57+
This is an experimental extension specification, intended to provide early
58+
access to features and gather community feedback. Interfaces defined in
59+
this specification are implemented in {dpcpp}, but they are not finalized
60+
and may change incompatibly in future versions of {dpcpp} without prior notice.
61+
*Shipping software products should not rely on APIs defined in
6262
this specification.*
6363

6464

@@ -101,7 +101,8 @@ This extension adds the `opencl` enumerator to the `source_language`
101101
enumeration, which indicates that a kernel bundle defines kernels in the
102102
OpenCL C language.
103103

104-
```
104+
[source,c++]
105+
----
105106
namespace sycl::ext::oneapi::experimental {
106107
107108
enum class source_language : /*unspecified*/ {
@@ -110,7 +111,7 @@ enum class source_language : /*unspecified*/ {
110111
};
111112
112113
} // namespace sycl::ext::oneapi::experimental
113-
```
114+
----
114115

115116
=== Source code is text format
116117

@@ -278,60 +279,106 @@ functions identify a kernel using the function name, exactly as it appears in
278279
the OpenCL C source code.
279280
For example, if the kernel is defined this way in OpenCL C:
280281

281-
```
282+
[source,c++]
283+
----
282284
__kernel
283285
void foo(__global int *in, __global int *out) {/*...*/}
284-
```
286+
----
285287

286288
Then the application's host code can query for the kernel like so:
287289

288-
```
290+
[source,c++]
291+
----
289292
sycl::kernel_bundle<sycl::bundle_state::executable> kb = /*...*/;
290293
sycl::kernel k = kb.ext_oneapi_get_kernel("foo");
291-
```
294+
----
292295

293296
=== Kernel argument restrictions
294297

295-
When a kernel is defined in OpenCL C and invoked from SYCL via a `kernel`
296-
object, the arguments to the kernel are restricted to certain types.
297-
In general, the host application passes an argument value via
298-
`handler::set_arg` using one type and the kernel receives the argument value
299-
as a corresponding OpenCL C type.
300-
The following table lists the set of valid types for these kernel arguments:
301-
298+
The following table defines the set of OpenCL C kernel argument types that are
299+
supported by this extension and explains how to pass each type of argument from
300+
SYCL.
302301

303302
[%header,cols="1,1"]
304303
|===
305-
|Type in SYCL host code
306-
|Type in OpenCL C kernel
304+
|OpenCL C type
305+
|Corresponding SYCL type
307306

308-
|One of the OpenCL scalar types (e.g. `cl_int`, `cl_float`, etc.)
309-
|The corresponding OpenCL C type (e.g. `int`, `float`, etc.)
307+
|One of the OpenCL C scalar types (e.g. `int`, `float`, etc.)
308+
|A {cpp} type that is device copyable, which has the same width and data
309+
representation.
310310

311-
|A USM pointer.
312-
|A `+__global+` pointer of the corresponding type.
311+
[_Note:_ Applications typically use the corresponding OpenCL type (e.g.
312+
`cl_int`, `cl_float`, etc.)
313+
_{endnote}_]
313314

314-
|A class (or struct) that is device copyable in SYCL whose elements are
315-
composed of OpenCL scalar types or USM pointers.
316-
|A class (or struct) passed by value whose elements have the corresponding
317-
OpenCL C types.
315+
|A `+__global+` pointer.
316+
|Either a {cpp} pointer (typically a pointer to USM memory) or an `accessor`
317+
whose target is `target::device`.
318318

319-
|An `accessor` with `target::device` whose `DataT` is an OpenCL scalar type,
320-
a USM pointer, or a device copyable class (or struct) whose elements are
321-
composed of these types.
322-
|A `+__global+` pointer to the first element of the accessor's buffer.
323-
The pointer has the corresponding OpenCL C type.
319+
|A `+__local+` pointer.
320+
|A `local_accessor`.
324321

325-
[_Note:_ The accessor's size is not passed as a kernel argument, so the host
326-
code must pass a separate argument with the size if this is desired.
322+
[_Note:_ The `local_accessor` merely conveys the size of the local memory, such
323+
that the kernel argument points to a local memory buffer of _N_ bytes, where
324+
_N_ is the value returned by `local_accessor::byte_size`.
325+
If the application wants to pass other information from the `local_accessor` to
326+
the kernel (such as the value _N_), it must pass this as separate kernel
327+
arguments.
327328
_{endnote}_]
328329

329-
|A `local_accessor` whose `DataT` is an OpenCL scalar type, a USM pointer, or a
330-
device copyable class (or struct) whose elements are composed of these types.
331-
|A `+__local+` pointer to the first element of the accessor's local memory.
332-
The pointer has the corresponding OpenCL C type.
330+
|A class (or struct) passed by value.
331+
|A {cpp} struct or class that is device copyable, which has the same size and
332+
data representation as the OpenCL C struct.
333+
334+
[_Note:_ The SYCL argument must not contain any `accessor` or `local_accessor`
335+
members because these types are not device copyable.
336+
If the OpenCL C structure contains a pointer member, the corresponding SYCL
337+
structure member is typically a USM pointer.
338+
_{endnote}_]
333339
|===
334340

341+
When data allocated on the host is accessed by the kernel via a pointer, the
342+
application must ensure that the data has the same size and representation on
343+
the host and inside the OpenCL C kernel.
344+
Applications can use the OpenCL types (e.g. `cl_int`) for this purpose.
345+
346+
=== Iteration space and work-item functions
347+
348+
A `kernel` object created from OpenCL C source code must be launched either as
349+
a single-task kernel or as an nd-range kernel.
350+
Attempting to launch such a kernel with a simple range iteration space results
351+
in undefined behavior.
352+
353+
If the kernel is launched as a single-task kernel, it is executed with a
354+
1-dimensional nd-range, with one work-group of one work-item.
355+
Because it is launched as an nd-range kernel, the kernel can use features that
356+
are normally prohibited in single-task kernels.
357+
For example, the `local_accessor` type is allowed as a kernel argument, and the
358+
kernel can use OpenCL C work-group collective functions and sub-group
359+
functions.
360+
Of course, these features have limited use because the kernel is launched with
361+
just a single work-item.
362+
363+
If the kernel is launched as an nd-range kernel, the number of work-group
364+
dimensions is the same as the number of dimensions in the `nd_range`.
365+
The global size, local size, and the number of work-groups is determined in the
366+
usual way from the `nd_range`.
367+
If the OpenCL C kernel is decorated with the `reqd_work_group_size` attribute,
368+
the local size in the `nd_range` must match this value.
369+
370+
The kernel may call the functions defined in section 6.15.1 "Work-Item
371+
Functions" of the OpenCL C specification, with the following clarification.
372+
Some of these functions take a `dimindx` parameter that selects a dimension
373+
index.
374+
This index has the opposite sense from SYCL, as described in section C.7.7
375+
"OpenCL kernel conventions and SYCL" of the core SYCL specification.
376+
To illustrate, consider a call to `get_global_size` from a kernel that is
377+
invoked with a 3-dimensional `nd_range`.
378+
Calling `get_global_size(0)` retrieves the global size from dimension 2 of the
379+
`nd_range`, and calling `get_global_size(2)` retrieves the global size from
380+
dimension 0 of the `nd_range`.
381+
335382

336383
== Examples
337384

@@ -340,9 +387,10 @@ _{endnote}_]
340387
The following example shows a simple SYCL program that defines an OpenCL C
341388
kernel as a string and then compiles and launches it.
342389

343-
```
390+
[source,c++]
391+
----
344392
#include <sycl/sycl.hpp>
345-
#include <OpenCL/opencl.h>
393+
#include <CL/opencl.h>
346394
namespace syclex = sycl::ext::oneapi::experimental;
347395
348396
int main() {
@@ -372,6 +420,7 @@ int main() {
372420
sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel");
373421
374422
constexpr int N = 4;
423+
constexpr int WGSIZE = 1;
375424
cl_int input[N] = {0, 1, 2, 3};
376425
cl_int output[N] = {};
377426
@@ -385,19 +434,21 @@ int main() {
385434
// Each argument to the kernel is a SYCL accessor.
386435
cgh.set_args(in, out);
387436
388-
// Invoke the kernel over a range.
389-
cgh.parallel_for(sycl::range{N}, k);
437+
// Invoke the kernel over an nd-range.
438+
sycl::nd_range ndr{{N}, {WGSIZE}};
439+
cgh.parallel_for(ndr, k);
390440
});
391441
}
392-
```
442+
----
393443

394444
=== Querying supported features and extensions
395445

396446
This example demonstrates how to query the version of OpenCL C that is
397447
supported, how to query the supported features, and how to query the
398448
supported extensions.
399449

400-
```
450+
[source,c++]
451+
----
401452
#include <iostream>
402453
#include <sycl/sycl.hpp>
403454
namespace syclex = sycl::ext::oneapi::experimental;
@@ -426,24 +477,4 @@ int main() {
426477
std::cout << "Device supports online compilation with the OpenCL full profile\n";
427478
428479
}
429-
```
430-
431-
432-
== Issues
433-
434-
* Do we need to document some restrictions on the OpenCL C
435-
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions[
436-
work-item functions] that the kernel can call, which depends on how the
437-
kernel was launched?
438-
For example, can a kernel launched with the simple `range` form of
439-
`parallel_for` call `get_local_size`?
440-
In OpenCL, there is only one way to launch kernels
441-
(`clEnqueueNDRangeKernel`), so it is always legal to call any of the
442-
work-item functions.
443-
If an OpenCL kernel is launched with a NULL `local_work_size` (which is
444-
roughly equivalent to SYCL's `range` form of `parallel_for`), the
445-
`get_local_size` function returns the local work-group size that is chosen by
446-
the implementation.
447-
Level Zero, similarly, has only one way to launch kernels.
448-
Therefore, maybe it is OK to let kernels in this extension call any of the
449-
work-item functions, regardless of how they are launched?
480+
----

0 commit comments

Comments
 (0)