Skip to content

Commit cda2d8f

Browse files
committed
Expand design to include exported device functions
Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in intel#3210.
1 parent 333bf43 commit cda2d8f

File tree

1 file changed

+122
-99
lines changed

1 file changed

+122
-99
lines changed

sycl/doc/OptionalDeviceFeatures.md

Lines changed: 122 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ attributes"][2] and [section 5.8.2 "Device function attributes"][3].
2222

2323
There are several categories of requirements covered by this design:
2424

25-
* The front-end compiler must issue a diagnostic in some cases when a kernel
26-
uses an optional feature. However, the front-end compiler must **not**
27-
generate a diagnostic in other cases.
25+
* The front-end compiler must issue a diagnostic in some cases when a kernel or
26+
device function uses an optional feature. However, the front-end compiler
27+
must **not** generate a diagnostic in other cases.
2828

2929
* The runtime must raise an exception when a kernel using optional features
3030
is submitted to a device that does not support those features. This
@@ -131,7 +131,7 @@ is an optional feature which is only supported on devices that have the
131131
describe the set of optional features that a kernel uses (with the exception
132132
of the required work-group or sub-group size).
133133

134-
As will see later, it will be very helpful to decorate all APIs in DPC++
134+
As we will see later, it will be very helpful to decorate all APIs in DPC++
135135
headers that correspond to optional kernel features with the
136136
`[[sycl::requires()]]` attribute. For example, the declaration of the
137137
`sycl::half` type would look like this:
@@ -248,40 +248,42 @@ AOT cases.
248248

249249
### JIT case
250250

251-
The JIT case requires some change to the way kernels are bundled together into
252-
device images. Currently, kernels are bundled together regardless of the
253-
features they use, and this can lead to JIT errors due to speculative
254-
compilation. Consider a device image that contains two kernels: `K1` uses no
255-
optional features and `K2` uses an optional feature that corresponds to aspect
256-
`A`. Now consider that the application submits kernel `K1` to a device that
257-
does not support aspect `A`. Since the two kernels are bundled together into
258-
one device image, the runtime really compiles both kernels for the device.
259-
Currently, this will raise a JIT exception because the compilation of kernel
260-
`K2` will fail when compiled for a device that does not support aspect `A`.
261-
262-
There are two ways to solve this problem. One is to change the way kernels are
263-
bundled into device images such that we never bundled two kernels together
264-
unless they required exactly the same set of device aspects. Doing this would
265-
avoid the error described above. However, we have elected for a different
266-
solution.
267-
268-
Instead, we will allow kernels to be bundled together as they currently are,
269-
but we will introduce extra decorations into the generated SPIR-V that allow
270-
the JIT compiler to discard kernels which require aspects that the device does
271-
not support. Although this solution requires an extension to SPIR-V, we think
272-
it is the better direction because it is aligned with the [device-if][4]
273-
feature, which will also requires this same SPIR-V extension.
251+
The JIT case requires some change to the way kernels and device functions are
252+
bundled together into device images. Currently, kernels and device functions
253+
are bundled together regardless of the features they use, and this can lead to
254+
JIT errors due to speculative compilation. Consider a device image that
255+
contains two kernels: `K1` uses no optional features and `K2` uses an optional
256+
feature that corresponds to aspect `A`. Now consider that the application
257+
submits kernel `K1` to a device that does not support aspect `A`. Since the
258+
two kernels are bundled together into one device image, the runtime really
259+
compiles both kernels for the device. Currently, this will raise a JIT
260+
exception because the compilation of kernel `K2` will fail when compiled for a
261+
device that does not support aspect `A`.
262+
263+
There are two ways to solve this problem. One is to change the way kernels and
264+
device functions are bundled into device images such that we never bundled two
265+
kernels or device functions together unless they require exactly the same set
266+
of device aspects. Doing this would avoid the error described above. However,
267+
we have elected for a different solution.
268+
269+
Instead, we will allow kernels and device functions to be bundled together as
270+
they currently are, but we will introduce extra decorations into the generated
271+
SPIR-V that allow the JIT compiler to discard kernels and device functions
272+
which require aspects that the device does not support. Although this solution
273+
requires an extension to SPIR-V, we think it is the better direction because it
274+
is aligned with the [device-if][4] feature, which will also requires this same
275+
SPIR-V extension.
274276

275277
[4]: <https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/DeviceIf/device_if.asciidoc>
276278

277279
The idea is to emit a SPIR-V specialization constant for each aspect that is
278-
required by a kernel in the device image. We then introduce a new SPIR-V
279-
"decoration" that tells the JIT compiler to discard a function if a
280-
specialization constant is `False`. The DPC++ runtime will set the values of
280+
required by a kernel or device function in the device image. We then introduce
281+
a new SPIR-V "decoration" that tells the JIT compiler to discard a function if
282+
a specialization constant is `False`. The DPC++ runtime will set the values of
281283
the specialization constants according to the target device, thus the JIT
282-
compiler discards (and does not compile) any kernels that use features which
283-
are not supported on that device. This avoids errors due to speculative
284-
compilation of kernels.
284+
compiler discards (and does not compile) any kernels or device functions that
285+
use features which are not supported on that device. This avoids errors due to
286+
speculative compilation of kernels or device functions.
285287

286288
#### Representation in SPIR-V
287289

@@ -343,12 +345,19 @@ mechanism.
343345
The post-link tool must be modified to add the SPIR-V `ConditionalINTEL`
344346
decorations to the appropriate functions and to emit the specialization
345347
constants that these decorations reference. This can be done with two passes
346-
over each kernel's static call tree.
348+
over the static call tree for each kernel and each exported device function.
347349

348-
The first pass operates on each kernel, iterating over all the functions in the
349-
kernel's static call tree to form the union of all aspects required by kernel.
350-
If the kernel is decorated with the `[[sycl::requires()]]`, those aspects are
351-
also added to the union.
350+
**NOTE**: In this context, "exported device function" means a device function
351+
that is exported from a shared library as defined by [Device Code Dynamic
352+
Linking][6].
353+
354+
[6]: <https://github.com/intel/llvm/pull/3210>
355+
356+
The first pass operates on each kernel and each exported device function,
357+
iterating over all the functions in the static call tree of that kernel or
358+
exported device function to form the union of all aspects it requires. If the
359+
kernel or exported device function is decorated with the
360+
`[[sycl::requires()]]`, those aspects are also added to the union.
352361

353362
**NOTE**: This first pass traverses the static call tree even for kernels that
354363
are decorated with the `[[sycl::requires()]]` attribute. Although the
@@ -360,43 +369,45 @@ aspects in the post-link tool because we have visibility to all device
360369
functions in the kernel's static call tree, even those that are defined in
361370
other translation units.
362371

363-
Once we have the full set of aspects used by each kernel, we do the following
364-
for each kernel:
372+
Once we have the full set of aspects used by each kernel and exported device
373+
function, we do the following for each:
365374

366-
* For each of the kernel's required aspects, emit an `OpSpecConstantTrue` op to
375+
* For each required aspects, emit an `OpSpecConstantTrue` instruction to
367376
represent this requirement. We maintain a set of "required specialization
368-
constants" for each kernel, which is used later. Add this specialization
369-
constant to that set. In addition, add an "aspect" entry to the device
370-
image's "SYCL/kernel reqs" property set, as described below. (We could
371-
instead emit `OpSpecConstantFalse`. It doesn't matter because the runtime
372-
will always provide a value for these specialization constants.)
377+
constants" for each kernel or exported device function, which is used later.
378+
Add this specialization constant to that set. In addition, add an "aspect"
379+
entry to the device image's "SYCL/requirements" property set, as described
380+
below. (We could instead emit `OpSpecConstantFalse`. It doesn't matter
381+
because the runtime will always provide a value for these specialization
382+
constants.)
373383

374384
* If the kernel function is decorated with the `[[reqd_work_group_size()]]`
375385
attribute, emit an `OpSpecConstantTrue` op to represent this requirement and
376386
add this also to the kernel's set of required specialization constants. In
377387
addition, add a "reqd\_work\_group\_size" entry to the device image's
378-
"SYCL/kernel reqs" property set.
388+
"SYCL/requirements" property set.
379389

380390
* If the kernel function is decorated with the `[[reqd_sub_group_size()]]`
381391
attribute, emit an `OpSpecConstantTrue` op to represent this requirement and
382392
add this also to the kernel's set of required specialization constants. In
383393
addition, add a "reqd\_sub\_group\_size" entry to the device image's
384-
"SYCL/kernel reqs" property set.
385-
386-
* If the kernel's set of required specialization constants is not empty, emit a
387-
series of `OpSpecConstantOp` ops with the `OpLogicalAnd` opcode to compute
388-
the expression `S1 && S2 && ...`, where `S1`, `S2`, etc. are the
389-
specialization constants in that set. In addition, emit a
390-
`ConditionalINTEL` decoration for the kernel's entry function which
391-
references the `S1 && S2 && ...` specialization constant.
392-
393-
The second pass propagates each kernel's required specialization constants back
394-
down the static call tree. This pass starts such that each kernel entry
395-
function has the set of required specialization constants as computed above.
396-
The set of required specialization constants for each remaining function `F` is
397-
computed as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions
398-
of `F` in the static call tree. (Obviously, a `Pn` term can be omitted if the
399-
parent function has no required specialization constants.) Once we have this
394+
"SYCL/requirements" property set.
395+
396+
* If the kernel or exported device function's set of required specialization
397+
constants is not empty, emit a series of `OpSpecConstantOp` ops with the
398+
`OpLogicalAnd` opcode to compute the expression `S1 && S2 && ...`, where
399+
`S1`, `S2`, etc. are the specialization constants in that set. In addition,
400+
emit a `ConditionalINTEL` decoration for the function which references the
401+
`S1 && S2 && ...` specialization constant.
402+
403+
The second pass propagates each kernel or exported device function's required
404+
specialization constants back down the static call tree. This pass starts by
405+
assigning each each kernel entry function and each exported device function the
406+
set of required specialization constants that were computed above. The set of
407+
required specialization constants for each remaining function `F` is computed
408+
as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions of `F` in
409+
the static call tree. (Obviously, a `Pn` term can be omitted if the parent
410+
function has no required specialization constants.) Once we have this
400411
information, we do the following for each function `F` that has a non-empty set
401412
of required specialization constants:
402413

@@ -413,10 +424,11 @@ are emitted and reuse them when possible, rather than emitting duplicates.
413424
#### New device image property set
414425

415426
A new device image property set is needed to inform the DPC++ runtime of the
416-
aspects that each kernel requires and the work-group or sub-group sizes it may
417-
require. This property set is named "SYCL/kernel reqs". The name of each
418-
property in the set is the name of a kernel in the device image. The value
419-
of each property has the following form:
427+
aspects that each kernel or exported device function requires and the
428+
work-group or sub-group sizes that each kernel requires. This property set is
429+
named "SYCL/requirements". The name of each property in the set is the name of
430+
a kernel or the name of an exported device function in the device image. The
431+
value of each property has the following form:
420432

421433
```
422434
[entry_count (uint32)]
@@ -456,9 +468,9 @@ Parameter | Definition
456468
`spec_id` | The SPIR-V `SpecId` decoration for the specialization constant that the post-link tool generated for this requirement.
457469

458470
Note that the post-link tool will generate a series of `OpSpecConstantOp` ops
459-
when the kernel has multiple requirements. However, each property list entry
460-
contains only the `SpecId` of the `OpSpecConstantTrue` op that is associated
461-
with a single requirement.
471+
when the kernel or exported device function has multiple requirements.
472+
However, each property list entry contains only the `SpecId` of the
473+
`OpSpecConstantTrue` op that is associated with a single requirement.
462474

463475
#### Modifications to the DPC++ runtime
464476

@@ -469,19 +481,26 @@ kernel's requirements, and it must raise an `errc::kernel_not_supported`
469481
exception if it does not.
470482

471483
When a kernel is submitted to a device, the runtime finds the device image that
472-
contains the kernel and also finds the kernel's entry in the "SYCL/kernel reqs"
473-
property set. This entry tells the set of requirements for the kernel. If the
474-
target device does not support all of these requirements, then the runtime
475-
raises `errc::kernel_not_supported`. This check can be done before the device
476-
image is JIT compiled, so the exception can be thrown synchronously.
477-
478-
Assuming this check passes, the first attempt to submit a kernel from a device
479-
image will cause it to be JIT compiled. The runtime must be modified to do the
480-
following:
481-
482-
* Compute the union of all requirements from all kernels in the
483-
"SYCL/kernel reqs" property set and their associated specialization
484-
constants.
484+
contains the kernel and also finds the kernel's entry in the
485+
"SYCL/requirements" property set. This entry tells the set of requirements for
486+
the kernel. If the target device does not support all of these requirements,
487+
then the runtime raises `errc::kernel_not_supported`. This check can be done
488+
before the device image is JIT compiled, so the exception can be thrown
489+
synchronously.
490+
491+
If the kernel imports device function symbols from a shared library as defined
492+
in [Device Code Dynamic Linking][6], the runtime first identifies all the
493+
device images that define these exported device functions. Before attempting
494+
to link them together, the runtime finds the entries for the exported device
495+
functions in their "SYCL/requirements" property sets and checks that the device
496+
supports all these requirements. If it does not, the runtime throws
497+
`errc::kernel_not_supported`.
498+
499+
Whenever the runtime submits a SPIR-V image to the backend for online
500+
compilation, it must do the following additional steps:
501+
502+
* Compute the union of all requirements from all entries in the image's
503+
"SYCL/requirements" property set.
485504

486505
* Query the target device to see whether it supports each of these
487506
requirements, yielding either `True` or `False` for each one.
@@ -493,7 +512,7 @@ Note that the runtime's cache of compiled device images does not need any
493512
special modification because the cache already needs to know the values of all
494513
the specialization constants that were used to compile the device image. We
495514
just need to make sure the cache is also aware of the specialization constants
496-
which correspond to the kernels' requirements.
515+
which correspond to the requirements from the "SYCL/requirements" property set.
497516

498517
#### Modifications to the GEN compiler
499518

@@ -507,18 +526,22 @@ functions.
507526

508527
The AOT case uses exactly the same solution as the JIT case described above,
509528
but there is one extra steps. For the AOT case, the post-link tool must set
510-
the values of the specialization constants that correspond to the kernel
511-
requirements, using the device named in the "-fsycl-targets" command line
512-
option. After doing this, the post-link tool calls the AOT compiler to
513-
generate native code from SPIR-V as it normally does. If more than one target
514-
device is specified, the post-link tool sets the specialization constants
515-
separately for each device before generating native code for that device.
516-
517-
Note that the native device image may not contain all kernels if there are
518-
kernels that use optional features. Nevertheless, the "SYCL/kernel reqs"
519-
property set still has entries for all kernel functions. If the application
520-
attempts to invoke one of the discarded kernels on a device (which does not
521-
support the kernel's features), the runtime will see that the kernel is not
522-
supported by using information from the "SYCL/kernel reqs" property set, and
523-
the runtime will raise an exception. Thus, the runtime will never attempt to
524-
invoke one of these discarded kernels.
529+
the values of the specialization constants that correspond to the requirements
530+
for the kernel or exported device function, using the device named in the
531+
"-fsycl-targets" command line option. After doing this, the post-link tool
532+
calls the AOT compiler to generate native code from SPIR-V as it normally does.
533+
If more than one target device is specified, the post-link tool sets the
534+
specialization constants separately for each device before generating native
535+
code for that device.
536+
537+
Note that the native device image may not contain all kernels or all exported
538+
device functions if they use optional features. Nevertheless, the
539+
"SYCL/requirements" property set still has entries for all kernel functions and
540+
all exported device functions. If the application attempts to invoke one of
541+
the discarded kernels on a device (which does not support the kernel's
542+
features), the runtime will see that the kernel is not supported by using
543+
information from the "SYCL/requirements" property set, and the runtime will
544+
raise an exception. Thus, the runtime will never attempt to invoke one of
545+
these discarded kernels. Likewise, if a kernel imports a discarded device
546+
function, the runtime will see that the device function is unsupported and
547+
will raise an exception before attempting to perform the dynamic link.

0 commit comments

Comments
 (0)