@@ -39,13 +39,13 @@ int main() {
39
39
```
40
40
41
41
In this use-case every work-item with even X dimension will trigger assertion
42
- failure. Assertion failure should be trigger a call to `std::abort()` at host as
42
+ failure. Assertion failure should trigger a call to `std::abort()` at host as
43
43
described in
44
44
[extension](extensions/Assert/SYCL_INTEL_ASSERT.asciidoc).
45
45
Even though multiple failures of the same or different assertions can happen in
46
46
multiple workitems, implementation is required to deliver at least one
47
47
assertion. The assertion failure message is printed to `stderr` by DPCPP
48
- Runtime.
48
+ Runtime or underlying backend .
49
49
50
50
When multiple kernels are enqueued and more than one fail at assertion, at least
51
51
single assertion should be reported.
@@ -93,15 +93,17 @@ Implementation of this function is supplied by Native Device Compiler for
93
93
safe approach or by DPCPP Compiler for fallback one.
94
94
95
95
In order to distinguish which implementation to use, DPCPP Runtime checks for
96
- `cl_intel_devicelib_cassert ` extension. If the extension isn't available, then
96
+ `PI_INTEL_DEVICELIB_CASSERT ` extension. If the extension isn't available, then
97
97
fallback implementation is used.
98
98
99
99
100
100
## Safe approach
101
101
102
102
This is the preferred approach and implementations should use it when possible.
103
103
It guarantees assertion failure notification delivery to the host regardless of
104
- kernel behavior which hit the assertion.
104
+ kernel behavior which hit the assertion. If backend suports the safe approach,
105
+ it must report this capability to DPCPP Runtime via the
106
+ `PI_INTEL_DEVICELIB_CASSERT` extension query.
105
107
106
108
The Native Device Compiler is responsible for providing implementation of
107
109
`__devicelib_assert_fail` which completely hides details of communication
@@ -125,10 +127,10 @@ The following sequence of events describes how user code gets notified:
125
127
126
128
## Fallback approach
127
129
128
- If Device-side Runtime doesn't support `__devicelib_assert_fail` then a fallback
129
- approach comes in place. The approach doesn't require any support from
130
- Device-side Runtime and Native Device Compiler. Neither it does from Low-level
131
- Runtime.
130
+ If Device-side Runtime doesn't support `__devicelib_assert_fail` (as reported
131
+ via `PI_INTEL_DEVICELIB_CASSERT` extension query) then a fallback approach comes
132
+ in place. The approach doesn't require any support from Device-side Runtime and
133
+ Native Device Compiler. Neither it does from Low-level Runtime.
132
134
133
135
Within this approach, a mutable program scope variable is introduced. This
134
136
variable stores a flag which says if an assert failure was encountered. Fallback
@@ -147,10 +149,15 @@ The following sequence of events describes how user code gets notified:
147
149
2. A host-task is enqueued to check value of assert failure flag.
148
150
3. The host task calls abort whenever assert failure flag is set.
149
151
152
+ DPCPP Runtime will automatically check if assertions are enabled in the kernel
153
+ being run, and won't enqueue the auxiliary kernels if assertions are not
154
+ enabled. So there is no host-side runtime overhead when assertion are not
155
+ enabled.
156
+
150
157
Illustrating this with an example, lets assume the user enqueues three kernels:
151
- - `Kernel #1`
152
- - `Kernel #2`
153
- - `Kernel #3`, which depends on `Kernel #1`
158
+ - `Kernel #1`, uses assert
159
+ - `Kernel #2`, uses assert
160
+ - `Kernel #3`, uses assert and depends on `Kernel #1`
154
161
155
162
The resulting graph will look like this: 
156
163
@@ -165,9 +172,15 @@ same binary image where fallback `__devicelib_assert_fail` resides.
165
172
declaration:</a>
166
173
167
174
```c++
175
+ namespace cl {
176
+ namespace sycl {
177
+ namespace detail {
168
178
struct AssertHappened {
169
179
int Flag = 0;
170
180
};
181
+ }
182
+ }
183
+ }
171
184
172
185
#ifdef __SYCL_DEVICE_ONLY__
173
186
extern SYCL_GLOBAL_VAR AssertHappened AssertHappenedMem;
@@ -189,49 +202,29 @@ implementation of `__devicelib_assert_fail`.
189
202
In DPCPP headers one can see if assert is enabled with status of ` NDEBUG ` macro
190
203
with ` #ifdef ` 's. This allows to enqueue a copy kernel and host task. The copy
191
204
kernel will copy ` AssertHappenedMem ` to host and host-task will check the ` Flag `
192
- value and ` abort() ` as needed.
205
+ value and ` abort() ` as needed. The kernel and host task are enqueued when
206
+ ` NDEBUG ` macro isn't defined.
193
207
194
208
When in DPCPP Runtime Library this knowledge is obtained from device binary
195
209
image descriptor's property sets.
196
210
197
- Each device image is supplied with an array of property sets:
198
- ``` c++
199
- struct pi_device_binary_struct {
200
- //...
201
- // Array of property sets
202
- pi_device_binary_property_set PropertySetsBegin;
203
- pi_device_binary_property_set PropertySetsEnd;
204
- };
205
- ```
206
- Each property set is represented by the following struct:
207
- ```c++
208
- // Named array of properties.
209
- struct _pi_device_binary_property_set_struct {
210
- char *Name; // the name
211
- pi_device_binary_property PropertiesBegin; // array start
212
- pi_device_binary_property PropertiesEnd; // array end
213
- };
214
- ```
215
- It contains name of property set and array of properties. Each property is
216
- represented by the following struct:
217
- ``` c++
218
- struct _pi_device_binary_property_struct {
219
- char * Name; // null-terminated property name
220
- void * ValAddr; // address of property value
221
- uint32_t Type; // _ pi_property_type
222
- uint64_t ValSize; // size of property value in bytes
223
- };
224
- ```
211
+ Each device image is supplied with an array of property sets. For description
212
+ of property sets see ` struct pi_device_binary_struct ` in
213
+ [ ` pi.h ` ] ( https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/pi.h#L692 )
225
214
226
215
A distinct property set ` SYCL/assert used ` is added. In this set a property
227
- with the name of the kernel is added whenever the kernel uses assert. Use of
228
- assert is detected through call to `__devicelib_assert_fail` function after
229
- linking device binary image with wrapper device library (the `libsycl-crt`
230
- library).
231
-
232
- The property set and the underlying properties are added by `sycl-post-link`
233
- tool with help of building callgraph for each and every kernel in device binary
234
- image.
216
+ with the name of the kernel is added whenever the kernel uses assert. The use of
217
+ assert is detected by a specific LLVM IR pass invoked by the ` sycl-post-link `
218
+ tool which runs on linked device code, i.e. after linking with the ` libsycl-crt `
219
+ library which defines the assert function. The pass builds complete call graph
220
+ for a kernel, and sees if there's a call to ` __devicelib_assert_fail ` anywhere
221
+ in the graph. If found, ` sycl-post-link ` adds the property for the kernel.
222
+
223
+ The same is done for all indirect callable functions (marked with specific
224
+ attribute) found in the linked device code. Those are functions whose pointers
225
+ can be taken and passed around in device code. If a callgraph for any such
226
+ function has a call to ` __devicelib_assert_fail ` , then all kernels in the module
227
+ are conservatively marked as using asserts.
235
228
236
229
The added property is used for:
237
230
- deciding if online-linking against fallback devicelib is required;
@@ -340,7 +333,7 @@ void workload() {
340
333
```
341
334
342
335
These two files are compiled into a single binary application. There are four
343
- states of definedness of `NDEBUG` macro available:
336
+ states of definition of ` NDEBUG ` macro available:
344
337
345
338
| # | ` impl.cpp ` | ` main.cpp ` |
346
339
| - | ---------- | ---------- |
@@ -349,12 +342,12 @@ states of definedness of `NDEBUG` macro available:
349
342
| 3 | undefined | defined |
350
343
| 4 | undefined | undefined |
351
344
352
- States of definedness of `NDEBUG` macro defines the set of assertions which can
345
+ States of definition of ` NDEBUG ` macro defines the set of assertions which can
353
346
fail.
354
347
355
348
### Raising assert failure flag and reading it on host
356
349
357
- Each and every translation unit provided by user should have declaration of
350
+ All translation units provided by the user should have a declaration of the
358
351
assert flag read function available:
359
352
``` c++
360
353
int __devicelib_assert_read (void);
0 commit comments