10
10
:encoding: utf-8
11
11
:lang: en
12
12
:dpcpp: pass:[DPC++]
13
- :stem: asciimath
13
+ :sectnums:
14
+ :sectnumlevels: 4
14
15
15
16
// Set the default source code type in this document to C++,
16
17
// for syntax highlighting purposes. This is needed because
17
18
// docbook uses c++ and html5 uses cpp.
18
19
:language: {basebackend@docbook:c++:cpp}
19
20
20
21
21
- == 1. Notice
22
+ == Notice
22
23
23
24
[%hardbreaks]
24
25
Copyright (C) Codeplay Software Limited. All rights reserved.
@@ -28,14 +29,14 @@ of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28
29
permission by Khronos.
29
30
30
31
31
- == 2. Contact
32
+ == Contact
32
33
33
34
To report problems with this extension, please open a new issue at:
34
35
35
36
https://github.com/intel/llvm/issues
36
37
37
38
38
- == 3. Dependencies
39
+ == Dependencies
39
40
40
41
This extension is written against the SYCL 2020 revision 6 specification. All
41
42
references below to the "core SYCL specification" or to section numbers in the
@@ -45,15 +46,15 @@ This extension builds on top of the proposed SYCL graphs
45
46
https://github.com/reble/llvm/blob/sycl-graph-update/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc[extension
46
47
proposal]. All references to the "graphs proposal" refer to this proposal.
47
48
48
- == 4. Status
49
+ == Status
49
50
50
51
This is a proposed extension specification, intended to gather community
51
52
feedback. Interfaces defined in this specification may not be implemented yet
52
53
or may be in a preliminary state. The specification itself may also change in
53
54
incompatible ways before it is finalized. *Shipping software products should
54
55
not rely on APIs defined in this specification.*
55
56
56
- == 5. Overview
57
+ == Overview
57
58
58
59
The SYCL graph
59
60
https://github.com/reble/llvm/blob/sycl-graph-update/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc[extension
@@ -87,9 +88,9 @@ fusion of two or more kernels in a SYCL graph into a single kernel **at
87
88
runtime**. This requires the extension of the runtime with some sort of JIT
88
89
compiler to allow for the fusion of kernel functions at runtime.
89
90
90
- == 6. Specification
91
+ == Specification
91
92
92
- === 6.1. Feature test macro
93
+ === Feature test macro
93
94
94
95
This extension provides a feature-test macro as described in the core SYCL
95
96
specification. An implementation supporting this extension must predefine the
@@ -108,11 +109,11 @@ supports.
108
109
|Initial version of this extension.
109
110
|===
110
111
111
- === 6.2. API modifications
112
+ === API modifications
112
113
113
- ==== 6.2.1. Properties
114
+ ==== Properties
114
115
115
- ===== 6.2.1.1. Graph Fusion Property
116
+ ===== Graph Fusion Property
116
117
117
118
The API for `command_graph<graph_state::modifiable>::finalize()` includes a
118
119
`property_list` parameter. The following property, defined by this extension,
@@ -124,7 +125,7 @@ sycl::ext::oneapi::experimental::property::command_graph::perform_fusion
124
125
```
125
126
126
127
The property is not prescriptive. Implementations are free to not perform fusion
127
- if it is not possible (see below section <<_6_5_limitations >>), fusion is not
128
+ if it is not possible (see below section <<_limitations >>), fusion is not
128
129
supported by the implementation, or the implementation decides not to perform
129
130
fusion for other reasons. It is not an error if an implementation does not
130
131
perform fusion even though the property is passed.
@@ -133,7 +134,7 @@ Implementations can provide a diagnostic message in case fusion was not
133
134
performed through an implementation-specified mechanism, but are not required to
134
135
do so.
135
136
136
- ===== 6.2.1.2. Barrier property
137
+ ===== Barrier property
137
138
138
139
The following property can be added to the `property_list` of the
139
140
`command_graph<graph_state::modifiable>::finalize()` API.
@@ -150,7 +151,7 @@ The property only takes effect if the
150
151
property is also part of the `property_list` of the same invocation of
151
152
`command_graph<...>::finalize()`.
152
153
153
- ===== 6.2.1.3. Local internalization property
154
+ ===== Local internalization property
154
155
155
156
The following property can be passed to three different APIs, namely:
156
157
@@ -187,7 +188,7 @@ no error if they do not perform internalization. Implementations can provide a
187
188
diagnostic message in case internalization was not performed through an
188
189
implementation-specified mechanism, but are not required to do so.
189
190
190
- ===== 6.2.1.4. Private internalization property
191
+ ===== Private internalization property
191
192
192
193
The following property can be passed to three different APIs, namely:
193
194
@@ -224,7 +225,7 @@ no error if they do not perform internalization. Implementations can provide a
224
225
diagnostic message in case internalization was not performed through an
225
226
implementation-specified mechanism, but are not required to do so.
226
227
227
- ==== 6.2.2. Device information descriptors
228
+ ==== Device information descriptors
228
229
229
230
To support querying whether a SYCL device and the underlying platform support
230
231
kernel fusion for graphs, the following device information descriptor is added
@@ -238,7 +239,7 @@ When passed to `device::get_info<...>()`, the function returns `true` if the
238
239
SYCL `device` and the underlying `platform` support kernel fusion for graphs.
239
240
240
241
241
- === 6.3. Linearization
242
+ === Linearization
242
243
243
244
In order to be able to perform kernel fusion, the commands in a graph must be
244
245
arranged in a valid sequential order.
@@ -252,7 +253,7 @@ partial order) is implementation defined. The linearization should be
252
253
deterministic, i.e. it should yield the same sequence when presented with the
253
254
same DAG.
254
255
255
- === 6.4. Synchronization in kernels
256
+ === Synchronization in kernels
256
257
257
258
Group barriers semantics do not change in the fused kernel and barriers already
258
259
in the unfused kernels are preserved in the fused kernel. Despite this, it is
@@ -261,7 +262,7 @@ same work-group executing a fused kernel, a barrier is added between each of the
261
262
kernels being fused. This automatic insertion of additional barriers can be
262
263
deactivated through the property defined above.
263
264
264
- === 6.5. Limitations
265
+ === Limitations
265
266
266
267
Some scenarios might require fusion to be cancelled if some undesired scenarios
267
268
arise.
@@ -278,29 +279,29 @@ The following sections describe a number of scenarios that might require to
278
279
cancel fusion. Note that some implementations might be more capable/permissive
279
280
and might not abort fusion in all of these cases.
280
281
281
- ==== 6.5.1. Hierarchical Parallelism
282
+ ==== Hierarchical Parallelism
282
283
283
284
The extension does not support kernels using hierarchical parallelism. Although
284
285
some implementations might want to add support for this kind of kernels.
285
286
286
- ==== 6.5.2. Incompatible ND-ranges of the kernels to fuse
287
+ ==== Incompatible ND-ranges of the kernels to fuse
287
288
288
289
Incompatibility of ND-ranges will be determined by the kernel fusion
289
290
implementation. All implementations should support fusing kernels with the exact
290
291
same ND-ranges, but implementations might cancel fusion as soon as a kernel with
291
292
a different ND-range is submitted.
292
293
293
- ==== 6.5.3. Kernels with different dimensions
294
+ ==== Kernels with different dimensions
294
295
295
296
Similar to the previous one, it is implementation-defined whether or not to
296
297
support fusing kernels with different dimensionality.
297
298
298
- ==== 6.5.4. No intermediate representation
299
+ ==== No intermediate representation
299
300
300
301
In case any of the kernels to be fused does not come with an accessible
301
302
suitable intermediate representation, kernel fusion is canceled.
302
303
303
- ==== 6.5.5. Explicit memory operations and host tasks
304
+ ==== Explicit memory operations and host tasks
304
305
305
306
The graph proposal allows graphs to contain, next to device kernels, explicit
306
307
memory operations and host tasks. As both of these types of commands cannot be
@@ -311,13 +312,13 @@ It is valid to execute some memory operations and host tasks before all device
311
312
kernels and some after all device kernels, as long as that sequence is a valid
312
313
linearization.
313
314
314
- ==== 6.5.6. Multi-device graph
315
+ ==== Multi-device graph
315
316
316
317
Attempting to fuse a graph containing device kernels for more than one device
317
318
may lead to fusion being cancelled, as kernel fusion across multiple devices
318
319
and/or backends is generally not possible.
319
320
320
- === 6.6. Internalization
321
+ === Internalization
321
322
322
323
While avoiding repeated kernel launch overheads will most likely already improve
323
324
application performance, kernel fusion can deliver even higher performance gains
@@ -368,7 +369,7 @@ no internalization were to happen.
368
369
In sum this allows users to trigger internalization of a buffer or allocated
369
370
device memory by just specifying a single property.
370
371
371
- ==== 6.6.1. Buffer internalization
372
+ ==== Buffer internalization
372
373
373
374
In some cases, the user will specify different internalization targets for a
374
375
buffer and accessors to such buffer. When incompatible combinations are used, an
@@ -440,9 +441,9 @@ performed. If there is a mismatch between the two accessors (access
440
441
range, access offset, number of dimensions, data type), no
441
442
internalization is performed.
442
443
443
- == 7. Examples
444
+ == Examples
444
445
445
- === 7.1. Buffer-based example
446
+ === Buffer-based example
446
447
447
448
```c++
448
449
#include <sycl/sycl.hpp>
@@ -530,7 +531,7 @@ int main() {
530
531
}
531
532
```
532
533
533
- === 7.2. USM-based example
534
+ === USM-based example
534
535
535
536
```c++
536
537
#include <sycl/sycl.hpp>
@@ -617,11 +618,12 @@ int main() {
617
618
}
618
619
```
619
620
620
- == 8. Contributors
621
+ == Contributors
621
622
622
623
Lukas Sommer, Codeplay +
623
624
Victor Lomüller, Codeplay +
624
625
Victor Perez, Codeplay +
626
+ Ewan Crawford, Codeplay +
625
627
626
628
== Revision History
627
629
0 commit comments