@@ -55,6 +55,13 @@ This extension builds on top of the experimental SYCL graphs
55
55
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc[extension
56
56
proposal]. All references to the "graphs proposal" refer to this proposal.
57
57
58
+ In addition, this extension also depends on the following other SYCL extensions:
59
+
60
+ * link:../experimental/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
61
+ extension.
62
+ * link:../experimental/sycl_ext_oneapi_annotated_ptr.asciidoc[sycl_ext_oneapi_annotated_ptr]
63
+ extension.
64
+
58
65
== Status
59
66
60
67
This is a proposed extension specification, intended to gather community
@@ -198,9 +205,8 @@ different APIs, namely:
198
205
* The `accessor` constructor, giving a more granular control.
199
206
* The `buffer` constructor, in which case all the `accessors` derived from
200
207
this buffer will inherit this property (unless overridden).
201
- * The `property_list` parameter of `sycl::malloc_device()`,
202
- `sycl::aligned_alloc_device()`, `sycl::malloc_shared()`, or
203
- `sycl::aligned_alloc_shared()` to apply the property to an USM pointer.
208
+ * The property list parameter of `annotated_ptr`, to apply the property to a
209
+ USM pointer.
204
210
205
211
```c++
206
212
namespace sycl::ext::oneapi::experimental::property{
@@ -248,16 +254,22 @@ Implementations can provide a diagnostic message in case internalization was
248
254
not performed through an implementation-specified mechanism, but are not
249
255
required to do so.
250
256
257
+ In case the `access_scope` property is attached to `annotated_ptr`, the
258
+ properties should be inspected by an implementation when the `annotated_ptr` is
259
+ captured by a kernel lambda or otherwise passed as an argument to a kernel
260
+ function. Implementations are not required to track internalization-related
261
+ information from other USM pointers that may be used by a kernel, such as those
262
+ stored inside of structs or other data structures.
263
+
251
264
===== Internal memory property
252
265
253
266
The following property can be passed to three different APIs, namely:
254
267
255
268
* The `accessor` constructor, giving a more granular control.
256
269
* The `buffer` constructor, in which case all the `accessors` derived from
257
270
this buffer will inherit this property (unless overridden).
258
- * The `property_list` parameter of `sycl::malloc_device()`,
259
- `sycl::aligned_alloc_device()`, `sycl::malloc_shared()`, or
260
- `sycl::aligned_alloc_shared()` to apply the property to an USM pointer.
271
+ * The property list parameter of `annotated_ptr`, to apply the property to a
272
+ USM pointer.
261
273
262
274
```c++
263
275
sycl::ext::oneapi::experimental::property::fusion_internal_memory
@@ -277,6 +289,14 @@ Implementations can provide a diagnostic message in case internalization was
277
289
not performed through an implementation-specified mechanism, but are not
278
290
required to do so.
279
291
292
+ In case the `fusion_internal_memory` property is attached to `annotated_ptr`,
293
+ the properties should be inspected by an implementation when the
294
+ `annotated_ptr` is captured by a kernel lambda or otherwise passed as an
295
+ argument to a kernel function. Implementations are not required to track
296
+ internalization-related information from other USM pointers that may be used by
297
+ a kernel, such as those stored inside of structs or other data structures.
298
+
299
+
280
300
==== Device aspect
281
301
282
302
To support querying whether a SYCL device and the underlying platform support
@@ -418,9 +438,13 @@ https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_propertie
418
438
419
439
To this end, this extension allows the use of the property in more places than
420
440
defined in Table 52 in the SYCL specification. More concretely, this extension
421
- allows to use the property in the buffer constructor or the `property_list`
422
- parameter of `sycl::malloc_device()`, `sycl::aligned_alloc_device()`,
423
- `sycl::malloc_shared()` and `sycl::aligned_alloc_shared()`.
441
+ allows to use the property in the buffer constructor or the property list
442
+ parameter of `annotated_ptr<...>`. In case the `no_init` property is attached to
443
+ `annotated_ptr`, the properties should be inspected by an implementation when
444
+ the `annotated_ptr` is captured by a kernel lambda or otherwise passed as an
445
+ argument to a kernel function. Implementations are not required to track
446
+ internalization-related information from other USM pointers that may be used by
447
+ a kernel, such as those stored inside of structs or other data structures.
424
448
425
449
If the implementation's fusion compiler is not able to guarantee write-back of
426
450
the final result after internalization, values stored to an internalized
@@ -445,6 +469,14 @@ specializations of the `access_scope` property template defined in this
445
469
proposal can be used to inform the fusion compiler about the access pattern of
446
470
the kernels involved in fusion.
447
471
472
+ If an `annotated_ptr` is created with any of the properties relating to
473
+ internalization and captured by a kernel lambda or otherwise passed as an
474
+ argument to a kernel function participating in fusion, the underlying memory
475
+ must only be accessed via pointers that are also captured or passed as kernel
476
+ argument. Access to the underlying memory via a different pointer, such as
477
+ pointers stored inside of structs or other data structures results in undefined
478
+ behavior.
479
+
448
480
As already stated above, it depends on the implementation's capabilities which
449
481
properties need to be applied to a buffer or allocated device memory to enable
450
482
dataflow internalization. Implementations should document the necessary
@@ -506,11 +538,12 @@ properties must be combined as follows:
506
538
|===
507
539
508
540
In case different internalization targets are used for accessors to the same
509
- buffer, the following (commutative and associative) rules are followed:
541
+ buffer or for `annotated_ptr` pointing to the same underlying memory, the
542
+ following (commutative and associative) rules are followed:
510
543
511
544
[options="header"]
512
545
|===
513
- |Accessor~1~ Access Scope|Accessor~2~ Access Scope|Resulting Access Scope
546
+ |Accessor/Ptr ~1~ Access Scope|Accessor/Ptr ~2~ Access Scope|Resulting Access Scope
514
547
515
548
|None
516
549
|_Any_
@@ -528,7 +561,7 @@ buffer, the following (commutative and associative) rules are followed:
528
561
|Work Item
529
562
|===
530
563
531
- If no work-group size is specified or two accessors specify different
564
+ If no work-group size is specified or two kernels specify different
532
565
work-group sizes when attempting local internalization for any of the
533
566
kernels involved in the fusion, no internalization will be
534
567
performed. If there is a mismatch between the two accessors (access
@@ -672,10 +705,10 @@ int main() {
672
705
dOut = malloc_device<int>(q, dataSize);
673
706
674
707
// Specify internalization for an USM pointer
675
- dTmp = malloc_device<int>(
676
- q, dataSize,
677
- { sycl_ext::property::access_scope_work_item{},
678
- sycl_ext::property::fusion_internal_memory{}, no_init} );
708
+ dTmp = malloc_device<int>(q, dataSize)
709
+ auto annotatedTmp = sycl_ext::annotated_ptr(
710
+ dTmp, sycl_ext::property::access_scope_work_item{},
711
+ sycl_ext::property::fusion_internal_memory{}, no_init);
679
712
680
713
// This explicit memory operation is compatible with fusion, as it can be
681
714
// linearized before any device kernel in the graph.
@@ -690,7 +723,7 @@ int main() {
690
723
auto kernel1 = graph.add(
691
724
[&](handler &cgh) {
692
725
cgh.parallel_for<class KernelOne>(
693
- dataSize, [=](id<1> i) { tmp [i] = in1[i] + in2[i]; });
726
+ dataSize, [=](id<1> i) { annotatedTmp [i] = in1[i] + in2[i]; });
694
727
},
695
728
{sycl_ext::property::node::depends_on(copy_in1, copy_in2)});
696
729
@@ -702,7 +735,7 @@ int main() {
702
735
auto kernel2 = graph.add(
703
736
[&](handler &cgh) {
704
737
cgh.parallel_for<class KernelTwo>(
705
- dataSize, [=](id<1> i) { out[i] = tmp [i] * in3[i]; });
738
+ dataSize, [=](id<1> i) { out[i] = annotatedTmp [i] * in3[i]; });
706
739
},
707
740
{sycl_ext::property::node::depends_on(copy_in3, kernel1)});
708
741
@@ -740,4 +773,5 @@ int main() {
740
773
|3|2023-04-11|Lukas Sommer|*Update usage examples for graph API changes*
741
774
|4|2023-08-17|Lukas Sommer|*Update after graph extension has been merged*
742
775
|5|2023-09-01|Lukas Sommer|*Split internalization properties and change barrier*
776
+ |6|2023-09-13|Lukas Sommer|*Use annotated_ptr for USM internalization*
743
777
|========================================
0 commit comments