You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][Doc] Use marray for sub-group loads/stores (#2167)
vec is only compatible with built-in scalar data types.
Using marray instead will allow sub-group loads/stores to eventually
support user-defined types and other C++ types like std::complex.
Using marray is also clearer, since the use of vec alongside sub-groups
(which may represent a SIMD vector in some implementations) has confused
some users.
Signed-off-by: John Pennycook <[email protected]>
Copy file name to clipboardExpand all lines: sycl/doc/extensions/SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc
+17-10Lines changed: 17 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ This document describes an extension which introduces a library of sub-group fun
31
31
32
32
== Notice
33
33
34
-
Copyright (c) 2020 Intel Corporation. All rights reserved.
34
+
Copyright (c) 2020-2021 Intel Corporation. All rights reserved.
35
35
36
36
== Status
37
37
@@ -55,6 +55,7 @@ This extension is written against the SYCL 1.2.1 specification, Revision 6 and t
55
55
56
56
- +SYCL_INTEL_group_algorithms+
57
57
- +SYCL_INTEL_sub_group+
58
+
- +SYCL_INTEL_math_array+
58
59
59
60
== Overview
60
61
@@ -82,9 +83,9 @@ It additionally introduces a number of functions that are currently specific to
82
83
83
84
=== Data Types
84
85
85
-
All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+.
86
+
All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ and +marray+ classes. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+.
86
87
87
-
Functions with arguments of type +vec<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+.
88
+
Functions with arguments of type +vec<T,N>+ or +marray<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+.
88
89
89
90
=== Functions
90
91
@@ -135,22 +136,27 @@ The load and store sub-group functions enable developers to assert that all work
135
136
|Function|Description
136
137
137
138
|+template <typename T> T load(sub_group sg, const T *src)+
138
-
|Load contiguous data from _src_. Returns one element per work-item, corresponding to the memory location at _src_ + +get_local_id()+. The value of _src_ must be the same for all work-items in the sub-group. The address space information is deduced automatically. Only pointers to global and local address spaces are valid. Passing a pointer to other address spaces will cause the run time assertion.
139
+
|Load contiguous data from _src_. Returns one element per work-item, corresponding to the memory location at _src_ + +get_local_id()+. The value of _src_ must be the same for all work-items in the sub-group. The address space information is deduced automatically. Only pointers to global and local address spaces are valid. Passing a pointer to other address spaces will cause the run time assertion. +T+ must be a _NumericType_.
|Load contiguous data from _src_. Returns one element per work-item, corresponding to the memory location at _src_ + +get_local_id()+. The value of _src_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+.
142
+
|Load contiguous data from _src_. Returns one element per work-item, corresponding to the memory location at _src_ + +get_local_id()+. The value of _src_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+. +T+ must be a _NumericType_.
|Load contiguous data from _src_. Returns _N_ elements per work-item, corresponding to the _N_ memory locations at _src_ + +i+ * +get_max_local_range()+ + +get_local_id()+ for +i+ between 0 and _N_. The value of _src_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+.
|Load contiguous data from _src_. Returns _N_ elements per work-item, corresponding to the _N_ memory locations at _src_ + +i+ * +get_max_local_range()+ + +get_local_id()+ for +i+ between 0 and _N_. The return type is implicitly convertible to +vec<T,N>+ (if +T+ is compatible with the +vec+ interface) and to +marray<T,N>+. The value of _src_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+. +T+ must be a _NumericType_.
|Store contiguous data to _dst_. The value of _x_ from each work-item is written to the memory location at _dst_ + +get_local_id()+. The value of _dst_ must be the same for all work-items in the sub-group. The address space information is deduced automatically. Only pointers to global and local address spaces are valid. Passing a pointer to other address spaces will cause the run time assertion.
148
+
|Store contiguous data to _dst_. The value of _x_ from each work-item is written to the memory location at _dst_ + +get_local_id()+. The value of _dst_ must be the same for all work-items in the sub-group. The address space information is deduced automatically. Only pointers to global and local address spaces are valid. Passing a pointer to other address spaces will cause the run time assertion. +T+ must be a _NumericType_.
|Store contiguous data to _dst_. The value of _x_ from each work-item is written to the memory location at _dst_ + +get_local_id()+. The value of _dst_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+.
151
+
|Store contiguous data to _dst_. The value of _x_ from each work-item is written to the memory location at _dst_ + +get_local_id()+. The value of _dst_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+. +T+ must be a _NumericType_.
|Store contiguous data to _dst_. The _N_ elements from each work-item are written to the memory locations at _dst_ + +i+ * +get_max_local_range()+ + +get_local_id()+ for +i+ between 0 and _N_. The value of _dst_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+.
154
+
|Store contiguous data to _dst_. The _N_ elements from each work-item are written to the memory locations at _dst_ + +i+ * +get_max_local_range()+ + +get_local_id()+ for +i+ between 0 and _N_. The value of _dst_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+. +T+ must be a _NumericType_.
|Store contiguous data to _dst_. The _N_ elements from each work-item are written to the memory locations at _dst_ + +i+ * +get_max_local_range()+ + +get_local_id()+ for +i+ between 0 and _N_. The value of _dst_ must be the same for all work-items in the sub-group. _Space_ must be +access::address_space::global_space+ or +access::address_space::local_space+. +T+ must be a _NumericType_.
158
+
|===
159
+
154
160
|===
155
161
156
162
== Issues
@@ -172,6 +178,7 @@ None.
172
178
|Rev|Date|Author|Changes
173
179
|1|2020-03-16|John Pennycook|*Initial public working draft*
174
180
|2|2021-02-26|Vladimir Lazarev|*Add load/store method for raw pointers*
181
+
|3|2021-04-06|John Pennycook|*Use sycl::marray in place of sycl::vec for load/store*
0 commit comments