You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reverts some changes made when switching from SubGroupNDRange to SubGroup extension:
- `max_sub_group_size` must take a work-group size in order to support OpenCL
- The sub-group `barrier` member function was removed by mistake
- The sub-group `shuffle` functions should be supported in addition to the higher-level `permute` and `shift_*` functions from the SubGroupAlgorithms extension
Signed-off-by: John Pennycook <[email protected]>
This extension is written against the SYCL 1.2.1 specification, Revision 6.
54
+
This extension is written against the SYCL 1.2.1 specification, Revision 6 and the SYCL_INTEL_device_specific_kernel_queries extension.
55
55
56
56
== Overview
57
57
@@ -111,33 +111,37 @@ The device descriptors below are added to the +info::device+ enumeration class:
111
111
|Returns a vector_class of +size_t+ containing the set of sub-group sizes supported by the device.
112
112
|===
113
113
114
-
An additional query for sub-group information is added to the +kernel+ class:
114
+
An additional query is added to the +kernel+ class, enabling an input value to be passed to `get_info`. The original `get_info` query from the SYCL_INTEL_device_specific_kernel_queries extension should be used for queries that do not specify an input type.
|Query information from a kernel using the +info::kernel_device_specific+ descriptor for a specific device and input parameter. The expected value of the input parameter depends on the information being queried.
121
121
|===
122
122
123
-
The kernel descriptors below are added as part of a new +info::kernel_sub_group+ enumeration class:
123
+
The kernel descriptors below are added to the +info::kernel_device_specific+ enumeration class:
|Returns the required sub-group size specified by the kernel, or 0 (if not specified).
143
147
|===
@@ -155,7 +159,9 @@ To provide access to the +sub_group+ class, a new member function is added to th
155
159
|Return the sub-group to which the work-item belongs.
156
160
|===
157
161
158
-
The member functions of the sub-group class provide a mechanism for a developer to query properties of a sub-group and a work-item's position in it.
162
+
==== Core Member Functions
163
+
164
+
The core member functions of the sub-group class provide a mechanism for a developer to query properties of a sub-group and a work-item's position in it.
A sub-group barrier synchronizes all work-items in a sub-group, and orders memory operations with a memory fence to all address spaces.
211
+
212
+
|===
213
+
|Member Functions|Description
214
+
215
+
|+void barrier() const+
216
+
|Execute a sub-group barrier.
217
+
|===
218
+
219
+
==== Shuffles
220
+
221
+
The shuffle sub-group functions perform arbitrary communication between pairs of work-items in a sub-group. Common patterns -- such as shifting all values in a sub-group by a fixed number of work-items -- are exposed as specialized shuffles that may be accelerated in hardware.
222
+
223
+
|===
224
+
|Member Functions|Description
225
+
226
+
|+template <typename T> T shuffle(T x, id<1> local_id) const+
227
+
|Exchange values of _x_ between work-items in the sub-group in an arbitrary pattern. Returns the value of _x_ from the work-item with the specified id. The value of _local_id_ must be between 0 and the sub-group size.
228
+
229
+
|+template <typename T> T shuffle_down(T x, uint32_t delta) const+
230
+
|Exchange values of _x_ between work-items in the sub-group via a shift. Returns the value of _x_ from the work-item whose id is _delta_ larger than the calling work-item. The value returned when the result of id + _delta_ is greater than or equal to the sub-group size is undefined.
231
+
232
+
|+template <typename T> T shuffle_up(T x, uint32_t delta) const+
233
+
|Exchange values of _x_ between work-items in the sub-group via a shift. Returns the value of _x_ from the work-item whose id is _delta_ smaller than the calling work-item. The value of returned when the result of id - _delta_ is less than zero is undefined.
234
+
235
+
|+template <typename T> T shuffle_xor(T x, id<1> mask) const+
236
+
|Exchange pairs of values of _x_ between work-items in the sub-group. Returns the value of _x_ from the work-item whose id is equal to the exclusive-or of the calling work-item's id and _mask_. _mask_ must be a compile-time constant value that is the same for all work-items in the sub-group.
237
+
|===
238
+
202
239
==== Sample Header
203
240
204
241
[source, c++]
@@ -222,6 +259,20 @@ struct sub_group {
222
259
linear_id_type get_group_linear_id() const;
223
260
range_type get_group_range() const;
224
261
262
+
void barrier() const;
263
+
264
+
template <typename T>
265
+
T shuffle(T x, id<1> local_id) const;
266
+
267
+
template <typename T>
268
+
T shuffle_down(T x, uint32_t delta) const;
269
+
270
+
template <typename T>
271
+
T shuffle_up(T x, uint32_t delta) const;
272
+
273
+
template <typename T>
274
+
T shuffle_xor(T x, id<1> mask) const;
275
+
225
276
};
226
277
} // intel
227
278
} // sycl
@@ -230,7 +281,19 @@ struct sub_group {
230
281
231
282
== Issues
232
283
233
-
None.
284
+
. Should sub-group query results for specific kernels depend on work-group size?
285
+
+
286
+
--
287
+
*RESOLVED*:
288
+
Yes, this is required by OpenCL devices. Devices that do not require the work-group size can ignore the parameter.
289
+
--
290
+
291
+
. Should sub-group "shuffles" be member functions?
292
+
+
293
+
--
294
+
*RESOLVED*:
295
+
Yes, the four shuffles in this extension are a defining feature of sub-groups. Higher-level algorithms (such as those in the +SubGroupAlgorithms+ proposal) may build on them, the same way as higher-level algorithms using work-groups build on work-group local memory.
296
+
--
234
297
235
298
//. asd
236
299
//+
@@ -247,6 +310,10 @@ None.
247
310
|Rev|Date|Author|Changes
248
311
|1|2019-04-19|John Pennycook|*Initial public working draft*
249
312
|2|2020-03-16|John Pennycook|*Separate class definition from algorithms*
0 commit comments