Skip to content

Commit 010f112

Browse files
authored
[SYCL][Doc] Update sub-group docs (#1565)
Reverts some changes made when switching from SubGroupNDRange to SubGroup extension: - `max_sub_group_size` must take a work-group size in order to support OpenCL - The sub-group `barrier` member function was removed by mistake - The sub-group `shuffle` functions should be supported in addition to the higher-level `permute` and `shift_*` functions from the SubGroupAlgorithms extension Signed-off-by: John Pennycook <[email protected]>
1 parent b7bdcbe commit 010f112

File tree

1 file changed

+80
-13
lines changed

1 file changed

+80
-13
lines changed

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

Lines changed: 80 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)
5151

5252
== Dependencies
5353

54-
This extension is written against the SYCL 1.2.1 specification, Revision 6.
54+
This extension is written against the SYCL 1.2.1 specification, Revision 6 and the SYCL_INTEL_device_specific_kernel_queries extension.
5555

5656
== Overview
5757

@@ -111,33 +111,37 @@ The device descriptors below are added to the +info::device+ enumeration class:
111111
|Returns a vector_class of +size_t+ containing the set of sub-group sizes supported by the device.
112112
|===
113113

114-
An additional query for sub-group information is added to the +kernel+ class:
114+
An additional query is added to the +kernel+ class, enabling an input value to be passed to `get_info`. The original `get_info` query from the SYCL_INTEL_device_specific_kernel_queries extension should be used for queries that do not specify an input type.
115115

116116
|===
117117
|Member Functions|Description
118118

119-
|+template <info::kernel_sub_group param>typename info::param_traits<info::kernel_sub_group, param>::return_type get_sub_group_info(const device &dev) const+
120-
|Query information from the sub-group from a kernel using the +info::kernel_sub_group+ descriptor for a specific device.
119+
|+template <info::kernel_device_specific param>typename info::param_traits<info::kernel_device_specific, param>::return_type get_info(const device &dev, typename info::param_traits<info::kernel_device_specific, param>::input_type value) const+
120+
|Query information from a kernel using the +info::kernel_device_specific+ descriptor for a specific device and input parameter. The expected value of the input parameter depends on the information being queried.
121121
|===
122122

123-
The kernel descriptors below are added as part of a new +info::kernel_sub_group+ enumeration class:
123+
The kernel descriptors below are added to the +info::kernel_device_specific+ enumeration class:
124124

125125
|===
126-
|Kernel Descriptors|Return Type|Description
126+
|Kernel Descriptors|Input Type|Return Type|Description
127127

128-
|+info::kernel_sub_group::max_num_sub_groups+
128+
|+info::kernel_device_specific::max_num_sub_groups+
129+
|N/A
129130
|+uint32_t+
130131
|Returns the maximum number of sub-groups for this kernel.
131132

132-
|+info::kernel_sub_group::compile_num_sub_groups+
133+
|+info::kernel_device_specific::compile_num_sub_groups+
134+
|N/A
133135
|+uint32_t+
134136
|Returns the number of sub-groups specified by the kernel, or 0 (if not specified).
135137

136-
|+info::kernel_sub_group::max_sub_group_size+
138+
|+info::kernel_device_specific::max_sub_group_size+
139+
|+range<D>+
137140
|+uint32_t+
138-
|Returns the maximum sub-group size for this kernel.
141+
|Returns the maximum sub-group size for this kernel launched with the specified work-group size.
139142

140-
|+info::kernel_sub_group::compile_sub_group_size+
143+
|+info::kernel_device_specific::compile_sub_group_size+
144+
|N/A
141145
|+uint32_t+
142146
|Returns the required sub-group size specified by the kernel, or 0 (if not specified).
143147
|===
@@ -155,7 +159,9 @@ To provide access to the +sub_group+ class, a new member function is added to th
155159
|Return the sub-group to which the work-item belongs.
156160
|===
157161

158-
The member functions of the sub-group class provide a mechanism for a developer to query properties of a sub-group and a work-item's position in it.
162+
==== Core Member Functions
163+
164+
The core member functions of the sub-group class provide a mechanism for a developer to query properties of a sub-group and a work-item's position in it.
159165

160166
|===
161167
|Member Functions|Description
@@ -199,6 +205,37 @@ parallel_for<class kernel>(..., [&](nd_item item)
199205
});
200206
----
201207

208+
==== Synchronization Functions
209+
210+
A sub-group barrier synchronizes all work-items in a sub-group, and orders memory operations with a memory fence to all address spaces.
211+
212+
|===
213+
|Member Functions|Description
214+
215+
|+void barrier() const+
216+
|Execute a sub-group barrier.
217+
|===
218+
219+
==== Shuffles
220+
221+
The shuffle sub-group functions perform arbitrary communication between pairs of work-items in a sub-group. Common patterns -- such as shifting all values in a sub-group by a fixed number of work-items -- are exposed as specialized shuffles that may be accelerated in hardware.
222+
223+
|===
224+
|Member Functions|Description
225+
226+
|+template <typename T> T shuffle(T x, id<1> local_id) const+
227+
|Exchange values of _x_ between work-items in the sub-group in an arbitrary pattern. Returns the value of _x_ from the work-item with the specified id. The value of _local_id_ must be between 0 and the sub-group size.
228+
229+
|+template <typename T> T shuffle_down(T x, uint32_t delta) const+
230+
|Exchange values of _x_ between work-items in the sub-group via a shift. Returns the value of _x_ from the work-item whose id is _delta_ larger than the calling work-item. The value returned when the result of id + _delta_ is greater than or equal to the sub-group size is undefined.
231+
232+
|+template <typename T> T shuffle_up(T x, uint32_t delta) const+
233+
|Exchange values of _x_ between work-items in the sub-group via a shift. Returns the value of _x_ from the work-item whose id is _delta_ smaller than the calling work-item. The value of returned when the result of id - _delta_ is less than zero is undefined.
234+
235+
|+template <typename T> T shuffle_xor(T x, id<1> mask) const+
236+
|Exchange pairs of values of _x_ between work-items in the sub-group. Returns the value of _x_ from the work-item whose id is equal to the exclusive-or of the calling work-item's id and _mask_. _mask_ must be a compile-time constant value that is the same for all work-items in the sub-group.
237+
|===
238+
202239
==== Sample Header
203240

204241
[source, c++]
@@ -222,6 +259,20 @@ struct sub_group {
222259
linear_id_type get_group_linear_id() const;
223260
range_type get_group_range() const;
224261
262+
void barrier() const;
263+
264+
template <typename T>
265+
T shuffle(T x, id<1> local_id) const;
266+
267+
template <typename T>
268+
T shuffle_down(T x, uint32_t delta) const;
269+
270+
template <typename T>
271+
T shuffle_up(T x, uint32_t delta) const;
272+
273+
template <typename T>
274+
T shuffle_xor(T x, id<1> mask) const;
275+
225276
};
226277
} // intel
227278
} // sycl
@@ -230,7 +281,19 @@ struct sub_group {
230281

231282
== Issues
232283

233-
None.
284+
. Should sub-group query results for specific kernels depend on work-group size?
285+
+
286+
--
287+
*RESOLVED*:
288+
Yes, this is required by OpenCL devices. Devices that do not require the work-group size can ignore the parameter.
289+
--
290+
291+
. Should sub-group "shuffles" be member functions?
292+
+
293+
--
294+
*RESOLVED*:
295+
Yes, the four shuffles in this extension are a defining feature of sub-groups. Higher-level algorithms (such as those in the +SubGroupAlgorithms+ proposal) may build on them, the same way as higher-level algorithms using work-groups build on work-group local memory.
296+
--
234297

235298
//. asd
236299
//+
@@ -247,6 +310,10 @@ None.
247310
|Rev|Date|Author|Changes
248311
|1|2019-04-19|John Pennycook|*Initial public working draft*
249312
|2|2020-03-16|John Pennycook|*Separate class definition from algorithms*
313+
|3|2020-04-21|John Pennycook|*Update max_sub_group_size query*
314+
|4|2020-04-21|John Pennycook|*Restore missing barrier function*
315+
|5|2020-04-21|John Pennycook|*Restore sub-group shuffles as member functions*
316+
|6|2020-04-22|John Pennycook|*Align with SYCL_INTEL_device_specific_kernel_queries*
250317
|========================================
251318
252319
//************************************************************************

0 commit comments

Comments
 (0)