Skip to content

Commit 4f3d7e1

Browse files
authored
[SYCL][Doc] Add named sub-group sizes extension (#5714)
This extension aims to simplify the process of using sub-groups by introducing the notion of named sub-group sizes, allowing developers to request a sub-group size that meets certain requirements at host compile-time and deferring the selection of a specific sub-group size until the kernel is compiled for a specific device. Signed-off-by: John Pennycook <[email protected]>
1 parent 960e9b9 commit 4f3d7e1

File tree

1 file changed

+304
-0
lines changed

1 file changed

+304
-0
lines changed
Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
= sycl_ext_oneapi_named_sub_group_sizes
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2019-2022 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
30+
== Contact
31+
32+
To report problems with this extension, please open a new issue at:
33+
34+
https://github.com/intel/llvm/issues
35+
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 4 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
This extension also depends on the following other SYCL extensions:
44+
45+
* link:../experimental/sycl_ext_oneapi_properties.asciidoc[
46+
sycl_ext_oneapi_properties]
47+
48+
* link:../proposed/sycl_ext_oneapi_kernel_properties.asciidoc[
49+
sycl_ext_oneapi_kernel_properties]
50+
51+
52+
== Status
53+
54+
This is a proposed extension specification, intended to gather community
55+
feedback. Interfaces defined in this specification may not be implemented yet
56+
or may be in a preliminary state. The specification itself may also change in
57+
incompatible ways before it is finalized. Shipping software products should not
58+
rely on APIs defined in this specification.
59+
60+
61+
== Overview
62+
63+
SYCL provides a mechanism to set a required sub-group size for a kernel via
64+
an attribute, and the sycl_ext_oneapi_kernel_properties extension provides an
65+
equivalent property.
66+
67+
Either mechanism is sufficient when tuning individual kernels for specific
68+
devices, but their usage quickly becomes complicated in real-life scenarios
69+
because:
70+
71+
1. An integral sub-group size must be provided at host compile-time.
72+
73+
2. The sub-group sizes supported by a device are not known until run-time.
74+
75+
3. It is common for the same sub-group size to be used for all kernels
76+
(e.g. because the sub-group size is reflected in data structures).
77+
78+
Applications wishing to write portable sub-group code that can target multiple
79+
architectures must therefore multi-version their C++ code (e.g. via templates),
80+
dispatch to the correct kernel(s) based on the result of a run-time query, and
81+
repeat this process for every kernel individually.
82+
83+
This extension aims to simplify the process of using sub-groups by introducing
84+
the notion of _named_ sub-group sizes, allowing developers to request a
85+
sub-group size that meets certain requirements at host compile-time and
86+
deferring the selection of a specific sub-group size until the kernel is
87+
compiled for a specific device.
88+
89+
This extension also defines the default behavior of sub-groups in SYCL code to
90+
improve the out-of-the-box experience for new developers, without preventing
91+
experts and existing developers from requesting the existing compiler behavior.
92+
93+
94+
== Specification
95+
96+
=== Feature test macro
97+
98+
This extension provides a feature-test macro as described in the core SYCL
99+
specification. An implementation supporting this extension must predefine the
100+
macro `SYCL_EXT_ONEAPI_NAMED_SUB_GROUP_SIZES` to one of the values defined in the
101+
table below. Applications can test for the existence of this macro to
102+
determine if the implementation supports this feature, or applications can test
103+
the macro's value to determine which of the extension's features the
104+
implementation supports.
105+
106+
[%header,cols="1,5"]
107+
|===
108+
|Value
109+
|Description
110+
111+
|1
112+
|The APIs of this experimental extension are not versioned, so the
113+
feature-test macro always has this value.
114+
|===
115+
116+
117+
=== Changes to sub-group behavior
118+
119+
Much of the behavior related to sub-groups in SYCL 2020 is
120+
implementation-defined. Different kernels may use different sub-group sizes,
121+
and even the same kernel may use different kernels on some devices (e.g. for
122+
different ND-range launch configurations).
123+
124+
The extension introduces simpler behavior for sub-groups:
125+
126+
- If no sub-group size property appears on a kernel or `SYCL_EXTERNAL`
127+
function, the default behavior of an implementation must be to compile and
128+
execute the kernel or function using a device's _primary_ sub-group size. The
129+
primary sub-group size must be compatible with all core language features.
130+
131+
- If a developer does not require a stable sub-group size across all kernels
132+
and kernel launches, they can explicitly request an _automatic_ sub-group
133+
size chosen by the implementation.
134+
135+
- Implementations are free to provide mechanisms which override the default
136+
sub-group behavior (e.g. via compiler flags), but developers must use this
137+
mechanism explicitly in order to opt-in to any change in behavior.
138+
139+
140+
=== Device queries
141+
142+
A new `info::device::primary_sub_group_size` device query is introduced to
143+
query a device's primary sub-group size.
144+
145+
[%header,cols="1,5,5"]
146+
|===
147+
|Device Descriptor
148+
|Return Type
149+
|Description
150+
151+
|`info::device::primary_sub_group_size`
152+
|`uint32_t`
153+
|Return a sub-group size supported by this device that is guaranteed to support
154+
all core language features for the device.
155+
|===
156+
157+
158+
=== Properties
159+
160+
```c++
161+
namespace sycl {
162+
namespace ext {
163+
namespace oneapi {
164+
namespace experimental {
165+
166+
struct named_sub_group_size {
167+
static constexpr uint32_t primary = /* unspecified */,
168+
static constexpr uint32_t automatic = /* unspecified */,
169+
};
170+
171+
inline constexpr sub_group_size_key::value_t<named_sub_group_size::primary> sub_group_size_primary;
172+
173+
inline constexpr sub_group_size_key::value_t<named_sub_group_size::automatic> sub_group_size_automatic;
174+
175+
} // namespace experimental
176+
} // namespace oneapi
177+
} // namespace ext
178+
} // namespace sycl
179+
```
180+
181+
NOTE: The named sub-group size properties are deliberately designed to reuse as
182+
much of the existing `sub_group_size` property infrastructure as possible.
183+
Implementations are free to choose the integral value associated with each
184+
named sub-group type, but it is expected that many implementations will use
185+
values like 0 (which is otherwise not a meaningful sub-group size) or -1
186+
(which would otherwise correspond to a sub-group size so large it is unlikely
187+
any device would support it).
188+
189+
|===
190+
|Property|Description
191+
192+
|`sub_group_size_primary`
193+
|The `sub_group_size_primary` property adds the requirement that the kernel
194+
must be compiled and executed with the primary sub-group size of the device to
195+
which the kernel is submitted (as reported by the
196+
`info::device::primary_sub_group_size` query).
197+
198+
|`sub_group_size_automatic`
199+
|The `sub_group_size_automatic` property adds the requirement that the kernel
200+
can be compiled and executed with any of the valid sub-group sizes associated
201+
with the device to which the kernel is submitted (as reported by the
202+
`info::device::sub_group_sizes` query). The manner in which the sub-group size
203+
is selected is implementation-defined.
204+
205+
|===
206+
207+
At most one of the `sub_group_size`, `sub_group_size_primary` and
208+
`sub_group_size_automatic` properties may be associated with a kernel or
209+
device function.
210+
211+
NOTE: No special handling is required to detect this case, since
212+
`sub_group_size_primary` and `sub_group_size_automatic` are simply named
213+
shorthands for properties associated with `sub_group_size_key`.
214+
215+
There are special requirements whenever a device function defined in one
216+
translation unit makes a call to a device function that is defined in a second
217+
translation unit. In such a case, the second device function is always declared
218+
using `SYCL_EXTERNAL`. If the kernel calling these device functions is defined
219+
using a sub-group size property, the functions declared using `SYCL_EXTERNAL`
220+
must be similarly decorated to ensure that the same sub-group size is used.
221+
This decoration must exist in both the translation unit making the call and
222+
also in the translation unit that defines the function. If the sub-group size
223+
property is missing in the translation unit that makes the call, or if the
224+
sub-group size of the called function does not match the sub-group size of the
225+
calling function, the program is ill-formed and the compiler must raise a
226+
diagnostic.
227+
228+
Note that a compiler may choose a different sub-group size for each kernel and
229+
`SYCL_EXTERNAL` function using an automatic sub-group size. If kernels with an
230+
automatic sub-group size call `SYCL_EXTERNAL` functions using an automatic
231+
sub-group size, the program may be ill-formed. The behavior when
232+
`SYCL_EXTERNAL` is used in conjunction with an automatic sub-group size is
233+
implementation-defined, and code relying on specific behavior should not be
234+
expected to be portable across implementations. If a kernel calls a
235+
`SYCL_EXTERNAL` function with an incompatible sub-group size, the compiler must
236+
raise a diagnostic -- it is expected that this diagnostic will be raised during
237+
link-time, since this is the first time the compiler will see both translation
238+
units together.
239+
240+
241+
=== DPC++ compiler flags
242+
243+
This non-normative section describes command line flags that the DPC++ compiler
244+
supports. Other compilers are free to provide their own command line flags (if
245+
any).
246+
247+
The `-fsycl-default-sub-group-size` flag controls the default sub-group size
248+
used within a translation unit, which applies to all kernels and
249+
`SYCL_EXTERNAL` functions without an explicitly specified sub-group size.
250+
251+
If the argument passed to `-fsycl-default-sub-group-size` is an integer `S`,
252+
all kernels and `SYCL_EXTERNAL` functions without an explicitly specified
253+
sub-group size are compiled as-if `sub_group_size<S>` was specified as a
254+
property of that kernel or function.
255+
256+
If the argument passed to `-fsycl-default-sub-group-size` is a string `NAME`,
257+
all kernels and `SYCL_EXTERNAL` functions without an explicitly specified
258+
sub-group size are compiled as-if `sub_group_size_NAME` was
259+
specified as a property of that kernel or function.
260+
261+
262+
== Implementation notes
263+
264+
This non-normative section provides information about one possible
265+
implementation of this extension. It is not part of the specification of the
266+
extension's API.
267+
268+
The existing mechanism of describing a required sub-group size in SPIR-V may
269+
need to be augmented to support named sub-group sizes. The existing sub-group
270+
size descriptors could be used with reserved values (similar to the template
271+
arguments in the properties), or new descriptors could be created for each
272+
case.
273+
274+
Device compilers will need to be taught to interpret these named sub-group
275+
sizes as equivalent to a device-specific integral sub-group size at
276+
compile-time.
277+
278+
279+
== Issues
280+
281+
. What should the sub-group size compatible with all features be called?
282+
+
283+
--
284+
*RESOLVED*: The name adopted is "primary", to convey that it is an integral
285+
part of sub-group support provided by the device. Other names considered are
286+
listed here for posterity: "default", "stable", "fixed", "core". These terms
287+
are easy to misunderstand (i.e. the "default" size may not be chosen by
288+
default, the "stable" size is unrelated to the software release cycle, the
289+
"fixed" sub-group size may change between devices or compiler releases, the
290+
"core" size is unrelated to hardware cores).
291+
--
292+
293+
. How does sub-group size interact with `SYCL_EXTERNAL` functions? The current
294+
behavior requires exact matching. Should this be relaxed to allow alternative
295+
implementations (e.g. link-time optimization, multi-versioning)?
296+
+
297+
--
298+
*RESOLVED*: Exact matching is required to ensure that developers can reason about
299+
the portability of their code across different implementations. Setting the
300+
default sub-group size to "primary" and providing an override flag to select
301+
"automatic" everywhere means that only advanced developers who are tuning
302+
sub-group size on a per-kernel basis will have to worry about potential
303+
matching issues.
304+
--

0 commit comments

Comments
 (0)