-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][Doc] Add named sub-group sizes extension #5714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
09872ed
[SYCL][Doc] Add named sub-group sizes extension
Pennycook 374aa57
Fix link
Pennycook d8a133b
Fix formatting
Pennycook d90b030
Move from experimental/ to proposed/
Pennycook 5786090
Fix feature macro name
Pennycook b2585f5
Clarify that compiler flags are DPC++-specific
Pennycook ee068fa
Use correct status for "proposed" extension
Pennycook b35f5b0
Tweak description of default behavior
Pennycook 23cca73
Replace enum with struct to avoid polluting sycl::
Pennycook File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
304 changes: 304 additions & 0 deletions
304
sycl/doc/extensions/proposed/sycl_ext_oneapi_named_sub_group_sizes.asciidoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,304 @@ | ||
= sycl_ext_oneapi_named_sub_group_sizes | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2019-2022 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 4 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
This extension also depends on the following other SYCL extensions: | ||
|
||
* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ | ||
sycl_ext_oneapi_properties] | ||
|
||
* link:../proposed/sycl_ext_oneapi_kernel_properties.asciidoc[ | ||
sycl_ext_oneapi_kernel_properties] | ||
|
||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. Interfaces defined in this specification may not be implemented yet | ||
or may be in a preliminary state. The specification itself may also change in | ||
incompatible ways before it is finalized. Shipping software products should not | ||
rely on APIs defined in this specification. | ||
|
||
|
||
== Overview | ||
|
||
SYCL provides a mechanism to set a required sub-group size for a kernel via | ||
an attribute, and the sycl_ext_oneapi_kernel_properties extension provides an | ||
equivalent property. | ||
|
||
Either mechanism is sufficient when tuning individual kernels for specific | ||
devices, but their usage quickly becomes complicated in real-life scenarios | ||
because: | ||
|
||
1. An integral sub-group size must be provided at host compile-time. | ||
|
||
2. The sub-group sizes supported by a device are not known until run-time. | ||
|
||
3. It is common for the same sub-group size to be used for all kernels | ||
(e.g. because the sub-group size is reflected in data structures). | ||
|
||
Applications wishing to write portable sub-group code that can target multiple | ||
architectures must therefore multi-version their C++ code (e.g. via templates), | ||
dispatch to the correct kernel(s) based on the result of a run-time query, and | ||
repeat this process for every kernel individually. | ||
|
||
This extension aims to simplify the process of using sub-groups by introducing | ||
the notion of _named_ sub-group sizes, allowing developers to request a | ||
sub-group size that meets certain requirements at host compile-time and | ||
deferring the selection of a specific sub-group size until the kernel is | ||
compiled for a specific device. | ||
|
||
This extension also defines the default behavior of sub-groups in SYCL code to | ||
improve the out-of-the-box experience for new developers, without preventing | ||
experts and existing developers from requesting the existing compiler behavior. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_NAMED_SUB_GROUP_SIZES` to one of the values defined in the | ||
table below. Applications can test for the existence of this macro to | ||
determine if the implementation supports this feature, or applications can test | ||
the macro's value to determine which of the extension's features the | ||
implementation supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
|
||
=== Changes to sub-group behavior | ||
|
||
Much of the behavior related to sub-groups in SYCL 2020 is | ||
implementation-defined. Different kernels may use different sub-group sizes, | ||
and even the same kernel may use different kernels on some devices (e.g. for | ||
different ND-range launch configurations). | ||
|
||
The extension introduces simpler behavior for sub-groups: | ||
|
||
- If no sub-group size property appears on a kernel or `SYCL_EXTERNAL` | ||
function, the default behavior of an implementation must be to compile and | ||
execute the kernel or function using a device's _primary_ sub-group size. The | ||
primary sub-group size must be compatible with all core language features. | ||
|
||
- If a developer does not require a stable sub-group size across all kernels | ||
and kernel launches, they can explicitly request an _automatic_ sub-group | ||
size chosen by the implementation. | ||
|
||
- Implementations are free to provide mechanisms which override the default | ||
sub-group behavior (e.g. via compiler flags), but developers must use this | ||
mechanism explicitly in order to opt-in to any change in behavior. | ||
|
||
|
||
=== Device queries | ||
|
||
A new `info::device::primary_sub_group_size` device query is introduced to | ||
query a device's primary sub-group size. | ||
|
||
[%header,cols="1,5,5"] | ||
|=== | ||
|Device Descriptor | ||
|Return Type | ||
|Description | ||
|
||
|`info::device::primary_sub_group_size` | ||
|`uint32_t` | ||
|Return a sub-group size supported by this device that is guaranteed to support | ||
all core language features for the device. | ||
|=== | ||
|
||
|
||
=== Properties | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
struct named_sub_group_size { | ||
static constexpr uint32_t primary = /* unspecified */, | ||
static constexpr uint32_t automatic = /* unspecified */, | ||
}; | ||
|
||
inline constexpr sub_group_size_key::value_t<named_sub_group_size::primary> sub_group_size_primary; | ||
|
||
inline constexpr sub_group_size_key::value_t<named_sub_group_size::automatic> sub_group_size_automatic; | ||
|
||
} // namespace experimental | ||
} // namespace oneapi | ||
} // namespace ext | ||
} // namespace sycl | ||
``` | ||
|
||
NOTE: The named sub-group size properties are deliberately designed to reuse as | ||
much of the existing `sub_group_size` property infrastructure as possible. | ||
Implementations are free to choose the integral value associated with each | ||
named sub-group type, but it is expected that many implementations will use | ||
values like 0 (which is otherwise not a meaningful sub-group size) or -1 | ||
(which would otherwise correspond to a sub-group size so large it is unlikely | ||
any device would support it). | ||
|
||
|=== | ||
|Property|Description | ||
|
||
|`sub_group_size_primary` | ||
|The `sub_group_size_primary` property adds the requirement that the kernel | ||
must be compiled and executed with the primary sub-group size of the device to | ||
which the kernel is submitted (as reported by the | ||
`info::device::primary_sub_group_size` query). | ||
|
||
|`sub_group_size_automatic` | ||
|The `sub_group_size_automatic` property adds the requirement that the kernel | ||
can be compiled and executed with any of the valid sub-group sizes associated | ||
with the device to which the kernel is submitted (as reported by the | ||
`info::device::sub_group_sizes` query). The manner in which the sub-group size | ||
is selected is implementation-defined. | ||
|
||
|=== | ||
|
||
At most one of the `sub_group_size`, `sub_group_size_primary` and | ||
`sub_group_size_automatic` properties may be associated with a kernel or | ||
device function. | ||
|
||
NOTE: No special handling is required to detect this case, since | ||
`sub_group_size_primary` and `sub_group_size_automatic` are simply named | ||
shorthands for properties associated with `sub_group_size_key`. | ||
|
||
There are special requirements whenever a device function defined in one | ||
translation unit makes a call to a device function that is defined in a second | ||
translation unit. In such a case, the second device function is always declared | ||
using `SYCL_EXTERNAL`. If the kernel calling these device functions is defined | ||
using a sub-group size property, the functions declared using `SYCL_EXTERNAL` | ||
must be similarly decorated to ensure that the same sub-group size is used. | ||
This decoration must exist in both the translation unit making the call and | ||
also in the translation unit that defines the function. If the sub-group size | ||
property is missing in the translation unit that makes the call, or if the | ||
sub-group size of the called function does not match the sub-group size of the | ||
calling function, the program is ill-formed and the compiler must raise a | ||
diagnostic. | ||
|
||
Note that a compiler may choose a different sub-group size for each kernel and | ||
`SYCL_EXTERNAL` function using an automatic sub-group size. If kernels with an | ||
automatic sub-group size call `SYCL_EXTERNAL` functions using an automatic | ||
sub-group size, the program may be ill-formed. The behavior when | ||
`SYCL_EXTERNAL` is used in conjunction with an automatic sub-group size is | ||
implementation-defined, and code relying on specific behavior should not be | ||
expected to be portable across implementations. If a kernel calls a | ||
`SYCL_EXTERNAL` function with an incompatible sub-group size, the compiler must | ||
raise a diagnostic -- it is expected that this diagnostic will be raised during | ||
link-time, since this is the first time the compiler will see both translation | ||
units together. | ||
|
||
|
||
=== DPC++ compiler flags | ||
|
||
This non-normative section describes command line flags that the DPC++ compiler | ||
supports. Other compilers are free to provide their own command line flags (if | ||
any). | ||
|
||
The `-fsycl-default-sub-group-size` flag controls the default sub-group size | ||
used within a translation unit, which applies to all kernels and | ||
`SYCL_EXTERNAL` functions without an explicitly specified sub-group size. | ||
|
||
If the argument passed to `-fsycl-default-sub-group-size` is an integer `S`, | ||
all kernels and `SYCL_EXTERNAL` functions without an explicitly specified | ||
sub-group size are compiled as-if `sub_group_size<S>` was specified as a | ||
property of that kernel or function. | ||
|
||
If the argument passed to `-fsycl-default-sub-group-size` is a string `NAME`, | ||
all kernels and `SYCL_EXTERNAL` functions without an explicitly specified | ||
sub-group size are compiled as-if `sub_group_size_NAME` was | ||
specified as a property of that kernel or function. | ||
|
||
|
||
== Implementation notes | ||
|
||
This non-normative section provides information about one possible | ||
implementation of this extension. It is not part of the specification of the | ||
extension's API. | ||
|
||
The existing mechanism of describing a required sub-group size in SPIR-V may | ||
need to be augmented to support named sub-group sizes. The existing sub-group | ||
size descriptors could be used with reserved values (similar to the template | ||
arguments in the properties), or new descriptors could be created for each | ||
case. | ||
|
||
Device compilers will need to be taught to interpret these named sub-group | ||
sizes as equivalent to a device-specific integral sub-group size at | ||
compile-time. | ||
|
||
|
||
== Issues | ||
|
||
. What should the sub-group size compatible with all features be called? | ||
+ | ||
-- | ||
*RESOLVED*: The name adopted is "primary", to convey that it is an integral | ||
part of sub-group support provided by the device. Other names considered are | ||
listed here for posterity: "default", "stable", "fixed", "core". These terms | ||
are easy to misunderstand (i.e. the "default" size may not be chosen by | ||
default, the "stable" size is unrelated to the software release cycle, the | ||
"fixed" sub-group size may change between devices or compiler releases, the | ||
"core" size is unrelated to hardware cores). | ||
-- | ||
|
||
. How does sub-group size interact with `SYCL_EXTERNAL` functions? The current | ||
behavior requires exact matching. Should this be relaxed to allow alternative | ||
implementations (e.g. link-time optimization, multi-versioning)? | ||
+ | ||
-- | ||
*RESOLVED*: Exact matching is required to ensure that developers can reason about | ||
the portability of their code across different implementations. Setting the | ||
default sub-group size to "primary" and providing an override flag to select | ||
"automatic" everywhere means that only advanced developers who are tuning | ||
sub-group size on a per-kernel basis will have to worry about potential | ||
matching issues. | ||
-- |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is an unscoped enum, the names
primary
andautomatic
are also in thesycl::ext::oneapi::experimental
namespace. I suspect that was not your intent? Unfortunately, you cannot just make this a scoped enum because then the type ofprimary
andautomatic
are no longer implicitly convertible touint32_t
. Maybe this is better?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right... I avoided making it a scoped enum precisely because of the
uint32_t
issue.What about:
I don't have a strong preference here, just floating this idea because I think it would allow the same syntax (
named_sub_group_size::primary
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks good too. I don't have a strong preference between the two styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone with the
struct
approach in 23cca73, simply because it meant changing fewer lines.I'd be happy to revisit this later.