[SYCL][Doc] Update sub-group extension docs #1330

Pennycook · 2020-03-16T16:05:55Z

Splits sub-group functionality into three extensions:

SubGroup (sub_group class and device queries)
SubGroupAlgorithms (GroupAlgorithm support and permute)
GroupMask (sub_group::mask_type and ballot)

Signed-off-by: John Pennycook [email protected]

When this moves to implementation, we should mark the existing sub-group functions as deprecated rather than suddenly removing them and replacing them with free-functions.

Splits sub-group functionality into three extensions: - SubGroup (sub_group class and device queries) - SubGroupAlgorithms (GroupAlgorithm support and permute) - GroupMask (sub_group::mask_type and ballot) Signed-off-by: John Pennycook <[email protected]>

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

Signed-off-by: John Pennycook <[email protected]>

AlexeySachkov · 2020-03-16T17:02:03Z

we should mark the existing sub-group functions as deprecated rather than suddenly removing them and replacing them with free-functions.

@Pennycook, we probably need to do the same with the spec, or at least let's leave a temporary file in the previous location of the spec, which redirects to the new specs and provides a link to the previous version as well

Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2020-03-16T18:18:46Z

@Pennycook, we probably need to do the same with the spec, or at least let's leave a temporary file in the previous location of the spec, which redirects to the new specs and provides a link to the previous version as well

That's a good idea. I like the temporary file with redirects solution, because forcing people to go through that extra step means they can't accidentally miss that things have changed.

sycl/doc/extensions/GroupMask/SYCL_INTEL_group_mask.asciidoc

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

AlexeySachkov · 2020-03-16T18:03:39Z

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

+
+== Overview
+
+A sub-group represents an implementation-defined grouping of work-items in a work-group. The work-items within a sub-group can communicate and synchronize independently of work-items in other sub-groups, and sub-groups are therefore commonly mapped to SIMD hardware where it exists.


In OpenCL, subgroups are always 1-dimensional, do we want to clearly state the same for SYCL?

Within a work-group work-items may be divided into sub-groups. The mapping of work-items to
sub-groups is implementation-defined and may be queried at runtime. While sub-groups may be
used in multi-dimensional work-groups, each sub-group is 1-dimensional and any given workitem may query which sub-group it is a member of.

However, even if this spelling theoretically allows 2d-subgroups, methods of the sub_group class below suggest that they are 1-dimensional.

BTW, OpenCL also provides some guarantees on work-items to sub-groups mapping during kernel execution, between dispatches of a given kernel and query operations:

Work items are mapped into sub-groups through a combination of compile-time decisions and
the parameters of the dispatch. The mapping to sub-groups is invariant for the duration of a
kernel’s execution, across dispatches of a given kernel with the same work-group dimensions,
between dispatches and query operations consistent with the dispatch parameterization, and from
one work-group to another within the dispatch (excluding the trailing edge work-groups in the
presence of non-uniform work-group sizes). In addition, all sub-groups within a work-group will
be the same size, apart from the sub-group with the maximum index which may be smaller if the
size of the work-group is not evenly divisible by the size of the sub-groups.

In the degenerate case, a single sub-group must be supported for each work-group. In this
situation all sub-group scope functions are equivalent to their work-group level equivalents.

Updated to partially address this. We need to consider the degenerate case more, as I'm not sure we necessarily need to follow OpenCL here.

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

AlexeySachkov · 2020-03-16T18:19:56Z

sycl/doc/extensions/SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc

+The permute sub-group functions perform arbitrary communication between pairs of work-items in a sub-group.  Common patterns -- such as shifting all values in a sub-group by a fixed number of work-items, or reversing the order of all values in a sub-group -- are exposed as specialized functions that may be accelerated in hardware.
+
+|===
+|Function|Description


Previous version of the spec had equivalents of the following built-in functions from cl_intel_subgroups OpenCL extension: intel_sub_group_shuffle_down and intel_sub_group_shuffle_up, while this algorithms library doesn't describe them.

I see that shift_left and shift_right are the most closer ones, but unlike the OpenCL extension, delta has to be the same across all work-items in a sub-group, which basically disallows cycle shifts. I don't know how important that use-case is, but this functions will almost, I guess, result in undefined elements in some elements of sub-group

This is a deliberate change. Other architectures can only implement shuffle_down/shuffle_up more efficiently than a general shuffle if it's known to be a shift (i.e. that delta is uniform). Developers can mimic the original behavior by using permute where the indices are computed from offsets.

It hadn't occurred to me that developers might be using non-uniform delta to implement cycle shifts. If that use-case is important, we should consider adding an equivalent of std::rotate.

It hadn't occurred to me that developers might be using non-uniform delta to implement cycle shifts. If that use-case is important, we should consider adding an equivalent of std::rotate.

I don't have any data about amount of such usages, but I guess this shouldn't be a blocker for this PR - we can always add such if requested

sycl/doc/extensions/SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc

Signed-off-by: John Pennycook <[email protected]>

Text before tables now identifies which class is being discussed. Signed-off-by: John Pennycook <[email protected]>

Signed-off-by: John Pennycook <[email protected]>

Unclear what this means if non-uniform work-groups are not supported. Signed-off-by: John Pennycook <[email protected]>

Signed-off-by: John Pennycook <[email protected]>

sycl/doc/extensions/GroupMask/SYCL_INTEL_group_mask.asciidoc

ProGTX · 2020-03-17T12:55:31Z

sycl/doc/extensions/SubGroupNDRange/SubGroupNDRange.md

-    T inclusive_scan(T x, BinaryOp binary_op) const;
-
-    template <typename T, class BinaryOp>
-    T inclusive_scan(T x, BinaryOp binary_op, T init) const;


I'm not sure where is the best place to ask this, but inclusive_scan seems inconsistent with exclusive_scan and reduce - init is the second argument in the other two functions, but the third argument here. This is also true of the document here: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc

The inconsistency here is unfortunate, but we inherited it from standard C++. See std::reduce, std::exclusive_scan and std::inclusive_scan.

My understanding is that this ordering is aligned with the ability to provide defaults for each argument: an inclusive scan is well defined without an initial value, but an exclusive scan is not.

Thanks, I wasn't aware of that

It is more for a mathematical reason than a C++ inconsistency.

For example for + the neutral element is 0 but for * the neutral element is 1. Obviously for all other data types and other operations it it is different, not even talking about non-mathematical use of these prefix operations, such as implementing a parser with scans...

Signed-off-by: John Pennycook <[email protected]>

AlexeySachkov

I'm not sure that I'm the right person to approve this PR, but at least I don't have any other questions or comments, LGTM

* sycl: (1209 commits) [SYCL] Check exit status get_device_count_by_type [SYCL][Doc] Update sub-group extension docs (intel#1330) [SYCL][Doc] Add leader to GroupAlgorithms (intel#1297) [SYCL] Add SYCL headers search path to default compilation options (intel#1347) [SYCL][PI] Add interoperability with generic handles to device and program classes (intel#1244) Move SPIR devicelib to top level (intel#1276) [SYCL][Driver] Improve fat static library support (intel#1319) [SYCL] Remove image_api LIT (intel#1349) [SYCL] Fix headers location for check-sycl-deploy target [SYCL] Allow gcc asm statements in kernel code (intel#1341) [SYCL] Add Intel FPGA force_pow2_depth attribute (intel#1284) [SPIR-V][NFC] Fix for building llvm-spirv with -DLLVM_LINK_LLVM_DYLIB=ON (intel#1323) [SYCL][NFC] Fix execution graph dump (intel#1331) [SYCL][Doc] Release SYCL_INTEL_enqueue_barrier extension document (intel#1199) [SYCL][USM] Fix USM malloc_shared and free to handle zero byte (intel#1273) [SYCL] Fix undefined symbols in async_work_group_copy (intel#1243) [SYCL] Mark calls to barrier and work-item functions as convergent [SYCL][CUDA] Fix CUDA plug-in build with enabled assertions (intel#1325) [SYCL][Test] Add OpenCL requirement to test/ordered_queue/prop.cpp (intel#1335) [SYCL][CUDA] Improve CUDA backend documentation (intel#1293) ...

AlexeySachkov · 2020-04-15T09:06:21Z

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc

+|+info::kernel_sub_group::max_num_sub_groups+
+|+uint32_t+
+|Returns the maximum number of sub-groups for this kernel.


@Pennycook,

I've just now realized that the list of info descriptors was reduced and previously we had:

info::kernel_sub_group::local_size_for_sub_group_count - removed now

info::kernel_sub_group::sub_group_count_for_ndrange - removed now

info::kernel_sub_group::max_sub_group_size_for_ndrange - replaced with info::kernel_sub_group::max_sub_group_size

I agree that (1) is not really needed and can be replaced by the following sequence:

ensure that desired number of sub-groups doesn't exceed max_num_sub_groups

query sub-group size selected for the kernel

multiple desired number of sub-groups by selected sub-group size

ensure that calculated value doesn't exceed maximum work-group size for a kernel

if previous check passed, use calculated work-group size

(2) is even easier to replace - once you know selected sub-group size, you can easily calculate amount of sub-groups per work-group

So, (1) and (2) depend on selected sub-groups size for a kernel, which is intended to be returned by (3) and here is a problem: according to OpenCL 2.1 spec, rev. 23:

Work items are mapped into sub-groups through a combination of compile-time decisions *and
the parameters of the dispatch*. The mapping to sub-groups is invariant for the duration of a
kernel’s execution, *across dispatches of a given kernel with the same work-group dimensions*,
between dispatches and query operations consistent with the dispatch parameterization, and from
one work-group to another within the dispatch (excluding the trailing edge work-groups in the
presence of non-uniform work-group sizes).

Spec explicitly states that selected sub-groups size for a particular kernel might be changed from one dispatch to another if work-group size is changed.

Strictly speaking, we cannot query exact max_sub_group_size from OpenCL RT without specifying exact work-group size which will be used for kernel dispatch.

Sorry for the delayed response. I've opened #1565 to address this.

Pennycook added the spec extension All issues/PRs related to extensions specifications label Mar 16, 2020

Pennycook assigned bader Mar 16, 2020

mkinsner reviewed Mar 16, 2020

View reviewed changes

sycl/doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc Outdated Show resolved Hide resolved

mkinsner previously approved these changes Mar 16, 2020

View reviewed changes

Pennycook added 2 commits March 16, 2020 09:45

[SYCL][Doc] Revision v1.2.1-6 to Revision 6

b4e84cf

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Fix copyright year

3abdfa5

Signed-off-by: John Pennycook <[email protected]>

Pennycook dismissed mkinsner’s stale review via 3abdfa5 March 16, 2020 16:47

bader assigned AlexeySachkov and unassigned bader Mar 16, 2020

Pennycook added 2 commits March 16, 2020 11:10

[SYCL][Doc] Point from old extension to new

e7dc7c2

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Link to archived sub-group extension

16ae2ff

Signed-off-by: John Pennycook <[email protected]>

AlexeySachkov reviewed Mar 16, 2020

View reviewed changes

Pennycook added 8 commits March 16, 2020 11:41

[SYCL][Doc] Fix typo in GroupMask

807dc5b

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Space template must be local or global

ff98d10

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Specify class before member functions

039050a

Text before tables now identifies which class is being discussed. Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Clarify sub-group dimensions and size

fae0eab

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Clarify attribute propagation behavior

58b2f7d

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Remove get_uniform_group_range()

99668d4

Unclear what this means if non-uniform work-groups are not supported. Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Fix group_mask operator definitions

0b0c474

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Fix table formatting

b0825b0

Signed-off-by: John Pennycook <[email protected]>

AlexeySachkov reviewed Mar 17, 2020

View reviewed changes

sycl/doc/extensions/GroupMask/SYCL_INTEL_group_mask.asciidoc Outdated Show resolved Hide resolved

ProGTX reviewed Mar 17, 2020

View reviewed changes

[SYCL][Doc] << and >> perform logical shifts

11a73fb

Signed-off-by: John Pennycook <[email protected]>

AlexeySachkov approved these changes Mar 18, 2020

View reviewed changes

bader merged commit d9b178f into intel:sycl Mar 19, 2020

Pennycook deleted the sub-group-docs branch March 19, 2020 17:06

AlexeySachkov reviewed Apr 15, 2020

View reviewed changes


		== Overview

		A sub-group represents an implementation-defined grouping of work-items in a work-group. The work-items within a sub-group can communicate and synchronize independently of work-items in other sub-groups, and sub-groups are therefore commonly mapped to SIMD hardware where it exists.

[SYCL][Doc] Update sub-group extension docs #1330

[SYCL][Doc] Update sub-group extension docs #1330

Uh oh!

Conversation

Pennycook commented Mar 16, 2020

Uh oh!

Uh oh!

AlexeySachkov commented Mar 16, 2020

Uh oh!

Pennycook commented Mar 16, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!