[SYCL][CUDA] Add sub-group shuffles #2623

Pennycook · 2020-10-09T18:41:03Z

Sub-group shuffles map to one of the following intrinsics:

__nvvm_shfl_sync_idx_i32
__nvvm_shfl_sync_up_i32
__nvvm_shfl_sync_down_i32
__nvvm_shfl_sync_xor_i32

Implemented in the SYCL headers instead of libclc for two reasons:

The SPIR-V implementation uses an extension (__spirv_SubgroupShuffleINTEL)
We currently need to use enable_if to generate different instruction
sequences for some types, and these cases differ between SPIR-V/PTX.

Signed-off-by: John Pennycook [email protected]

Sub-group shuffles map to one of the following intrinsics: - __nvvm_shfl_sync_idx_i32 - __nvvm_shfl_sync_up_i32 - __nvvm_shfl_sync_down_i32 - __nvvm_shfl_sync_xor_i32 Implemented in the SYCL headers instead of libclc for two reasons: 1) The SPIR-V implementation uses an extension (__spirv_SubgroupShuffleINTEL) 2) We currently need to use enable_if to generate different instruction sequences for some types, and these cases differ between SPIR-V/PTX. Signed-off-by: John Pennycook <[email protected]>

Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2020-10-09T20:35:15Z

@bader: Is this failing because of the warnings, or something else? The warnings don't seem to be related to any of the changes here.

bader · 2020-10-11T12:51:10Z

Is this failing because of the warnings

LIT tests do not fail because of warnings - warnings are ignored.

According to the log, reduction_nd_s0_rw.cpp.tmp.out returned non-zero (i.e. non-successful) error code - -6.

error: command failed with exit status: -6

Pennycook · 2020-10-12T14:11:36Z

Thanks, @bader. I saw the warning right before that error and assumed they were related, but I see the mistake now. I'll try and work out why reduction_nd_s0_rw.cpp is failing.

Pennycook · 2020-10-12T16:10:33Z

@bader: I can't reproduce the failure locally, and when I triggered a rebuild it passed.

bader · 2020-10-12T16:25:04Z

I can't reproduce the failure locally, and when I triggered a rebuild it passed.

Sounds like either corrupted environment on BuildBot machine or sporadic issue. I'll be watching for new reduction_nd_s0_rw.cpp failures.

Pennycook · 2020-10-14T00:35:25Z

@AlexeySachkov, @rbegam: Now that the tests are all passing, I think this is ready for review.

sycl/include/CL/sycl/detail/spirv.hpp

AlexeySachkov · 2020-10-14T10:24:52Z

sycl/test/sub_group/generic-shuffle.cpp

@@ -216,7 +213,7 @@ void check_struct(queue &Queue, Generator &Gen, size_t G = 256, size_t L = 64) {

 int main() {
  queue Queue;
-  if (!Queue.get_device().has_extension("cl_intel_subgroups")) {
+  if (Queue.get_device().is_host()) {


Strictly speaking, this patch doesn't bring shuffles support to all non-host devices and "cl_intel_subgroups" extension is still required, but only for non-CUDA devices.

You're right. I wasn't sure what to do here, really -- the fact that "cl_intel_subgroups" is required is backend-specific.

@gmlueck: Do we have the necessary infrastructure implemented to query whether a particular extension from sycl/docs/extensions is supported for a given device?

I guess you are thinking of adding a new "aspect" and then checking it like:

if (Queue.get_device().has(aspect::ext_intel_has_subgroups)) {

Correct?

@glyons-intel has added support for some aspects already, so it probably wouldn't be too hard to add another.

Thanks, @gmlueck. That's the sort of thing I was thinking of. But as you noted elsewhere, we really have two options:

Define a new SYCL aspect describing devices that implement the sub-group extension

Commit to the path that all SYCL devices are intended to implement sub-groups

Option 1) addresses @AlexeySachkov's concern about whether the tests are accurately reflecting current implementation. But I think I prefer 2) as the correct long-term direction. Sub-groups aren't an optional feature in SYCL 2020 provisional, and we don't expect users to have to check any sort of extension before using them.

If we go with 1), we should probably define aspects for all of our device-specific extensions and update all the tests accordingly. If we go with 2), we should phase out our use of __spirv_SubgroupShuffleINTEL in favor of the standard __spirv_OpGroupNonUniformShuffle.

Both of these seem fairly big jobs and outside of the scope of this PR. @AlexeySachkov, are you okay to defer the resolution here until a future PR?

While working on the group algorithms, I also stumbled across this check: https://github.com/intel/llvm/blob/sycl/sycl/test/group-algorithm/reduce.cpp#L67. I'd be okay with implementing something similar here in the short-term, if that's preferred.

@AlexeySachkov, are you okay to defer the resolution here until a future PR?

Sure, no objections

AlexeySachkov

The change itself looks good to me. However, I do have concern about test changes: this is probably not critical, but might create inconveniences if someone launched tests on low-level runtimes other than we have in CI.

Document CTS fixtures and macros.

Pennycook added 2 commits October 9, 2020 14:12

[SYCL][CUDA] Enable sub-group shuffle tests

e9d7cc2

Signed-off-by: John Pennycook <[email protected]>

Pennycook added enhancement New feature or request spec extension All issues/PRs related to extensions specifications cuda CUDA back-end labels Oct 9, 2020

Pennycook requested review from AlexeySachkov and a team as code owners October 9, 2020 18:41

Pennycook requested a review from rbegam October 9, 2020 18:41

bader mentioned this pull request Oct 11, 2020

[SYCL][CUDA] Comping SYCL code with subgroups failing #1767

Closed

vladimirlaz reviewed Oct 14, 2020

View reviewed changes

sycl/include/CL/sycl/detail/spirv.hpp Show resolved Hide resolved

AlexeySachkov reviewed Oct 14, 2020

View reviewed changes

AlexeySachkov approved these changes Oct 14, 2020

View reviewed changes

vladimirlaz approved these changes Oct 15, 2020

View reviewed changes

romanovvlad merged commit f189e41 into intel:sycl Oct 15, 2020

Pennycook deleted the cuda-sub-groups branch January 28, 2021 18:23

kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025

Merge pull request intel#2623 from aarongreig/aaron/documentFixtures

af4f331

Document CTS fixtures and macros.

Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025

Merge pull request #2623 from aarongreig/aaron/documentFixtures

e02ed86

Document CTS fixtures and macros.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Add sub-group shuffles #2623

[SYCL][CUDA] Add sub-group shuffles #2623

Uh oh!

Pennycook commented Oct 9, 2020

Uh oh!

Pennycook commented Oct 9, 2020

Uh oh!

bader commented Oct 11, 2020

Uh oh!

Pennycook commented Oct 12, 2020

Uh oh!

Pennycook commented Oct 12, 2020

Uh oh!

bader commented Oct 12, 2020

Uh oh!

Pennycook commented Oct 14, 2020

Uh oh!

Uh oh!

AlexeySachkov Oct 14, 2020

Uh oh!

Pennycook Oct 14, 2020

Uh oh!

gmlueck Oct 14, 2020

Uh oh!

Pennycook Oct 14, 2020

Uh oh!

Pennycook Oct 14, 2020

Uh oh!

AlexeySachkov Oct 15, 2020

Uh oh!

AlexeySachkov left a comment

Uh oh!

Uh oh!

[SYCL][CUDA] Add sub-group shuffles #2623

[SYCL][CUDA] Add sub-group shuffles #2623

Uh oh!

Conversation

Pennycook commented Oct 9, 2020

Uh oh!

Pennycook commented Oct 9, 2020

Uh oh!

bader commented Oct 11, 2020

Uh oh!

Pennycook commented Oct 12, 2020

Uh oh!

Pennycook commented Oct 12, 2020

Uh oh!

bader commented Oct 12, 2020

Uh oh!

Pennycook commented Oct 14, 2020

Uh oh!

Uh oh!

AlexeySachkov Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

Pennycook Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

gmlueck Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

Pennycook Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

Pennycook Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov Oct 15, 2020

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!