Skip to content

[SYCL][CUDA] Add sub-group shuffles #2623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 15, 2020
Merged

Conversation

Pennycook
Copy link
Contributor

Sub-group shuffles map to one of the following intrinsics:

  • __nvvm_shfl_sync_idx_i32
  • __nvvm_shfl_sync_up_i32
  • __nvvm_shfl_sync_down_i32
  • __nvvm_shfl_sync_xor_i32

Implemented in the SYCL headers instead of libclc for two reasons:

  1. The SPIR-V implementation uses an extension (__spirv_SubgroupShuffleINTEL)
  2. We currently need to use enable_if to generate different instruction
    sequences for some types, and these cases differ between SPIR-V/PTX.

Signed-off-by: John Pennycook [email protected]

Sub-group shuffles map to one of the following intrinsics:
- __nvvm_shfl_sync_idx_i32
- __nvvm_shfl_sync_up_i32
- __nvvm_shfl_sync_down_i32
- __nvvm_shfl_sync_xor_i32

Implemented in the SYCL headers instead of libclc for two reasons:
1) The SPIR-V implementation uses an extension (__spirv_SubgroupShuffleINTEL)
2) We currently need to use enable_if to generate different instruction
   sequences for some types, and these cases differ between SPIR-V/PTX.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook added enhancement New feature or request spec extension All issues/PRs related to extensions specifications cuda CUDA back-end labels Oct 9, 2020
@Pennycook Pennycook requested review from AlexeySachkov and a team as code owners October 9, 2020 18:41
@Pennycook Pennycook requested a review from rbegam October 9, 2020 18:41
@Pennycook
Copy link
Contributor Author

@bader: Is this failing because of the warnings, or something else? The warnings don't seem to be related to any of the changes here.

@bader
Copy link
Contributor

bader commented Oct 11, 2020

Is this failing because of the warnings

LIT tests do not fail because of warnings - warnings are ignored.

According to the log, reduction_nd_s0_rw.cpp.tmp.out returned non-zero (i.e. non-successful) error code - -6.

error: command failed with exit status: -6

@Pennycook
Copy link
Contributor Author

Thanks, @bader. I saw the warning right before that error and assumed they were related, but I see the mistake now. I'll try and work out why reduction_nd_s0_rw.cpp is failing.

@Pennycook
Copy link
Contributor Author

@bader: I can't reproduce the failure locally, and when I triggered a rebuild it passed.

@bader
Copy link
Contributor

bader commented Oct 12, 2020

I can't reproduce the failure locally, and when I triggered a rebuild it passed.

Sounds like either corrupted environment on BuildBot machine or sporadic issue. I'll be watching for new reduction_nd_s0_rw.cpp failures.

@Pennycook
Copy link
Contributor Author

@AlexeySachkov, @rbegam: Now that the tests are all passing, I think this is ready for review.

@@ -216,7 +213,7 @@ void check_struct(queue &Queue, Generator &Gen, size_t G = 256, size_t L = 64) {

int main() {
queue Queue;
if (!Queue.get_device().has_extension("cl_intel_subgroups")) {
if (Queue.get_device().is_host()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this patch doesn't bring shuffles support to all non-host devices and "cl_intel_subgroups" extension is still required, but only for non-CUDA devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I wasn't sure what to do here, really -- the fact that "cl_intel_subgroups" is required is backend-specific.

@gmlueck: Do we have the necessary infrastructure implemented to query whether a particular extension from sycl/docs/extensions is supported for a given device?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are thinking of adding a new "aspect" and then checking it like:

if (Queue.get_device().has(aspect::ext_intel_has_subgroups)) {

Correct?

@glyons-intel has added support for some aspects already, so it probably wouldn't be too hard to add another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gmlueck. That's the sort of thing I was thinking of. But as you noted elsewhere, we really have two options:

  1. Define a new SYCL aspect describing devices that implement the sub-group extension
  2. Commit to the path that all SYCL devices are intended to implement sub-groups

Option 1) addresses @AlexeySachkov's concern about whether the tests are accurately reflecting current implementation. But I think I prefer 2) as the correct long-term direction. Sub-groups aren't an optional feature in SYCL 2020 provisional, and we don't expect users to have to check any sort of extension before using them.

If we go with 1), we should probably define aspects for all of our device-specific extensions and update all the tests accordingly. If we go with 2), we should phase out our use of __spirv_SubgroupShuffleINTEL in favor of the standard __spirv_OpGroupNonUniformShuffle.

Both of these seem fairly big jobs and outside of the scope of this PR. @AlexeySachkov, are you okay to defer the resolution here until a future PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on the group algorithms, I also stumbled across this check: https://github.com/intel/llvm/blob/sycl/sycl/test/group-algorithm/reduce.cpp#L67. I'd be okay with implementing something similar here in the short-term, if that's preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeySachkov, are you okay to defer the resolution here until a future PR?

Sure, no objections

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks good to me. However, I do have concern about test changes: this is probably not critical, but might create inconveniences if someone launched tests on low-level runtimes other than we have in CI.

@romanovvlad romanovvlad merged commit f189e41 into intel:sycl Oct 15, 2020
@Pennycook Pennycook deleted the cuda-sub-groups branch January 28, 2021 18:23
kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end enhancement New feature or request spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants