-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][CUDA] Add sub-group shuffles #2623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sub-group shuffles map to one of the following intrinsics: - __nvvm_shfl_sync_idx_i32 - __nvvm_shfl_sync_up_i32 - __nvvm_shfl_sync_down_i32 - __nvvm_shfl_sync_xor_i32 Implemented in the SYCL headers instead of libclc for two reasons: 1) The SPIR-V implementation uses an extension (__spirv_SubgroupShuffleINTEL) 2) We currently need to use enable_if to generate different instruction sequences for some types, and these cases differ between SPIR-V/PTX. Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
@bader: Is this failing because of the warnings, or something else? The warnings don't seem to be related to any of the changes here. |
LIT tests do not fail because of warnings - warnings are ignored. According to the log, reduction_nd_s0_rw.cpp.tmp.out returned non-zero (i.e. non-successful) error code -
|
Thanks, @bader. I saw the warning right before that error and assumed they were related, but I see the mistake now. I'll try and work out why reduction_nd_s0_rw.cpp is failing. |
@bader: I can't reproduce the failure locally, and when I triggered a rebuild it passed. |
Sounds like either corrupted environment on BuildBot machine or sporadic issue. I'll be watching for new |
@AlexeySachkov, @rbegam: Now that the tests are all passing, I think this is ready for review. |
@@ -216,7 +213,7 @@ void check_struct(queue &Queue, Generator &Gen, size_t G = 256, size_t L = 64) { | |||
|
|||
int main() { | |||
queue Queue; | |||
if (!Queue.get_device().has_extension("cl_intel_subgroups")) { | |||
if (Queue.get_device().is_host()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, this patch doesn't bring shuffles support to all non-host devices and "cl_intel_subgroups" extension is still required, but only for non-CUDA devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I wasn't sure what to do here, really -- the fact that "cl_intel_subgroups" is required is backend-specific.
@gmlueck: Do we have the necessary infrastructure implemented to query whether a particular extension from sycl/docs/extensions is supported for a given device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are thinking of adding a new "aspect" and then checking it like:
if (Queue.get_device().has(aspect::ext_intel_has_subgroups)) {
Correct?
@glyons-intel has added support for some aspects already, so it probably wouldn't be too hard to add another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @gmlueck. That's the sort of thing I was thinking of. But as you noted elsewhere, we really have two options:
- Define a new SYCL aspect describing devices that implement the sub-group extension
- Commit to the path that all SYCL devices are intended to implement sub-groups
Option 1) addresses @AlexeySachkov's concern about whether the tests are accurately reflecting current implementation. But I think I prefer 2) as the correct long-term direction. Sub-groups aren't an optional feature in SYCL 2020 provisional, and we don't expect users to have to check any sort of extension before using them.
If we go with 1), we should probably define aspects for all of our device-specific extensions and update all the tests accordingly. If we go with 2), we should phase out our use of __spirv_SubgroupShuffleINTEL
in favor of the standard __spirv_OpGroupNonUniformShuffle
.
Both of these seem fairly big jobs and outside of the scope of this PR. @AlexeySachkov, are you okay to defer the resolution here until a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on the group algorithms, I also stumbled across this check: https://github.com/intel/llvm/blob/sycl/sycl/test/group-algorithm/reduce.cpp#L67. I'd be okay with implementing something similar here in the short-term, if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexeySachkov, are you okay to defer the resolution here until a future PR?
Sure, no objections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself looks good to me. However, I do have concern about test changes: this is probably not critical, but might create inconveniences if someone launched tests on low-level runtimes other than we have in CI.
Document CTS fixtures and macros.
Document CTS fixtures and macros.
Sub-group shuffles map to one of the following intrinsics:
Implemented in the SYCL headers instead of libclc for two reasons:
sequences for some types, and these cases differ between SPIR-V/PTX.
Signed-off-by: John Pennycook [email protected]