-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][CUDA][libclc] Added atomics with scopes and memory orders #4820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ers in libclc for CUDA.
# Conflicts: # libclc/ptx-nvidiacl/libspirv/SPV_EXT_shader_atomic_float_add/atomicfaddext.cl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a huge pull request, but I think for these changes it make sense they are reviewed together.
Okay to make reviewing process easier it better to split changes into multiple commits. For instance, we are adding support for 10 atomic operations, so one way to split the patch is create a separate commit for each operation. It should be much easier to understand what's going on for one operation, so that once first commit in pull request is reviewed others should be easier to review. Please, consider such approach for future pull requests.
For this one, I suggest uploading clang and llvm project changes (i.e. non-libclc) to reviews.llvm.org to get feedback from NVPTX target maintainers.
Well I originally had multiple commits, but they were a bit of a mess, since I am learning a lot of this at the same time. I don't really think splitting this by operation makes sense, as a lot of code is common for all operations. |
This reverts part of commit 7d29406.
Co-authored-by: Alexey Bader <[email protected]>
/summary:run |
…tics.ll as suggested by review comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks way better. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but I think it's good overall. That said, I agree with @bader that everything in clang/* and llvm/* should be reviewed and merged upstream (see https://llvm.org/docs/Contributing.html).
TARGET_BUILTIN(__nvvm_atom_cta_acq_rel_cas_gen_ll, "LLiLLiD*LLiLLi", "n", SM_70) | ||
TARGET_BUILTIN(__nvvm_atom_sys_acq_rel_cas_gen_ll, "LLiLLiD*LLiLLi", "n", SM_70) | ||
|
||
BUILTIN(__nvvm_atom_add_global_i, "iiD*i", "n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot help but wonder if the builtins could select the right intrinsics without the space explicitly stated in the builtin's name, i.e. inferring it from the pointer's address space. Or maybe even as late as instruction selection?
@Naghasan - Do you have insight into whether or not that is possible?
I requested a review, but forgot to post a link to it. It is here: https://reviews.llvm.org/D112718 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this, so I cannot comment on the functionality. FE changes LGTM otherwise. Please wait for review from someone more familiar with this before merge. If this is merged to llvm-project instead, please apply review comments to patch uploaded there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE Changes look good to me.
@t4c1, it looks like the build is broken. Could you take a look at the failures here: http://ci.llvm.intel.com:8010/#builders/37/builds/14729, please? |
Thank you for the reminder. I forgot to make changes in libclc after modifying the builtins. Should be fixed now. |
/summary:run |
) Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones. This adds tests for changes introduced in intel/llvm#4820 and intel/llvm#5192.
…ntel/llvm-test-suite#534) Added tests for atomics with various memory orders and scopes. Reductions tests also have updated sm requirements, as they call work group atomics, which are now implemented and have higher sm requirements than device scoped ones. This adds tests for changes introduced in intel#4820 and intel#5192.
Added libclc implementations for CUDA atomics, including for various scopes and memory orders. They are implemented using LLVM intrinsics and exposed as clang builtins, which are than used to implement functions in libclc.
Sorry for a huge pull request, but I think for these changes it make sense they are reviewed together.
EDIT: llvm-test-suite PR with tests for this is intel/llvm-test-suite#534