-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][COMPAT] Add new file group_utils.hpp #17263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution.
Please in general can you provide a commit message here #17263 (comment) which will aid reviews and provide git history for future development.
An appropriate message will depend on the content of the contribution.
For example in this particular PR I think it would be good to explain to reviewers:
- What motivates this contribution in general
- Is this targeting a specific device: It looks like all of these algorithms are completely shared memory based, whereas for most modern GPUs such group algorithms have optimal implementations involving subgroup level shuffles, irrespective of whether the group algorithms are in subgroup or workgroup scope (For example see the implementations beginning in https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/group_algorithm.hpp). Presumably the target device is one which does not support shuffles natively? If not, why aren't shuffles used?
Also, all new functionality to intel/llvm requires in repo testing. For this case the appropriate location to add a test is sycl/test-e2e/syclcompat. Please can you add an appropriate test for these algorithms.
The above applies generally to #17264
Hi @JackAKirk, thanks for review. Helper functions in this file is used to migrate CUB functions. The motivation of this PR is to reduce the gap between syclcompat namespace and dpct namespace. The shared memory based implementation is initial version that ensure there are functionality mapping in sycl side can be used for migration. Also, the e2e test is added. |
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
/// \param group_suffix The suffix of the group after the shuffle. | ||
template <int ElementsPerWorkItem, typename ItemT> | ||
__syclcompat_inline__ void | ||
shuffle_right(const ItemT &item, T (&input)[ElementsPerWorkItem], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this e.g. maps to
llvm/sycl/include/sycl/group_algorithm.hpp
Line 552 in 68cacbf
shift_group_right(Group g, T x, typename Group::linear_id_type delta = 1) { |
I won't go through all the cases. But I think most of these functions are already implemented in a better way that doesn't just rely on shared memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
I see thanks. |
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
Signed-off-by: intwanghao <[email protected]>
Hi, @JackAKirk. Could you please help to review again? I have addressed the comments. |
Thanks, this just needs docs added to https://github.com/intel/llvm/blob/sycl/sycl/doc/syclcompat/README.md then it should be good to go. |
Signed-off-by: intwanghao <[email protected]>
All passing, ready to merge. @intel/llvm-gatekeepers |
Helper functions in this file is used to migrate CUB functions. The motivation of this PR is to reduce the gap between syclcompat namespace and dpct namespace.