Skip to content

[SYCL] UR_KERNEL_SUB_GROUP_INFO_SUB_GROUP_SIZE_INTEL on Cuda and HIP #17137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 19, 2025

Conversation

jchlanda
Copy link
Contributor

For HIP the value of sub group size can either be 32 or 64, it can be retrieved from intel_reqd_sub_group_size metadata node.

Cuda only supports 32, which is enforced in the compiler, see SemaSYCL::addIntelReqdSubGroupSizeAttr.

@jchlanda jchlanda requested review from a team as code owners February 24, 2025 14:07
@jchlanda
Copy link
Contributor Author

Fixes: #14357

Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR LGTM

return ReturnValue(0);
// The only supported value of required sub-group size for CUDA devices is
// 32.
return ReturnValue(32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, the CUDA binaries will also have the same metadata as HIP, just we're choosing to hard-code 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are righty, decorating a kernel with sycl::reqd_sub_group_size(SG_SIZE) results in it having !intel_reqd_sub_group_size !14 attached. For Cuda, the value of that node can only ever be 32.

I'm in two minds about hard coding that value, but copy-pasting the code handling the metadata in question for it only ever generating 32 doesn't seem great either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I lean ever so slightly towards not hard coding it, but it's a toss up. If we could more programmatically share the 32 between the compiler and UR then that would also be an option. But I'm not a huge fan of individually held assumptions.

I think what would help would be sharing more code between HIP and CUDA. Right now they're often just copy/pastes of each other. There's nothing CUDA or HIP-specific about fetching and interpreting metadata from the program, for instance. And I suppose another argument for not hard-coding it would be that it makes any eventual refactor for code sharing more trivial as the copy/paste becomes obvious: it'd be in the identical metadata code, not in the value retrieval code.

Anyway these are ultimately just ideas, I don't know if it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a GPU-adapter that would abstract away the duplication would be nice. You make a good point about providing the implementation to make it easier for future re-factor. Will extend the CUDA path as well.

@@ -39,6 +39,7 @@ struct ur_program_handle_t_ {
std::unordered_map<std::string, std::string> GlobalIDMD;
std::unordered_map<std::string, std::tuple<uint32_t, uint32_t, uint32_t>>
KernelReqdWorkGroupSizeMD;
std::unordered_map<std::string, uint32_t> KernelReqdSubGroupSizeMD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I changed the program metadata stuff I was wondering if all these separate maps for each program metadata item are the best idea, in terms of memory usage, cache efficiency, access times, etc.

Would std::unordered_map<std::string, struct KernelMetadata> bring any benefits, do we think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An honest answer is that I don't know and without micro benchmarking it, it's impossible to answer this.
I'm not too keen on the idea of bundling all possible metadata into one struct and storing that as per kernel name value. Intuitively, I'd say most of the time that struct would be storing 0-initialised bytes. I'm tempted to leave it as it is.

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of food for thought but it's not essential

@jchlanda
Copy link
Contributor Author

jchlanda commented Mar 5, 2025

Friendly ping: @uditagarwal97 @intel/llvm-reviewers-runtime

@jchlanda jchlanda force-pushed the jakub/info_sub_group_size_intel branch from 14aa75c to 5c99321 Compare March 6, 2025 06:38
jchlanda added 3 commits March 6, 2025 06:59
For HIP the value of sub group size can either be 32 or 64, it can be
retrieved from `intel_reqd_sub_group_size` metadata node.

Cuda only supports 32, which is enforced in the compiler, see
[SemaSYCL::addIntelReqdSubGroupSizeAttr](https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCLDeclAttr.cpp#L828).
@jchlanda jchlanda force-pushed the jakub/info_sub_group_size_intel branch from 5c99321 to 4b4fc14 Compare March 6, 2025 11:59
@jchlanda
Copy link
Contributor Author

Friendly ping @npmiller, @AlexeySachkov.

@jchlanda
Copy link
Contributor Author

@AlexeySachkov are you happy with the changes in the patch?
Thank you.

@jchlanda
Copy link
Contributor Author

@intel/llvm-gatekeepers I think this should be ready to land.

@sommerlukas
Copy link
Contributor

@jchlanda Github UI still says that merge is blocked due to missing review from @intel/dpcpp-tools-reviewers, so the change to code owners might only take effect for new PRs.

@intel/dpcpp-tools-reviewers can we get a quick review on this one?

Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools part LGTM.

@npmiller
Copy link
Contributor

@intel/llvm-gatekeepers this should be ready to merge now

@uditagarwal97
Copy link
Contributor

@intel/llvm-gatekeepers this should be ready to merge now

CI test results are 2 weeks old. Can you please rebase and re-run the CI?

@jchlanda
Copy link
Contributor Author

tools part LGTM.

@maksimsab while you're here could I kindly ask you to have a look a this patch as well. It's missing a tools review and I didn't have much luck with pinging.
Thank you.

@npmiller
Copy link
Contributor

@intel/llvm-gatekeepers the fresh CI run passed, it should be good to merge now

@sommerlukas sommerlukas merged commit 0d5266b into intel:sycl Mar 19, 2025
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants