Skip to content

[SYCL] Do not attach reqd_work_group_size info when multiple are detected #13523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions llvm/lib/SYCLLowerIR/SYCLDeviceRequirements.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ static llvm::StringRef ExtractStringFromMDNodeOperand(const MDNode *N,
SYCLDeviceRequirements
llvm::computeDeviceRequirements(const module_split::ModuleDesc &MD) {
SYCLDeviceRequirements Reqs;
bool MultipleReqdWGSize = false;
// Process all functions in the module
for (const Function &F : MD.getModule()) {
if (auto *MDN = F.getMetadata("sycl_used_aspects")) {
Expand All @@ -64,6 +65,8 @@ llvm::computeDeviceRequirements(const module_split::ModuleDesc &MD) {
ExtractUnsignedIntegerFromMDNodeOperand(MDN, I));
if (!Reqs.ReqdWorkGroupSize.has_value())
Reqs.ReqdWorkGroupSize = NewReqdWorkGroupSize;
if (Reqs.ReqdWorkGroupSize != NewReqdWorkGroupSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 61 can we add a check for !MultipleReqdWGSize? There is no point in checking again if we already know multiple WG sizes are required.

Also can we call Reqs.ReqdWorkGroupSize.reset() after line 69 to keep all the code related to ReqdWorkGroupSize together?

MultipleReqdWGSize = true;
}

if (auto *MDN = F.getMetadata("sycl_joint_matrix")) {
Expand Down Expand Up @@ -99,6 +102,14 @@ llvm::computeDeviceRequirements(const module_split::ModuleDesc &MD) {
assert(*Reqs.SubGroupSize == static_cast<uint32_t>(MDValue));
}
}

// Usually, we would only expect one ReqdWGSize, as the module passed to
// this function would be split according to that. However, when splitting
// is disabled, this cannot be guaranteed. In this case, we reset the value,
// which makes so that no value is reqd_work_group_size data is attached in
// in the device image.
if (MultipleReqdWGSize)
Reqs.ReqdWorkGroupSize.reset();
Comment on lines +111 to +112
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment explaining this. My understanding is that strictly speaking, we expect only one value of reqd_work_group_size metadata because of per-optional-kernel-feature device code split that had supposed to happen before.

However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize is false, but I'm not sure if we have access to that knowledge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize is false, but I'm not sure if we have access to that knowledge here.

I agree, but yea, the device code split mode information is not present in this function. If we want to go that far, I think it makes sense to add it as a parameter to the function.

return Reqs;
}

Expand Down
28 changes: 28 additions & 0 deletions sycl/test-e2e/Regression/no-split-reqd-wg-size.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// This test checks that with -fsycl-device-code-split=off, kernels
// with different reqd_work_group_size dimensions can be launched.

// RUN: %{build} -fsycl -fsycl-device-code-split=off -o %t.out
// RUN: %{run} %t.out

// UNSUPPORTED: hip

#include <sycl/detail/core.hpp>

using namespace sycl;

#define TEST(...) \
{ \
range globalRange(__VA_ARGS__); \
range localRange(__VA_ARGS__); \
nd_range NDRange(globalRange, localRange); \
q.parallel_for(NDRange, \
[=](auto) [[sycl::reqd_work_group_size(__VA_ARGS__)]] {}); \
}

int main(int argc, char **argv) {
queue q;
TEST(4);
TEST(4, 5);
TEST(4, 5, 6);
return 0;
}