-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Do not attach reqd_work_group_size info when multiple are detected #13523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
q.submit([&](handler &cgh) { | ||
cgh.parallel_for<class testNDRange>( | ||
NDRange, | ||
[=](nd_item<2> it) [[sycl::reqd_work_group_size(WGSIZE, WGSIZE)]] {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly, but can sycl::reqd_work_group_size(WGSIZE, WGSIZE)
and sycl::reqd_work_group_size(WGSIZE)
be moved to some template or function parameter (+ move dimension to the template parameter), to not duplicate the same code in both kernel_launch_*
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it seems attributes cannot accept parameter packs, so the best we can do is the macro version:
#define TEST(...) \
{ \
range globalRange(__VA_ARGS__); \
range localRange(__VA_ARGS__); \
nd_range NDRange(globalRange, localRange); \
q.parallel_for(NDRange, \
[=](auto) [[sycl::reqd_work_group_size(__VA_ARGS__)]] {}); \
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it seems attributes cannot accept parameter packs
We should create a tracker for that, because it is possible to add such support and we have done it already for other attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue: #13686
if (MultipleReqdWGSize) | ||
Reqs.ReqdWorkGroupSize.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a comment explaining this. My understanding is that strictly speaking, we expect only one value of reqd_work_group_size
metadata because of per-optional-kernel-feature device code split that had supposed to happen before.
However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size
requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize
is false
, but I'm not sure if we have access to that knowledge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize is false, but I'm not sure if we have access to that knowledge here.
I agree, but yea, the device code split mode information is not present in this function. If we want to go that far, I think it makes sense to add it as a parameter to the function.
@@ -64,6 +65,8 @@ llvm::computeDeviceRequirements(const module_split::ModuleDesc &MD) { | |||
ExtractUnsignedIntegerFromMDNodeOperand(MDN, I)); | |||
if (!Reqs.ReqdWorkGroupSize.has_value()) | |||
Reqs.ReqdWorkGroupSize = NewReqdWorkGroupSize; | |||
if (Reqs.ReqdWorkGroupSize != NewReqdWorkGroupSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 61 can we add a check for !MultipleReqdWGSize? There is no point in checking again if we already know multiple WG sizes are required.
Also can we call Reqs.ReqdWorkGroupSize.reset() after line 69 to keep all the code related to ReqdWorkGroupSize together?
Are we allowed to discard a requirement just because they are contradictory? From https://intel.github.io/llvm-docs/design/OptionalDeviceFeatures.html:
We can't honor this requirement if we discard reqd_work_group_size. |
The situation when we discard that metadata and therefore lose ability to emit that error can only happen when a user explicitly specifies non-standard Essentially this is a trade-off between user experience and being conformant. The problem with user experience we had is that we also have a check that local size passed to |
…16236) There was a bug (#13523) where a kernel couldn't be launched when `-fsycl-device-code-split=off` was used and multiple kernels with different required work group sizes were present. This issue was fixed by ensuring that the required work group size metadata is not attached to the device image when multiple required work group sizes are detected in a single module. However, there was a similar but related case that was not fixed by that PR, which is now demonstrated in the new test no-split-reqd-wg-size-2.cpp. This issue occurs when there is a single kernel with a required work group size and another kernel without one. In this case, the module doesn't contain multiple required work group sizes, so the required work group size metadata is still attached. As a result of the metadata being attached, the runtime cannot launch the kernel without a required work group size. This PR removes the logic of ensuring metadata is not attached when there are multiple required work group sizes, and instead adds logic that ensures the metadata is not attached when the split mode is `SPLIT_NONE`. This covers the old cases from the previous PR and the new case in this PR.
No description provided.