Skip to content

[SYCL] Do not attach reqd_work_group_size info when multiple are detected #13523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2024

Conversation

jzc
Copy link
Contributor

@jzc jzc commented Apr 22, 2024

No description provided.

@jzc jzc requested review from a team as code owners April 22, 2024 20:09
@jzc jzc requested a review from dm-vodopyanov April 22, 2024 20:09
@jzc jzc temporarily deployed to WindowsCILock April 23, 2024 14:54 — with GitHub Actions Inactive
q.submit([&](handler &cgh) {
cgh.parallel_for<class testNDRange>(
NDRange,
[=](nd_item<2> it) [[sycl::reqd_work_group_size(WGSIZE, WGSIZE)]] {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly, but can sycl::reqd_work_group_size(WGSIZE, WGSIZE) and sycl::reqd_work_group_size(WGSIZE) be moved to some template or function parameter (+ move dimension to the template parameter), to not duplicate the same code in both kernel_launch_* functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it seems attributes cannot accept parameter packs, so the best we can do is the macro version:

#define TEST(...)                                                              \
  {                                                                            \
    range globalRange(__VA_ARGS__);                                            \
    range localRange(__VA_ARGS__);                                             \
    nd_range NDRange(globalRange, localRange);                                 \
    q.parallel_for(NDRange,                                                    \
                   [=](auto) [[sycl::reqd_work_group_size(__VA_ARGS__)]] {});  \
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it seems attributes cannot accept parameter packs

We should create a tracker for that, because it is possible to add such support and we have done it already for other attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue: #13686

Comment on lines +106 to +107
if (MultipleReqdWGSize)
Reqs.ReqdWorkGroupSize.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment explaining this. My understanding is that strictly speaking, we expect only one value of reqd_work_group_size metadata because of per-optional-kernel-feature device code split that had supposed to happen before.

However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize is false, but I'm not sure if we have access to that knowledge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is an exception when device code split is disabled, which causes kernels with different reqd_work_group_size requirements to be bundled together. I think that ideally we want to assert here that device code split is disabled and that otherwise MultipleReqdWGSize is false, but I'm not sure if we have access to that knowledge here.

I agree, but yea, the device code split mode information is not present in this function. If we want to go that far, I think it makes sense to add it as a parameter to the function.

@jzc jzc temporarily deployed to WindowsCILock April 23, 2024 15:40 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock April 24, 2024 14:06 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock April 24, 2024 14:45 — with GitHub Actions Inactive
@@ -64,6 +65,8 @@ llvm::computeDeviceRequirements(const module_split::ModuleDesc &MD) {
ExtractUnsignedIntegerFromMDNodeOperand(MDN, I));
if (!Reqs.ReqdWorkGroupSize.has_value())
Reqs.ReqdWorkGroupSize = NewReqdWorkGroupSize;
if (Reqs.ReqdWorkGroupSize != NewReqdWorkGroupSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 61 can we add a check for !MultipleReqdWGSize? There is no point in checking again if we already know multiple WG sizes are required.

Also can we call Reqs.ReqdWorkGroupSize.reset() after line 69 to keep all the code related to ReqdWorkGroupSize together?

@LU-JOHN
Copy link
Contributor

LU-JOHN commented May 7, 2024

Are we allowed to discard a requirement just because they are contradictory? From https://intel.github.io/llvm-docs/design/OptionalDeviceFeatures.html:

For a kernel that is decorated with the [[sycl::reqd_work_group_size(W)]] or [[sycl::reqd_sub_group_size(S)]] attribute, the exception must be thrown if the device does not support the work group size W or the sub-group size S.

We can't honor this requirement if we discard reqd_work_group_size.

@AlexeySachkov
Copy link
Contributor

Are we allowed to discard a requirement just because they are contradictory? From https://intel.github.io/llvm-docs/design/OptionalDeviceFeatures.html:

For a kernel that is decorated with the [[sycl::reqd_work_group_size(W)]] or [[sycl::reqd_sub_group_size(S)]] attribute, the exception must be thrown if the device does not support the work group size W or the sub-group size S.

We can't honor this requirement if we discard reqd_work_group_size.

The situation when we discard that metadata and therefore lose ability to emit that error can only happen when a user explicitly specifies non-standard -fsycl-device-code-split=off. We have not claimed to be fully conformant with the SYCL specification with that flag.

Essentially this is a trade-off between user experience and being conformant. The problem with user experience we had is that we also have a check that local size passed to parallel_for is the same as what is attached as an attribute to a kernel. Since we record an attribute on a per-device-image basis assuming that it is the same for all kernels, this caused false alarms, fully preventing users from launching any kernels. Disabled device code split path is essentially a default for FPGA devices and therefore we decided to go this way.

@dm-vodopyanov dm-vodopyanov merged commit 6934bcf into intel:sycl May 21, 2024
AlexeySachkov pushed a commit that referenced this pull request Dec 6, 2024
…16236)

There was a bug (#13523) where a
kernel couldn't be launched when `-fsycl-device-code-split=off` was used
and multiple kernels with different required work group sizes were
present. This issue was fixed by ensuring that the required work group
size metadata is not attached to the device image when multiple required
work group sizes are detected in a single module.

However, there was a similar but related case that was not fixed by that
PR, which is now demonstrated in the new test
no-split-reqd-wg-size-2.cpp. This issue occurs when there is a single
kernel with a required work group size and another kernel without one.
In this case, the module doesn't contain multiple required work group
sizes, so the required work group size metadata is still attached. As a
result of the metadata being attached, the runtime cannot launch the
kernel without a required work group size.

This PR removes the logic of ensuring metadata is not attached when
there are multiple required work group sizes, and instead adds logic
that ensures the metadata is not attached when the split mode is
`SPLIT_NONE`. This covers the old cases from the previous PR and the new
case in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants