Skip to content

[SYCL] Change unique-stable-name to just use the lambda's mangled name. #4457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 3, 2021

Conversation

erichkeane
Copy link
Contributor

Currently, whether a lambda is used in a kernel is used to determine the
result of __builtin_sycl_unique_stable_name.This leads to a sitaution
where using a lambda in a kernel that is lexically higher than an
existing kernel will change the result of the builtin, causing a build
failure.

This patch replaces this with JUST the lambda's itanium-mangled name.
This has the limitation in that it is not stable/unique in light of
certain macro-expansions that differ between device and host, however
wording is being put into the SYCL standard to make those cases UB.

This DOES have one bit of fallout, which is to make sure that the name
on a Windows host does NOT have its lambda mangling 'number', which we
are doing by storing the value in the Device Lamba Mangling Number
(previously done for CUDA only for similar reasons!).

Additionally, this patch removes the 'mark_kernel_name' builtin, as this
results in it being a no-op.

Currently, whether a lambda is used in a kernel is used to determine the
result of `__builtin_sycl_unique_stable_name`.This leads to a sitaution
where using a lambda in a kernel that is lexically higher than an
existing kernel will change the result of the builtin, causing a build
failure.

This patch replaces this with JUST the lambda's itanium-mangled name.
This has the limitation in that it is not stable/unique in light of
certain macro-expansions that differ between device and host, however
wording is being put into the SYCL standard to make those cases UB.

This DOES have one bit of fallout, which is to make sure that the name
on a Windows host does NOT have its lambda mangling 'number', which we
are doing by storing the value in the Device Lamba Mangling Number
(previously done for CUDA only for similar reasons!).

Additionally, this patch removes the 'mark_kernel_name' builtin, as this
results in it being a no-op.
@erichkeane
Copy link
Contributor Author

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just minor nits.

Erich Keane added 4 commits September 2, 2021 07:41
The Unique-stable-** now make sure to use 'device mangling number'
always all of the time, rather than counting on Itanium to always set it
to 0, which would result in us using the 'lambda number'.  This patch
instead changes our ItaniumCXXABI to always set the device mangling
number to match the lambda mangling number, so we can always use the
device mangling number if it exists.

This patch also fixes all of the review comments so far.
…_name

minor conflict in : clang/test/CodeGenSYCL/unique_stable_name_windows_diff.cpp
dm-vodopyanov
dm-vodopyanov previously approved these changes Sep 2, 2021
Copy link
Contributor

@dm-vodopyanov dm-vodopyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/ changes LGTM

? *DeviceNumber
: Lambda->getLambdaManglingNumber();
unsigned Number =
DeviceNumber ? *DeviceNumber : Lambda->getLambdaManglingNumber();

assert(Number > 0 && "Lambda should be mangled as an unnamed class");
if (Number > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for 3 here? Would it not conflict with 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because '3' would mangle the number '1', '1' would mangle 'nothing'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, so there is {"unnamed class", nothing, 0,1,...} for {0, 1, 2, 3, ...} respectively. Did I get that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly. Isn't Itanium fun :)?

premanandrao
premanandrao previously approved these changes Sep 2, 2021
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

AaronBallman
AaronBallman previously approved these changes Sep 2, 2021
sycl::queue q;
q.parallel_for(10, [](auto i) {});
q.parallel_for(10, w);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there would be value in adding a test that uses macros in a problematic way to demonstrate that we don't assert/crash/do bad things? Or do you think we have sufficient coverage for that already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see value in seeing if the compiler would crash/etc. But we DO know that the runtime behavior of those programs would be bad. I have no idea how to test those situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to see if we do crash; I want to prove that we don't crash in a known-problematic situation.

Perhaps instead of a FE test, it'd make sense for a SYCL end-to-end test? I don't want to codify "this is the behavior you can rely on when you do this stuff", but I do think there's some benefit to showing "we know this code pattern is a sharp edge and we explicitly don't care what happens so long as 'compiler crash' isn't the behavior."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The expected outcome is that the 'wrong' kernel would be called, or the runtime would fail (depending on how this is configured). We're intentionally making this "UB" in the standard to cover both of those cases.

Would it be ok if @AlexeySachkov or @romanovvlad could write a test in a separate patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be totally fine to add a test in a separate review; no need to block this review on it. I agree on the expected outcome -- the value I was seeing in the test is showing that we don't care about the runtime behavior (you get what you get), but that we don't expect the FE to explode on it. However, if others don't think that's a particularly valuable test, I don't insist on one being added at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes a lot of sense to test UB. AFAIK users do not look at our tests to see what is code is legal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal isn't to "test UB", it's to ensure our product doesn't crash just because of UB. We added new assertions, we changed old ones, and we did it on a fast-turnaround patch on something we're now on our third iteration of due to issues discovered in the field. As I said, I don't insist on the test, but hopefully we're at least all talking about the same thing being tested when we make that decision.

This is a test for unnamed kernels, so if we don't have unnamed kernels,
it isn't meaningful.
@erichkeane erichkeane dismissed stale reviews from AaronBallman and premanandrao via 4eaeb32 September 2, 2021 18:38
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/ changes LGTM

@erichkeane
Copy link
Contributor Author

@ping all! Looks like I need a CFE and a intel/llvm-reviews-runtime reviewer. All but the one bot have finished, so just waiting on that one now.

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@AaronBallman
Copy link
Contributor

@ping all! Looks like I need a CFE and a intel/llvm-reviews-runtime reviewer. All but the one bot have finished, so just waiting on that one now.

Oops, it seemed my previous approval dropped. I re-approved for CFE.

@erichkeane
Copy link
Contributor Author

@dm-vodopyanov : Looks like I need a re-+1 from you or someone else from intel/llvm-reviews-runtime when you get a chance. Thanks!

@erichkeane
Copy link
Contributor Author

@bader : The failure is another one of the 'out of disk space errors': /localdisk2/sycl_ci/buildbot/worker/Lit_With_Cuda/llvm.src/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:8989:1: fatal error: error writing to /tmp/ccuEHlOL.s: No space left on device

Looks like it has been failed for hours. I've submitted a ticket internally to hopefully get this taken care of, but I thought you might have some ideas on who to ping as well.

Thanks!

@bader
Copy link
Contributor

bader commented Sep 3, 2021

Looks like it has been failed for hours. I've submitted a ticket internally to hopefully get this taken care of, but I thought you might have some ideas on who to ping as well.

AFAIK, @tfzhu is the contact point for CI system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants