Skip to content

[SYCL] Fix alignment of emulated specialization constants #6132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 19, 2022

Conversation

npmiller
Copy link
Contributor

This patch solves #6093, and finishes to fix #5911

It doesn't change anything for native specialization constants, but correctly aligns emulated specialization constants based on type requirements.

Emulated specialization constant don't use the CompositeOffset mechanism, so this patch re-uses this field to communicate the necessary padding from the compiler pass in sycl-post-link to the runtime to ensure correct alignment.

With this patch the SYCL-CTS specialization constant tests are all passing with the CUDA plugin:

% ./bin/test_specialization_constants  
===============================================================================
All tests passed (56 assertions in 46 test cases)

Note this is on top of #6125

@npmiller npmiller requested review from a team as code owners May 10, 2022 09:53
@aelovikov-intel
Copy link
Contributor

It seems that some changes are done in sync in both sycl-post-link and inside SYCL runtime. What would be the behavior if we'd use the compilation toolchain (including sycl-post-link) before this change and runtime after it? Is it correct to assume that nothing that wouldn't be failing with old runtime would fail in this scenario?

@npmiller
Copy link
Contributor Author

It seems that some changes are done in sync in both sycl-post-link and inside SYCL runtime. What would be the behavior if we'd use the compilation toolchain (including sycl-post-link) before this change and runtime after it? Is it correct to assume that nothing that wouldn't be failing with old runtime would fail in this scenario?

I believe that would work fine yes.

This patch doesn't change the format of how the specialization constants are communicated between the compilation and the runtime, it simply re-uses a field that's unused for emulated spec constants to communicate the padding, and that field was previously always set to 0 for emulated specialization constants.

So the new runtime using an old binary will read that field and add it as padding but since it's always 0 it will end up with the exact same layout as the old runtime.

NativePrg, SpecIDDesc.ID, SpecIDDesc.Size,
SpecConsts.data() + SpecIDDesc.BlobOffset);
}
if (!DeviceCodeWasInCache &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any changes (other than some format changes) here. Can we do away with this change? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a change, the whole block is now conditioned on InputImpl->get_bin_image_ref()->supportsSpecConstants() when before it was only the call to enableITTAnnotationsIfNeeded, this is necessary because piextProgramSetSpecializationConstant is only supposed to be called for native specialization constants.

It's a little hard to see because of the way it was written originally, "one-line" if with no brackets followed by a scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this is part of #6125 , maybe I should fuse the two MRs.

if (It[0] != std::numeric_limits<std::uint32_t>::max()) {
// The map is not locked here because updateSpecConstSymMap() is
// only supposed to be called from c'tor.
MSpecConstSymMap[std::string{SCName}].push_back(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call can be moved out of the if-then-else (nit pick). Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted it and also re-worked this part of the code so it's more readable, mainly re-named the iterators.

@npmiller
Copy link
Contributor Author

Rebased since #6125 was merged, this now only contains the changes relating to the alignment issue.

// node of the previous spec constant to append a padding node. It
// can't be added in front of the current spec constant, as doing
// so would require the spec constant node to have a non-zero
// CompositeOffset which breaks accessing it in the runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What part of the runtime makes this restriction? I am not opposed to appending it to the previous entry, but it would be a little cleaner to have it attached to the entry forcing the alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's two metadata nodes and the second one has a CompositeOffset of 0, the OffsetFromLast here goes negative and it causes issues:

And if we set the CompositeOffset correctly, then reading/writing the data won't work properly for emulated spec constants as it offsets the input data with it, see here:

So appending the padding works because it can have a CompositeOffset as it's not accessed, but putting it in front doesn't because then the data node needs a non-null CompositeOffset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Thanks for the explanation. I will leave this unresolved to highlight it.

steffenlarsen
steffenlarsen previously approved these changes May 17, 2022
// Ensure correct alignment
if (CurrentOffset % Align != 0) {
// Compute necessary padding to correctly align the constant.
Padding = Size - CurrentOffset % Align;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to jump in late. What if Size = 6, CurrentOffset = 5 and Align = 4? In this case Padding = 6 - 5 % 4 = 5, which means CurrentOffset becomes 10 on line 785, which is not aligned to 4. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It should be Align instead of Size here, fixed that and added the assert as suggested below.


// The spec constant map can't be empty as the first offset is 0
// and so it can't be misaligned.
assert(!SCMetadata.empty() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest to add assert(CurrentOffset % Align == 0, "alignment calculation error");

// Ensure correct alignment
if (CurrentOffset % Align != 0) {
// Compute necessary padding to correctly align the constant.
Padding = Align - CurrentOffset % Align;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lead to memory over-use. If CurrentOffset % Align == 0, then Padding should be 0, but it will be Align. Will something like that work?

assert(Align != 0 && isPowerOf2(Align));
CurrentOffset = (CurrentOffset + Align - 1) & ~(Align-1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take this back. There is a check if (CurrentOffset % Align != 0) above, missed it

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@pvchupin pvchupin merged commit 0cec3c6 into intel:sycl May 19, 2022
yinyangsx pushed a commit to yinyangsx/llvm that referenced this pull request May 25, 2022
This patch solves intel#6093, and finishes to fix intel#5911

It doesn't change anything for native specialization constants, but correctly aligns emulated specialization constants based on type requirements.

Emulated specialization constant don't use the CompositeOffset mechanism, so this patch re-uses this field to communicate the necessary padding from the compiler pass in sycl-post-link to the runtime to ensure correct alignment.

With this patch the SYCL-CTS specialization constant tests are all passing with the CUDA plugin:

% ./bin/test_specialization_constants  
=======================
All tests passed (56 assertions in 46 test cases)

Note this is on top of intel#6125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants