-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Fix alignment of emulated specialization constants #6132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It seems that some changes are done in sync in both sycl-post-link and inside SYCL runtime. What would be the behavior if we'd use the compilation toolchain (including sycl-post-link) before this change and runtime after it? Is it correct to assume that nothing that wouldn't be failing with old runtime would fail in this scenario? |
I believe that would work fine yes. This patch doesn't change the format of how the specialization constants are communicated between the compilation and the runtime, it simply re-uses a field that's unused for emulated spec constants to communicate the padding, and that field was previously always set to So the new runtime using an old binary will read that field and add it as padding but since it's always |
NativePrg, SpecIDDesc.ID, SpecIDDesc.Size, | ||
SpecConsts.data() + SpecIDDesc.BlobOffset); | ||
} | ||
if (!DeviceCodeWasInCache && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any changes (other than some format changes) here. Can we do away with this change? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a change, the whole block is now conditioned on InputImpl->get_bin_image_ref()->supportsSpecConstants()
when before it was only the call to enableITTAnnotationsIfNeeded
, this is necessary because piextProgramSetSpecializationConstant
is only supposed to be called for native specialization constants.
It's a little hard to see because of the way it was written originally, "one-line" if with no brackets followed by a scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that this is part of #6125 , maybe I should fuse the two MRs.
if (It[0] != std::numeric_limits<std::uint32_t>::max()) { | ||
// The map is not locked here because updateSpecConstSymMap() is | ||
// only supposed to be called from c'tor. | ||
MSpecConstSymMap[std::string{SCName}].push_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call can be moved out of the if-then-else (nit pick). Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted it and also re-worked this part of the code so it's more readable, mainly re-named the iterators.
Rebased since #6125 was merged, this now only contains the changes relating to the alignment issue. |
// node of the previous spec constant to append a padding node. It | ||
// can't be added in front of the current spec constant, as doing | ||
// so would require the spec constant node to have a non-zero | ||
// CompositeOffset which breaks accessing it in the runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What part of the runtime makes this restriction? I am not opposed to appending it to the previous entry, but it would be a little cleaner to have it attached to the entry forcing the alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's two metadata nodes and the second one has a CompositeOffset
of 0
, the OffsetFromLast
here goes negative and it causes issues:
And if we set the CompositeOffset
correctly, then reading/writing the data won't work properly for emulated spec constants as it offsets the input data with it, see here:
So appending the padding works because it can have a CompositeOffset
as it's not accessed, but putting it in front doesn't because then the data node needs a non-null CompositeOffset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Thanks for the explanation. I will leave this unresolved to highlight it.
// Ensure correct alignment | ||
if (CurrentOffset % Align != 0) { | ||
// Compute necessary padding to correctly align the constant. | ||
Padding = Size - CurrentOffset % Align; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to jump in late. What if Size
= 6, CurrentOffset
= 5 and Align
= 4? In this case Padding
= 6 - 5 % 4 = 5, which means CurrentOffset
becomes 10 on line 785, which is not aligned to 4. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It should be Align
instead of Size
here, fixed that and added the assert as suggested below.
|
||
// The spec constant map can't be empty as the first offset is 0 | ||
// and so it can't be misaligned. | ||
assert(!SCMetadata.empty() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest to add assert(CurrentOffset % Align == 0, "alignment calculation error");
// Ensure correct alignment | ||
if (CurrentOffset % Align != 0) { | ||
// Compute necessary padding to correctly align the constant. | ||
Padding = Align - CurrentOffset % Align; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can lead to memory over-use. If CurrentOffset
% Align
== 0, then Padding
should be 0, but it will be Align. Will something like that work?
assert(Align != 0 && isPowerOf2(Align));
CurrentOffset = (CurrentOffset + Align - 1) & ~(Align-1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take this back. There is a check if (CurrentOffset % Align != 0)
above, missed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
This patch solves intel#6093, and finishes to fix intel#5911 It doesn't change anything for native specialization constants, but correctly aligns emulated specialization constants based on type requirements. Emulated specialization constant don't use the CompositeOffset mechanism, so this patch re-uses this field to communicate the necessary padding from the compiler pass in sycl-post-link to the runtime to ensure correct alignment. With this patch the SYCL-CTS specialization constant tests are all passing with the CUDA plugin: % ./bin/test_specialization_constants ======================= All tests passed (56 assertions in 46 test cases) Note this is on top of intel#6125
This patch solves #6093, and finishes to fix #5911
It doesn't change anything for native specialization constants, but correctly aligns emulated specialization constants based on type requirements.
Emulated specialization constant don't use the
CompositeOffset
mechanism, so this patch re-uses this field to communicate the necessary padding from the compiler pass insycl-post-link
to the runtime to ensure correct alignment.With this patch the SYCL-CTS specialization constant tests are all passing with the CUDA plugin:
Note this is on top of #6125