-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450
Conversation
In the current implementation the sycl::reqd_work_group_size attribute sets the Y and Z dimension arguments optional. However, when the internal representation of the attribute is created it will be padded with 1's in the additional dimensions. An effect of this padding is that dimensionality information is lost, which has three big drawbacks: 1. SYCL work-group sizes are reversed, but only the specified dimensions are reversed. For example `sycl::reqd_work_group_size(1, 2, 3)` is for the backends the same as a backend work-group of size `<3,2,1>`, but `sycl::reqd_work_group_size(3)` corresonds to a backend work-group of `<3,1,1>` rather than `<1,1,3>`. 2. The SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the `sycl::reqd_work_group_size`'s dimensionality. sycl-post-link generates kernel meta-information for the runtime which could be used to diagnose these, but since the attribute is padded with 1's the `reqd_work_group_size` metadata node knows no difference between it and a user-specified attribute with trailing 1's. 3. Sema cannot know the difference between two attributes where one was padded with 1's by the user and one was not, so it currently thinks these are equivalent. To fix these, this patch changes Sema to not add the padding and instead consider cases where the Y and Z dimensions are unset. This only affects the SYCL spelling of the attribute. Additionally, when generating the `reqd_work_group_size` attribute CodeGen will only generate a metadata value for dimensions that have been set. Signed-off-by: Larsen, Steffen <[email protected]>
Notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to go over this again because the requirements are confusing. I also noticed you refactored code out into Sema quite a bit. IIRC @AaronBallman and @smanna12 avoided this when they first worked on the patch to keep handling in line with community attribute handling. Tagging them to do a review as well since they implemented the original support.
if (!getLangOpts().SYCLIsDevice) { | ||
// On non-SYCL targets we add all dimensions in the order specified. | ||
AttrMDArgs.push_back( | ||
llvm::ConstantAsMetadata::get(Builder.getInt(*XDimVal))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-SYCL targets will always have 3 dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, yes. The documentation states that:
In OpenCL, all three arguments are required.
In SYCL, the attribute accepts either one, two, or three arguments; in each
form, the last (or only) argument is the index that increments fastest.
The number of arguments passed to the attribute must match the dimensionality
of the kernel the attribute is applied to.
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
// If any of the operand is still value dependent, we can't test anything. | ||
const auto *MGValueExpr = dyn_cast<ConstantExpr>(MGValue); | ||
const auto *XDimExpr = dyn_cast<ConstantExpr>(XDim); | ||
const auto *YDimExpr = dyn_cast<ConstantExpr>(YDim); | ||
const auto *ZDimExpr = dyn_cast<ConstantExpr>(ZDim); | ||
const auto *YDimExpr = dyn_cast_or_null<ConstantExpr>(YDim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this. Should the cast be done only after we check YDim and ZDim exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the casts and the confusing checks a little to hopefully make it a little clearer what the intention is, but note that even if YDim
and XDim
are null we'll only do an early exit if it's not targeting SYCL, so whether or not we do it before or after we check whether they exist it should still be in a dyn_cast_or_null
or equivalent.
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
const auto *MWGSXDimExpr = dyn_cast<ConstantExpr>(MWGSXDim); | ||
const auto *MWGSYDimExpr = dyn_cast<ConstantExpr>(MWGSYDim); | ||
const auto *MWGSZDimExpr = dyn_cast<ConstantExpr>(MWGSZDim); | ||
|
||
if (!RWGSXDimExpr || !RWGSYDimExpr || !RWGSZDimExpr || !MWGSXDimExpr || | ||
!MWGSYDimExpr || !MWGSZDimExpr) | ||
if (!RWGSXDimExpr || (!RWGSYDimExpr && RWGSYDim) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought here about this being confusing
MWGSZDimExpr->getResultAsAPSInt().getZExtValue() | ||
: RWGSXDimExpr->getResultAsAPSInt().getZExtValue() > | ||
MWGSXDimExpr->getResultAsAPSInt().getZExtValue(); | ||
FirstRWGDimExpr->getResultAsAPSInt().getZExtValue() > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you've reordered using a swap already, why are we checking FirstDim against Zdim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little confused about the required behavior for reqd_work_group_size
checks against max and how dimensionality plays a role here. Can you explain for SYCL and non SYCL targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It generally becomes a little confusing because max_work_group_size
requires all 3 arguments at the moment, while reqd_work_group_size
does not. The way SYCL works we swap the first and last dimensions, which means when padding is added to bring the dimensions up to 3, the work-group size given to the backends are actually dependent on the dimensionality of the specified work groups.
So in this code there are potentially two flips; one for reqd_work_group_size
and one for max_work_group_size
. Since we know that the latter is always 3-dimensional we compare "first-dim" with the Z of max_work_group_size
as it is actually the first dimension from a backend-perspective. However, for reqd_work_group_size
we need the additional flipping logic because it is dependent on how many of the dimensions were actually specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It generally becomes a little confusing
LoL, @steffenlarsen you might be understating it a bit. ;-) I hope the SYCL language folks take this feedback as constructive criticism: this attribute has been a massive time sink and was a mistake to add to SYCL. It wasn't SYCL's attribute to modify in the first place, but then to swap argument orders when the arguments have meaningless names is setting everyone up for failure.
Please consider deprecating this attribute. At this point we have so much implementation experience and feedback that this was a design mistake that not reacting to it would be problematic. There's no way the SYCL variant of these attributes will be upstreamed to Clang as they are today (it's become "over my dead body" bad at this point given how often we struggle with this design). CC @intel/dpcpp-specification-reviewers and @bader for awareness.
That said, one thing I think is worth exploring at this point is making a wholly separate semantic attribute for the SYCL variant. We have the ability to make target-specific attributes that share a name with an attribute supported in another target -- we do this for the interrupt
attribute because so many targets support an attribute with that name but with different argument lists. I think we can try doing the same for these problematic ones. See uses of "interrupt"
in Attr.td and handleInterruptAttr()
in SemaDeclAttr.cpp; you can try modeling after that. It won't help solve the ergonomic problems for users, but it might at least make the implementation less onerous. (And if the implementation gets significantly less onerous, then perhaps this becomes more reasonable to upstream so long as it's well understood this means there's no ability to mix and match SYCL and OpenCL code because the attributes will become target specific.) @steffenlarsen you can ping me if you run into questions or problems with this (it's an experiment more than a definite path forward).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AaronBallman ! As per your suggestion I have added a commit with an implementation that splits the attribute into two; ReqdWorkGroupSizeAttr
which should be handled like in upstream and SYCLReqdWorkGroupSizeAttr
which should have the behavior we need for SYCL. They share the same spellings but the latter is picked if we are in SYCL mode. I will do more cleanup soon, but this should at least give a taste of what it will look like, with a handful of changes actually being reverts from upstream LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm largely responsible for much of the complexity in sycl::reqd_work_group_size
, so I apologize that this has become such a mess. I assumed that the sycl::
namespace would be enough to completely isolate the SYCL attribute from any existing OpenCL work.
The good news is that there is a path to deprecation here: sycl_ext_oneapi_kernel_properties provides equivalent functionality without exposing a user-facing attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline we will go ahead with the merge of the new variant of the changes. @AaronBallman - Please feel free to do post-commit review of this and we can address it accordingly.
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
// If any of the operand is still value dependent, we can't test anything. | ||
const auto *NSWIValueExpr = dyn_cast<ConstantExpr>(NSWIValue); | ||
const auto *RWGSXDimExpr = dyn_cast<ConstantExpr>(RWGSXDim); | ||
const auto *RWGSZDimExpr = dyn_cast<ConstantExpr>(RWGSZDim); | ||
const auto *RWGSYDimExpr = dyn_cast_or_null<ConstantExpr>(RWGSYDim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same points here
[[sycl::reqd_work_group_size(4, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}} | ||
[[sycl::reqd_work_group_size(4, 1, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}} | ||
[[sycl::reqd_work_group_size(1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}} | ||
[[sycl::reqd_work_group_size(1, 1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using different functions to test these various diagnostics instead of four_again for every check. Dimensionality and value checks is getting mixed up in here and its difficult to understand why the errors are getting triggered.
f18(); | ||
|
||
[[intel::max_work_group_size(16, 16, 1)]] void f19(); | ||
[[intel::max_work_group_size(1, 16, 16)]] void f19(); | ||
[[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ok? PR description says
sycl::reqd_work_group_size(3) corresonds to a backend work-group of <3,1,1> rather than <1,1,3>.
What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because max_work_group_size
, like reqd_work_group_size
, is flipped, but since it always has 3 arguments it is X and Z it swaps, while the reqd_work_group_size
here only has two arguments, so it only swaps those.
What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?
sycl::reqd_work_group_size(3,4)
corresponds to <4,3,1>
which would correspond to max_work_group_size(1,3,4)
. Honestly I would prefer changing max_work_group_size
too to avoid this kind of confusion, but it is a little more complicated because I am not sure we want to restrict the dimensionality of it necessarily, like we do with reqd_work_group_size
. Index flipping is the gift that keeps on giving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No matter how many times I try to make sense of this, I don't think I ever will. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it is worth, I imagine most users will struggle with this one too.
In my opinion we should change the intel::max_work_group_size
attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch. @gmlueck & @Pennycook - Any objections to this? Would we want an error or a warning if there is a dimensionality mismatch? An error could break existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.
+1 from me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections from me. If there are concerns about breaking existing code, my suggestion would be to use a warning for the attribute spelling and an error for the eventual compile-time properties spelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.
@steffenlarsen, could you please create an issue about it so that we do not forget this discussion? You can assign to me if you do not plan to work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from intel/llvm#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.
The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from intel#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.
) The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from #7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V. NOTE: This is a cherry-pick of KhronosGroup/SPIRV-LLVM-Translator#1726.
f18(); | ||
|
||
[[intel::max_work_group_size(16, 16, 1)]] void f19(); | ||
[[intel::max_work_group_size(1, 16, 16)]] void f19(); | ||
[[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No matter how many times I try to make sense of this, I don't think I ever will. :-)
Signed-off-by: Larsen, Steffen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. LGTM
// arguments or subjects differ, should specify HasCustomParsing = 1 and | ||
// implement their own parsing and semantic handling requirements as-needed. | ||
// Additionally, they should ensure that the language options do not overlap. | ||
string ParseKind; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this class hold the language option used to distinguish the attribute variants? It seems a bit strange to me that we can use this mixin but never specify a language option (which is different from target-specific attributes where you have to specify the target).
That said, it might be confusing for this class and Attr
to both accept a language option, so we might want tablegen to yell at you if you do something wrong. e.g., if we leave the design as-is, maybe tablegen should bark if you don't specify the language option in the attribute definition, and if we switch the design up, maybe tablegen should bark if you specify the language option in two places (once here and once on the attribute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first design I tried had a new LangOpts-like member as part of this class, but I decided for the current design because it did feel redundant when we already had LangOpts in Attr, which the attributes would be deriving from anyway. As such, I prefer to keep it as is and have checks for LangOpts being set in the attributes deriving from it. See #7947.
This commit refactors the SYCL path for work_group_size_hint, similar to the refactoring for reqd_work_group_size in intel#7450. It also fixes an issue where the dimensions of the work-group hint were not correctly reversed. Signed-off-by: Larsen, Steffen <[email protected]>
…icAttr (#7947) This commit adds asserts that all attributes deriving from LanguageOptionsSpecificAttr define both a ParseKind and a LangOpts value. Based on #7450 (comment) Signed-off-by: Larsen, Steffen <[email protected]>
This commit refactors the SYCL path for work_group_size_hint, similar to the refactoring for reqd_work_group_size in #7450. It also fixes an issue where the dimensions of the work-group hint were not correctly reversed. --------- Signed-off-by: Larsen, Steffen <[email protected]>
In the current implementation the sycl::reqd_work_group_size attribute sets the Y and Z dimension arguments optional. However, when the internal representation of the attribute is created it will be padded with 1's in the additional dimensions. An effect of this padding is that dimensionality information is lost, which has three big drawbacks:
sycl::reqd_work_group_size(1, 2, 3)
is for the backends the same as a backend work-group of size<3,2,1>
, butsycl::reqd_work_group_size(3)
corresonds to a backend work-group of<3,1,1>
rather than<1,1,3>
.sycl::reqd_work_group_size
's dimensionality. sycl-post-link generates kernel meta-information for the runtime which could be used to diagnose these, but since the attribute is padded with 1's thereqd_work_group_size
metadata node knows no difference between it and a user-specified attribute with trailing 1's.To fix these, this patch changes Sema to not add the padding and instead consider cases where the Y and Z dimensions are unset. This only affects the SYCL spelling of the attribute.
Additionally, when generating the
reqd_work_group_size
attribute CodeGen will only generate a metadata value for dimensions that have been set.