Skip to content

[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steffenlarsen
Copy link
Contributor

In the current implementation the sycl::reqd_work_group_size attribute sets the Y and Z dimension arguments optional. However, when the internal representation of the attribute is created it will be padded with 1's in the additional dimensions. An effect of this padding is that dimensionality information is lost, which has three big drawbacks:

  1. SYCL work-group sizes are reversed, but only the specified dimensions are reversed. For example sycl::reqd_work_group_size(1, 2, 3) is for the backends the same as a backend work-group of size <3,2,1>, but sycl::reqd_work_group_size(3) corresonds to a backend work-group of <3,1,1> rather than <1,1,3>.
  2. The SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the sycl::reqd_work_group_size's dimensionality. sycl-post-link generates kernel meta-information for the runtime which could be used to diagnose these, but since the attribute is padded with 1's the reqd_work_group_size metadata node knows no difference between it and a user-specified attribute with trailing 1's.
  3. Sema cannot know the difference between two attributes where one was padded with 1's by the user and one was not, so it currently thinks these are equivalent.

To fix these, this patch changes Sema to not add the padding and instead consider cases where the Y and Z dimensions are unset. This only affects the SYCL spelling of the attribute.
Additionally, when generating the reqd_work_group_size attribute CodeGen will only generate a metadata value for dimensions that have been set.

In the current implementation the sycl::reqd_work_group_size attribute
sets the Y and Z dimension arguments optional. However, when the
internal representation of the attribute is created it will be padded
with 1's in the additional dimensions. An effect of this padding is that
dimensionality information is lost, which has three big drawbacks:
 1. SYCL work-group sizes are reversed, but only the specified
    dimensions are reversed. For example
    `sycl::reqd_work_group_size(1, 2, 3)` is for the backends the same
    as a backend work-group of size `<3,2,1>`, but
    `sycl::reqd_work_group_size(3)` corresonds to a backend work-group
    of `<3,1,1>` rather than `<1,1,3>`.
 2. The SYCL runtime is supposed to throw an exception when a kernel
    is launched with a number of dimensions that does not match the
    `sycl::reqd_work_group_size`'s dimensionality. sycl-post-link
    generates kernel meta-information for the runtime which could be
    used to diagnose these, but since the attribute is padded with 1's
    the `reqd_work_group_size` metadata node knows no difference between
    it and a user-specified attribute with trailing 1's.
 3. Sema cannot know the difference between two attributes where one was
    padded with 1's by the user and one was not, so it currently thinks
    these are equivalent.

To fix these, this patch changes Sema to not add the padding and instead
consider cases where the Y and Z dimensions are unset. This only affects
the SYCL spelling of the attribute.
Additionally, when generating the `reqd_work_group_size` attribute
CodeGen will only generate a metadata value for dimensions that have
been set.

Signed-off-by: Larsen, Steffen <[email protected]>
@steffenlarsen
Copy link
Contributor Author

Notes:

  1. Opened as draft because the SPIR-V Translator will need a patch to be able to handle reqd_work_group_size metadata with fewer than 3 operands. Likewise sycl-post-link needs to be taught this as well, which will be done in a separate patch.
  2. This only affects the attribute for now. The corresponding kernel property still pads, although correctly. The padding on the property will be removed in a follow-up patch.

@steffenlarsen steffenlarsen requested review from elizabethandrews and a team November 18, 2022 16:41
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to go over this again because the requirements are confusing. I also noticed you refactored code out into Sema quite a bit. IIRC @AaronBallman and @smanna12 avoided this when they first worked on the patch to keep handling in line with community attribute handling. Tagging them to do a review as well since they implemented the original support.

if (!getLangOpts().SYCLIsDevice) {
// On non-SYCL targets we add all dimensions in the order specified.
AttrMDArgs.push_back(
llvm::ConstantAsMetadata::get(Builder.getInt(*XDimVal)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-SYCL targets will always have 3 dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, yes. The documentation states that:

In OpenCL, all three arguments are required.

In SYCL, the attribute accepts either one, two, or three arguments; in each
form, the last (or only) argument is the index that increments fastest.
The number of arguments passed to the attribute must match the dimensionality
of the kernel the attribute is applied to.

// If any of the operand is still value dependent, we can't test anything.
const auto *MGValueExpr = dyn_cast<ConstantExpr>(MGValue);
const auto *XDimExpr = dyn_cast<ConstantExpr>(XDim);
const auto *YDimExpr = dyn_cast<ConstantExpr>(YDim);
const auto *ZDimExpr = dyn_cast<ConstantExpr>(ZDim);
const auto *YDimExpr = dyn_cast_or_null<ConstantExpr>(YDim);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. Should the cast be done only after we check YDim and ZDim exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the casts and the confusing checks a little to hopefully make it a little clearer what the intention is, but note that even if YDim and XDim are null we'll only do an early exit if it's not targeting SYCL, so whether or not we do it before or after we check whether they exist it should still be in a dyn_cast_or_null or equivalent.

const auto *MWGSXDimExpr = dyn_cast<ConstantExpr>(MWGSXDim);
const auto *MWGSYDimExpr = dyn_cast<ConstantExpr>(MWGSYDim);
const auto *MWGSZDimExpr = dyn_cast<ConstantExpr>(MWGSZDim);

if (!RWGSXDimExpr || !RWGSYDimExpr || !RWGSZDimExpr || !MWGSXDimExpr ||
!MWGSYDimExpr || !MWGSZDimExpr)
if (!RWGSXDimExpr || (!RWGSYDimExpr && RWGSYDim) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought here about this being confusing

MWGSZDimExpr->getResultAsAPSInt().getZExtValue()
: RWGSXDimExpr->getResultAsAPSInt().getZExtValue() >
MWGSXDimExpr->getResultAsAPSInt().getZExtValue();
FirstRWGDimExpr->getResultAsAPSInt().getZExtValue() >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've reordered using a swap already, why are we checking FirstDim against Zdim?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little confused about the required behavior for reqd_work_group_size checks against max and how dimensionality plays a role here. Can you explain for SYCL and non SYCL targets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally becomes a little confusing because max_work_group_size requires all 3 arguments at the moment, while reqd_work_group_size does not. The way SYCL works we swap the first and last dimensions, which means when padding is added to bring the dimensions up to 3, the work-group size given to the backends are actually dependent on the dimensionality of the specified work groups.

So in this code there are potentially two flips; one for reqd_work_group_size and one for max_work_group_size. Since we know that the latter is always 3-dimensional we compare "first-dim" with the Z of max_work_group_size as it is actually the first dimension from a backend-perspective. However, for reqd_work_group_size we need the additional flipping logic because it is dependent on how many of the dimensions were actually specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally becomes a little confusing

LoL, @steffenlarsen you might be understating it a bit. ;-) I hope the SYCL language folks take this feedback as constructive criticism: this attribute has been a massive time sink and was a mistake to add to SYCL. It wasn't SYCL's attribute to modify in the first place, but then to swap argument orders when the arguments have meaningless names is setting everyone up for failure.

Please consider deprecating this attribute. At this point we have so much implementation experience and feedback that this was a design mistake that not reacting to it would be problematic. There's no way the SYCL variant of these attributes will be upstreamed to Clang as they are today (it's become "over my dead body" bad at this point given how often we struggle with this design). CC @intel/dpcpp-specification-reviewers and @bader for awareness.

That said, one thing I think is worth exploring at this point is making a wholly separate semantic attribute for the SYCL variant. We have the ability to make target-specific attributes that share a name with an attribute supported in another target -- we do this for the interrupt attribute because so many targets support an attribute with that name but with different argument lists. I think we can try doing the same for these problematic ones. See uses of "interrupt" in Attr.td and handleInterruptAttr() in SemaDeclAttr.cpp; you can try modeling after that. It won't help solve the ergonomic problems for users, but it might at least make the implementation less onerous. (And if the implementation gets significantly less onerous, then perhaps this becomes more reasonable to upstream so long as it's well understood this means there's no ability to mix and match SYCL and OpenCL code because the attributes will become target specific.) @steffenlarsen you can ping me if you run into questions or problems with this (it's an experiment more than a definite path forward).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AaronBallman ! As per your suggestion I have added a commit with an implementation that splits the attribute into two; ReqdWorkGroupSizeAttr which should be handled like in upstream and SYCLReqdWorkGroupSizeAttr which should have the behavior we need for SYCL. They share the same spellings but the latter is picked if we are in SYCL mode. I will do more cleanup soon, but this should at least give a taste of what it will look like, with a handful of changes actually being reverts from upstream LLVM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm largely responsible for much of the complexity in sycl::reqd_work_group_size, so I apologize that this has become such a mess. I assumed that the sycl:: namespace would be enough to completely isolate the SYCL attribute from any existing OpenCL work.

The good news is that there is a path to deprecation here: sycl_ext_oneapi_kernel_properties provides equivalent functionality without exposing a user-facing attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline we will go ahead with the merge of the new variant of the changes. @AaronBallman - Please feel free to do post-commit review of this and we can address it accordingly.

// If any of the operand is still value dependent, we can't test anything.
const auto *NSWIValueExpr = dyn_cast<ConstantExpr>(NSWIValue);
const auto *RWGSXDimExpr = dyn_cast<ConstantExpr>(RWGSXDim);
const auto *RWGSZDimExpr = dyn_cast<ConstantExpr>(RWGSZDim);
const auto *RWGSYDimExpr = dyn_cast_or_null<ConstantExpr>(RWGSYDim);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same points here

[[sycl::reqd_work_group_size(4, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
[[sycl::reqd_work_group_size(4, 1, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
[[sycl::reqd_work_group_size(1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
[[sycl::reqd_work_group_size(1, 1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using different functions to test these various diagnostics instead of four_again for every check. Dimensionality and value checks is getting mixed up in here and its difficult to understand why the errors are getting triggered.

f18();

[[intel::max_work_group_size(16, 16, 1)]] void f19();
[[intel::max_work_group_size(1, 16, 16)]] void f19();
[[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ok? PR description says

sycl::reqd_work_group_size(3) corresonds to a backend work-group of <3,1,1> rather than <1,1,3>.

What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because max_work_group_size, like reqd_work_group_size, is flipped, but since it always has 3 arguments it is X and Z it swaps, while the reqd_work_group_size here only has two arguments, so it only swaps those.

What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?

sycl::reqd_work_group_size(3,4) corresponds to <4,3,1> which would correspond to max_work_group_size(1,3,4). Honestly I would prefer changing max_work_group_size too to avoid this kind of confusion, but it is a little more complicated because I am not sure we want to restrict the dimensionality of it necessarily, like we do with reqd_work_group_size. Index flipping is the gift that keeps on giving.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter how many times I try to make sense of this, I don't think I ever will. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it is worth, I imagine most users will struggle with this one too.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch. @gmlueck & @Pennycook - Any objections to this? Would we want an error or a warning if there is a dimensionality mismatch? An error could break existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.

+1 from me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me. If there are concerns about breaking existing code, my suggestion would be to use a warning for the attribute spelling and an error for the eventual compile-time properties spelling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.

@steffenlarsen, could you please create an issue about it so that we do not forget this discussion? You can assign to me if you do not plan to work on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MrSidims pushed a commit to KhronosGroup/SPIRV-LLVM-Translator that referenced this pull request Nov 22, 2022
The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands.

Motivation comes from intel/llvm#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.
steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Nov 22, 2022
The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands.

Motivation comes from intel#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.
steffenlarsen added a commit that referenced this pull request Nov 24, 2022
)

The reqd_work_group_size and work_group_size_hint metadata nodes are
currently expected to have exactly 3 operands which are in turn
translated to the 3 operands of their corresponding SPIR-V execution
modes. This commit relaxes this requirement by allowing the metadata to
have 1-3 operands, where the missing dimensions are padded with 1's in
the corresponding execution mode. Note that the information about the
dimensionality of the original metadata is lost after translation, so
reverse translation will always result in the metadata having all three
operands.

Motivation comes from #7450 where instead of doing the
implicit padding during LLVM IR generation we want to simplify the
metadata to only have the operands specified. Not only does it relax the
requirements, it also lets us preserve information about dimensionality
up until it is translated into SPIR-V.

NOTE: This is a cherry-pick of
KhronosGroup/SPIRV-LLVM-Translator#1726.
@steffenlarsen steffenlarsen marked this pull request as ready for review November 28, 2022 14:45
@steffenlarsen steffenlarsen requested a review from a team as a code owner November 28, 2022 14:45
@steffenlarsen
Copy link
Contributor Author

Opened as draft because the SPIR-V Translator will need a patch to be able to handle reqd_work_group_size metadata with fewer than 3 operands. Likewise sycl-post-link needs to be taught this as well, which will be done in a separate patch.

This dependency has been resolved with #7471 and #7486.

f18();

[[intel::max_work_group_size(16, 16, 1)]] void f19();
[[intel::max_work_group_size(1, 16, 16)]] void f19();
[[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter how many times I try to make sense of this, I don't think I ever will. :-)

Signed-off-by: Larsen, Steffen <[email protected]>
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with the change.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. LGTM

// arguments or subjects differ, should specify HasCustomParsing = 1 and
// implement their own parsing and semantic handling requirements as-needed.
// Additionally, they should ensure that the language options do not overlap.
string ParseKind;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class hold the language option used to distinguish the attribute variants? It seems a bit strange to me that we can use this mixin but never specify a language option (which is different from target-specific attributes where you have to specify the target).

That said, it might be confusing for this class and Attr to both accept a language option, so we might want tablegen to yell at you if you do something wrong. e.g., if we leave the design as-is, maybe tablegen should bark if you don't specify the language option in the attribute definition, and if we switch the design up, maybe tablegen should bark if you specify the language option in two places (once here and once on the attribute).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first design I tried had a new LangOpts-like member as part of this class, but I decided for the current design because it did feel redundant when we already had LangOpts in Attr, which the attributes would be deriving from anyway. As such, I prefer to keep it as is and have checks for LangOpts being set in the attributes deriving from it. See #7947.

steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Jan 9, 2023
This commit refactors the SYCL path for work_group_size_hint, similar to
the refactoring for reqd_work_group_size in
intel#7450. It also fixes an issue where
the dimensions of the work-group hint were not correctly reversed.

Signed-off-by: Larsen, Steffen <[email protected]>
bader pushed a commit that referenced this pull request Jan 9, 2023
…icAttr (#7947)

This commit adds asserts that all attributes deriving from
LanguageOptionsSpecificAttr define both a ParseKind and a LangOpts
value.

Based on #7450 (comment)

Signed-off-by: Larsen, Steffen <[email protected]>
bader pushed a commit that referenced this pull request Jan 11, 2023
This commit refactors the SYCL path for work_group_size_hint, similar to
the refactoring for reqd_work_group_size in
#7450. It also fixes an issue where
the dimensions of the work-group hint were not correctly reversed.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants