[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450

steffenlarsen · 2022-11-18T16:37:58Z

In the current implementation the sycl::reqd_work_group_size attribute sets the Y and Z dimension arguments optional. However, when the internal representation of the attribute is created it will be padded with 1's in the additional dimensions. An effect of this padding is that dimensionality information is lost, which has three big drawbacks:

SYCL work-group sizes are reversed, but only the specified dimensions are reversed. For example sycl::reqd_work_group_size(1, 2, 3) is for the backends the same as a backend work-group of size <3,2,1>, but sycl::reqd_work_group_size(3) corresonds to a backend work-group of <3,1,1> rather than <1,1,3>.
The SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the sycl::reqd_work_group_size's dimensionality. sycl-post-link generates kernel meta-information for the runtime which could be used to diagnose these, but since the attribute is padded with 1's the reqd_work_group_size metadata node knows no difference between it and a user-specified attribute with trailing 1's.
Sema cannot know the difference between two attributes where one was padded with 1's by the user and one was not, so it currently thinks these are equivalent.

To fix these, this patch changes Sema to not add the padding and instead consider cases where the Y and Z dimensions are unset. This only affects the SYCL spelling of the attribute.
Additionally, when generating the reqd_work_group_size attribute CodeGen will only generate a metadata value for dimensions that have been set.

In the current implementation the sycl::reqd_work_group_size attribute sets the Y and Z dimension arguments optional. However, when the internal representation of the attribute is created it will be padded with 1's in the additional dimensions. An effect of this padding is that dimensionality information is lost, which has three big drawbacks: 1. SYCL work-group sizes are reversed, but only the specified dimensions are reversed. For example `sycl::reqd_work_group_size(1, 2, 3)` is for the backends the same as a backend work-group of size `<3,2,1>`, but `sycl::reqd_work_group_size(3)` corresonds to a backend work-group of `<3,1,1>` rather than `<1,1,3>`. 2. The SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the `sycl::reqd_work_group_size`'s dimensionality. sycl-post-link generates kernel meta-information for the runtime which could be used to diagnose these, but since the attribute is padded with 1's the `reqd_work_group_size` metadata node knows no difference between it and a user-specified attribute with trailing 1's. 3. Sema cannot know the difference between two attributes where one was padded with 1's by the user and one was not, so it currently thinks these are equivalent. To fix these, this patch changes Sema to not add the padding and instead consider cases where the Y and Z dimensions are unset. This only affects the SYCL spelling of the attribute. Additionally, when generating the `reqd_work_group_size` attribute CodeGen will only generate a metadata value for dimensions that have been set. Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen · 2022-11-18T16:40:55Z

Notes:

Opened as draft because the SPIR-V Translator will need a patch to be able to handle reqd_work_group_size metadata with fewer than 3 operands. Likewise sycl-post-link needs to be taught this as well, which will be done in a separate patch.
This only affects the attribute for now. The corresponding kernel property still pads, although correctly. The padding on the property will be removed in a follow-up patch.

elizabethandrews

I'll need to go over this again because the requirements are confusing. I also noticed you refactored code out into Sema quite a bit. IIRC @AaronBallman and @smanna12 avoided this when they first worked on the patch to keep handling in line with community attribute handling. Tagging them to do a review as well since they implemented the original support.

clang/include/clang/Basic/Attr.td

elizabethandrews · 2022-11-21T10:33:46Z

clang/lib/CodeGen/CodeGenFunction.cpp

+    if (!getLangOpts().SYCLIsDevice) {
+      // On non-SYCL targets we add all dimensions in the order specified.
+      AttrMDArgs.push_back(
+          llvm::ConstantAsMetadata::get(Builder.getInt(*XDimVal)));


Non-SYCL targets will always have 3 dimensions?

To my knowledge, yes. The documentation states that:

In OpenCL, all three arguments are required.

In SYCL, the attribute accepts either one, two, or three arguments; in each
form, the last (or only) argument is the index that increments fastest.
The number of arguments passed to the attribute must match the dimensionality
of the kernel the attribute is applied to.

clang/lib/Sema/SemaDeclAttr.cpp

elizabethandrews · 2022-11-21T11:02:57Z

clang/lib/Sema/SemaDeclAttr.cpp

  // If any of the operand is still value dependent, we can't test anything.
  const auto *MGValueExpr = dyn_cast<ConstantExpr>(MGValue);
  const auto *XDimExpr = dyn_cast<ConstantExpr>(XDim);
-  const auto *YDimExpr = dyn_cast<ConstantExpr>(YDim);
-  const auto *ZDimExpr = dyn_cast<ConstantExpr>(ZDim);
+  const auto *YDimExpr = dyn_cast_or_null<ConstantExpr>(YDim);


Not sure about this. Should the cast be done only after we check YDim and ZDim exist?

I've moved the casts and the confusing checks a little to hopefully make it a little clearer what the intention is, but note that even if YDim and XDim are null we'll only do an early exit if it's not targeting SYCL, so whether or not we do it before or after we check whether they exist it should still be in a dyn_cast_or_null or equivalent.

elizabethandrews · 2022-11-21T11:11:38Z

clang/lib/Sema/SemaDeclAttr.cpp

  const auto *MWGSXDimExpr = dyn_cast<ConstantExpr>(MWGSXDim);
  const auto *MWGSYDimExpr = dyn_cast<ConstantExpr>(MWGSYDim);
  const auto *MWGSZDimExpr = dyn_cast<ConstantExpr>(MWGSZDim);

-  if (!RWGSXDimExpr || !RWGSYDimExpr || !RWGSZDimExpr || !MWGSXDimExpr ||
-      !MWGSYDimExpr || !MWGSZDimExpr)
+  if (!RWGSXDimExpr || (!RWGSYDimExpr && RWGSYDim) ||


Same thought here about this being confusing

elizabethandrews · 2022-11-21T11:25:19Z

clang/lib/Sema/SemaDeclAttr.cpp

-                MWGSZDimExpr->getResultAsAPSInt().getZExtValue()
-          : RWGSXDimExpr->getResultAsAPSInt().getZExtValue() >
-                MWGSXDimExpr->getResultAsAPSInt().getZExtValue();
+      FirstRWGDimExpr->getResultAsAPSInt().getZExtValue() >


If you've reordered using a swap already, why are we checking FirstDim against Zdim?

Little confused about the required behavior for reqd_work_group_size checks against max and how dimensionality plays a role here. Can you explain for SYCL and non SYCL targets

It generally becomes a little confusing because max_work_group_size requires all 3 arguments at the moment, while reqd_work_group_size does not. The way SYCL works we swap the first and last dimensions, which means when padding is added to bring the dimensions up to 3, the work-group size given to the backends are actually dependent on the dimensionality of the specified work groups.

So in this code there are potentially two flips; one for reqd_work_group_size and one for max_work_group_size. Since we know that the latter is always 3-dimensional we compare "first-dim" with the Z of max_work_group_size as it is actually the first dimension from a backend-perspective. However, for reqd_work_group_size we need the additional flipping logic because it is dependent on how many of the dimensions were actually specified.

It generally becomes a little confusing

LoL, @steffenlarsen you might be understating it a bit. ;-) I hope the SYCL language folks take this feedback as constructive criticism: this attribute has been a massive time sink and was a mistake to add to SYCL. It wasn't SYCL's attribute to modify in the first place, but then to swap argument orders when the arguments have meaningless names is setting everyone up for failure.

Please consider deprecating this attribute. At this point we have so much implementation experience and feedback that this was a design mistake that not reacting to it would be problematic. There's no way the SYCL variant of these attributes will be upstreamed to Clang as they are today (it's become "over my dead body" bad at this point given how often we struggle with this design). CC @intel/dpcpp-specification-reviewers and @bader for awareness.

That said, one thing I think is worth exploring at this point is making a wholly separate semantic attribute for the SYCL variant. We have the ability to make target-specific attributes that share a name with an attribute supported in another target -- we do this for the interrupt attribute because so many targets support an attribute with that name but with different argument lists. I think we can try doing the same for these problematic ones. See uses of "interrupt" in Attr.td and handleInterruptAttr() in SemaDeclAttr.cpp; you can try modeling after that. It won't help solve the ergonomic problems for users, but it might at least make the implementation less onerous. (And if the implementation gets significantly less onerous, then perhaps this becomes more reasonable to upstream so long as it's well understood this means there's no ability to mix and match SYCL and OpenCL code because the attributes will become target specific.) @steffenlarsen you can ping me if you run into questions or problems with this (it's an experiment more than a definite path forward).

Thanks @AaronBallman ! As per your suggestion I have added a commit with an implementation that splits the attribute into two; ReqdWorkGroupSizeAttr which should be handled like in upstream and SYCLReqdWorkGroupSizeAttr which should have the behavior we need for SYCL. They share the same spellings but the latter is picked if we are in SYCL mode. I will do more cleanup soon, but this should at least give a taste of what it will look like, with a handful of changes actually being reverts from upstream LLVM.

I'm largely responsible for much of the complexity in sycl::reqd_work_group_size, so I apologize that this has become such a mess. I assumed that the sycl:: namespace would be enough to completely isolate the SYCL attribute from any existing OpenCL work.

The good news is that there is a path to deprecation here: sycl_ext_oneapi_kernel_properties provides equivalent functionality without exposing a user-facing attribute.

As discussed offline we will go ahead with the merge of the new variant of the changes. @AaronBallman - Please feel free to do post-commit review of this and we can address it accordingly.

elizabethandrews · 2022-11-21T11:29:09Z

clang/lib/Sema/SemaDeclAttr.cpp

  // If any of the operand is still value dependent, we can't test anything.
  const auto *NSWIValueExpr = dyn_cast<ConstantExpr>(NSWIValue);
  const auto *RWGSXDimExpr = dyn_cast<ConstantExpr>(RWGSXDim);
-  const auto *RWGSZDimExpr = dyn_cast<ConstantExpr>(RWGSZDim);
+  const auto *RWGSYDimExpr = dyn_cast_or_null<ConstantExpr>(RWGSYDim);


Same points here

elizabethandrews · 2022-11-21T11:48:12Z

clang/test/SemaSYCL/reqd-work-group-size-device.cpp

+[[sycl::reqd_work_group_size(4, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
+[[sycl::reqd_work_group_size(4, 1, 1)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
+[[sycl::reqd_work_group_size(1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}} expected-note {{previous attribute is here}}
+[[sycl::reqd_work_group_size(1, 1, 4)]] void four_again(); // expected-error {{attribute 'reqd_work_group_size' is already applied with different arguments}}


I would suggest using different functions to test these various diagnostics instead of four_again for every check. Dimensionality and value checks is getting mixed up in here and its difficult to understand why the errors are getting triggered.

elizabethandrews · 2022-11-21T11:59:23Z

clang/test/SemaSYCL/intel-max-work-group-size.cpp

 f18();

-[[intel::max_work_group_size(16, 16, 1)]] void f19();
+[[intel::max_work_group_size(1, 16, 16)]] void f19();
 [[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK


Why is this ok? PR description says

sycl::reqd_work_group_size(3) corresonds to a backend work-group of <3,1,1> rather than <1,1,3>.

What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?

This is because max_work_group_size, like reqd_work_group_size, is flipped, but since it always has 3 arguments it is X and Z it swaps, while the reqd_work_group_size here only has two arguments, so it only swaps those.

What is backend workgroup size corresponding to sycl::reqd_work_group_size(3,4) and how does it relate to checks with max_work_group_size?

sycl::reqd_work_group_size(3,4) corresponds to <4,3,1> which would correspond to max_work_group_size(1,3,4). Honestly I would prefer changing max_work_group_size too to avoid this kind of confusion, but it is a little more complicated because I am not sure we want to restrict the dimensionality of it necessarily, like we do with reqd_work_group_size. Index flipping is the gift that keeps on giving.

No matter how many times I try to make sense of this, I don't think I ever will. :-)

For what it is worth, I imagine most users will struggle with this one too.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch. @gmlueck & @Pennycook - Any objections to this? Would we want an error or a warning if there is a dimensionality mismatch? An error could break existing code.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.

+1 from me

No objections from me. If there are concerns about breaking existing code, my suggestion would be to use a warning for the attribute spelling and an error for the eventual compile-time properties spelling.

In my opinion we should change the intel::max_work_group_size attribute to allow 1-3 arguments too. That would make it easier for users to actually use this with the other work-group-related attributes and the actual work-group sizes specified for kernels at launch.

@steffenlarsen, could you please create an issue about it so that we do not forget this discussion? You can assign to me if you do not plan to work on it.

Signed-off-by: Larsen, Steffen <[email protected]>

clang/utils/TableGen/ClangAttrEmitter.cpp

Signed-off-by: Larsen, Steffen <[email protected]>

The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from intel/llvm#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.

The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from intel#7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V.

) The reqd_work_group_size and work_group_size_hint metadata nodes are currently expected to have exactly 3 operands which are in turn translated to the 3 operands of their corresponding SPIR-V execution modes. This commit relaxes this requirement by allowing the metadata to have 1-3 operands, where the missing dimensions are padded with 1's in the corresponding execution mode. Note that the information about the dimensionality of the original metadata is lost after translation, so reverse translation will always result in the metadata having all three operands. Motivation comes from #7450 where instead of doing the implicit padding during LLVM IR generation we want to simplify the metadata to only have the operands specified. Not only does it relax the requirements, it also lets us preserve information about dimensionality up until it is translated into SPIR-V. NOTE: This is a cherry-pick of KhronosGroup/SPIRV-LLVM-Translator#1726.

steffenlarsen · 2022-11-28T14:46:26Z

Opened as draft because the SPIR-V Translator will need a patch to be able to handle reqd_work_group_size metadata with fewer than 3 operands. Likewise sycl-post-link needs to be taught this as well, which will be done in a separate patch.

This dependency has been resolved with #7471 and #7486.

clang/lib/Sema/SemaDeclAttr.cpp

clang/include/clang/Basic/Attr.td

clang/lib/CodeGen/TargetInfo.cpp

clang/test/SemaOpenCL/invalid-kernel-attrs.cl

premanandrao · 2022-11-29T20:30:45Z

clang/test/SemaSYCL/intel-max-work-group-size.cpp

 f18();

-[[intel::max_work_group_size(16, 16, 1)]] void f19();
+[[intel::max_work_group_size(1, 16, 16)]] void f19();
 [[sycl::reqd_work_group_size(16, 16)]] void f19(); // OK


No matter how many times I try to make sense of this, I don't think I ever will. :-)

Signed-off-by: Larsen, Steffen <[email protected]>

smanna12

I am OK with the change.

elizabethandrews

Sorry for the delay. LGTM

clang/include/clang/Basic/Attr.td

clang/lib/CodeGen/TargetInfo.cpp

AaronBallman · 2023-01-03T14:48:36Z

clang/include/clang/Basic/Attr.td

+  // arguments or subjects differ, should specify HasCustomParsing = 1 and
+  // implement their own parsing and semantic handling requirements as-needed.
+  // Additionally, they should ensure that the language options do not overlap.
+  string ParseKind;


Should this class hold the language option used to distinguish the attribute variants? It seems a bit strange to me that we can use this mixin but never specify a language option (which is different from target-specific attributes where you have to specify the target).

That said, it might be confusing for this class and Attr to both accept a language option, so we might want tablegen to yell at you if you do something wrong. e.g., if we leave the design as-is, maybe tablegen should bark if you don't specify the language option in the attribute definition, and if we switch the design up, maybe tablegen should bark if you specify the language option in two places (once here and once on the attribute).

The first design I tried had a new LangOpts-like member as part of this class, but I decided for the current design because it did feel redundant when we already had LangOpts in Attr, which the attributes would be deriving from anyway. As such, I prefer to keep it as is and have checks for LangOpts being set in the attributes deriving from it. See #7947.

This commit refactors the SYCL path for work_group_size_hint, similar to the refactoring for reqd_work_group_size in intel#7450. It also fixes an issue where the dimensions of the work-group hint were not correctly reversed. Signed-off-by: Larsen, Steffen <[email protected]>

…icAttr (#7947) This commit adds asserts that all attributes deriving from LanguageOptionsSpecificAttr define both a ParseKind and a LangOpts value. Based on #7450 (comment) Signed-off-by: Larsen, Steffen <[email protected]>

This commit refactors the SYCL path for work_group_size_hint, similar to the refactoring for reqd_work_group_size in #7450. It also fixes an issue where the dimensions of the work-group hint were not correctly reversed. --------- Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen requested review from elizabethandrews and a team November 18, 2022 16:41

elizabethandrews reviewed Nov 21, 2022

View reviewed changes

steffenlarsen mentioned this pull request Nov 21, 2022

Allow variable number of operands in work-group metadata KhronosGroup/SPIRV-LLVM-Translator#1726

Merged

steffenlarsen added 2 commits November 21, 2022 06:03

Respond to comments

dde66cf

Signed-off-by: Larsen, Steffen <[email protected]>

Split ReqdWorkGroupSizeAttr by language options

1942f22

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen commented Nov 21, 2022

View reviewed changes

clang/utils/TableGen/ClangAttrEmitter.cpp Outdated Show resolved Hide resolved

Fix clang-tidy and minor refactoring

98414e7

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen mentioned this pull request Nov 22, 2022

[SPIR-V] Allow variable number of operands in work-group metadata #7486

Merged

steffenlarsen marked this pull request as ready for review November 28, 2022 14:45

steffenlarsen requested a review from a team as a code owner November 28, 2022 14:45

steffenlarsen mentioned this pull request Nov 28, 2022

[SYCL][CUDA] Allow varying program metadata in CUDA backend #7555

Merged

smanna12 reviewed Nov 28, 2022

View reviewed changes

clang/lib/Sema/SemaDeclAttr.cpp Show resolved Hide resolved

premanandrao reviewed Nov 29, 2022

View reviewed changes

AlexeySachkov mentioned this pull request Nov 29, 2022

[SYCL][Fusion] API for kernel fusion library #7465

Merged

Remove wrong not in comments

f5e44b5

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen requested review from AaronBallman, premanandrao, elizabethandrews and smanna12 December 6, 2022 14:57

premanandrao approved these changes Dec 6, 2022

View reviewed changes

smanna12 approved these changes Dec 6, 2022

View reviewed changes

This was referenced Dec 8, 2022

sycl::work_group_size_hint should be refactored #7699

Closed

Add SYCL extension specification for max_work_group_size #7700

Open

elizabethandrews reviewed Dec 8, 2022

View reviewed changes

clang/include/clang/Basic/Attr.td Show resolved Hide resolved

clang/lib/CodeGen/TargetInfo.cpp Show resolved Hide resolved

elizabethandrews approved these changes Dec 8, 2022

View reviewed changes

steffenlarsen merged commit b211d0d into intel:sycl Dec 15, 2022

AlexeySachkov mentioned this pull request Dec 23, 2022

reqd_work_group_size attribute is not reversed correctly #4224

Closed

AaronBallman reviewed Jan 3, 2023

View reviewed changes

steffenlarsen mentioned this pull request Jan 6, 2023

[SYCL] Add assert for LangOpts and ParseKind in LanguageOptionsSpecificAttr #7947

Merged

steffenlarsen mentioned this pull request Jan 9, 2023

[SYCL] Refactor SYCL path for work_group_size_hint #7961

Merged

steffenlarsen mentioned this pull request Feb 14, 2023

SYCL :: OptionalKernelFeatures/is_compatible.cpp LIT test fails on HIP #8146

Closed

jchlanda mentioned this pull request Oct 6, 2023

HIP reqd_work_group_size attribute #11453

Closed

[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450

[SYCL] Change sycl::reqd_work_group_size with optional dimensions #7450

Uh oh!

Conversation

steffenlarsen commented Nov 18, 2022

Uh oh!

steffenlarsen commented Nov 18, 2022

Uh oh!

elizabethandrews left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steffenlarsen commented Nov 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smanna12 left a comment

Choose a reason for hiding this comment

Uh oh!

elizabethandrews left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!