Skip to content

[SYCL][FPGA][NFC] Fix num_simd_work_items and reqd_work_group_size argument check #3337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

smanna12
Copy link
Contributor

@smanna12 smanna12 commented Mar 10, 2021

From Bug: #3252
The order of the arguments to reqd_work_group_size in SYCL and OpenCL are reversed.
In the OpenCL syntax, reqd_work_group_size(X,Y,Z), X is the index that increments the fastest,
followed by Y, then Z. If we preserve that definition of X, Y, and Z, then in SYCL the attribute
is specified as reqd_work_group_size(Z,Y,X). This gets more complicated in the case where fewer than
three arguments are provided. In such a case it's not the "X" argument that is omitted
(by the definitions of X, Y, and Z above) it's the Z dimension that's omitted. So the two argument
version has the syntax reqd_work_group_size(Y,X). Similarly, the one argument variant has syntax
reqd_work_group_size(X).

This patches fixes the problem on #3275 where we checked the wrong argument.
In SYCL, the num_simd_work_items must evenly divide the last argument to reqd_work_group_size regardless of how many arguments req_work_group_size has.

Signed-off-by: Soumi Manna [email protected]

…gument check

The order of the arguments to reqd_work_group_size in SYCL and OpenCL are reversed.
In the OpenCL syntax, reqd_work_group_size(X,Y,Z), X is the index that increments the fastest,
followed by Y, then Z. If we preserve that definition of X, Y, and Z, then in SYCL the attribute
is specified as reqd_work_group_size(Z,Y,X). This gets more complicated in the case where fewer than
three arguments are provided. In such a case it's not the "X" argument that is omitted
(by the definitions of X, Y, and Z above) it's the Z dimension that's omitted. So the two argument
version has the syntax reqd_work_group_size(Y,X). Similarly, the one argument variant has syntax
reqd_work_group_size(X).

This patches fixes the problem on intel#3275
where we checked the wrong argument. In SYCL, the num_simd_work_items must evenly divide the last argument
to reqd_work_group_size regardless of how many arguments req_work_group_size has.

Signed-off-by: Soumi Manna <[email protected]>
@smanna12 smanna12 changed the title [SYCL][FPGA][NFC] Fix num_simd_work_items and reqd_work_group_size ar… [SYCL][FPGA][NFC] Fix num_simd_work_items and reqd_work_group_size argument check Mar 10, 2021
@smanna12 smanna12 requested a review from AaronBallman March 10, 2021 15:47
@smanna12 smanna12 marked this pull request as ready for review March 10, 2021 15:47
Signed-off-by: Soumi Manna <[email protected]>
@@ -3151,11 +3151,14 @@ static void handleWorkGroupSize(Sema &S, Decl *D, const ParsedAttr &AL) {
return;
ZDimExpr = ZDim.get();

// If the declaration has an [[intel::num_simd_work_items()]] attribute,
// check to see if the last argument can be evenly divided by the
// num_simd_work_items attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// num_simd_work_items attribute
// num_simd_work_items attribute.

if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) {
int64_t NumSimdWorkItems =
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue();

if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) {
if (ZDimVal.getZExtValue() % NumSimdWorkItems != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is called for __attribute__((reqd_work_group_size)) as well as [[intel::reqd_work_group_size]] -- aren't X and Z only swapped for the intel attribute and not the OpenCL ones?

Copy link
Contributor Author

@smanna12 smanna12 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronBallman,
I have added a check to verify SYCL vs OpenCL spelling. I did not add any test for __attribute__((reqd_work_group_size)) since we can not use [[intel::num_simd_work_items]] and _attribute__((reqd_work_group_size)) together here

Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
@smanna12 smanna12 requested a review from AaronBallman March 12, 2021 14:43
// can be evenly divided by the num_simd_work_items attribute.
struct TRIFuncObjBad1 {
[[intel::num_simd_work_items(3)]] // expected-error{{'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute}}
[[intel::reqd_work_group_size(5, 3, 3)]] // expected-note{{conflicting attribute is here}}
[[intel::reqd_work_group_size(5, 3, 5)]] // expected-note{{conflicting attribute is here}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change the examples to something like (3, 3, 5) so we know which value the complaint is about? With (5, 3, 5) it is ambiguous as to which '5' the '3' does not divide. This comment applies to many of the cases in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Soumi Manna <[email protected]>
@smanna12 smanna12 requested a review from premanandrao March 12, 2021 18:11
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with the changes. @elizabethandrews ?

@@ -2421,7 +2421,7 @@ device kernel, the attribute is not ignored and it is propagated to the kernel.
If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size``
attribute is specified on a declaration along with a
intel::num_simd_work_items attribute, the work group size attribute
argument (the first argument) must be evenly divisible by the argument specified
argument (the last argument) must be evenly divisible by the argument specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation also needs to be updated for the OpenCL vs SYCL confusion. This would be a great place for a big warning to users because they're definitely going to be surprised by this SYCL design choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) {
int64_t NumSimdWorkItems =
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue();

if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) {
unsigned checkWorkGroupSize =
AL.getSyntax() == AttributeCommonInfo::AS_GNU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document clearly that the GNU spelling of this attribute maps to the OpenCL semantics (because otherwise it's extremely unclear as to which semantics you get since there's no vendor information).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this code misses the [[cl::reqd_work_group_size]] spelling. We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.

Copy link
Contributor Author

@smanna12 smanna12 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.

@AaronBallman , just want to check with you that how can we add tests for GNU spelling here? Am i missing anything?

we can not use [[intel::num_simd_work_items]] and __attribute__((reqd_work_group_size)) together here to verify workgroup size (first argument) divides by num_simd_work_items.

We have __attribute__((reqd_work_group_size)) and __attribute__((num_simd_work_items)) in downstream as OpenCL spelling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I'm not certain I understand the questions. You're already testing for the GNU spelling? Did you mean the [[cl:reqd_work_group_size]] spelling? You can do something like (AL.getSyntax() == AttributeCommonInfo::AS_CXX11 || AL.getSyntax() == AttributeCommonInfo::AS_C2x) && AL.hasScope() && AL.getScopeName()->isStr("cl") to get you there.

Copy link
Contributor Author

@smanna12 smanna12 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i meant to say __attribute__((reqd_work_group_size)) - another GNU spelling.
[[cl:reqd_work_group_size]] - this is not the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests we need are:

// Merging cases.
[[intel::num_simd_work_items(2)]] void func1();
__attribute__((reqd_work_group_size(3,4,4))) void func1(); // Should diagnose

[[intel::num_simd_work_items(2)]] void func2();
[[cl::reqd_work_group_size(3, 4, 4)]] void func2(); // Should diagnose

[[intel::num_simd_work_items(2)]] void func3();
[[intel::reqd_work_group_size(3, 4, 4)]] void func3(); // Shockingly OK!

// Direct cases.
[[intel::num_simd_work_items(2)]] __attribute__((reqd_work_group_size(3,4,4))) void func1(); // Should diagnose
[[intel::num_simd_work_items(2)]] [[cl::reqd_work_group_size(3, 4, 4)]] void func2(); // Should diagnose

[[intel::num_simd_work_items(2)]] [[intel::reqd_work_group_size(3, 4, 4)]] void func3(); // Shockingly OK!

// It'd probably be good to add template cases along these same lines as well.

Or am I misunderstanding still? (Sorry, it's been a long week and I'm feeling a bit braindead.)

Copy link
Contributor Author

@smanna12 smanna12 Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confused myself with the syntax of reqd_work_group_size attributes
kernel __attribute__((reqd_work_group_size(1,1,2))) void kernel13(){}

am I misunderstanding still?

No, you understood correctly. You answered my questions. Thanks @AaronBallman

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.

Added test cases, Done.

if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) {
int64_t NumSimdWorkItems =
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue();

if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) {
unsigned checkWorkGroupSize =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned checkWorkGroupSize =
unsigned CheckWorkGroupSize =

per usual coding conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Optional<llvm::APSInt> ZDimVal = DeclAttr->getZDimVal(Context);

llvm::APSInt WorkGroupSize =
DeclAttr->getSyntax() == AttributeCommonInfo::AS_GNU ? *XDimVal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better done by adding an Accessor to the attribute in Attr.td. e.g.,

let Accessors = [Accessor<"usesOpenCLArgOrdering", [GNU<"reqd_work_group_size">, CXX11<"cl","reqd_work_group_size">]>];

then you can do DeclAttr->usesOpenCLArgOrdering() here. That will also fix the issue with the C++ OpenCL spelling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
@@ -2418,26 +2418,44 @@ device kernel, the attribute is not ignored and it is propagated to the kernel.
[[intel::num_simd_work_items(N)]] void operator()() const {}
};

If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear if we rearranged the changes to the docs. My recommendation is:

The docs for reqd_work_group_size is where most of these details should live. Those docs should be updated to explain what the three arguments are and that the argument which specifies the "work group size" differs depending on which spelling of the attribute is used. If possible, we should put an anchor in the docs for the work group size so that later links can direct users to that specific anchor.

The documentation that refers to reqd_work_group_size (like here) can then say "If the reqd_work_group_size attribute is specified on a declaration along with a intel::num_simd_work_items attribute, the requested work group size argument must be evenly divisible by the argument specified in the num_simd_work_items attribute.

This way, we keep all the "we made bad decisions about the argument ordering" words are all in one place (instead of repeated) and can refer to the oddity from whatever other attributes have this sort of interaction with the work group size.


if (*XDimVal % ArgVal != 0) {
llvm::APSInt WorkGroupSize =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have a testing hole here with a bug hiding in it -- won't this wind up with ZDimVal being llvm::None because the attribute has not yet been instantiated?

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.

const auto *DeclXExpr = dyn_cast<ConstantExpr>(DeclAttr->getXDim());
const auto *DeclZExpr = dyn_cast<ConstantExpr>(DeclAttr->getZDim());

llvm::APSInt WorkGroupSize = DeclAttr->usesOpenCLArgOrdering()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here -- I think this could crash because the work group size or the merge expression hasn't been evaluated yet (so we'd dereference a null pointer). I think on something like this:

template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();

template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();

Copy link
Contributor Author

@smanna12 smanna12 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried couple of examples below, no crash happens with current patch:

template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();

template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();

int main() {
  func<3, 3, 5, 3>();
}

This does not crash.

test.cpp:5:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(N)]] void func();
  ^
test.cpp:8:3: note: in instantiation of function template specialization 'func<3, 3, 5, 3>' requested here
  func<3, 3, 5, 3>();
  ^
test.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]] void func();
  ^
1 error generated.

Same for other test case. No crash happens with current patch:

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.

int main() {
  func<3, 3, 5>();
}

test1.cpp:6:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.
  ^
test1.cpp:9:3: note: in instantiation of function template specialization 'func<3, 3, 5>' requested here
  func<3, 3, 5>();
  ^
test1.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]]
  ^
1 error generated.

Am i missing something here? Did you mean to test something different examples here?

Copy link
Contributor Author

@smanna12 smanna12 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would crash. I think you meant to test this example. The crash happens here because the work group size or the merge expression hasn't been evaluated yet for existing implementation of reqd_work_group_size attribute( plan to refactor this attribute later - outside of this PR)

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func();

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

int main() {
func<3, 3, 5>();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible my test cases weren't quite right. The cases I'm worried about are when either the arg expression from num_simd_work_items is null (because it's not a constant expression we've evaluated yet) or the work group size argument from reqd_work_group_size is null (for the same reason).

(Note how we're checking for a null MergeExpr a few lines higher in this same function.)


struct func1 {
[[intel::num_simd_work_items(4)]]
[[cl::reqd_work_group_size(4, 5, 5)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When possible, do not pick 2 numbers the same.
I know in the SYCL spec there are a lot of such bad examples... :-(
What about 3 different prime numbers?

@smanna12 smanna12 closed this May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants