-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][FPGA][NFC] Fix num_simd_work_items and reqd_work_group_size argument check #3337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gument check The order of the arguments to reqd_work_group_size in SYCL and OpenCL are reversed. In the OpenCL syntax, reqd_work_group_size(X,Y,Z), X is the index that increments the fastest, followed by Y, then Z. If we preserve that definition of X, Y, and Z, then in SYCL the attribute is specified as reqd_work_group_size(Z,Y,X). This gets more complicated in the case where fewer than three arguments are provided. In such a case it's not the "X" argument that is omitted (by the definitions of X, Y, and Z above) it's the Z dimension that's omitted. So the two argument version has the syntax reqd_work_group_size(Y,X). Similarly, the one argument variant has syntax reqd_work_group_size(X). This patches fixes the problem on intel#3275 where we checked the wrong argument. In SYCL, the num_simd_work_items must evenly divide the last argument to reqd_work_group_size regardless of how many arguments req_work_group_size has. Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
@@ -3151,11 +3151,14 @@ static void handleWorkGroupSize(Sema &S, Decl *D, const ParsedAttr &AL) { | |||
return; | |||
ZDimExpr = ZDim.get(); | |||
|
|||
// If the declaration has an [[intel::num_simd_work_items()]] attribute, | |||
// check to see if the last argument can be evenly divided by the | |||
// num_simd_work_items attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// num_simd_work_items attribute | |
// num_simd_work_items attribute. |
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) { | ||
int64_t NumSimdWorkItems = | ||
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue(); | ||
|
||
if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) { | ||
if (ZDimVal.getZExtValue() % NumSimdWorkItems != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is called for __attribute__((reqd_work_group_size))
as well as [[intel::reqd_work_group_size]]
-- aren't X and Z only swapped for the intel attribute and not the OpenCL ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronBallman,
I have added a check to verify SYCL vs OpenCL spelling. I did not add any test for __attribute__((reqd_work_group_size))
since we can not use [[intel::num_simd_work_items]]
and _attribute__((reqd_work_group_size))
together here
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
// can be evenly divided by the num_simd_work_items attribute. | ||
struct TRIFuncObjBad1 { | ||
[[intel::num_simd_work_items(3)]] // expected-error{{'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute}} | ||
[[intel::reqd_work_group_size(5, 3, 3)]] // expected-note{{conflicting attribute is here}} | ||
[[intel::reqd_work_group_size(5, 3, 5)]] // expected-note{{conflicting attribute is here}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please change the examples to something like (3, 3, 5) so we know which value the complaint is about? With (5, 3, 5) it is ambiguous as to which '5' the '3' does not divide. This comment applies to many of the cases in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Soumi Manna <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the changes. @elizabethandrews ?
@@ -2421,7 +2421,7 @@ device kernel, the attribute is not ignored and it is propagated to the kernel. | |||
If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size`` | |||
attribute is specified on a declaration along with a | |||
intel::num_simd_work_items attribute, the work group size attribute | |||
argument (the first argument) must be evenly divisible by the argument specified | |||
argument (the last argument) must be evenly divisible by the argument specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation also needs to be updated for the OpenCL vs SYCL confusion. This would be a great place for a big warning to users because they're definitely going to be surprised by this SYCL design choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) { | ||
int64_t NumSimdWorkItems = | ||
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue(); | ||
|
||
if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) { | ||
unsigned checkWorkGroupSize = | ||
AL.getSyntax() == AttributeCommonInfo::AS_GNU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document clearly that the GNU spelling of this attribute maps to the OpenCL semantics (because otherwise it's extremely unclear as to which semantics you get since there's no vendor information).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this code misses the [[cl::reqd_work_group_size]]
spelling. We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.
@AaronBallman , just want to check with you that how can we add tests for GNU spelling here? Am i missing anything?
we can not use [[intel::num_simd_work_items]]
and __attribute__((reqd_work_group_size))
together here to verify workgroup size (first argument) divides by num_simd_work_items.
We have __attribute__((reqd_work_group_size))
and __attribute__((num_simd_work_items))
in downstream as OpenCL spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I'm not certain I understand the questions. You're already testing for the GNU spelling? Did you mean the [[cl:reqd_work_group_size]]
spelling? You can do something like (AL.getSyntax() == AttributeCommonInfo::AS_CXX11 || AL.getSyntax() == AttributeCommonInfo::AS_C2x) && AL.hasScope() && AL.getScopeName()->isStr("cl")
to get you there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry i meant to say __attribute__((reqd_work_group_size))
- another GNU spelling.
[[cl:reqd_work_group_size]] - this is not the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests we need are:
// Merging cases.
[[intel::num_simd_work_items(2)]] void func1();
__attribute__((reqd_work_group_size(3,4,4))) void func1(); // Should diagnose
[[intel::num_simd_work_items(2)]] void func2();
[[cl::reqd_work_group_size(3, 4, 4)]] void func2(); // Should diagnose
[[intel::num_simd_work_items(2)]] void func3();
[[intel::reqd_work_group_size(3, 4, 4)]] void func3(); // Shockingly OK!
// Direct cases.
[[intel::num_simd_work_items(2)]] __attribute__((reqd_work_group_size(3,4,4))) void func1(); // Should diagnose
[[intel::num_simd_work_items(2)]] [[cl::reqd_work_group_size(3, 4, 4)]] void func2(); // Should diagnose
[[intel::num_simd_work_items(2)]] [[intel::reqd_work_group_size(3, 4, 4)]] void func3(); // Shockingly OK!
// It'd probably be good to add template cases along these same lines as well.
Or am I misunderstanding still? (Sorry, it's been a long week and I'm feeling a bit braindead.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confused myself with the syntax of reqd_work_group_size attributes
kernel __attribute__((reqd_work_group_size(1,1,2))) void kernel13(){}
am I misunderstanding still?
No, you understood correctly. You answered my questions. Thanks @AaronBallman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be sure to add tests for both the GNU and CXX spellings so we don't accidentally regress this sort of thing in the future.
Added test cases, Done.
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) { | ||
int64_t NumSimdWorkItems = | ||
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue(); | ||
|
||
if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) { | ||
unsigned checkWorkGroupSize = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned checkWorkGroupSize = | |
unsigned CheckWorkGroupSize = |
per usual coding conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
Optional<llvm::APSInt> ZDimVal = DeclAttr->getZDimVal(Context); | ||
|
||
llvm::APSInt WorkGroupSize = | ||
DeclAttr->getSyntax() == AttributeCommonInfo::AS_GNU ? *XDimVal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better done by adding an Accessor
to the attribute in Attr.td. e.g.,
let Accessors = [Accessor<"usesOpenCLArgOrdering", [GNU<"reqd_work_group_size">, CXX11<"cl","reqd_work_group_size">]>];
then you can do DeclAttr->usesOpenCLArgOrdering()
here. That will also fix the issue with the C++ OpenCL spelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
Signed-off-by: Soumi Manna <[email protected]>
@@ -2418,26 +2418,44 @@ device kernel, the attribute is not ignored and it is propagated to the kernel. | |||
[[intel::num_simd_work_items(N)]] void operator()() const {} | |||
}; | |||
|
|||
If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more clear if we rearranged the changes to the docs. My recommendation is:
The docs for reqd_work_group_size
is where most of these details should live. Those docs should be updated to explain what the three arguments are and that the argument which specifies the "work group size" differs depending on which spelling of the attribute is used. If possible, we should put an anchor in the docs for the work group size so that later links can direct users to that specific anchor.
The documentation that refers to reqd_work_group_size
(like here) can then say "If the reqd_work_group_size
attribute is specified on a declaration along with a intel::num_simd_work_items
attribute, the requested work group size argument must be evenly divisible by the argument specified in the num_simd_work_items
attribute.
This way, we keep all the "we made bad decisions about the argument ordering" words are all in one place (instead of repeated) and can refer to the oddity from whatever other attributes have this sort of interaction with the work group size.
|
||
if (*XDimVal % ArgVal != 0) { | ||
llvm::APSInt WorkGroupSize = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have a testing hole here with a bug hiding in it -- won't this wind up with ZDimVal
being llvm::None
because the attribute has not yet been instantiated?
template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();
template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.
const auto *DeclXExpr = dyn_cast<ConstantExpr>(DeclAttr->getXDim()); | ||
const auto *DeclZExpr = dyn_cast<ConstantExpr>(DeclAttr->getZDim()); | ||
|
||
llvm::APSInt WorkGroupSize = DeclAttr->usesOpenCLArgOrdering() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here -- I think this could crash because the work group size or the merge expression hasn't been evaluated yet (so we'd dereference a null pointer). I think on something like this:
template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();
template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried couple of examples below, no crash happens with current patch:
template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();
template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();
int main() {
func<3, 3, 5, 3>();
}
This does not crash.
test.cpp:5:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(N)]] void func();
^
test.cpp:8:3: note: in instantiation of function template specialization 'func<3, 3, 5, 3>' requested here
func<3, 3, 5, 3>();
^
test.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]] void func();
^
1 error generated.
Same for other test case. No crash happens with current patch:
template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();
template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.
int main() {
func<3, 3, 5>();
}
test1.cpp:6:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.
^
test1.cpp:9:3: note: in instantiation of function template specialization 'func<3, 3, 5>' requested here
func<3, 3, 5>();
^
test1.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]]
^
1 error generated.
Am i missing something here? Did you mean to test something different examples here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would crash. I think you meant to test this example. The crash happens here because the work group size or the merge expression hasn't been evaluated yet for existing implementation of reqd_work_group_size attribute( plan to refactor this attribute later - outside of this PR)
template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func();
template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();
int main() {
func<3, 3, 5>();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible my test cases weren't quite right. The cases I'm worried about are when either the arg expression from num_simd_work_items
is null (because it's not a constant expression we've evaluated yet) or the work group size argument from reqd_work_group_size
is null (for the same reason).
(Note how we're checking for a null MergeExpr
a few lines higher in this same function.)
|
||
struct func1 { | ||
[[intel::num_simd_work_items(4)]] | ||
[[cl::reqd_work_group_size(4, 5, 5)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When possible, do not pick 2 numbers the same.
I know in the SYCL spec there are a lot of such bad examples... :-(
What about 3 different prime numbers?
From Bug: #3252
The order of the arguments to reqd_work_group_size in SYCL and OpenCL are reversed.
In the OpenCL syntax, reqd_work_group_size(X,Y,Z), X is the index that increments the fastest,
followed by Y, then Z. If we preserve that definition of X, Y, and Z, then in SYCL the attribute
is specified as reqd_work_group_size(Z,Y,X). This gets more complicated in the case where fewer than
three arguments are provided. In such a case it's not the "X" argument that is omitted
(by the definitions of X, Y, and Z above) it's the Z dimension that's omitted. So the two argument
version has the syntax reqd_work_group_size(Y,X). Similarly, the one argument variant has syntax
reqd_work_group_size(X).
This patches fixes the problem on #3275 where we checked the wrong argument.
In SYCL, the num_simd_work_items must evenly divide the last argument to reqd_work_group_size regardless of how many arguments req_work_group_size has.
Signed-off-by: Soumi Manna [email protected]