Skip to content

[SYCL][FPGA][NFC] Fix num_simd_work_items and reqd_work_group_size argument check #3337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -2793,6 +2793,9 @@ def ReqdWorkGroupSize : InheritableAttr {
ExprArgument<"YDim", /*optional*/1>,
ExprArgument<"ZDim", /*optional*/1>];
let Subjects = SubjectList<[Function], ErrorDiag>;
let Accessors = [Accessor<"usesOpenCLArgOrdering",
[GNU<"reqd_work_group_size">,
CXX11<"cl","reqd_work_group_size">]>];
let AdditionalMembers = [{
ArrayRef<const Expr *> dimensions() const {
return {getXDim(), getYDim(), getZDim()};
Expand Down
30 changes: 24 additions & 6 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -2418,26 +2418,44 @@ device kernel, the attribute is not ignored and it is propagated to the kernel.
[[intel::num_simd_work_items(N)]] void operator()() const {}
};

If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear if we rearranged the changes to the docs. My recommendation is:

The docs for reqd_work_group_size is where most of these details should live. Those docs should be updated to explain what the three arguments are and that the argument which specifies the "work group size" differs depending on which spelling of the attribute is used. If possible, we should put an anchor in the docs for the work group size so that later links can direct users to that specific anchor.

The documentation that refers to reqd_work_group_size (like here) can then say "If the reqd_work_group_size attribute is specified on a declaration along with a intel::num_simd_work_items attribute, the requested work group size argument must be evenly divisible by the argument specified in the num_simd_work_items attribute.

This way, we keep all the "we made bad decisions about the argument ordering" words are all in one place (instead of repeated) and can refer to the oddity from whatever other attributes have this sort of interaction with the work group size.

attribute is specified on a declaration along with a
intel::num_simd_work_items attribute, the work group size attribute
argument (the first argument) must be evenly divisible by the argument specified
If the ``intel::reqd_work_group_size`` attribute is specified on a declaration
along with a ``intel::num_simd_work_items`` attribute, the work group size
(the last argument) must be evenly divisible by the argument specified
in the ``intel::num_simd_work_items`` attribute.

.. code-block:: c++

struct func {
[[intel::num_simd_work_items(4)]]
[[intel::reqd_work_group_size(64, 64, 64)]]
[[intel::reqd_work_group_size(5, 5, 64)]]
void operator()() const {}
};

struct bar {
[[intel::reqd_work_group_size(64, 64, 64)]]
[[intel::reqd_work_group_size(5, 5, 64)]]
[[intel::num_simd_work_items(4)]]
void operator()() const {}
};

If the ``cl::reqd_work_group_size`` or ``__attribute__((reqd_work_group_size()))``
attribute is specified on a declaration along with a ``intel::num_simd_work_items``
attribute, the work group size (the first argument) must be evenly divisible by
the argument specified in the ``intel::num_simd_work_items`` attribute.
GNU and ``cl::reqd_work_group_size`` spelling of ReqdWorkGroupSizeAttr maps to
the OpenCL semantic. The first and last argument are only swapped for the Intel
attributes in SYCL and not the OpenCL ones.

.. code-block:: c++

struct func1 {
[[intel::num_simd_work_items(4)]]
[[cl::reqd_work_group_size(4, 5, 5)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When possible, do not pick 2 numbers the same.
I know in the SYCL spec there are a lot of such bad examples... :-(
What about 3 different prime numbers?

void operator()() const {}
};

[[intel::num_simd_work_items(2)]]
__attribute__((reqd_work_group_size(4,3,3))) void bar();

}];
}

Expand Down
65 changes: 59 additions & 6 deletions clang/lib/Sema/SemaDeclAttr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3152,11 +3152,28 @@ static void handleWorkGroupSize(Sema &S, Decl *D, const ParsedAttr &AL) {
return;
ZDimExpr = ZDim.get();

// If the declaration has an [[intel::num_simd_work_items()]] attribute,
// check to see if the first argument of [[cl::reqd_work_group_size()]]
// or __attribute__((reqd_work_group_size)) attribute and last argument
// of [intel::reqd_work_group_size()]] attribute can be evenly divided by
// the num_simd_work_items attribute. GNU and [[cl::reqd_work_group_size]]
// spelling of ReqdWorkGroupSizeAttr maps to the OpenCL
// semantics. First and last argument are only swapped for the Intel
// atributes in SYCL and not the OpenCL ones.
if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) {
int64_t NumSimdWorkItems =
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue();

if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) {
bool UsesOpenCLArgOrdering =
((AL.getSyntax() == AttributeCommonInfo::AS_CXX11 ||
AL.getSyntax() == AttributeCommonInfo::AS_C2x) &&
AL.hasScope() && AL.getScopeName()->isStr("cl")) ||
AL.getSyntax() == AttributeCommonInfo::AS_GNU;

unsigned WorkGroupSize = UsesOpenCLArgOrdering ? XDimVal.getZExtValue()
: ZDimVal.getZExtValue();

if (WorkGroupSize % NumSimdWorkItems != 0) {
S.Diag(A->getLocation(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< A << AL;
S.Diag(AL.getLoc(), diag::note_conflicting_attribute);
Expand Down Expand Up @@ -3302,14 +3319,24 @@ void Sema::AddSYCLIntelNumSimdWorkItemsAttr(Decl *D,
return;
}
}

// If the declaration has an [[intel::reqd_work_group_size]] attribute,
// check to see if the first argument can be evenly divided by the
// num_simd_work_items attribute.
// If the declaration has a [[intel::reqd_work_group_size()]]
// attribute, check to see if the last argument can be evenly
// divided by the num_simd_work_items attribute.
// If the declaration has a __attribute__((reqd_work_group_size))
// or [[cl::reqd_work_group_size()]] attribute, check to see if the
// first argument can be evenly divided by the num_simd_work_items
// attribute. GNU and [[cl::reqd_work_group_size()]] spelling of
// ReqdWorkGroupSizeAttr maps to the OpenCL semantics. First and last
// argument are only swapped for the Intel atributes in SYCL and not
// the OpenCL ones.
if (const auto *DeclAttr = D->getAttr<ReqdWorkGroupSizeAttr>()) {
Optional<llvm::APSInt> XDimVal = DeclAttr->getXDimVal(Context);
Optional<llvm::APSInt> ZDimVal = DeclAttr->getZDimVal(Context);

if (*XDimVal % ArgVal != 0) {
llvm::APSInt WorkGroupSize =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have a testing hole here with a bug hiding in it -- won't this wind up with ZDimVal being llvm::None because the attribute has not yet been instantiated?

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.

DeclAttr->usesOpenCLArgOrdering() ? *XDimVal : *ZDimVal;

if (WorkGroupSize % ArgVal != 0) {
Diag(CI.getLoc(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< CI << DeclAttr;
Diag(DeclAttr->getLocation(), diag::note_conflicting_attribute);
Expand All @@ -3335,6 +3362,32 @@ SYCLIntelNumSimdWorkItemsAttr *Sema::MergeSYCLIntelNumSimdWorkItemsAttr(
return nullptr;
}
}
// If the declaration has a [[intel::reqd_work_group_size()]]
// attribute, check to see if the last argument can be evenly
// divided by the num_simd_work_items attribute.
// If the declaration has a __attribute__((reqd_work_group_size))
// or [[cl::reqd_work_group_size()]] attribute, check to see if the
// first argument can be evenly divided by the num_simd_work_items
// attribute. GNU and [[cl::reqd_work_group_size()]] spelling of
// ReqdWorkGroupSizeAttr maps to the OpenCL semantics. First and last
// argument are only swapped for the Intel atributes in SYCL and not
// the OpenCL ones.
if (const auto *DeclAttr = D->getAttr<ReqdWorkGroupSizeAttr>()) {
const auto *MergeExpr = dyn_cast<ConstantExpr>(A.getValue());
const auto *DeclXExpr = dyn_cast<ConstantExpr>(DeclAttr->getXDim());
const auto *DeclZExpr = dyn_cast<ConstantExpr>(DeclAttr->getZDim());

llvm::APSInt WorkGroupSize = DeclAttr->usesOpenCLArgOrdering()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here -- I think this could crash because the work group size or the merge expression hasn't been evaluated yet (so we'd dereference a null pointer). I think on something like this:

template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();

template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();

Copy link
Contributor Author

@smanna12 smanna12 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried couple of examples below, no crash happens with current patch:

template <int X, int Y, int Z, int N>
[[intel::reqd_work_group_size(X, Y, Z)]] void func();

template <int X, int Y, int Z, int N>
[[intel::num_simd_work_items(N)]] void func();

int main() {
  func<3, 3, 5, 3>();
}

This does not crash.

test.cpp:5:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(N)]] void func();
  ^
test.cpp:8:3: note: in instantiation of function template specialization 'func<3, 3, 5, 3>' requested here
  func<3, 3, 5, 3>();
  ^
test.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]] void func();
  ^
1 error generated.

Same for other test case. No crash happens with current patch:

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.

int main() {
  func<3, 3, 5>();
}

test1.cpp:6:3: error: 'num_simd_work_items' attribute must evenly divide the work-group size for the 'reqd_work_group_size' attribute
[[intel::num_simd_work_items(3)]] void func(); // When we go to process this attribute.
  ^
test1.cpp:9:3: note: in instantiation of function template specialization 'func<3, 3, 5>' requested here
  func<3, 3, 5>();
  ^
test1.cpp:2:3: note: conflicting attribute is here
[[intel::reqd_work_group_size(X, Y, Z)]]
  ^
1 error generated.

Am i missing something here? Did you mean to test something different examples here?

Copy link
Contributor Author

@smanna12 smanna12 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would crash. I think you meant to test this example. The crash happens here because the work group size or the merge expression hasn't been evaluated yet for existing implementation of reqd_work_group_size attribute( plan to refactor this attribute later - outside of this PR)

template <int X, int Y, int Z>
[[intel::num_simd_work_items(3)]] void func();

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]]
void func();

int main() {
func<3, 3, 5>();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible my test cases weren't quite right. The cases I'm worried about are when either the arg expression from num_simd_work_items is null (because it's not a constant expression we've evaluated yet) or the work group size argument from reqd_work_group_size is null (for the same reason).

(Note how we're checking for a null MergeExpr a few lines higher in this same function.)

? DeclXExpr->getResultAsAPSInt()
: DeclZExpr->getResultAsAPSInt();

if (WorkGroupSize % MergeExpr->getResultAsAPSInt() != 0) {
Diag(A.getLoc(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< &A << DeclAttr;
Diag(DeclAttr->getLocation(), diag::note_conflicting_attribute);
return nullptr;
}
}
return ::new (Context)
SYCLIntelNumSimdWorkItemsAttr(Context, A, A.getValue());
}
Expand Down
Loading