Skip to content

Commit 25f12ff

Browse files
committed
Merge branch 'sycl' into llvmspirv_pulldown
2 parents a159325 + ef62cad commit 25f12ff

File tree

56 files changed

+1080
-991
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+1080
-991
lines changed

clang/include/clang/Basic/Attr.td

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1704,20 +1704,14 @@ def SYCLIntelMaxWorkGroupSize : InheritableAttr {
17041704
let LangOpts = [SYCLIsDevice, SilentlyIgnoreSYCLIsHost];
17051705
let Subjects = SubjectList<[Function], ErrorDiag>;
17061706
let AdditionalMembers = [{
1707-
std::optional<llvm::APSInt> getXDimVal() const {
1708-
if (const auto *CE = dyn_cast<ConstantExpr>(getXDim()))
1709-
return CE->getResultAsAPSInt();
1710-
return std::nullopt;
1707+
unsigned getXDimVal() const {
1708+
return cast<ConstantExpr>(getXDim())->getResultAsAPSInt().getExtValue();
17111709
}
1712-
std::optional<llvm::APSInt> getYDimVal() const {
1713-
if (const auto *CE = dyn_cast<ConstantExpr>(getYDim()))
1714-
return CE->getResultAsAPSInt();
1715-
return std::nullopt;
1710+
unsigned getYDimVal() const {
1711+
return cast<ConstantExpr>(getYDim())->getResultAsAPSInt().getExtValue();
17161712
}
1717-
std::optional<llvm::APSInt> getZDimVal() const {
1718-
if (const auto *CE = dyn_cast<ConstantExpr>(getZDim()))
1719-
return CE->getResultAsAPSInt();
1720-
return std::nullopt;
1713+
unsigned getZDimVal() const {
1714+
return cast<ConstantExpr>(getZDim())->getResultAsAPSInt().getExtValue();
17211715
}
17221716
}];
17231717
let Documentation = [SYCLIntelMaxWorkGroupSizeAttrDocs];

clang/include/clang/Basic/AttrDocs.td

Lines changed: 42 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -2980,18 +2980,32 @@ argument to **clEnqueueNDRangeKernel** (in OpenCL) or to
29802980
**parallel_for** in SYCL. This allows the compiler to optimize the
29812981
generated code appropriately for the kernel to which attribute is applied.
29822982

2983-
While semantic of this attribute is the same between OpenCL and SYCL,
2984-
spelling is a bit different:
2985-
2986-
SYCL 2020 describes the ``[[sycl::reqd_work_group_size(X, Y, Z)]]`` spelling
2987-
in detail. This attribute indicates that the kernel must be launched with the
2988-
specified work-group size. The order of the arguments matches the constructor
2989-
of the group class. Each argument to the attribute must be an integral constant
2990-
expression. The dimensionality of the attribute variant used must match the
2991-
dimensionality of the work-group used to invoke the kernel. This spelling
2992-
allows the Y and Z arguments to be optional. If not provided by the user, the
2993-
value of Y and Z defaults to 1. See section 5.8.1 Kernel Attributes for more
2994-
details.
2983+
The arguments to ``reqd_work_group_size`` are ordered based on which index
2984+
increments the fastest. In OpenCL, the first argument is the index that
2985+
increments the fastest. In SYCL, the last argument is the index that increments
2986+
the fastest.
2987+
2988+
In OpenCL C, this attribute is available with the GNU spelling
2989+
(``__attribute__((reqd_work_group_size(X, Y, Z)))``) and all three arguments
2990+
are required. See section 6.7.2 Optional Attribute Qualifiers of OpenCL 1.2
2991+
specification for details.
2992+
2993+
.. code-block:: c++
2994+
2995+
__kernel __attribute__((reqd_work_group_size(8, 16, 32))) void test() {}
2996+
2997+
In SYCL, the attribute accepts either one, two, or three arguments; in each
2998+
form, the last (or only) argument is the index that increments fastest. The
2999+
number of arguments passed to the attribute must match the dimensionality of
3000+
the kernel the attribute is applied to.
3001+
3002+
SYCL 2020 describes the ``[[sycl::reqd_work_group_size(dim0, dim1, dim2)]]``
3003+
spelling in detail. This attribute indicates that the kernel must be launched
3004+
with the specified work-group size. The order of the arguments matches the
3005+
constructor of the ``range`` class. Each argument to the attribute must be an
3006+
integral constant expression. The dimensionality of the attribute variant used
3007+
must match the dimensionality of the work-group used to invoke the kernel. See
3008+
section 5.8.1 Kernel Attributes for more details.
29953009

29963010
In SYCL 1.2.1 mode, the ``cl::reqd_work_group_size`` and
29973011
``sycl::reqd_work_group_size`` attributes are propagated from the function they
@@ -3016,18 +3030,9 @@ attributes are not propagated to the kernel.
30163030
template <int N, int N1, int N2>
30173031
[[sycl::reqd_work_group_size(N, N1, N2)]] void func() {}
30183032

3019-
The ``[[cl::reqd_work_group_size(X, Y, Z)]]`` and
3020-
``__attribute__((reqd_work_group_size(X, Y, Z)))`` spellings are both
3033+
The ``[[cl::reqd_work_group_size(dim0, dim1, dim2)]]`` and
3034+
``__attribute__((reqd_work_group_size(dim0, dim1, dim2)))`` spellings are both
30213035
deprecated in SYCL 2020.
3022-
3023-
In OpenCL C, this attribute is available with the GNU spelling
3024-
(``__attribute__((reqd_work_group_size(X, Y, Z)))``), see section
3025-
6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 specification for details.
3026-
3027-
.. code-block:: c++
3028-
3029-
__kernel __attribute__((reqd_work_group_size(8, 16, 32))) void test() {}
3030-
30313036
}];
30323037
}
30333038

@@ -3041,6 +3046,15 @@ unsigned. The number of dimensional values defined provide additional
30413046
information to the compiler on the dimensionality most likely to be used when
30423047
launching the kernel at runtime.
30433048

3049+
The arguments to ``work_group_size_hint`` are ordered based on which index
3050+
increments the fastest. In OpenCL, the first argument is the index that
3051+
increments the fastest. In SYCL, the last argument is the index that increments
3052+
the fastest.
3053+
3054+
In OpenCL C, this attribute is available with the GNU spelling
3055+
(``__attribute__((work_group_size_hint(X, Y, Z)))``) and all three arguments
3056+
are required.
3057+
30443058
The GNU spelling is deprecated in SYCL mode.
30453059

30463060
.. code-block:: c++
@@ -3052,15 +3066,6 @@ The GNU spelling is deprecated in SYCL mode.
30523066
[[sycl::work_group_size_hint(2, 2, 2)]] void operator()() const {}
30533067
};
30543068

3055-
The arguments to ``work_group_size_hint`` are ordered based on which index
3056-
increments the fastest. In OpenCL, the first argument is the index that
3057-
increments the fastest, and in SYCL, the last argument is the index that
3058-
increments the fastest.
3059-
3060-
In OpenCL C, this attribute is available with the GNU spelling
3061-
(``__attribute__((work_group_size_hint(X, Y, Z)))``) and all
3062-
three arguments are required.
3063-
30643069
In SYCL, the attribute accepts either one, two, or three arguments; in each
30653070
form, the last (or only) argument is the index that increments fastest. The
30663071
number of arguments passed to the attribute must match the dimensionality of
@@ -3077,9 +3082,11 @@ def SYCLIntelMaxWorkGroupSizeAttrDocs : Documentation {
30773082
let Heading = "intel::max_work_group_size";
30783083
let Content = [{
30793084
Applies to a device function/lambda function. Indicates the maximum dimensions
3080-
of a work group. Values must be positive integers. This is similar to
3081-
reqd_work_group_size, but allows work groups that are smaller or equal to the
3082-
specified sizes.
3085+
of a work group. Values must be positive integers. This attribute behaves
3086+
similarly to ``reqd_work_group_size``, but allows work groups that are smaller
3087+
or equal to the specified sizes. The dimensionality behaves the same as with
3088+
the SYCL ``reqd_work_group_size`` attribute, but *all* dimensions must be
3089+
provided.
30833090

30843091
In SYCL 1.2.1 mode, the ``intel::max_work_group_size`` attribute is propagated
30853092
from the function it is applied to onto the kernel which calls the function.

clang/lib/CodeGen/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ set(LLVM_LINK_COMPONENTS
2929
ScalarOpts
3030
Support
3131
SYCLLowerIR
32+
SYCLNativeCPUUtils
3233
Target
3334
TargetParser
3435
TransformUtils

clang/lib/CodeGen/CodeGenFunction.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -826,9 +826,9 @@ void CodeGenFunction::EmitKernelMetadata(const FunctionDecl *FD,
826826
// Attributes arguments (first and third) are reversed on SYCLDevice.
827827
if (getLangOpts().SYCLIsDevice) {
828828
llvm::Metadata *AttrMDArgs[] = {
829-
llvm::ConstantAsMetadata::get(Builder.getInt(*A->getZDimVal())),
830-
llvm::ConstantAsMetadata::get(Builder.getInt(*A->getYDimVal())),
831-
llvm::ConstantAsMetadata::get(Builder.getInt(*A->getXDimVal()))};
829+
llvm::ConstantAsMetadata::get(Builder.getInt32(A->getZDimVal())),
830+
llvm::ConstantAsMetadata::get(Builder.getInt32(A->getYDimVal())),
831+
llvm::ConstantAsMetadata::get(Builder.getInt32(A->getXDimVal()))};
832832
Fn->setMetadata("max_work_group_size",
833833
llvm::MDNode::get(Context, AttrMDArgs));
834834
}

clang/lib/CodeGen/Targets/NVPTX.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -252,13 +252,13 @@ void NVPTXTargetCodeGenInfo::setTargetAttributes(
252252
bool HasMaxWorkGroupSize = false;
253253
bool HasMinWorkGroupPerCU = false;
254254
if (const auto *MWGS = FD->getAttr<SYCLIntelMaxWorkGroupSizeAttr>()) {
255-
auto MaxThreads = (*MWGS->getZDimVal()).getExtValue() *
256-
(*MWGS->getYDimVal()).getExtValue() *
257-
(*MWGS->getXDimVal()).getExtValue();
258-
if (MaxThreads > 0) {
259-
addNVVMMetadata(F, "maxntidx", MaxThreads);
260-
HasMaxWorkGroupSize = true;
261-
}
255+
HasMaxWorkGroupSize = true;
256+
// We must index-flip between SYCL's notation, X,Y,Z (aka dim0,dim1,dim2)
257+
// with the fastest-moving dimension rightmost, to CUDA's, where X is the
258+
// fastest-moving dimension.
259+
addNVVMMetadata(F, "maxntidx", MWGS->getZDimVal());
260+
addNVVMMetadata(F, "maxntidy", MWGS->getYDimVal());
261+
addNVVMMetadata(F, "maxntidz", MWGS->getXDimVal());
262262
}
263263

264264
auto attrValue = [&](Expr *E) {
@@ -275,7 +275,7 @@ void NVPTXTargetCodeGenInfo::setTargetAttributes(
275275
<< MWGPCU << 0;
276276
} else {
277277
// The value is guaranteed to be > 0, pass it to the metadata.
278-
addNVVMMetadata(F, "minnctapersm", attrValue(MWGPCU->getValue()));
278+
addNVVMMetadata(F, "minctasm", attrValue(MWGPCU->getValue()));
279279
HasMinWorkGroupPerCU = true;
280280
}
281281
}

clang/test/CodeGenSYCL/launch_bounds_nvptx.cpp

Lines changed: 49 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
// compute unit and maximum work groups per multi-processor attributes, that
55
// correspond to CUDA's launch bounds. Expect max_work_group_size,
66
// min_work_groups_per_cu and max_work_groups_per_mp that are mapped to
7-
// maxntidx, minnctapersm, maxclusterrank PTX directives respectively.
7+
// maxntid[xyz], minctasm, and maxclusterrank NVVM annotations respectively.
88

99
#include "sycl.hpp"
1010

@@ -13,24 +13,24 @@ queue q;
1313

1414
class Foo {
1515
public:
16-
[[intel::max_work_group_size(8, 8, 8), intel::min_work_groups_per_cu(2),
16+
[[intel::max_work_group_size(2, 4, 8), intel::min_work_groups_per_cu(2),
1717
intel::max_work_groups_per_mp(4)]] void
1818
operator()() const {}
1919
};
2020

2121
template <int N> class Functor {
2222
public:
23-
[[intel::max_work_group_size(N, 8, 8), intel::min_work_groups_per_cu(N),
23+
[[intel::max_work_group_size(N, 4, 8), intel::min_work_groups_per_cu(N),
2424
intel::max_work_groups_per_mp(N)]] void
2525
operator()() const {}
2626
};
2727

2828
template <int N>
29-
[[intel::max_work_group_size(N, 8, 8), intel::min_work_groups_per_cu(N),
29+
[[intel::max_work_group_size(N, 4, 8), intel::min_work_groups_per_cu(N),
3030
intel::max_work_groups_per_mp(N)]] void
3131
zoo() {}
3232

33-
[[intel::max_work_group_size(8, 8, 8), intel::min_work_groups_per_cu(2),
33+
[[intel::max_work_group_size(2, 4, 8), intel::min_work_groups_per_cu(2),
3434
intel::max_work_groups_per_mp(4)]] void
3535
bar() {}
3636

@@ -42,7 +42,7 @@ int main() {
4242

4343
// Test attribute is applied on lambda.
4444
h.single_task<class kernel_name2>(
45-
[] [[intel::max_work_group_size(8, 8, 8),
45+
[] [[intel::max_work_group_size(2, 4, 8),
4646
intel::min_work_groups_per_cu(2),
4747
intel::max_work_groups_per_mp(4)]] () {});
4848

@@ -65,41 +65,61 @@ int main() {
6565
// CHECK: define dso_local void @{{.*}}kernel_name4() #0 {{.*}} !min_work_groups_per_cu ![[MWGPC:[0-9]+]] !max_work_groups_per_mp ![[MWGPM:[0-9]+]] !max_work_group_size ![[MWGS:[0-9]+]]
6666
// CHECK: define dso_local void @{{.*}}kernel_name5() #0 {{.*}} !min_work_groups_per_cu ![[MWGPC_MWGPM_2:[0-9]+]] !max_work_groups_per_mp ![[MWGPC_MWGPM_2]] !max_work_group_size ![[MWGS_3:[0-9]+]]
6767

68-
// CHECK: {{.*}}@{{.*}}kernel_name1, !"maxntidx", i32 512}
69-
// CHECK: {{.*}}@{{.*}}kernel_name1, !"minnctapersm", i32 2}
68+
// CHECK: {{.*}}@{{.*}}kernel_name1, !"maxntidx", i32 8}
69+
// CHECK: {{.*}}@{{.*}}kernel_name1, !"maxntidy", i32 4}
70+
// CHECK: {{.*}}@{{.*}}kernel_name1, !"maxntidz", i32 2}
71+
// CHECK: {{.*}}@{{.*}}kernel_name1, !"minctasm", i32 2}
7072
// CHECK: {{.*}}@{{.*}}kernel_name1, !"maxclusterrank", i32 4}
71-
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"maxntidx", i32 512}
72-
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"minnctapersm", i32 2}
73+
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"maxntidx", i32 8}
74+
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"maxntidy", i32 4}
75+
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"maxntidz", i32 2}
76+
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"minctasm", i32 2}
7377
// CHECK: {{.*}}@{{.*}}Foo{{.*}}, !"maxclusterrank", i32 4}
74-
// CHECK: {{.*}}@{{.*}}kernel_name2, !"maxntidx", i32 512}
75-
// CHECK: {{.*}}@{{.*}}kernel_name2, !"minnctapersm", i32 2}
78+
// CHECK: {{.*}}@{{.*}}kernel_name2, !"maxntidx", i32 8}
79+
// CHECK: {{.*}}@{{.*}}kernel_name2, !"maxntidy", i32 4}
80+
// CHECK: {{.*}}@{{.*}}kernel_name2, !"maxntidz", i32 2}
81+
// CHECK: {{.*}}@{{.*}}kernel_name2, !"minctasm", i32 2}
7682
// CHECK: {{.*}}@{{.*}}kernel_name2, !"maxclusterrank", i32 4}
77-
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"maxntidx", i32 512}
78-
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"minnctapersm", i32 2}
83+
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"maxntidx", i32 8}
84+
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"maxntidy", i32 4}
85+
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"maxntidz", i32 2}
86+
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"minctasm", i32 2}
7987
// CHECK: {{.*}}@{{.*}}main{{.*}}, !"maxclusterrank", i32 4}
80-
// CHECK: {{.*}}@{{.*}}kernel_name3, !"maxntidx", i32 384}
81-
// CHECK: {{.*}}@{{.*}}kernel_name3, !"minnctapersm", i32 6}
88+
// CHECK: {{.*}}@{{.*}}kernel_name3, !"maxntidx", i32 8}
89+
// CHECK: {{.*}}@{{.*}}kernel_name3, !"maxntidy", i32 4}
90+
// CHECK: {{.*}}@{{.*}}kernel_name3, !"maxntidz", i32 6}
91+
// CHECK: {{.*}}@{{.*}}kernel_name3, !"minctasm", i32 6}
8292
// CHECK: {{.*}}@{{.*}}kernel_name3, !"maxclusterrank", i32 6}
83-
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"maxntidx", i32 384}
84-
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"minnctapersm", i32 6}
93+
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"maxntidx", i32 8}
94+
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"maxntidy", i32 4}
95+
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"maxntidz", i32 6}
96+
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"minctasm", i32 6}
8597
// CHECK: {{.*}}@{{.*}}Functor{{.*}}, !"maxclusterrank", i32 6}
86-
// CHECK: {{.*}}@{{.*}}kernel_name4, !"maxntidx", i32 512}
87-
// CHECK: {{.*}}@{{.*}}kernel_name4, !"minnctapersm", i32 2}
98+
// CHECK: {{.*}}@{{.*}}kernel_name4, !"maxntidx", i32 8}
99+
// CHECK: {{.*}}@{{.*}}kernel_name4, !"maxntidy", i32 4}
100+
// CHECK: {{.*}}@{{.*}}kernel_name4, !"maxntidz", i32 2}
101+
// CHECK: {{.*}}@{{.*}}kernel_name4, !"minctasm", i32 2}
88102
// CHECK: {{.*}}@{{.*}}kernel_name4, !"maxclusterrank", i32 4}
89-
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"maxntidx", i32 512}
90-
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"minnctapersm", i32 2}
103+
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"maxntidx", i32 8}
104+
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"maxntidy", i32 4}
105+
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"maxntidz", i32 2}
106+
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"minctasm", i32 2}
91107
// CHECK: {{.*}}@{{.*}}bar{{.*}}, !"maxclusterrank", i32 4}
92-
// CHECK: {{.*}}@{{.*}}kernel_name5, !"maxntidx", i32 1024}
93-
// CHECK: {{.*}}@{{.*}}kernel_name5, !"minnctapersm", i32 16}
108+
// CHECK: {{.*}}@{{.*}}kernel_name5, !"maxntidx", i32 8}
109+
// CHECK: {{.*}}@{{.*}}kernel_name5, !"maxntidy", i32 4}
110+
// CHECK: {{.*}}@{{.*}}kernel_name5, !"maxntidz", i32 16}
111+
// CHECK: {{.*}}@{{.*}}kernel_name5, !"minctasm", i32 16}
94112
// CHECK: {{.*}}@{{.*}}kernel_name5, !"maxclusterrank", i32 16}
95-
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"maxntidx", i32 1024}
96-
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"minnctapersm", i32 16}
113+
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"maxntidx", i32 8}
114+
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"maxntidy", i32 4}
115+
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"maxntidz", i32 16}
116+
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"minctasm", i32 16}
97117
// CHECK: {{.*}}@{{.*}}zoo{{.*}}, !"maxclusterrank", i32 16}
98118

99119
// CHECK: ![[MWGPC]] = !{i32 2}
100120
// CHECK: ![[MWGPM]] = !{i32 4}
101-
// CHECK: ![[MWGS]] = !{i32 8, i32 8, i32 8}
121+
// CHECK: ![[MWGS]] = !{i32 8, i32 4, i32 2}
102122
// CHECK: ![[MWGPC_MWGPM]] = !{i32 6}
103-
// CHECK: ![[MWGS_2]] = !{i32 8, i32 8, i32 6}
123+
// CHECK: ![[MWGS_2]] = !{i32 8, i32 4, i32 6}
104124
// CHECK: ![[MWGPC_MWGPM_2]] = !{i32 16}
105-
// CHECK: ![[MWGS_3]] = !{i32 8, i32 8, i32 16}
125+
// CHECK: ![[MWGS_3]] = !{i32 8, i32 4, i32 16}

0 commit comments

Comments
 (0)