Skip to content

[AMDGPU] Adding the amdgpu-num-work-groups function attribute #79035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,12 @@ Removed Compiler Flags

Attribute Changes in Clang
--------------------------
- Introduced a new function attribute ``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/work_groups/workgroup/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing attributes that have workgroup spelled as two separate words: flat-work-group-size, reqd_work_group_size.
Pls let me know if you still want workgroup as one word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. The ISA manuals usually use "workgroup". read_work_group_size came from OpenCL. We made up amdgpu_flat_work_group_size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend facing parts seem more consistently to be workgroup. e.g in the HSA metadata, we have .workgroup_size_hint, .reqd_workgroup_size. As horrible as it is, maybe it's best to keep it this way for the clang attribute, and rename all the backend bits to be "workgroup"?

Copy link
Contributor Author

@jwanggit86 jwanggit86 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of flat workgroup size, the LLVM attribute is called amdgpu-flat-work-group-size, but the metadata is .max_flat_workgroup_size. I suppose we can copy that and change the metadata from .max_num_work_groups_x to .max_num_workgroups_x and so on.

Do you want the LLVM attribute to be changed from amdgpu-max-num-work-groups to amdgpu-max-num-workgroups as well? Note that in the file AMDGPUUsage.rst the word work-group with a hyphen is used a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the AMDGPUUsage work-groups are actually wrong based on the actual code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ".max_num_workgroups" "amdgpu-max-num-workgroups" and "amdgpu_max_num_work_groups" is the most consistent with the existing uses

``[[clang::amdgpu_max_num_work_groups(x, y, z)]]`` for the AMDGPU target. This attribute can be
attached to HIP or OpenCL kernel function definitions to provide an optimization hint. The parameters
``x``, ``y``, and ``z`` specify the maximum number of workgroups for the respective dimensions,
and each must be a positive integer when provided. The parameter ``x`` is required, while ``y`` and
``z`` are optional with default value of 1.

Improvements to Clang's diagnostics
-----------------------------------
Expand Down
7 changes: 7 additions & 0 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -2039,6 +2039,13 @@ def AMDGPUNumVGPR : InheritableAttr {
let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
}

def AMDGPUMaxNumWorkGroups : InheritableAttr {
let Spellings = [Clang<"amdgpu_max_num_work_groups", 0>];
let Args = [ExprArgument<"MaxNumWorkGroupsX">, ExprArgument<"MaxNumWorkGroupsY", 1>, ExprArgument<"MaxNumWorkGroupsZ", 1>];
let Documentation = [AMDGPUMaxNumWorkGroupsDocs];
let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
}

def AMDGPUKernelCall : DeclOrTypeAttr {
let Spellings = [Clang<"amdgpu_kernel">];
let Documentation = [Undocumented];
Expand Down
27 changes: 27 additions & 0 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -2713,6 +2713,33 @@ An error will be given if:
}];
}

def AMDGPUMaxNumWorkGroupsDocs : Documentation {
let Category = DocCatAMDGPUAttributes;
let Content = [{
This attribute specifies the max number of work groups when the kernel
is dispatched.

Clang supports the
``__attribute__((amdgpu_max_num_work_groups(<x>, <y>, <z>)))`` or
``[[clang::amdgpu_max_num_work_groups(<x>, <y>, <z>)]]`` attribute for the
AMDGPU target. This attribute may be attached to HIP or OpenCL kernel function
definitions and is an optimization hint.

The ``<x>`` parameter specifies the maximum number of work groups in the x dimension.
Similarly ``<y>`` and ``<z>`` are for the y and z dimensions respectively.
Each of the three values must be greater than 0 when provided. The ``<x>`` parameter
is required, while ``<y>`` and ``<z>`` are optional with default value of 1.

If specified, the AMDGPU target backend might be able to produce better machine
code.

An error will be given if:
- Specified values violate subtarget specifications;
- Specified values are not compatible with values provided through other
attributes.
}];
}

def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> {
let Content = [{
Clang supports several different calling conventions, depending on the target
Expand Down
10 changes: 10 additions & 0 deletions clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -3911,6 +3911,16 @@ class Sema final {
void addAMDGPUWavesPerEUAttr(Decl *D, const AttributeCommonInfo &CI,
Expr *Min, Expr *Max);

/// Create an AMDGPUMaxNumWorkGroupsAttr attribute.
AMDGPUMaxNumWorkGroupsAttr *
CreateAMDGPUMaxNumWorkGroupsAttr(const AttributeCommonInfo &CI, Expr *XExpr,
Expr *YExpr, Expr *ZExpr);

/// addAMDGPUMaxNumWorkGroupsAttr - Adds an amdgpu_max_num_work_groups
/// attribute to a particular declaration.
void addAMDGPUMaxNumWorkGroupsAttr(Decl *D, const AttributeCommonInfo &CI,
Expr *XExpr, Expr *YExpr, Expr *ZExpr);

DLLImportAttr *mergeDLLImportAttr(Decl *D, const AttributeCommonInfo &CI);
DLLExportAttr *mergeDLLExportAttr(Decl *D, const AttributeCommonInfo &CI);
MSInheritanceAttr *mergeMSInheritanceAttr(Decl *D,
Expand Down
23 changes: 23 additions & 0 deletions clang/lib/CodeGen/Targets/AMDGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,29 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
if (NumVGPR != 0)
F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
}

if (const auto *Attr = FD->getAttr<AMDGPUMaxNumWorkGroupsAttr>()) {
uint32_t X = Attr->getMaxNumWorkGroupsX()
->EvaluateKnownConstInt(M.getContext())
.getExtValue();
// Y and Z dimensions default to 1 if not specified
uint32_t Y = Attr->getMaxNumWorkGroupsY()
? Attr->getMaxNumWorkGroupsY()
->EvaluateKnownConstInt(M.getContext())
.getExtValue()
: 1;
uint32_t Z = Attr->getMaxNumWorkGroupsZ()
? Attr->getMaxNumWorkGroupsZ()
->EvaluateKnownConstInt(M.getContext())
.getExtValue()
: 1;

llvm::SmallString<32> AttrVal;
llvm::raw_svector_ostream OS(AttrVal);
OS << X << ',' << Y << ',' << Z;

F->addFnAttr("amdgpu-max-num-workgroups", AttrVal.str());
}
}

/// Emits control constants used to change per-architecture behaviour in the
Expand Down
62 changes: 62 additions & 0 deletions clang/lib/Sema/SemaDeclAttr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8078,6 +8078,65 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
}

static bool
checkAMDGPUMaxNumWorkGroupsArguments(Sema &S, Expr *XExpr, Expr *YExpr,
Expr *ZExpr,
const AMDGPUMaxNumWorkGroupsAttr &Attr) {
if (S.DiagnoseUnexpandedParameterPack(XExpr) ||
(YExpr && S.DiagnoseUnexpandedParameterPack(YExpr)) ||
(ZExpr && S.DiagnoseUnexpandedParameterPack(ZExpr)))
return true;

// Accept template arguments for now as they depend on something else.
// We'll get to check them when they eventually get instantiated.
if (XExpr->isValueDependent() || (YExpr && YExpr->isValueDependent()) ||
(ZExpr && ZExpr->isValueDependent()))
return false;

uint32_t NumWG = 0;
Expr *Exprs[3] = {XExpr, YExpr, ZExpr};
for (int i = 0; i < 3; i++) {
if (Exprs[i]) {
if (!checkUInt32Argument(S, Attr, Exprs[i], NumWG, i,
/*StrictlyUnsigned=*/true))
return true;
if (NumWG == 0) {
S.Diag(Attr.getLoc(), diag::err_attribute_argument_is_zero)
<< &Attr << Exprs[i]->getSourceRange();
return true;
}
}
}

return false;
}

AMDGPUMaxNumWorkGroupsAttr *
Sema::CreateAMDGPUMaxNumWorkGroupsAttr(const AttributeCommonInfo &CI,
Expr *XExpr, Expr *YExpr, Expr *ZExpr) {
AMDGPUMaxNumWorkGroupsAttr TmpAttr(Context, CI, XExpr, YExpr, ZExpr);

if (checkAMDGPUMaxNumWorkGroupsArguments(*this, XExpr, YExpr, ZExpr, TmpAttr))
return nullptr;

return ::new (Context)
AMDGPUMaxNumWorkGroupsAttr(Context, CI, XExpr, YExpr, ZExpr);
}

void Sema::addAMDGPUMaxNumWorkGroupsAttr(Decl *D, const AttributeCommonInfo &CI,
Expr *XExpr, Expr *YExpr,
Expr *ZExpr) {
if (auto *Attr = CreateAMDGPUMaxNumWorkGroupsAttr(CI, XExpr, YExpr, ZExpr))
D->addAttr(Attr);
}

static void handleAMDGPUMaxNumWorkGroupsAttr(Sema &S, Decl *D,
const ParsedAttr &AL) {
Expr *YExpr = (AL.getNumArgs() > 1) ? AL.getArgAsExpr(1) : nullptr;
Expr *ZExpr = (AL.getNumArgs() > 2) ? AL.getArgAsExpr(2) : nullptr;
S.addAMDGPUMaxNumWorkGroupsAttr(D, AL, AL.getArgAsExpr(0), YExpr, ZExpr);
}

static void handleX86ForceAlignArgPointerAttr(Sema &S, Decl *D,
const ParsedAttr &AL) {
// If we try to apply it to a function pointer, don't warn, but don't
Expand Down Expand Up @@ -9182,6 +9241,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
case ParsedAttr::AT_AMDGPUNumVGPR:
handleAMDGPUNumVGPRAttr(S, D, AL);
break;
case ParsedAttr::AT_AMDGPUMaxNumWorkGroups:
handleAMDGPUMaxNumWorkGroupsAttr(S, D, AL);
break;
case ParsedAttr::AT_AVRSignal:
handleAVRSignalAttr(S, D, AL);
break;
Expand Down
29 changes: 29 additions & 0 deletions clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,29 @@ static void instantiateDependentAMDGPUWavesPerEUAttr(
S.addAMDGPUWavesPerEUAttr(New, Attr, MinExpr, MaxExpr);
}

static void instantiateDependentAMDGPUMaxNumWorkGroupsAttr(
Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs,
const AMDGPUMaxNumWorkGroupsAttr &Attr, Decl *New) {
EnterExpressionEvaluationContext Unevaluated(
S, Sema::ExpressionEvaluationContext::ConstantEvaluated);

ExprResult ResultX = S.SubstExpr(Attr.getMaxNumWorkGroupsX(), TemplateArgs);
if (!ResultX.isUsable())
return;
ExprResult ResultY = S.SubstExpr(Attr.getMaxNumWorkGroupsY(), TemplateArgs);
if (!ResultY.isUsable())
return;
ExprResult ResultZ = S.SubstExpr(Attr.getMaxNumWorkGroupsZ(), TemplateArgs);
if (!ResultZ.isUsable())
return;

Expr *XExpr = ResultX.getAs<Expr>();
Expr *YExpr = ResultY.getAs<Expr>();
Expr *ZExpr = ResultZ.getAs<Expr>();

S.addAMDGPUMaxNumWorkGroupsAttr(New, Attr, XExpr, YExpr, ZExpr);
}

// This doesn't take any template parameters, but we have a custom action that
// needs to happen when the kernel itself is instantiated. We need to run the
// ItaniumMangler to mark the names required to name this kernel.
Expand Down Expand Up @@ -792,6 +815,12 @@ void Sema::InstantiateAttrs(const MultiLevelTemplateArgumentList &TemplateArgs,
*AMDGPUFlatWorkGroupSize, New);
}

if (const auto *AMDGPUMaxNumWorkGroups =
dyn_cast<AMDGPUMaxNumWorkGroupsAttr>(TmplAttr)) {
instantiateDependentAMDGPUMaxNumWorkGroupsAttr(
*this, TemplateArgs, *AMDGPUMaxNumWorkGroups, New);
}

if (const auto *ParamAttr = dyn_cast<HLSLParamModifierAttr>(TmplAttr)) {
instantiateDependentHLSLParamModifierAttr(*this, TemplateArgs, ParamAttr,
New);
Expand Down
35 changes: 35 additions & 0 deletions clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,53 @@ __attribute__((amdgpu_num_vgpr(64))) // expected-no-diagnostics
__global__ void num_vgpr_64() {
// CHECK: define{{.*}} amdgpu_kernel void @_Z11num_vgpr_64v() [[NUM_VGPR_64:#[0-9]+]]
}
__attribute__((amdgpu_max_num_work_groups(32, 4, 2))) // expected-no-diagnostics
__global__ void max_num_work_groups_32_4_2() {
// CHECK: define{{.*}} amdgpu_kernel void @_Z26max_num_work_groups_32_4_2v() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]
}
__attribute__((amdgpu_max_num_work_groups(32))) // expected-no-diagnostics
__global__ void max_num_work_groups_32() {
// CHECK: define{{.*}} amdgpu_kernel void @_Z22max_num_work_groups_32v() [[MAX_NUM_WORK_GROUPS_32_1_1:#[0-9]+]]
}
__attribute__((amdgpu_max_num_work_groups(32,1))) // expected-no-diagnostics
__global__ void max_num_work_groups_32_1() {
// CHECK: define{{.*}} amdgpu_kernel void @_Z24max_num_work_groups_32_1v() [[MAX_NUM_WORK_GROUPS_32_1_1:#[0-9]+]]
}



template<unsigned a>
__attribute__((amdgpu_max_num_work_groups(a, 4, 2)))
__global__ void template_a_4_2_max_num_work_groups() {}
template __global__ void template_a_4_2_max_num_work_groups<32>();
// CHECK: define{{.*}} amdgpu_kernel void @_Z34template_a_4_2_max_num_work_groupsILj32EEvv() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]

template<unsigned a>
__attribute__((amdgpu_max_num_work_groups(32, a, 2)))
__global__ void template_32_a_2_max_num_work_groups() {}
template __global__ void template_32_a_2_max_num_work_groups<4>();
// CHECK: define{{.*}} amdgpu_kernel void @_Z35template_32_a_2_max_num_work_groupsILj4EEvv() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]

template<unsigned a>
__attribute__((amdgpu_max_num_work_groups(32, 4, a)))
__global__ void template_32_4_a_max_num_work_groups() {}
template __global__ void template_32_4_a_max_num_work_groups<2>();
// CHECK: define{{.*}} amdgpu_kernel void @_Z35template_32_4_a_max_num_work_groupsILj2EEvv() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]

// Make sure this is silently accepted on other targets.
// NAMD-NOT: "amdgpu-flat-work-group-size"
// NAMD-NOT: "amdgpu-waves-per-eu"
// NAMD-NOT: "amdgpu-num-vgpr"
// NAMD-NOT: "amdgpu-num-sgpr"
// NAMD-NOT: "amdgpu-max-num-work-groups"

// DEFAULT-DAG: attributes [[FLAT_WORK_GROUP_SIZE_DEFAULT]] = {{.*}}"amdgpu-flat-work-group-size"="1,1024"{{.*}}"uniform-work-group-size"="true"
// MAX1024-DAG: attributes [[FLAT_WORK_GROUP_SIZE_DEFAULT]] = {{.*}}"amdgpu-flat-work-group-size"="1,1024"
// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = {{.*}}"amdgpu-flat-work-group-size"="32,64"
// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = {{.*}}"amdgpu-waves-per-eu"="2"
// CHECK-DAG: attributes [[NUM_SGPR_32]] = {{.*}}"amdgpu-num-sgpr"="32"
// CHECK-DAG: attributes [[NUM_VGPR_64]] = {{.*}}"amdgpu-num-vgpr"="64"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_32_4_2]] = {{.*}}"amdgpu-max-num-workgroups"="32,4,2"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_32_1_1]] = {{.*}}"amdgpu-max-num-workgroups"="32,1,1"

// NOUB-NOT: "uniform-work-group-size"="true"
47 changes: 47 additions & 0 deletions clang/test/CodeGenOpenCL/amdgpu-attrs.cl
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,46 @@ kernel void reqd_work_group_size_32_2_1_flat_work_group_size_16_128() {
// CHECK: define{{.*}} amdgpu_kernel void @reqd_work_group_size_32_2_1_flat_work_group_size_16_128() [[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(1, 1, 1))) // expected-no-diagnostics
kernel void max_num_work_groups_1_1_1() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_1_1_1() [[MAX_NUM_WORK_GROUPS_1_1_1:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(32, 1, 1))) // expected-no-diagnostics
kernel void max_num_work_groups_32_1_1() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_32_1_1() [[MAX_NUM_WORK_GROUPS_32_1_1:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(32, 8, 1))) // expected-no-diagnostics
kernel void max_num_work_groups_32_8_1() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_32_8_1() [[MAX_NUM_WORK_GROUPS_32_8_1:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(1, 1, 32))) // expected-no-diagnostics
kernel void max_num_work_groups_1_1_32() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_1_1_32() [[MAX_NUM_WORK_GROUPS_1_1_32:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(1, 8, 32))) // expected-no-diagnostics
kernel void max_num_work_groups_1_8_32() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_1_8_32() [[MAX_NUM_WORK_GROUPS_1_8_32:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(4, 8, 32))) // expected-no-diagnostics
kernel void max_num_work_groups_4_8_32() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_4_8_32() [[MAX_NUM_WORK_GROUPS_4_8_32:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(32))) // expected-no-diagnostics
kernel void max_num_work_groups_32() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_32() [[MAX_NUM_WORK_GROUPS_32_1_1:#[0-9]+]]
}

__attribute__((amdgpu_max_num_work_groups(32,1))) // expected-no-diagnostics
kernel void max_num_work_groups_32_1() {
// CHECK: define{{.*}} amdgpu_kernel void @max_num_work_groups_32_1() [[MAX_NUM_WORK_GROUPS_32_1_1:#[0-9]+]]
}

void a_function() {
// CHECK: define{{.*}} void @a_function() [[A_FUNCTION:#[0-9]+]]
}
Expand Down Expand Up @@ -189,5 +229,12 @@ kernel void default_kernel() {
// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = {{.*}} "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32_NUM_VGPR_64]] = {{.*}} "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"

// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_1_1_1]] = {{.*}} "amdgpu-max-num-workgroups"="1,1,1"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_32_1_1]] = {{.*}} "amdgpu-max-num-workgroups"="32,1,1"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_32_8_1]] = {{.*}} "amdgpu-max-num-workgroups"="32,8,1"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_1_1_32]] = {{.*}} "amdgpu-max-num-workgroups"="1,1,32"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_1_8_32]] = {{.*}} "amdgpu-max-num-workgroups"="1,8,32"
// CHECK-DAG: attributes [[MAX_NUM_WORK_GROUPS_4_8_32]] = {{.*}} "amdgpu-max-num-workgroups"="4,8,32"

// CHECK-DAG: attributes [[A_FUNCTION]] = {{.*}}
// CHECK-DAG: attributes [[DEFAULT_KERNEL_ATTRS]] = {{.*}} "amdgpu-flat-work-group-size"="1,256"
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

// CHECK: #pragma clang attribute supports the following attributes:
// CHECK-NEXT: AMDGPUFlatWorkGroupSize (SubjectMatchRule_function)
// CHECK-NEXT: AMDGPUMaxNumWorkGroups (SubjectMatchRule_function)
// CHECK-NEXT: AMDGPUNumSGPR (SubjectMatchRule_function)
// CHECK-NEXT: AMDGPUNumVGPR (SubjectMatchRule_function)
// CHECK-NEXT: AMDGPUWavesPerEU (SubjectMatchRule_function)
Expand Down
Loading