[AMDGPU] Adding the amdgpu-num-work-groups function attribute #79035

jwanggit86 · 2024-01-22T18:50:51Z

A new function attribute named amdgpu-num-work-groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information.

llvmbot · 2024-01-22T18:51:10Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)

Changes

A new function attribute named amdgpu-num-work-groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information.

Full diff: https://github.com/llvm/llvm-project/pull/79035.diff

13 Files Affected:

(modified) clang/include/clang/Basic/Attr.td (+7)
(modified) clang/include/clang/Basic/AttrDocs.td (+24)
(modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+13)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+22)
(modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+8)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+5)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h (+3)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+2)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+10)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+53)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+19)
(added) llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll (+65)

diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 78a9229aeaf081..5251858ac3bfd7 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2031,6 +2031,13 @@ def AMDGPUNumVGPR : InheritableAttr {
   let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }
 
+def AMDGPUNumWorkGroups : InheritableAttr {
+  let Spellings = [Clang<"amdgpu_num_work_groups", 0>];
+  let Args = [UnsignedArgument<"NumWorkGroupsX">, UnsignedArgument<"NumWorkGroupsY">, UnsignedArgument<"NumWorkGroupsZ">];
+  let Documentation = [AMDGPUNumWorkGroupsDocs];
+  let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
+}
+
 def AMDGPUKernelCall : DeclOrTypeAttr {
   let Spellings = [Clang<"amdgpu_kernel">];
   let Documentation = [Undocumented];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 9e8190614fbe8a..268d15eddab16f 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2702,6 +2702,30 @@ An error will be given if:
   }];
 }
 
+def AMDGPUNumWorkGroupsDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+The number of work groups specifies the number of work groups when the kernel
+is dispatched.
+
+Clang supports the
+``__attribute__((amdgpu_num_work_groups(<x>, <y>, <z>)))`` attribute for the
+AMDGPU target. This attribute may be attached to a kernel function definition
+and is an optimization hint.
+
+``<x>`` parameter specifies the maximum number of work groups in the x dimentsion.
+Similarly ``<y>`` and ``<z>`` are for the y and z dimensions respectively.
+
+If specified, the AMDGPU target backend might be able to produce better machine
+code.
+
+An error will be given if:
+  - Specified values violate subtarget specifications;
+  - Specified values are not compatible with values provided through other
+    attributes.
+  }];
+}
+
 def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> {
   let Content = [{
 Clang supports several different calling conventions, depending on the target
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 03ac6b78598fc8..93321efd26462c 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -356,6 +356,19 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
     if (NumVGPR != 0)
       F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
   }
+
+  if (const auto *Attr = FD->getAttr<AMDGPUNumWorkGroupsAttr>()) {
+    uint32_t X = Attr->getNumWorkGroupsX();
+    uint32_t Y = Attr->getNumWorkGroupsY();
+    uint32_t Z = Attr->getNumWorkGroupsZ();
+
+    if (X != 0 && Y != 0 && Z != 0) {
+      std::string AttrVal = llvm::utostr(X) + std::string(", ") +
+                            llvm::utostr(Y) + std::string(", ") +
+                            llvm::utostr(Z);
+      F->addFnAttr("amdgpu-num-work-groups", AttrVal);
+    }
+  }
 }
 
 /// Emits control constants used to change per-architecture behaviour in the
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index a482919356e1bc..dc1c951031d58c 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -8072,6 +8072,25 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
 }
 
+static void handleAMDGPUNumWorkGroupsAttr(Sema &S, Decl *D,
+                                          const ParsedAttr &AL) {
+  uint32_t NumWGX = 0;
+  uint32_t NumWGY = 0;
+  uint32_t NumWGZ = 0;
+  Expr *NumWGXExpr = AL.getArgAsExpr(0);
+  Expr *NumWGYExpr = AL.getArgAsExpr(1);
+  Expr *NumWGZExpr = AL.getArgAsExpr(2);
+  if (!checkUInt32Argument(S, AL, NumWGXExpr, NumWGX))
+    return;
+  if (!checkUInt32Argument(S, AL, NumWGYExpr, NumWGY))
+    return;
+  if (!checkUInt32Argument(S, AL, NumWGZExpr, NumWGZ))
+    return;
+
+  D->addAttr(::new (S.Context) AMDGPUNumWorkGroupsAttr(S.Context, AL, NumWGX,
+                                                       NumWGY, NumWGZ));
+}
+
 static void handleX86ForceAlignArgPointerAttr(Sema &S, Decl *D,
                                               const ParsedAttr &AL) {
   // If we try to apply it to a function pointer, don't warn, but don't
@@ -9170,6 +9189,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_AMDGPUNumVGPR:
     handleAMDGPUNumVGPRAttr(S, D, AL);
     break;
+  case ParsedAttr::AT_AMDGPUNumWorkGroups:
+    handleAMDGPUNumWorkGroupsAttr(S, D, AL);
+    break;
   case ParsedAttr::AT_AVRSignal:
     handleAVRSignalAttr(S, D, AL);
     break;
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index e476c15b35ded9..3d12656612eb06 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -6,6 +6,7 @@
 // CHECK-NEXT: AMDGPUFlatWorkGroupSize (SubjectMatchRule_function)
 // CHECK-NEXT: AMDGPUNumSGPR (SubjectMatchRule_function)
 // CHECK-NEXT: AMDGPUNumVGPR (SubjectMatchRule_function)
+// CHECK-NEXT: AMDGPUNumWorkGroups (SubjectMatchRule_function)
 // CHECK-NEXT: AMDGPUWavesPerEU (SubjectMatchRule_function)
 // CHECK-NEXT: AVRSignal (SubjectMatchRule_function)
 // CHECK-NEXT: AbiTag (SubjectMatchRule_record_not_is_union, SubjectMatchRule_variable, SubjectMatchRule_function, SubjectMatchRule_namespace)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
index 74e9cd7d09654c..b1eb701d18db89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
@@ -494,6 +494,14 @@ MetadataStreamerMsgPackV4::getHSAKernelProps(const MachineFunction &MF,
 
   Kern[".max_flat_workgroup_size"] =
       Kern.getDocument()->getNode(MFI.getMaxFlatWorkGroupSize());
+  unsigned NumWGX = MFI.getNumWorkGroupsX();
+  unsigned NumWGY = MFI.getNumWorkGroupsY();
+  unsigned NumWGZ = MFI.getNumWorkGroupsZ();
+  if (NumWGX != 0 && NumWGY != 0 && NumWGZ != 0) {
+    Kern[".num_work_groups_x"] = Kern.getDocument()->getNode(NumWGX);
+    Kern[".num_work_groups_y"] = Kern.getDocument()->getNode(NumWGY);
+    Kern[".num_work_groups_z"] = Kern.getDocument()->getNode(NumWGZ);
+  }
   Kern[".sgpr_spill_count"] =
       Kern.getDocument()->getNode(MFI.getNumSpilledSGPRs());
   Kern[".vgpr_spill_count"] =
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index f19c5766856408..c02d7c6387c11d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -1108,3 +1108,8 @@ void GCNUserSGPRUsageInfo::allocKernargPreloadSGPRs(unsigned NumSGPRs) {
 unsigned GCNUserSGPRUsageInfo::getNumFreeUserSGPRs() {
   return AMDGPU::getMaxNumUserSGPRs(ST) - NumUsedUserSGPRs;
 }
+
+SmallVector<unsigned>
+AMDGPUSubtarget::getNumWorkGroups(const Function &F) const {
+  return AMDGPU::getIntegerVecAttribute(F, "amdgpu-num-work-groups", 3);
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
index b72697973be7a1..90c394b6e3b252 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
@@ -288,6 +288,9 @@ class AMDGPUSubtarget {
   /// 2) dimension.
   unsigned getMaxWorkitemID(const Function &Kernel, unsigned Dimension) const;
 
+  /// Return the number of work groups for the function.
+  SmallVector<unsigned> getNumWorkGroups(const Function &F) const;
+
   /// Return true if only a single workitem can be active in a wave.
   bool isSingleLaneExecution(const Function &Kernel) const;
 
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index e8142244b7db69..58f5fe415e84d2 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -46,6 +46,8 @@ SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F,
   const GCNSubtarget &ST = *static_cast<const GCNSubtarget *>(STI);
   FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
   WavesPerEU = ST.getWavesPerEU(F);
+  NumWorkGroups = ST.getNumWorkGroups(F);
+  assert(NumWorkGroups.size() == 3);
 
   Occupancy = ST.computeOccupancy(F, getLDSSize());
   CallingConv::ID CC = F.getCallingConv();
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index dc63ae44c528db..29c75309bdebe6 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -426,6 +426,9 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
 
   const AMDGPUGWSResourcePseudoSourceValue GWSResourcePSV;
 
+  // Default/requested number of work groups for the function.
+  SmallVector<unsigned> NumWorkGroups = {0, 0, 0};
+
 private:
   unsigned NumUserSGPRs = 0;
   unsigned NumSystemSGPRs = 0;
@@ -1095,6 +1098,13 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
 
   // \returns true if a function needs or may need AGPRs.
   bool usesAGPRs(const MachineFunction &MF) const;
+
+  /// \returns Default/requested number of work groups for this function.
+  SmallVector<unsigned> getNumWorkGroups() const { return NumWorkGroups; }
+
+  unsigned getNumWorkGroupsX() const { return NumWorkGroups[0]; }
+  unsigned getNumWorkGroupsY() const { return NumWorkGroups[1]; }
+  unsigned getNumWorkGroupsZ() const { return NumWorkGroups[2]; }
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index b4f7fc456f0bdd..9ef02b83f55b43 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -11,6 +11,7 @@
 #include "AMDGPUAsmUtils.h"
 #include "AMDKernelCodeT.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/Constants.h"
@@ -1261,6 +1262,58 @@ getIntegerPairAttribute(const Function &F, StringRef Name,
   return Ints;
 }
 
+SmallVector<unsigned> getIntegerVecAttribute(const Function &F, StringRef Name,
+                                             unsigned Size) {
+  assert(Size > 2);
+  SmallVector<unsigned> Default(Size, 0);
+
+  Attribute A = F.getFnAttribute(Name);
+  if (!A.isStringAttribute())
+    return Default;
+
+  SmallVector<unsigned> Vals(Size, 0);
+
+  LLVMContext &Ctx = F.getContext();
+
+  StringRef S = A.getValueAsString();
+  unsigned i = 0;
+  for (; !S.empty() && i < Size; i++) {
+    std::pair<StringRef, StringRef> Strs = S.split(',');
+    unsigned IntVal;
+    if (Strs.first.trim().getAsInteger(0, IntVal)) {
+      Ctx.emitError("can't parse integer attribute " + Strs.first + " in " +
+                    Name);
+      return Default;
+    }
+    Vals[i] = IntVal;
+    S = Strs.second;
+  }
+
+  if (!S.empty() || i < Size) {
+    Ctx.emitError("attribute " + Name +
+                  " has incorrect number of integers; expected " +
+                  llvm::utostr(Size));
+    return Default;
+  }
+  return Vals;
+}
+
+unsigned getUnsignedIntegerAttribute(const Function &F, StringRef Name,
+                                     unsigned Default) {
+  Attribute A = F.getFnAttribute(Name);
+  if (!A.isStringAttribute())
+    return Default;
+
+  LLVMContext &Ctx = F.getContext();
+  unsigned IntVal = Default;
+  StringRef Str = A.getValueAsString();
+  if (Str.trim().getAsInteger(0, IntVal)) {
+    Ctx.emitError("can't parse integer attribute " + Name);
+    return Default;
+  }
+  return IntVal;
+}
+
 unsigned getVmcntBitMask(const IsaVersion &Version) {
   return (1 << (getVmcntBitWidthLo(Version.Major) +
                 getVmcntBitWidthHi(Version.Major))) -
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 351563e957f14a..ee2b68c13617a8 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -818,6 +818,15 @@ bool shouldEmitConstantsToTextSection(const Triple &TT);
 /// to integer.
 int getIntegerAttribute(const Function &F, StringRef Name, int Default);
 
+/// \returns Unsigned Integer value requested using \p F's \p Name attribute.
+///
+/// \returns \p Default if attribute is not present.
+///
+/// \returns \p Default and emits error if requested value cannot be converted
+/// to integer.
+unsigned getUnsignedIntegerAttribute(const Function &F, StringRef Name,
+                                     unsigned Default);
+
 /// \returns A pair of integer values requested using \p F's \p Name attribute
 /// in "first[,second]" format ("second" is optional unless \p OnlyFirstRequired
 /// is false).
@@ -832,6 +841,16 @@ getIntegerPairAttribute(const Function &F, StringRef Name,
                         std::pair<unsigned, unsigned> Default,
                         bool OnlyFirstRequired = false);
 
+/// \returns Generate a vector of integer values requested using \p F's \p Name
+/// attribute.
+///
+/// \returns true if exactly Size (>2) number of integers are found in the
+/// attribute.
+///
+/// \returns false if any error occurs.
+SmallVector<unsigned> getIntegerVecAttribute(const Function &F, StringRef Name,
+                                             unsigned Size);
+
 /// Represents the counter values to wait for in an s_waitcnt instruction.
 ///
 /// Large values (including the maximum possible integer) can be used to
diff --git a/llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll
new file mode 100644
index 00000000000000..6fc6de91d1d030
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll
@@ -0,0 +1,65 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck %s
+
+; Attribute not specified.
+; CHECK-LABEL: {{^}}empty_no_attribute:
+define amdgpu_kernel void @empty_no_attribute() {
+entry:
+  ret void
+}
+
+; Ignore if number of work groups for x dimension is 0.
+; CHECK-LABEL: {{^}}empty_num_work_groups_x0:
+define amdgpu_kernel void @empty_num_work_groups_x0() #0 {
+entry:
+  ret void
+}
+attributes #0 = {"amdgpu-num-work-groups"="0,2,3"}
+
+; Ignore if number of work groups for y dimension is 0.
+; CHECK-LABEL: {{^}}empty_num_work_groups_y0:
+define amdgpu_kernel void @empty_num_work_groups_y0() #1 {
+entry:
+  ret void
+}
+attributes #1 = {"amdgpu-num-work-groups"="1,0,3"}
+
+; Ignore if number of work groups for z dimension is 0.
+; CHECK-LABEL: {{^}}empty_num_work_groups_z0:
+define amdgpu_kernel void @empty_num_work_groups_z0() #2 {
+entry:
+  ret void
+}
+attributes #2 = {"amdgpu-num-work-groups"="1,2,0"}
+
+; CHECK-LABEL: {{^}}empty_num_work_groups_1_2_3:
+define amdgpu_kernel void @empty_num_work_groups_1_2_3() #3 {
+entry:
+  ret void
+}
+attributes #3 = {"amdgpu-num-work-groups"="1,2,3"}
+
+; CHECK-LABEL: {{^}}empty_num_work_groups_1024_1024_1024:
+define amdgpu_kernel void @empty_num_work_groups_1024_1024_1024() #4 {
+entry:
+  ret void
+}
+attributes #4 = {"amdgpu-num-work-groups"="1024,1024,1024"}
+
+
+; CHECK: .amdgpu_metadata
+; CHECK:        .name:           empty_no_attribute
+; CHECK-NEXT:   .private_segment_fixed_size: 0
+; CHECK:        .name:           empty_num_work_groups_x0
+; CHECK-NEXT:   .private_segment_fixed_size: 0
+; CHECK:        .name:           empty_num_work_groups_y0
+; CHECK-NEXT:   .private_segment_fixed_size: 0
+; CHECK:        .name:           empty_num_work_groups_z0
+; CHECK-NEXT:   .private_segment_fixed_size: 0
+; CHECK:        .name:           empty_num_work_groups_1_2_3
+; CHECK-NEXT:   .num_work_groups_x: 1
+; CHECK-NEXT:   .num_work_groups_y: 2
+; CHECK-NEXT:   .num_work_groups_z: 3
+; CHECK:        .name:           empty_num_work_groups_1024_1024_1024
+; CHECK-NEXT:   .num_work_groups_x: 1024
+; CHECK-NEXT:   .num_work_groups_y: 1024
+; CHECK-NEXT:   .num_work_groups_z: 1024

jwanggit86 · 2024-01-29T17:59:26Z

@arsenm @krzysz00 Any comments?

krzysz00 · 2024-01-29T19:31:01Z

Do we want to also get min-num-work-groups and max-num-work-groups versions?

jwanggit86 · 2024-01-29T19:58:20Z

@krzysz00 Are you asking for something like the following:

"amdgpu-min-num-work-groups"="1,2,3", "amdgpu-max-num-work-groups"="4,5,6"

When both are given, min must be <= max.

krzysz00 · 2024-01-29T20:34:27Z

I'm suggesting that this might be a more general design and that there might be more uses for it.

jwanggit86 · 2024-01-29T22:30:56Z

@krzysz00 Let me make sure I understand the requirements correctly. Based on my understanding, the following are the requirements. Pls let me know if there are any mistakes.

Create a new function attribute for the number of workgroups (maybe 2 attributes, one for max, and one for min).
The function attribute consists of 3 unsigned integers, one for each of the x, y, and z dimensions.
If the numbers are all valid, they are listed in the medata data section of the .s file.
What the compiler does with the numbers is left for future work.

krzysz00 · 2024-01-30T15:25:32Z

Yeah, that's my proposal for metadata that's useful to record, especially since min == max gives the present case

jwanggit86 · 2024-02-05T20:35:48Z

@krzysz00 Code has been updated. Pls take a look when convenient. Pls note the following:
(1) Two attributes are now supported, one for min and one for max num of workgroups.
(2) It is allowed to only specify one of the two attributes.
(3) An attribute is ignored if any one of the 3 numbers (for x,y,z) is 0.
(4) When both attributes are valid, we make sure max >= min element-wise. If not, both are ignored.

arsenm

One attribute, with a range, would be better than two attributes. This is how it is handled in the similar cases.

I also think this should be in terms of work items, not workgroups

arsenm

One attribute

krzysz00 · 2024-02-06T15:35:24Z

@arsenm Are you suggesting that these should instead be a range of minimum/maximum number of workitems globally?

arsenm · 2024-02-06T15:48:14Z

@arsenm Are you suggesting that these should instead be a range of minimum/maximum number of workitems globally?

That's how all of the other attributes we already have do this. amdgpu-waves-per-eu is a single min, max pair. Same with amdgpu-flat-work-group-size

Although this one is weird because it's dimensional. I'm also wondering what the use of the minimum dispatch size would be? I can see some minimal use for the maximum

jwanggit86 · 2024-02-06T18:17:22Z

I thought about having one attribute with 6 numbers. Then you have to provide 6 numbers when using it. In the current design, either the min or the max attribute can be omitted.

arsenm

Needs documentation in AMDGPUUsage. Should also clarify behavior of 0

arsenm · 2024-02-12T12:26:18Z

clang/include/clang/Basic/Attr.td

+def AMDGPUNumWorkGroups : InheritableAttr {
+  let Spellings = [Clang<"amdgpu_num_work_groups", 0>];


max_num? num_work_groups implies an exact match

Changed name to "max_num_work_groups".

arsenm · 2024-02-12T12:27:06Z

clang/lib/CodeGen/Targets/AMDGPU.cpp

+      std::string AttrVal = llvm::utostr(X) + std::string(", ") +
+                            llvm::utostr(Y) + std::string(", ") +
+                            llvm::utostr(Z);


SmallString + raw_svector_ostream

Changed as suggested.

arsenm · 2024-02-12T12:27:50Z

clang/lib/CodeGen/Targets/AMDGPU.cpp

+    uint32_t Y = Attr->getNumWorkGroupsY();
+    uint32_t Z = Attr->getNumWorkGroupsZ();
+
+    if (X != 0 && Y != 0 && Z != 0) {


shouldn't it try to set this if any dimension has a relevant value?

Yeah, this seems like a 'bad' user interface. If '0' means 'nothing', we should probably reject that in Sema.

My understanding is that 0 is not allowed. If any of the 3 numbers is 0 the attribute is rejected.

How do you represent I know one dimension but not another?

My understanding is that <X, 1, 1> would mean all workgroups are in the x-dimension only. Similarly <X, Y, 1> would mean only the x- and y-dimensions are involved.

arsenm · 2024-02-12T12:28:18Z

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

@@ -1108,3 +1108,8 @@ void GCNUserSGPRUsageInfo::allocKernargPreloadSGPRs(unsigned NumSGPRs) {
 unsigned GCNUserSGPRUsageInfo::getNumFreeUserSGPRs() {
  return AMDGPU::getMaxNumUserSGPRs(ST) - NumUsedUserSGPRs;
 }
+
+SmallVector<unsigned>


std::array<3>?

This function calls getIntegerVecAttribute() which returns SmallVector. If change this to std::array<3>, the other function has to be changed as well.

llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll

arsenm · 2024-02-12T12:30:37Z

Also should have follow up patch to propagate in AMDGPUAttributor, and another to lower to !range in AMDGPULowerKernelAttributes

erichkeane · 2024-02-12T15:02:54Z

clang/include/clang/Basic/AttrDocs.td

+def AMDGPUNumWorkGroupsDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+The number of work groups specifies the number of work groups when the kernel


I think this needs more elaboration... "The number of work groups specifies the number of work groups" is a tautology.

Updated. However, the phrase "number of work groups" itself appears to be clear enough.

erichkeane · 2024-02-12T15:03:24Z

clang/include/clang/Basic/AttrDocs.td

+is dispatched.
+
+Clang supports the
+``__attribute__((amdgpu_num_work_groups(<x>, <y>, <z>)))`` attribute for the


I think we prefer spelling it [[clang::...]] now in documentation. I personally would prefer we move everyone to that as well.

Added [[clang::...]] as suggested.

erichkeane · 2024-02-12T15:04:22Z

clang/include/clang/Basic/AttrDocs.td

+
+Clang supports the
+``__attribute__((amdgpu_num_work_groups(<x>, <y>, <z>)))`` attribute for the
+AMDGPU target. This attribute may be attached to a kernel function definition


'kernel function definition' probably needs more elaboration as well, we now have ~3 different 'kinds' of kernel function definitions.

Would it be better to say "This attribute may be attached to HIP or OpenCL kernel function..." ?

That would be preferential, yes

Changed to what's suggested above.

erichkeane · 2024-02-12T15:05:16Z

clang/include/clang/Basic/AttrDocs.td

+AMDGPU target. This attribute may be attached to a kernel function definition
+and is an optimization hint.
+
+``<x>`` parameter specifies the maximum number of work groups in the x dimentsion.


Suggested change

``<x>`` parameter specifies the maximum number of work groups in the x dimentsion.

``<x>`` parameter specifies the maximum number of work groups in the x dimension.

Also, we should be more clear/elaborate more what x, y, and z dimensions mean here. One thing I note is that OpenCL (IIRC?) actually reverses these? So it is VERY important that we document both order and meaning explicitly.

The 3 numbers specify the 3 dimensions for a 3D grid of threads. Conventionally the dimensions are referred to as x, y, and z. See attributes such as reqd_work_group_size.

erichkeane · 2024-02-12T15:06:25Z

clang/include/clang/Basic/Attr.td

@@ -2031,6 +2031,13 @@ def AMDGPUNumVGPR : InheritableAttr {
  let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }

+def AMDGPUNumWorkGroups : InheritableAttr {
+  let Spellings = [Clang<"amdgpu_num_work_groups", 0>];
+  let Args = [UnsignedArgument<"NumWorkGroupsX">, UnsignedArgument<"NumWorkGroupsY">, UnsignedArgument<"NumWorkGroupsZ">];


UnsignedArgument doesn't allow dependent values. Is this something we're OK with? Typically we'd want to support some level of template support, which means you have to store as expressions and convert when needed (then TreeTransform it properly).

Changed UnsignedArgument to AttrArgument.

erichkeane · 2024-02-12T15:07:19Z

clang/lib/CodeGen/Targets/AMDGPU.cpp

+    uint32_t Y = Attr->getNumWorkGroupsY();
+    uint32_t Z = Attr->getNumWorkGroupsZ();
+
+    if (X != 0 && Y != 0 && Z != 0) {


Yeah, this seems like a 'bad' user interface. If '0' means 'nothing', we should probably reject that in Sema.

erichkeane · 2024-02-12T15:08:44Z

clang/lib/Sema/SemaDeclAttr.cpp

+  if (!checkUInt32Argument(S, AL, NumWGZExpr, NumWGZ))
+    return;
+
+  D->addAttr(::new (S.Context) AMDGPUNumWorkGroupsAttr(S.Context, AL, NumWGX,


Based on the above, if any of these are zero, this attribute has no effect. We should diagnose based on the value of X, Y, and Z, then only create it in the AST if it has an effect.

Add check in this function to ensure addAttr is called only when none of the 3 numbers are 0. The check in AMDGPU.cpp is removed.

github-actions · 2024-02-17T00:52:30Z

✅ With the latest revision this PR passed the C/C++ code formatter.

erichkeane

Reviewed the CFE component, didn't look at LLVM.

erichkeane · 2024-02-20T15:43:23Z

clang/include/clang/Basic/AttrDocs.td

+AMDGPU target. This attribute may be attached to HIP or OpenCL kernel function
+definitions and is an optimization hint.
+
+``<x>`` parameter specifies the maximum number of work groups in the x dimension.


Suggested change

``<x>`` parameter specifies the maximum number of work groups in the x dimension.

The ``<x>`` parameter specifies the maximum number of work groups in the x dimension.

erichkeane · 2024-02-20T15:43:58Z

clang/include/clang/Basic/AttrDocs.td

+
+``<x>`` parameter specifies the maximum number of work groups in the x dimension.
+Similarly ``<y>`` and ``<z>`` are for the y and z dimensions respectively.
+Each of the three numbers must be >=1. The attribute is ignored if any of the


Suggested change

Each of the three numbers must be >=1. The attribute is ignored if any of the

Each of the three values must be greater than zero. The attribute is ignored if any of the

erichkeane · 2024-02-20T15:44:15Z

clang/include/clang/Basic/AttrDocs.td

+
+``<x>`` parameter specifies the maximum number of work groups in the x dimension.
+Similarly ``<y>`` and ``<z>`` are for the y and z dimensions respectively.
+Each of the three numbers must be >=1. The attribute is ignored if any of the


I don't think 'ignored' is the right semantics here: that should diagnose.

erichkeane · 2024-02-20T15:45:10Z

clang/lib/Sema/SemaDeclAttr.cpp

+  if (!checkUInt32Argument(S, AL, NumWGZExpr, NumWGZ))
+    return;
+
+  if (NumWGX != 0 && NumWGY != 0 && NumWGZ != 0)


This needs to diagnose.

Pls elaborate.

Silently ignoring this is not acceptable. We need to diagnose this as an error/warning in the compiler.

arsenm · 2024-03-06T13:57:50Z

clang/docs/ReleaseNotes.rst

@@ -137,6 +137,12 @@ Removed Compiler Flags

 Attribute Changes in Clang
 --------------------------
+- Introduced a new function attribute ``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or


s/work_groups/workgroup/

There are existing attributes that have workgroup spelled as two separate words: flat-work-group-size, reqd_work_group_size.
Pls let me know if you still want workgroup as one word.

Ugh. The ISA manuals usually use "workgroup". read_work_group_size came from OpenCL. We made up amdgpu_flat_work_group_size

The backend facing parts seem more consistently to be workgroup. e.g in the HSA metadata, we have .workgroup_size_hint, .reqd_workgroup_size. As horrible as it is, maybe it's best to keep it this way for the clang attribute, and rename all the backend bits to be "workgroup"?

In the case of flat workgroup size, the LLVM attribute is called amdgpu-flat-work-group-size, but the metadata is .max_flat_workgroup_size. I suppose we can copy that and change the metadata from .max_num_work_groups_x to .max_num_workgroups_x and so on.

Do you want the LLVM attribute to be changed from amdgpu-max-num-work-groups to amdgpu-max-num-workgroups as well? Note that in the file AMDGPUUsage.rst the word work-group with a hyphen is used a lot.

I think some of the AMDGPUUsage work-groups are actually wrong based on the actual code.

I think ".max_num_workgroups" "amdgpu-max-num-workgroups" and "amdgpu_max_num_work_groups" is the most consistent with the existing uses

A new function attribute named amdgpu-num-work-groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information.

…ork groups." This reverts commit c4e460b.

to amdgpu-max-num-workgroups; clang attribute name unchanged.

Endilll

Sema.h changes look good to me.

shiltian · 2024-12-11T17:24:46Z

What is the best case (from the compiler optimization perspective) that we can expect from amdgpu-max-num-workgroups? For example, is a smaller value better? I’d imagine that with a smaller value, the compiler can assume each workgroup has access to more resources. If that’s the case, is 1 the best-case scenario? Obviously, 0 doesn’t make much sense.

@jwanggit86 @arsenm

krzysz00 · 2024-12-11T20:02:54Z

The main case I had in mind when adding the annotation was range()-like information: that is, the ability to infer nsw and friends on workgroup IDs and dimensions

shiltian · 2024-12-11T21:43:22Z

@krzysz00 Do you suggest that the actual value doesn't matter? Like, 128 is not better or worse than 256.

arsenm · 2024-12-11T22:15:27Z

If that’s the case, is 1 the best-case scenario?

Yes

@krzysz00 Do you suggest that the actual value doesn't matter? Like, 128 is not better or worse than 256.

This enables known bits optimizations. Less is always better, but the benefits can be marginal.

krzysz00 · 2024-12-12T01:09:34Z

Oh, yeah, agreed that known-bits data is marginal ... but a big pile of marginal improvements stacks up.

arsenm · 2024-12-12T01:23:38Z

Oh, yeah, agreed that known-bits data is marginal ... but a big pile of marginal improvements stacks up.

The main benefit would be is if we can reduce 64-bit indexing calculations down to 32-bit. Not sure if that's implemented anywhere

jwanggit86 added the backend:AMDGPU label Jan 22, 2024

jwanggit86 requested a review from arsenm January 22, 2024 18:50

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 22, 2024

jwanggit86 mentioned this pull request Jan 22, 2024

[AMDGPU] Adding the amdgpu-num-work-groups function attribute #75647

Closed

jwanggit86 requested a review from krzysz00 January 23, 2024 22:33

jwanggit86 force-pushed the attr-num-workgroups-xyz branch from 8ed74e1 to c4e460b Compare February 5, 2024 00:24

arsenm reviewed Feb 6, 2024

View reviewed changes

arsenm requested changes Feb 6, 2024

View reviewed changes

jwanggit86 requested a review from arsenm February 9, 2024 23:12

arsenm reviewed Feb 12, 2024

View reviewed changes

AaronBallman requested a review from erichkeane February 12, 2024 12:36

erichkeane reviewed Feb 12, 2024

View reviewed changes

jwanggit86 requested review from arsenm and erichkeane February 19, 2024 19:35

erichkeane reviewed Feb 20, 2024

View reviewed changes

arsenm reviewed Mar 6, 2024

View reviewed changes

jwanggit86 requested a review from arsenm March 6, 2024 19:28

Jun Wang added 15 commits March 7, 2024 12:48

Support 2 attributes: one for min and one for max number of work groups.

f267295

Revert "Support 2 attributes: one for min and one for max number of w…

dcb8f78

…ork groups." This reverts commit c4e460b.

Minor change to the test file

94ca4a7

Update based on code review.

b29bf4a

Code formatting.

9ac9f8c

Update AMDGPUUsage.rst and AttrDocs.td.

2945cc7

Add (1) diagnostics for the attribute (2) test cases in clang.

ed862b9

Fix formatting and AttrDocs.td.

42bc76e

Allow the attribute's elements to be template arguments.

e2df1e3

Updated release notes; added codegen testcases; minor code changes.

5469b1d

Fix formatting for release note.

7da2015

Make y and z parameters optional with devault value of 1.

7e6209a

Added test cases for largest allowed value.

4d0ab6e

Change the LLVM attribute name from amdgpu-max-num-work-groups

15cb5b5

to amdgpu-max-num-workgroups; clang attribute name unchanged.

jwanggit86 force-pushed the attr-num-workgroups-xyz branch from 36f195a to 15cb5b5 Compare March 7, 2024 19:54

jwanggit86 requested a review from Endilll as a code owner March 7, 2024 19:54

arsenm approved these changes Mar 12, 2024

View reviewed changes

Endilll reviewed Mar 12, 2024

View reviewed changes

jwanggit86 merged commit c4e517f into llvm:main Mar 12, 2024

jwanggit86 deleted the attr-num-workgroups-xyz branch March 12, 2024 17:31

		def AMDGPUNumWorkGroups : InheritableAttr {
		let Spellings = [Clang<"amdgpu_num_work_groups", 0>];

	``<x>`` parameter specifies the maximum number of work groups in the x dimentsion.
	``<x>`` parameter specifies the maximum number of work groups in the x dimension.

	``<x>`` parameter specifies the maximum number of work groups in the x dimension.
	The ``<x>`` parameter specifies the maximum number of work groups in the x dimension.

	Each of the three numbers must be >=1. The attribute is ignored if any of the
	Each of the three values must be greater than zero. The attribute is ignored if any of the

[AMDGPU] Adding the amdgpu-num-work-groups function attribute #79035

[AMDGPU] Adding the amdgpu-num-work-groups function attribute #79035

Uh oh!

Conversation

jwanggit86 commented Jan 22, 2024

Uh oh!

llvmbot commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jwanggit86 commented Jan 29, 2024

Uh oh!

krzysz00 commented Jan 29, 2024

Uh oh!

jwanggit86 commented Jan 29, 2024

Uh oh!

krzysz00 commented Jan 29, 2024

Uh oh!

jwanggit86 commented Jan 29, 2024

Uh oh!

krzysz00 commented Jan 30, 2024

Uh oh!

jwanggit86 commented Feb 5, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Feb 6, 2024

Uh oh!

arsenm commented Feb 6, 2024

Uh oh!

jwanggit86 commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm commented Feb 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Jan 22, 2024 •

edited

Loading

jwanggit86 commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 17, 2024 •

edited

Loading