[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077

krzysz00 · 2024-01-22T23:41:52Z

GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choice.

In the event it is needed,t his commit adds
rocdl.uniform_work_group_size as an attribute to be set on LLVM functions that can be used to override the default.

llvmbot · 2024-01-22T23:42:24Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Krzysztof Drewniak (krzysz00)

Changes

GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choiec.

In the event it is needed,t his commit adds
rocdl.uniform_wrok_group_size as an attribute to be set on LLVM functions that can be used to override the default.

Full diff: https://github.com/llvm/llvm-project/pull/79077.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td (+6)
(modified) mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp (+19-1)
(modified) mlir/test/Target/LLVMIR/rocdl.mlir (+8-1)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 48b830ae34f2922..92d5e7127821604 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -37,6 +37,12 @@ def ROCDL_Dialect : Dialect {
     static constexpr ::llvm::StringLiteral getReqdWorkGroupSizeAttrName() {
       return ::llvm::StringLiteral("rocdl.reqd_work_group_size");
     }
+    /// MLIR's gpu-related infrastructure effectively assume uniform workgroup
+    /// sizes, so this attribute defaults to "true" on `rocdl.kernel` functions.
+    /// It is provided here to allow overriding this assumption.
+    static constexpr ::llvm::StringLiteral getUniformWorkGroupSizeAttrName() {
+      return ::llvm::StringLiteral("rocdl.uniform_work_group_size");
+    }
 
     /// The address space value that represents global memory.
     static constexpr unsigned kGlobalMemoryAddressSpace = 1;
diff --git a/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
index cbce23fd580e755..9d44c087b41226c 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
@@ -99,6 +99,12 @@ class ROCDLDialectLLVMIRTranslationInterface
       if (!llvmFunc->hasFnAttribute("amdgpu-flat-work-group-size")) {
         llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1,256");
       }
+      // MLIR's GPU kernel APIs all assume and produce uniformly-sized
+      // workgroups, so the lowering of the `rocdl.kernel` marker encodes this
+      // assumption. This assumption may be overridden by setting
+      // `rocdl.uniform_work_group_size` on a given function.
+      if (!llvmFunc->hasFnAttribute("uniform-work-group-size"))
+        llvmFunc->addFnAttr("uniform-work-group-size", "true");
     }
     // Override flat-work-group-size
     // TODO: update clients to rocdl.flat_work_group_size instead,
@@ -133,7 +139,19 @@ class ROCDLDialectLLVMIRTranslationInterface
       llvmAttrValue.append(value.getValue());
       llvmFunc->addFnAttr("amdgpu-flat-work-group-size", llvmAttrValue);
     }
-
+    if (ROCDL::ROCDLDialect::getUniformWorkGroupSizeAttrName() ==
+        attribute.getName()) {
+      auto func = dyn_cast<LLVM::LLVMFuncOp>(op);
+      if (!func)
+        return failure();
+      auto value = dyn_cast<BoolAttr>(attribute.getValue());
+      if (!value)
+        return failure();
+      llvm::Function *llvmFunc =
+          moduleTranslation.lookupFunction(func.getName());
+      llvmFunc->addFnAttr("uniform-work-group-size",
+                          value.getValue() ? "true" : "false");
+    }
     // Set reqd_work_group_size metadata
     if (ROCDL::ROCDLDialect::getReqdWorkGroupSizeAttrName() ==
         attribute.getName()) {
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3c9c70711ae2304..0041b959a5a1d26 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -56,6 +56,12 @@ llvm.func @known_block_sizes()
   llvm.return
 }
 
+llvm.func @kernel_func_no_uniform_work_groups() attributes {rocdl.kernel, rocdl.uniform_work_group_size = false} {
+  // CHECK-LABEL: amdgpu_kernel void @kernel_func_no_uniform_work_groups()
+  // CHECK: #[[$KERNEL_NO_UNIFORM_WORK_GROUPS_ATTRS:[0-9]+]]
+  llvm.return
+}
+
 llvm.func @rocdl.lane_id() -> i32 {
   // CHECK: [[mbcntlo:%.+]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
   // CHECK-NEXT: call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 [[mbcntlo]])
@@ -489,8 +495,9 @@ llvm.func @rocdl_8bit_floats(%source: i32, %stoch: i32) -> i32 {
   llvm.return %source5 : i32
 }
 
-// CHECK-DAG: attributes #[[$KERNEL_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" }
+// CHECK-DAG: attributes #[[$KERNEL_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" "uniform-work-group-size"="true" }
 // CHECK-DAG: attributes #[[$KERNEL_WORKGROUP_ATTRS]] = { "amdgpu-flat-work-group-size"="1,1024"
 // CHECK-DAG: attributes #[[$KNOWN_BLOCK_SIZE_ATTRS]] = { "amdgpu-flat-work-group-size"="128,128"
+// CHECK-DAG: attributes #[[$KERNEL_NO_UNIFORM_WORK_GROUPS_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" "uniform-work-group-size"="false" }
 // CHECK-DAG: ![[$RANGE]] = !{i32 0, i32 64}
 // CHECK-DAG: ![[$REQD_WORK_GROUP_SIZE]] = !{i32 16, i32 4, i32 2}

arsenm · 2024-02-13T18:38:10Z

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp

+      // workgroups, so the lowering of the `rocdl.kernel` marker encodes this
+      // assumption. This assumption may be overridden by setting
+      // `rocdl.uniform_work_group_size` on a given function.
+      if (!llvmFunc->hasFnAttribute("uniform-work-group-size"))


Why do you need to check if the attribute is already set on the function? Aren't you constructing this fresh IR?

I don't have guarantees about whether we'll process rocdl.kernel or rocdl.uniform_work_group_size first, so if I've already encountered a manual definition of uniform work group size, I don't want to overwrite it with the default

This seems like a peculiar process but I have no idea how MLIR does things. I'd expect it to look like clang where the full set of attributes is considered when initially emitting the IR attributes

Yeah, this is a hook amendOperation. It gets called when you have a rocdl attribute on an operation being translated, once per attribute.
So given

llvm.func @foo(...) attributes {llvm.something = N : i64, rocdl.kernel, rocdl.uniform_work_group_size = true}

The LLVM dialect is responsible for translating llvm.func into LLVM IR, and then you'll end up with three amendOperation() calls in some order, one to the LLVM dialect's, two to ROCDL's.

arsenm · 2024-02-13T19:33:05Z

Typo wrok in description

fabianmcg

Overall LGTM, see single comments.

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp

fabianmcg

LGTM! Thanks for adding the errors!
Small NIT, is there a reason for Twine(attribute.getName()) + "..." instead of attribute.getName() + "..." ?. I think the latter should also create a Twine.

GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choiec. In the event it is needed,t his commit adds `rocdl.uniform_wrok_group_size` as an attribute to be set on LLVM functions that can be used to override the default. In addition, add proper failure messages to translation

krzysz00 · 2024-02-27T18:33:00Z

@fabianmcg Re the nit, I'm pretty sure that + on StringAttr (or StringRef, either way) and a literal doesn't create a Twine by default, so one of the arguments has to get an explicit constructor.

llvmbot added mlir:llvm mlir labels Jan 22, 2024

krzysz00 force-pushed the uniform-work-group-mlir branch from a12f902 to 1eedc8a Compare January 23, 2024 20:43

krzysz00 requested review from arsenm, joker-eph and fabianmcg February 13, 2024 15:41

arsenm reviewed Feb 13, 2024

View reviewed changes

fabianmcg reviewed Feb 15, 2024

View reviewed changes

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp Outdated Show resolved Hide resolved

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp Outdated Show resolved Hide resolved

krzysz00 requested a review from fabianmcg February 19, 2024 17:37

fabianmcg approved these changes Feb 20, 2024

View reviewed changes

krzysz00 force-pushed the uniform-work-group-mlir branch from 4efa86b to f25985d Compare February 27, 2024 16:52

krzysz00 merged commit 563f414 into llvm:main Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077

[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077

Uh oh!

krzysz00 commented Jan 22, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jan 22, 2024 •

edited

Loading

Uh oh!

arsenm Feb 13, 2024

Uh oh!

krzysz00 Feb 13, 2024

Uh oh!

arsenm Feb 14, 2024

Uh oh!

krzysz00 Feb 14, 2024

Uh oh!

arsenm commented Feb 13, 2024

Uh oh!

fabianmcg left a comment

Uh oh!

Uh oh!

Uh oh!

fabianmcg left a comment •

edited

Loading

Uh oh!

krzysz00 commented Feb 27, 2024

Uh oh!

Uh oh!

[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077

[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077

Uh oh!

Conversation

krzysz00 commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm commented Feb 13, 2024

Uh oh!

fabianmcg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fabianmcg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Feb 27, 2024

Uh oh!

Uh oh!

krzysz00 commented Jan 22, 2024 •

edited

Loading

llvmbot commented Jan 22, 2024 •

edited

Loading

fabianmcg left a comment •

edited

Loading