-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][AMDGPU] Set uniform-work-group-size=true by default #79077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-llvm Author: Krzysztof Drewniak (krzysz00) ChangesGPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choiec. In the event it is needed,t his commit adds Full diff: https://github.com/llvm/llvm-project/pull/79077.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 48b830ae34f2922..92d5e7127821604 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -37,6 +37,12 @@ def ROCDL_Dialect : Dialect {
static constexpr ::llvm::StringLiteral getReqdWorkGroupSizeAttrName() {
return ::llvm::StringLiteral("rocdl.reqd_work_group_size");
}
+ /// MLIR's gpu-related infrastructure effectively assume uniform workgroup
+ /// sizes, so this attribute defaults to "true" on `rocdl.kernel` functions.
+ /// It is provided here to allow overriding this assumption.
+ static constexpr ::llvm::StringLiteral getUniformWorkGroupSizeAttrName() {
+ return ::llvm::StringLiteral("rocdl.uniform_work_group_size");
+ }
/// The address space value that represents global memory.
static constexpr unsigned kGlobalMemoryAddressSpace = 1;
diff --git a/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
index cbce23fd580e755..9d44c087b41226c 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
@@ -99,6 +99,12 @@ class ROCDLDialectLLVMIRTranslationInterface
if (!llvmFunc->hasFnAttribute("amdgpu-flat-work-group-size")) {
llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1,256");
}
+ // MLIR's GPU kernel APIs all assume and produce uniformly-sized
+ // workgroups, so the lowering of the `rocdl.kernel` marker encodes this
+ // assumption. This assumption may be overridden by setting
+ // `rocdl.uniform_work_group_size` on a given function.
+ if (!llvmFunc->hasFnAttribute("uniform-work-group-size"))
+ llvmFunc->addFnAttr("uniform-work-group-size", "true");
}
// Override flat-work-group-size
// TODO: update clients to rocdl.flat_work_group_size instead,
@@ -133,7 +139,19 @@ class ROCDLDialectLLVMIRTranslationInterface
llvmAttrValue.append(value.getValue());
llvmFunc->addFnAttr("amdgpu-flat-work-group-size", llvmAttrValue);
}
-
+ if (ROCDL::ROCDLDialect::getUniformWorkGroupSizeAttrName() ==
+ attribute.getName()) {
+ auto func = dyn_cast<LLVM::LLVMFuncOp>(op);
+ if (!func)
+ return failure();
+ auto value = dyn_cast<BoolAttr>(attribute.getValue());
+ if (!value)
+ return failure();
+ llvm::Function *llvmFunc =
+ moduleTranslation.lookupFunction(func.getName());
+ llvmFunc->addFnAttr("uniform-work-group-size",
+ value.getValue() ? "true" : "false");
+ }
// Set reqd_work_group_size metadata
if (ROCDL::ROCDLDialect::getReqdWorkGroupSizeAttrName() ==
attribute.getName()) {
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3c9c70711ae2304..0041b959a5a1d26 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -56,6 +56,12 @@ llvm.func @known_block_sizes()
llvm.return
}
+llvm.func @kernel_func_no_uniform_work_groups() attributes {rocdl.kernel, rocdl.uniform_work_group_size = false} {
+ // CHECK-LABEL: amdgpu_kernel void @kernel_func_no_uniform_work_groups()
+ // CHECK: #[[$KERNEL_NO_UNIFORM_WORK_GROUPS_ATTRS:[0-9]+]]
+ llvm.return
+}
+
llvm.func @rocdl.lane_id() -> i32 {
// CHECK: [[mbcntlo:%.+]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
// CHECK-NEXT: call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 [[mbcntlo]])
@@ -489,8 +495,9 @@ llvm.func @rocdl_8bit_floats(%source: i32, %stoch: i32) -> i32 {
llvm.return %source5 : i32
}
-// CHECK-DAG: attributes #[[$KERNEL_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" }
+// CHECK-DAG: attributes #[[$KERNEL_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" "uniform-work-group-size"="true" }
// CHECK-DAG: attributes #[[$KERNEL_WORKGROUP_ATTRS]] = { "amdgpu-flat-work-group-size"="1,1024"
// CHECK-DAG: attributes #[[$KNOWN_BLOCK_SIZE_ATTRS]] = { "amdgpu-flat-work-group-size"="128,128"
+// CHECK-DAG: attributes #[[$KERNEL_NO_UNIFORM_WORK_GROUPS_ATTRS]] = { "amdgpu-flat-work-group-size"="1,256" "uniform-work-group-size"="false" }
// CHECK-DAG: ![[$RANGE]] = !{i32 0, i32 64}
// CHECK-DAG: ![[$REQD_WORK_GROUP_SIZE]] = !{i32 16, i32 4, i32 2}
|
a12f902
to
1eedc8a
Compare
// workgroups, so the lowering of the `rocdl.kernel` marker encodes this | ||
// assumption. This assumption may be overridden by setting | ||
// `rocdl.uniform_work_group_size` on a given function. | ||
if (!llvmFunc->hasFnAttribute("uniform-work-group-size")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to check if the attribute is already set on the function? Aren't you constructing this fresh IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have guarantees about whether we'll process rocdl.kernel
or rocdl.uniform_work_group_size
first, so if I've already encountered a manual definition of uniform work group size, I don't want to overwrite it with the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a peculiar process but I have no idea how MLIR does things. I'd expect it to look like clang where the full set of attributes is considered when initially emitting the IR attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a hook amendOperation
. It gets called when you have a rocdl
attribute on an operation being translated, once per attribute.
So given
llvm.func @foo(...) attributes {llvm.something = N : i64, rocdl.kernel, rocdl.uniform_work_group_size = true}
The LLVM dialect is responsible for translating llvm.func
into LLVM IR, and then you'll end up with three amendOperation()
calls in some order, one to the LLVM dialect's, two to ROCDL's.
Typo wrok in description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, see single comments.
mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding the errors!
Small NIT, is there a reason for Twine(attribute.getName()) + "..."
instead of attribute.getName() + "..."
?. I think the latter should also create a Twine.
GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choiec. In the event it is needed,t his commit adds `rocdl.uniform_wrok_group_size` as an attribute to be set on LLVM functions that can be used to override the default. In addition, add proper failure messages to translation
4efa86b
to
f25985d
Compare
@fabianmcg Re the nit, I'm pretty sure that |
GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choice.
In the event it is needed,t his commit adds
rocdl.uniform_work_group_size
as an attribute to be set on LLVM functions that can be used to override the default.