-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-gpu Author: Artem Kroviakov (akroviakov) ChangesI try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this: const char *build_flags = "-cl-kernel-arg-info -x spir -cl-std=CL3.0";
err = clBuildProgram(program, 1, &device, build_flags, NULL, NULL); The current mangling does not seem to work:
The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me? Related question: __kernel void kernel(__global int *data) {
*data = get_sub_group_local_id();
}; leads to the following call in IGC dump: call spir_func i32 @<!-- -->_Z40__spirv_BuiltInSubgroupLocalInvocationIdv() #<!-- -->0 Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead? Full diff: https://github.com/llvm/llvm-project/pull/126111.diff 2 Files Affected:
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 8b6b553f6eed054..26bfa97fa34d058 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -401,13 +401,13 @@ struct GPUSubgroupOpConversion final : ConvertOpToLLVMPattern<SubgroupOp> {
ConversionPatternRewriter &rewriter) const final {
constexpr StringRef funcName = [] {
if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupIdOp>) {
- return "_Z16get_sub_group_id";
+ return "_Z16get_sub_group_idv";
} else if constexpr (std::is_same_v<SubgroupOp, gpu::LaneIdOp>) {
- return "_Z22get_sub_group_local_id";
+ return "_Z22get_sub_group_local_idv";
} else if constexpr (std::is_same_v<SubgroupOp, gpu::NumSubgroupsOp>) {
- return "_Z18get_num_sub_groups";
+ return "_Z18get_num_sub_groupsv";
} else if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupSizeOp>) {
- return "_Z18get_sub_group_size";
+ return "_Z18get_sub_group_sizev";
}
}();
diff --git a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
index c2930971dbcf9b9..b3a9c33eb66d911 100644
--- a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
+++ b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
@@ -546,28 +546,28 @@ gpu.module @kernels {
// Lowering of subgroup query operations
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_size() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groups() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_id() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_id() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_sizev() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groupsv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_idv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_idv() -> i32 attributes {no_unwind, will_return}
gpu.module @subgroup_operations {
// CHECK-LABEL: @gpu_subgroup
func.func @gpu_subgroup() {
- // CHECK: %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_id() {no_unwind, will_return} : () -> i32
+ // CHECK: %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_idv() {no_unwind, will_return} : () -> i32
// CHECK-32-NOT: llvm.zext
// CHECK-64 %{{.*}} = llvm.zext %[[SG_ID]] : i32 to i64
%0 = gpu.subgroup_id : index
- // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_id() {no_unwind, will_return} : () -> i32
+ // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_idv() {no_unwind, will_return} : () -> i32
// CHECK-32-NOT: llvm.zext
// CHECK-64: %{{.*}} = llvm.zext %[[SG_LOCAL_ID]] : i32 to i64
%1 = gpu.lane_id
- // CHECK: %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groups() {no_unwind, will_return} : () -> i32
+ // CHECK: %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groupsv() {no_unwind, will_return} : () -> i32
// CHECK-32-NOT: llvm.zext
// CHECK-64: %{{.*}} = llvm.zext %[[NUM_SGS]] : i32 to i64
%2 = gpu.num_subgroups : index
- // CHECK: %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_size() {no_unwind, will_return} : () -> i32
+ // CHECK: %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_sizev() {no_unwind, will_return} : () -> i32
// CHECK-32-NOT: llvm.zext
// CHECK-64: %{{.*}} = llvm.zext %[[SG_SIZE]] : i32 to i64
%3 = gpu.subgroup_size : index
|
There seem to be no automatic reviewers, I suppose I should ping the original author @FMarno . |
@akroviakov Sorry, I must have made a mistake there. I see that |
The change makes sense to me. Maybe @VyacheslavLevytskyy as the main contributor and maintainer of the LLVM SPIR-V backend can chime in here and advise on the best representation for these builtins, including OpenCL vs SPIR-V. |
@akroviakov May I ask you please to paste here a reproducer as a LLVM IR input (i.e., an intermediate step of producing SPIR-V from MLIR input) that results eventually into |
As an example, this mlir: module @gemm attributes {gpu.container_module} {
gpu.module @kernel {
gpu.func @store_constant(%ptr: !llvm.ptr<1>) kernel {
%const_val = arith.constant 42.0 : f32
%laneid = gpu.lane_id
%laneid _i64 = arith.index_cast %laneid : index to i64
%ptr_next_1 = llvm.getelementptr %ptr[%laneid] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, i32
llvm.store %const_val, %ptr_next_1 : f32, !llvm.ptr<1>
gpu.return
}
} results in this LLVM IR that later leads to ; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64-G1"
target triple = "spirv64-unknown-unknown"
; Function Attrs: mustprogress nounwind willreturn
declare spir_func i32 @_Z22get_sub_group_local_id() local_unnamed_addr #0
; Function Attrs: mustprogress nounwind willreturn
define spir_kernel void @store_constant(ptr addrspace(1) writeonly captures(none) %0) local_unnamed_addr #0 !intel_reqd_sub_group_size !1 {
%2 = tail call spir_func i32 @_Z22get_sub_group_local_id() #1
%3 = zext i32 %2 to i64
%4 = getelementptr i32, ptr addrspace(1) %0, i64 %3
store float 4.200000e+01, ptr addrspace(1) %4, align 4
ret void
}
attributes #0 = { mustprogress nounwind willreturn }
attributes #1 = { nounwind willreturn }
!llvm.module.flags = !{!0}
!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 16} |
@akroviakov I understand a rationale behind using an exact mangled name of a bultin in the version that a target environment recognizes. You wouldn't rely on a middleware SPIR-V encoding layer then, and the function call is passing over the SPIR-V backend business logic. However, the root cause of this PR's problem is not mangling but the fact that this builtin was not known to the SPIR-V backend. Please have a look to #127242 -- it add support for this and more OpenCL builtins, so that your lowering should work properly now in either of 3 forms: In other words, after #127242 you no longer need to rely on the exact mangled name, the SPIR-V backend will now correctly recognize and map such OpenCL builtins (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables) to SPIR-V builtin variables (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_builtin). |
Thank you for addressing it in SPIR-V backend. I close this PR in favor of #127242 . |
I try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this:
The current mangling does not seem to work:
The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me?
Related question:
The similar call in OpenCL:
leads to the following call in IGC (as an example) dump:
Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead?