Skip to content

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

akroviakov
Copy link
Contributor

@akroviakov akroviakov commented Feb 6, 2025

I try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this:

  const char *build_flags = "-cl-kernel-arg-info -x spir -cl-std=CL3.0";
  err = clBuildProgram(program, 1, &device, build_flags, NULL, NULL);

The current mangling does not seem to work:

unresolved external symbol _Z22get_sub_group_local_id

The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me?


Related question:
The similar call in OpenCL:

__kernel void kernel(__global int *data) { 
*data = get_sub_group_local_id();
};

leads to the following call in IGC (as an example) dump:

call spir_func i32 @_Z40__spirv_BuiltInSubgroupLocalInvocationIdv() #0

Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead?

@llvmbot
Copy link
Member

llvmbot commented Feb 6, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Artem Kroviakov (akroviakov)

Changes

I try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this:

  const char *build_flags = "-cl-kernel-arg-info -x spir -cl-std=CL3.0";
  err = clBuildProgram(program, 1, &device, build_flags, NULL, NULL);

The current mangling does not seem to work:

unresolved external symbol _Z22get_sub_group_local_id

The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me?


Related question:
The similar call in OpenCL:

__kernel void kernel(__global int *data) { 
*data = get_sub_group_local_id();
};

leads to the following call in IGC dump:

call spir_func i32 @<!-- -->_Z40__spirv_BuiltInSubgroupLocalInvocationIdv() #<!-- -->0

Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead?


Full diff: https://github.com/llvm/llvm-project/pull/126111.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp (+4-4)
  • (modified) mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir (+8-8)
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 8b6b553f6eed054..26bfa97fa34d058 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -401,13 +401,13 @@ struct GPUSubgroupOpConversion final : ConvertOpToLLVMPattern<SubgroupOp> {
                   ConversionPatternRewriter &rewriter) const final {
     constexpr StringRef funcName = [] {
       if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupIdOp>) {
-        return "_Z16get_sub_group_id";
+        return "_Z16get_sub_group_idv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::LaneIdOp>) {
-        return "_Z22get_sub_group_local_id";
+        return "_Z22get_sub_group_local_idv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::NumSubgroupsOp>) {
-        return "_Z18get_num_sub_groups";
+        return "_Z18get_num_sub_groupsv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupSizeOp>) {
-        return "_Z18get_sub_group_size";
+        return "_Z18get_sub_group_sizev";
       }
     }();
 
diff --git a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
index c2930971dbcf9b9..b3a9c33eb66d911 100644
--- a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
+++ b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
@@ -546,28 +546,28 @@ gpu.module @kernels {
 
 // Lowering of subgroup query operations
 
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_size() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groups() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_id() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_id() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_sizev() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groupsv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_idv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_idv() -> i32 attributes {no_unwind, will_return}
 
 
 gpu.module @subgroup_operations {
 // CHECK-LABEL: @gpu_subgroup
   func.func @gpu_subgroup() {
-    // CHECK:       %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_id() {no_unwind, will_return} : () -> i32
+    // CHECK:       %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_idv() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64           %{{.*}} = llvm.zext %[[SG_ID]] : i32 to i64
     %0 = gpu.subgroup_id : index
-    // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_id() {no_unwind, will_return}  : () -> i32
+    // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_idv() {no_unwind, will_return}  : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[SG_LOCAL_ID]] : i32 to i64
     %1 = gpu.lane_id
-    // CHECK:     %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groups() {no_unwind, will_return} : () -> i32
+    // CHECK:     %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groupsv() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[NUM_SGS]] : i32 to i64
     %2 = gpu.num_subgroups : index
-    // CHECK:     %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_size() {no_unwind, will_return} : () -> i32
+    // CHECK:     %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_sizev() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[SG_SIZE]] : i32 to i64
     %3 = gpu.subgroup_size : index

@akroviakov
Copy link
Contributor Author

There seem to be no automatic reviewers, I suppose I should ping the original author @FMarno .

@FMarno
Copy link
Contributor

FMarno commented Feb 7, 2025

@akroviakov Sorry, I must have made a mistake there. I see that v is the type encoding for a "void" parameter. It was working for the SPIR-V driver I was using at the time. Potentially the SPIRV built-in names would work also.
I've passed this on to my colleague @sommerlukas.

@adam-smnk adam-smnk requested a review from sommerlukas February 7, 2025 14:57
@sommerlukas
Copy link
Contributor

The change makes sense to me.

Maybe @VyacheslavLevytskyy as the main contributor and maintainer of the LLVM SPIR-V backend can chime in here and advise on the best representation for these builtins, including OpenCL vs SPIR-V.

@VyacheslavLevytskyy
Copy link
Contributor

@akroviakov May I ask you please to paste here a reproducer as a LLVM IR input (i.e., an intermediate step of producing SPIR-V from MLIR input) that results eventually into unresolved external symbol _Z22get_sub_group_local_id error? I'd like to have a closer look of what's going on the SPIR-V backend side.

@akroviakov
Copy link
Contributor Author

As an example, this mlir:

module @gemm attributes {gpu.container_module} {
  gpu.module @kernel {
    gpu.func @store_constant(%ptr: !llvm.ptr<1>) kernel {
      %const_val = arith.constant 42.0 : f32
      %laneid = gpu.lane_id
      %laneid _i64 = arith.index_cast %laneid : index to i64
      %ptr_next_1 = llvm.getelementptr %ptr[%laneid] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, i32
      llvm.store %const_val, %ptr_next_1 : f32, !llvm.ptr<1>
      gpu.return
    }
  }

results in this LLVM IR that later leads to unresolved external symbol error:

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64-G1"
target triple = "spirv64-unknown-unknown"

; Function Attrs: mustprogress nounwind willreturn
declare spir_func i32 @_Z22get_sub_group_local_id() local_unnamed_addr #0

; Function Attrs: mustprogress nounwind willreturn
define spir_kernel void @store_constant(ptr addrspace(1) writeonly captures(none) %0) local_unnamed_addr #0 !intel_reqd_sub_group_size !1 {
  %2 = tail call spir_func i32 @_Z22get_sub_group_local_id() #1
  %3 = zext i32 %2 to i64
  %4 = getelementptr i32, ptr addrspace(1) %0, i64 %3
  store float 4.200000e+01, ptr addrspace(1) %4, align 4
  ret void
}

attributes #0 = { mustprogress nounwind willreturn }
attributes #1 = { nounwind willreturn }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 16}

@VyacheslavLevytskyy
Copy link
Contributor

@akroviakov I understand a rationale behind using an exact mangled name of a bultin in the version that a target environment recognizes. You wouldn't rely on a middleware SPIR-V encoding layer then, and the function call is passing over the SPIR-V backend business logic. However, the root cause of this PR's problem is not mangling but the fact that this builtin was not known to the SPIR-V backend. Please have a look to #127242 -- it add support for this and more OpenCL builtins, so that your lowering should work properly now in either of 3 forms:
%r = tail call spir_func i32 @get_sub_group_local_id()
or
%r = tail call spir_func i32 @_Z22get_sub_group_local_id()
or
%r = tail call spir_func i32 @_Z22get_sub_group_local_idv()

In other words, after #127242 you no longer need to rely on the exact mangled name, the SPIR-V backend will now correctly recognize and map such OpenCL builtins (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables) to SPIR-V builtin variables (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_builtin).

@akroviakov
Copy link
Contributor Author

Thank you for addressing it in SPIR-V backend. I close this PR in favor of #127242 .

@akroviakov akroviakov closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants