[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

akroviakov · 2025-02-06T18:56:55Z

I try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this:

  const char *build_flags = "-cl-kernel-arg-info -x spir -cl-std=CL3.0";
  err = clBuildProgram(program, 1, &device, build_flags, NULL, NULL);

The current mangling does not seem to work:

unresolved external symbol _Z22get_sub_group_local_id

The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me?

Related question:
The similar call in OpenCL:

__kernel void kernel(__global int *data) { 
*data = get_sub_group_local_id();
};

leads to the following call in IGC (as an example) dump:

call spir_func i32 @_Z40__spirv_BuiltInSubgroupLocalInvocationIdv() #0

Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead?

llvmbot · 2025-02-06T18:57:28Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Artem Kroviakov (akroviakov)

Changes

I try to use LLVM's SPIRV backend to compile a GPU module into a binary string that is later consumed by OpenCL to create a program like this:

  const char *build_flags = "-cl-kernel-arg-info -x spir -cl-std=CL3.0";
  err = clBuildProgram(program, 1, &amp;device, build_flags, NULL, NULL);

The current mangling does not seem to work:

unresolved external symbol _Z22get_sub_group_local_id

The proposed fix (add parameter list to mangling) resolves the issue. Am I doing something wrong or why does the current mangled names not work for me?

Related question:
The similar call in OpenCL:

__kernel void kernel(__global int *data) { 
*data = get_sub_group_local_id();
};

leads to the following call in IGC dump:

call spir_func i32 @<!-- -->_Z40__spirv_BuiltInSubgroupLocalInvocationIdv() #<!-- -->0

Why doesn't llvm-spv pass lower to spirv built-ins (as the pass name suggests) and uses OpenCL instead?

Full diff: https://github.com/llvm/llvm-project/pull/126111.diff

2 Files Affected:

(modified) mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp (+4-4)
(modified) mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir (+8-8)

diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 8b6b553f6eed054..26bfa97fa34d058 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -401,13 +401,13 @@ struct GPUSubgroupOpConversion final : ConvertOpToLLVMPattern<SubgroupOp> {
                   ConversionPatternRewriter &rewriter) const final {
     constexpr StringRef funcName = [] {
       if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupIdOp>) {
-        return "_Z16get_sub_group_id";
+        return "_Z16get_sub_group_idv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::LaneIdOp>) {
-        return "_Z22get_sub_group_local_id";
+        return "_Z22get_sub_group_local_idv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::NumSubgroupsOp>) {
-        return "_Z18get_num_sub_groups";
+        return "_Z18get_num_sub_groupsv";
       } else if constexpr (std::is_same_v<SubgroupOp, gpu::SubgroupSizeOp>) {
-        return "_Z18get_sub_group_size";
+        return "_Z18get_sub_group_sizev";
       }
     }();
 
diff --git a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
index c2930971dbcf9b9..b3a9c33eb66d911 100644
--- a/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
+++ b/mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir
@@ -546,28 +546,28 @@ gpu.module @kernels {
 
 // Lowering of subgroup query operations
 
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_size() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groups() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_id() -> i32 attributes {no_unwind, will_return}
-// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_id() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_sub_group_sizev() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z18get_num_sub_groupsv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z22get_sub_group_local_idv() -> i32 attributes {no_unwind, will_return}
+// CHECK-DAG: llvm.func spir_funccc @_Z16get_sub_group_idv() -> i32 attributes {no_unwind, will_return}
 
 
 gpu.module @subgroup_operations {
 // CHECK-LABEL: @gpu_subgroup
   func.func @gpu_subgroup() {
-    // CHECK:       %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_id() {no_unwind, will_return} : () -> i32
+    // CHECK:       %[[SG_ID:.*]] = llvm.call spir_funccc @_Z16get_sub_group_idv() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64           %{{.*}} = llvm.zext %[[SG_ID]] : i32 to i64
     %0 = gpu.subgroup_id : index
-    // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_id() {no_unwind, will_return}  : () -> i32
+    // CHECK: %[[SG_LOCAL_ID:.*]] = llvm.call spir_funccc @_Z22get_sub_group_local_idv() {no_unwind, will_return}  : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[SG_LOCAL_ID]] : i32 to i64
     %1 = gpu.lane_id
-    // CHECK:     %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groups() {no_unwind, will_return} : () -> i32
+    // CHECK:     %[[NUM_SGS:.*]] = llvm.call spir_funccc @_Z18get_num_sub_groupsv() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[NUM_SGS]] : i32 to i64
     %2 = gpu.num_subgroups : index
-    // CHECK:     %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_size() {no_unwind, will_return} : () -> i32
+    // CHECK:     %[[SG_SIZE:.*]] = llvm.call spir_funccc @_Z18get_sub_group_sizev() {no_unwind, will_return} : () -> i32
     // CHECK-32-NOT:                llvm.zext
     // CHECK-64:          %{{.*}} = llvm.zext %[[SG_SIZE]] : i32 to i64
     %3 = gpu.subgroup_size : index

akroviakov · 2025-02-07T08:55:29Z

There seem to be no automatic reviewers, I suppose I should ping the original author @FMarno .

FMarno · 2025-02-07T10:22:45Z

@akroviakov Sorry, I must have made a mistake there. I see that v is the type encoding for a "void" parameter. It was working for the SPIR-V driver I was using at the time. Potentially the SPIRV built-in names would work also.
I've passed this on to my colleague @sommerlukas.

sommerlukas · 2025-02-07T15:01:07Z

The change makes sense to me.

Maybe @VyacheslavLevytskyy as the main contributor and maintainer of the LLVM SPIR-V backend can chime in here and advise on the best representation for these builtins, including OpenCL vs SPIR-V.

VyacheslavLevytskyy · 2025-02-13T17:22:48Z

@akroviakov May I ask you please to paste here a reproducer as a LLVM IR input (i.e., an intermediate step of producing SPIR-V from MLIR input) that results eventually into unresolved external symbol _Z22get_sub_group_local_id error? I'd like to have a closer look of what's going on the SPIR-V backend side.

akroviakov · 2025-02-15T10:09:25Z

As an example, this mlir:

module @gemm attributes {gpu.container_module} {
  gpu.module @kernel {
    gpu.func @store_constant(%ptr: !llvm.ptr<1>) kernel {
      %const_val = arith.constant 42.0 : f32
      %laneid = gpu.lane_id
      %laneid _i64 = arith.index_cast %laneid : index to i64
      %ptr_next_1 = llvm.getelementptr %ptr[%laneid] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, i32
      llvm.store %const_val, %ptr_next_1 : f32, !llvm.ptr<1>
      gpu.return
    }
  }

results in this LLVM IR that later leads to unresolved external symbol error:

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64-G1"
target triple = "spirv64-unknown-unknown"

; Function Attrs: mustprogress nounwind willreturn
declare spir_func i32 @_Z22get_sub_group_local_id() local_unnamed_addr #0

; Function Attrs: mustprogress nounwind willreturn
define spir_kernel void @store_constant(ptr addrspace(1) writeonly captures(none) %0) local_unnamed_addr #0 !intel_reqd_sub_group_size !1 {
  %2 = tail call spir_func i32 @_Z22get_sub_group_local_id() #1
  %3 = zext i32 %2 to i64
  %4 = getelementptr i32, ptr addrspace(1) %0, i64 %3
  store float 4.200000e+01, ptr addrspace(1) %4, align 4
  ret void
}

attributes #0 = { mustprogress nounwind willreturn }
attributes #1 = { nounwind willreturn }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 16}

VyacheslavLevytskyy · 2025-02-17T12:37:29Z

@akroviakov I understand a rationale behind using an exact mangled name of a bultin in the version that a target environment recognizes. You wouldn't rely on a middleware SPIR-V encoding layer then, and the function call is passing over the SPIR-V backend business logic. However, the root cause of this PR's problem is not mangling but the fact that this builtin was not known to the SPIR-V backend. Please have a look to #127242 -- it add support for this and more OpenCL builtins, so that your lowering should work properly now in either of 3 forms:
%r = tail call spir_func i32 @get_sub_group_local_id()
or
%r = tail call spir_func i32 @_Z22get_sub_group_local_id()
or
%r = tail call spir_func i32 @_Z22get_sub_group_local_idv()

In other words, after #127242 you no longer need to rely on the exact mangled name, the SPIR-V backend will now correctly recognize and map such OpenCL builtins (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables) to SPIR-V builtin variables (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_builtin).

akroviakov · 2025-02-17T14:40:21Z

Thank you for addressing it in SPIR-V backend. I close this PR in favor of #127242 .

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling

887a0e6

llvmbot added mlir:gpu mlir labels Feb 6, 2025

adam-smnk requested a review from sommerlukas February 7, 2025 14:57

VyacheslavLevytskyy mentioned this pull request Feb 17, 2025

[SPIR-V] Built-in variables: mapping from an OpenCL to SPIR-V BuiltIn and SPIR-V friendly builtins for Image Read/Write instructions #127242

Merged

akroviakov closed this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

Uh oh!

akroviakov commented Feb 6, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 6, 2025 •

edited

Loading

Uh oh!

akroviakov commented Feb 7, 2025

Uh oh!

FMarno commented Feb 7, 2025

Uh oh!

sommerlukas commented Feb 7, 2025

Uh oh!

VyacheslavLevytskyy commented Feb 13, 2025

Uh oh!

akroviakov commented Feb 15, 2025

Uh oh!

VyacheslavLevytskyy commented Feb 17, 2025

Uh oh!

akroviakov commented Feb 17, 2025

Uh oh!

Uh oh!

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

[MLIR][GPUToLLVMSPV] Fix subgroup ops mangling #126111

Uh oh!

Conversation

akroviakov commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akroviakov commented Feb 7, 2025

Uh oh!

FMarno commented Feb 7, 2025

Uh oh!

sommerlukas commented Feb 7, 2025

Uh oh!

VyacheslavLevytskyy commented Feb 13, 2025

Uh oh!

akroviakov commented Feb 15, 2025

Uh oh!

VyacheslavLevytskyy commented Feb 17, 2025

Uh oh!

akroviakov commented Feb 17, 2025

Uh oh!

Uh oh!

akroviakov commented Feb 6, 2025 •

edited

Loading

llvmbot commented Feb 6, 2025 •

edited

Loading