[mlir] lower min/maxnum to libdevice calls #127323

ftynse · 2025-02-15T13:54:34Z

Introduce lowering from arith.minnum/maxxnum operations to the corresponding Nvidia libdevice calls. This requires to reorder pattern population methods so that the libdevice-targeting patterns are prioritized over default patterns targeting LLVM IR intrinsics from the Arith dialect. The tests are placed into a separate file as the existing gpu-to-nvvm.mlir files has a mode that forces Arith dialect operations to be preserved as is without using a separate FileCheck tag to differentiate.

llvmbot · 2025-02-15T13:55:06Z

@llvm/pr-subscribers-mlir-gpu

Author: Oleksandr "Alex" Zinenko (ftynse)

Changes

Introduce lowering from arith.minnum/maxxnum operations to the corresponding Nvidia libdevice calls. This requires to reorder pattern population methods so that the libdevice-targeting patterns are prioritized over default patterns targeting LLVM IR intrinsics from the Arith dialect. The tests are placed into a separate file as the existing gpu-to-nvvm.mlir files has a mode that forces Arith dialect operations to be preserved as is without using a separate FileCheck tag to differentiate.

Full diff: https://github.com/llvm/llvm-project/pull/127323.diff

2 Files Affected:

(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+7-1)
(added) mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir (+21)

diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index 35330f870e6ae..c1a4d31861d3b 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -378,6 +378,8 @@ struct LowerGpuOpsToNVVMOpsPass final
     RewritePatternSet llvmPatterns(m.getContext());
     LLVMConversionTarget target(getContext());
 
+    populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
+
     llvm::SmallDenseSet<StringRef> allowedDialectsSet(allowedDialects.begin(),
                                                       allowedDialects.end());
     for (Dialect *dialect : getContext().getLoadedDialects()) {
@@ -407,7 +409,6 @@ struct LowerGpuOpsToNVVMOpsPass final
                                                      llvmPatterns);
     }
 
-    populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
     populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);
     if (this->hasRedux)
       populateGpuSubgroupReduceOpLoweringPattern(converter, llvmPatterns);
@@ -552,6 +553,11 @@ void mlir::populateGpuToNVVMConversionPatterns(
 
   populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
                                     "__nv_fmod");
+  populateOpPatterns<arith::MaxNumFOp>(converter, patterns, "__nv_fmaxf",
+                                       "__nv_fmax");
+  populateOpPatterns<arith::MinNumFOp>(converter, patterns, "__nv_fminf",
+                                       "__nv_fmin");
+
   populateIntOpPatterns<math::AbsIOp>(converter, patterns, "__nv_abs");
   populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",
                                    "__nv_fabs");
diff --git a/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir b/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir
new file mode 100644
index 0000000000000..2b1c5a7fef8fe
--- /dev/null
+++ b/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir
@@ -0,0 +1,21 @@
+// RUN: mlir-opt %s -convert-gpu-to-nvvm -split-input-file | FileCheck %s
+
+gpu.module @test_module_54 {
+  // CHECK: llvm.func @__nv_fmaxf(f32, f32) -> f32
+  // CHECK: llvm.func @__nv_fminf(f32, f32) -> f32
+  // CHECK: llvm.func @__nv_fmax(f64, f64) -> f64
+  // CHECK: llvm.func @__nv_fmin(f64, f64) -> f64
+  // CHECK-LABEL: @gpu_fminmax
+  func.func @gpu_fminmax(%arg1_f32: f32, %arg2_f32: f32, %arg1_f64: f64, %arg2_f64: f64) 
+      -> (f32, f32, f64, f64) {
+    // CHECK: llvm.call @__nv_fmaxf
+    %max_f32 = arith.maxnumf %arg1_f32, %arg2_f32 : f32
+    // CHECK: llvm.call @__nv_fminf
+    %min_f32 = arith.minnumf %arg1_f32, %arg2_f32 : f32
+    // CHECK: llvm.call @__nv_fmax(
+    %max_f64 = arith.maxnumf %arg1_f64, %arg2_f64 : f64
+    // CHECK: llvm.call @__nv_fmin(
+    %min_f64 = arith.minnumf %arg1_f64, %arg2_f64 : f64
+    return %max_f32, %min_f32, %max_f64, %min_f64 : f32, f32, f64, f64
+  }
+}

llvmbot · 2025-02-15T13:55:07Z

@llvm/pr-subscribers-mlir

Author: Oleksandr "Alex" Zinenko (ftynse)

Changes

Introduce lowering from arith.minnum/maxxnum operations to the corresponding Nvidia libdevice calls. This requires to reorder pattern population methods so that the libdevice-targeting patterns are prioritized over default patterns targeting LLVM IR intrinsics from the Arith dialect. The tests are placed into a separate file as the existing gpu-to-nvvm.mlir files has a mode that forces Arith dialect operations to be preserved as is without using a separate FileCheck tag to differentiate.

Full diff: https://github.com/llvm/llvm-project/pull/127323.diff

2 Files Affected:

(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+7-1)
(added) mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir (+21)

diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index 35330f870e6ae..c1a4d31861d3b 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -378,6 +378,8 @@ struct LowerGpuOpsToNVVMOpsPass final
     RewritePatternSet llvmPatterns(m.getContext());
     LLVMConversionTarget target(getContext());
 
+    populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
+
     llvm::SmallDenseSet<StringRef> allowedDialectsSet(allowedDialects.begin(),
                                                       allowedDialects.end());
     for (Dialect *dialect : getContext().getLoadedDialects()) {
@@ -407,7 +409,6 @@ struct LowerGpuOpsToNVVMOpsPass final
                                                      llvmPatterns);
     }
 
-    populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
     populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);
     if (this->hasRedux)
       populateGpuSubgroupReduceOpLoweringPattern(converter, llvmPatterns);
@@ -552,6 +553,11 @@ void mlir::populateGpuToNVVMConversionPatterns(
 
   populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
                                     "__nv_fmod");
+  populateOpPatterns<arith::MaxNumFOp>(converter, patterns, "__nv_fmaxf",
+                                       "__nv_fmax");
+  populateOpPatterns<arith::MinNumFOp>(converter, patterns, "__nv_fminf",
+                                       "__nv_fmin");
+
   populateIntOpPatterns<math::AbsIOp>(converter, patterns, "__nv_abs");
   populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",
                                    "__nv_fabs");
diff --git a/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir b/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir
new file mode 100644
index 0000000000000..2b1c5a7fef8fe
--- /dev/null
+++ b/mlir/test/Conversion/GPUToNVVM/gpu-arith-ops-to-nvvm.mlir
@@ -0,0 +1,21 @@
+// RUN: mlir-opt %s -convert-gpu-to-nvvm -split-input-file | FileCheck %s
+
+gpu.module @test_module_54 {
+  // CHECK: llvm.func @__nv_fmaxf(f32, f32) -> f32
+  // CHECK: llvm.func @__nv_fminf(f32, f32) -> f32
+  // CHECK: llvm.func @__nv_fmax(f64, f64) -> f64
+  // CHECK: llvm.func @__nv_fmin(f64, f64) -> f64
+  // CHECK-LABEL: @gpu_fminmax
+  func.func @gpu_fminmax(%arg1_f32: f32, %arg2_f32: f32, %arg1_f64: f64, %arg2_f64: f64) 
+      -> (f32, f32, f64, f64) {
+    // CHECK: llvm.call @__nv_fmaxf
+    %max_f32 = arith.maxnumf %arg1_f32, %arg2_f32 : f32
+    // CHECK: llvm.call @__nv_fminf
+    %min_f32 = arith.minnumf %arg1_f32, %arg2_f32 : f32
+    // CHECK: llvm.call @__nv_fmax(
+    %max_f64 = arith.maxnumf %arg1_f64, %arg2_f64 : f64
+    // CHECK: llvm.call @__nv_fmin(
+    %min_f64 = arith.minnumf %arg1_f64, %arg2_f64 : f64
+    return %max_f32, %min_f32, %max_f64, %min_f64 : f32, f32, f64, f64
+  }
+}

Introduce lowering from arith.minnum/maxxnum operations to the corresponding Nvidia libdevice calls. This requires to reorder pattern population methods so that the libdevice-targeting patterns are prioritized over default patterns targeting LLVM IR intrinsics from the Arith dialect. The tests are placed into a separate file as the existing gpu-to-nvvm.mlir files has a mode that forces Arith dialect operations to be preserved as is without using a separate FileCheck tag to differentiate. Co-authored-by: William Moses <[email protected]>

ftynse requested review from wsmoses, ivanradanov and chelini February 15, 2025 13:54

llvmbot added mlir:gpu mlir labels Feb 15, 2025

wsmoses approved these changes Feb 15, 2025

View reviewed changes

Merge branch 'main' into minmax-lowering

b2c50d2

wsmoses merged commit 963ff1c into llvm:main Feb 15, 2025
8 checks passed

ftynse deleted the minmax-lowering branch February 16, 2025 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] lower min/maxnum to libdevice calls #127323

[mlir] lower min/maxnum to libdevice calls #127323

Uh oh!

ftynse commented Feb 15, 2025

Uh oh!

llvmbot commented Feb 15, 2025

Uh oh!

llvmbot commented Feb 15, 2025

Uh oh!

Uh oh!

Uh oh!

[mlir] lower min/maxnum to libdevice calls #127323

[mlir] lower min/maxnum to libdevice calls #127323

Uh oh!

Conversation

ftynse commented Feb 15, 2025

Uh oh!

llvmbot commented Feb 15, 2025

Uh oh!

llvmbot commented Feb 15, 2025

Uh oh!

Uh oh!

Uh oh!