[mlir][OpenMP] map argument to reduction initialization region #86979

tblah · 2024-03-28T17:03:56Z

The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification).

Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region.

We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument.

This argument was never used previously because for most cases the initialized value depends only upon the type of the reduction, not on the original variable. It is needed now so that we can read the array extents for the local copy from the mold.

Flang support for reductions on assumed shape arrays patch 2/3

llvmbot · 2024-03-28T17:04:26Z

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-mlir

Author: Tom Eccles (tblah)

Changes

The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification).

Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region.

We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument.

Full diff: https://github.com/llvm/llvm-project/pull/86979.diff

2 Files Affected:

(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+35)
(added) mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir (+147)

diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index cacf2c37e38d1c..c4bf6a20ebe71c 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -825,6 +825,25 @@ static void allocByValReductionVars(
   }
 }
 
+/// Map input argument to all reduction initialization regions
+template <typename T>
+static void
+mapInitializationArg(T loop, LLVM::ModuleTranslation &moduleTranslation,
+                     SmallVectorImpl<omp::DeclareReductionOp> &reductionDecls,
+                     unsigned i) {
+  // map input argument to the initialization region
+  mlir::omp::DeclareReductionOp &reduction = reductionDecls[i];
+  Region &initializerRegion = reduction.getInitializerRegion();
+  Block &entry = initializerRegion.front();
+  assert(entry.getNumArguments() == 1 &&
+         "the initialization region has one argument");
+
+  mlir::Value mlirSource = loop.getReductionVars()[i];
+  llvm::Value *llvmSource = moduleTranslation.lookupValue(mlirSource);
+  assert(llvmSource && "lookup reduction var");
+  moduleTranslation.mapValue(entry.getArgument(0), llvmSource);
+}
+
 /// Collect reduction info
 template <typename T>
 static void collectReductionInfo(
@@ -902,6 +921,10 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
       loop.getRegion().getArguments().take_back(loop.getNumReductionVars());
   for (unsigned i = 0; i < loop.getNumReductionVars(); ++i) {
     SmallVector<llvm::Value *> phis;
+
+    // map block argument to initializer region
+    mapInitializationArg(loop, moduleTranslation, reductionDecls, i);
+
     if (failed(inlineConvertOmpRegions(reductionDecls[i].getInitializerRegion(),
                                        "omp.reduction.neutral", builder,
                                        moduleTranslation, &phis)))
@@ -925,6 +948,11 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
       builder.CreateStore(phis[0], privateReductionVariables[i]);
       // the rest was handled in allocByValReductionVars
     }
+
+    // forget the mapping for the initializer region because we might need a
+    // different mapping if this reduction declaration is re-used for a
+    // different variable
+    moduleTranslation.forgetMapping(reductionDecls[i].getInitializerRegion());
   }
 
   // Store the mapping between reduction variables and their private copies on
@@ -1118,6 +1146,9 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
             opInst.getNumReductionVars());
     for (unsigned i = 0; i < opInst.getNumReductionVars(); ++i) {
       SmallVector<llvm::Value *> phis;
+
+      // map the block argument
+      mapInitializationArg(opInst, moduleTranslation, reductionDecls, i);
       if (failed(inlineConvertOmpRegions(
               reductionDecls[i].getInitializerRegion(), "omp.reduction.neutral",
               builder, moduleTranslation, &phis)))
@@ -1144,6 +1175,10 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         builder.CreateStore(phis[0], privateReductionVariables[i]);
         // the rest is done in allocByValReductionVars
       }
+
+      // clear block argument mapping in case it needs to be re-created with a
+      // different source for another use of the same reduction decl
+      moduleTranslation.forgetMapping(reductionDecls[i].getInitializerRegion());
     }
 
     // Store the mapping between reduction variables and their private copies on
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
new file mode 100644
index 00000000000000..b3cbb8aeebef6a
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -0,0 +1,147 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Test that the block argument to the initialization region of
+// omp.declare_reduction gets mapped properly when translating to LLVMIR.
+
+module {
+  omp.declare_reduction @add_reduction_byref_box_Uxf64 : !llvm.ptr init {
+  ^bb0(%arg0: !llvm.ptr):
+// test usage of %arg0:
+    %11 = llvm.load %arg0 : !llvm.ptr -> !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>
+    omp.yield(%arg0 : !llvm.ptr)
+  } combiner {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+    omp.yield(%arg0 : !llvm.ptr)
+  }
+
+  llvm.func internal @_QFPreduce(%arg0: !llvm.ptr {fir.bindc_name = "r"}, %arg1: !llvm.ptr {fir.bindc_name = "r2"}) attributes {sym_visibility = "private"} {
+  %8 = llvm.mlir.constant(1 : i32) : i32
+  %9 = llvm.mlir.constant(10 : i32) : i32
+  %10 = llvm.mlir.constant(0 : i32) : i32
+    omp.parallel {
+      %83 = llvm.mlir.constant(1 : i64) : i64
+      %84 = llvm.alloca %83 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr
+      %86 = llvm.mlir.constant(1 : i64) : i64
+      %87 = llvm.alloca %86 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr
+// test multiple reduction variables to ensure they don't intefere with eachother
+// when inlining the reduction init region multiple times
+      omp.wsloop byref reduction(@add_reduction_byref_box_Uxf64 %84 -> %arg3 : !llvm.ptr, @add_reduction_byref_box_Uxf64 %87 -> %arg4 : !llvm.ptr)  for  (%arg2) : i32 = (%10) to (%9) inclusive step (%8) {
+        omp.yield
+      }
+      omp.terminator
+    }
+    llvm.return
+  }
+}
+
+// CHECK-LABEL: define internal void @_QFPreduce(ptr %{{.*}}, ptr %{{.*}})
+// CHECK:         br label %entry
+// CHECK:       entry:                                            ; preds = %[[VAL_1:.*]]
+// CHECK:         %[[VAL_2:.*]] = call i32 @__kmpc_global_thread_num(ptr @1)
+// CHECK:         br label %[[VAL_3:.*]]
+// CHECK:       omp_parallel:                                     ; preds = %entry
+// CHECK:         call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @1, i32 0, ptr @_QFPreduce..omp_par)
+// CHECK:         br label %[[VAL_4:.*]]
+// CHECK:       omp.par.outlined.exit:                            ; preds = %[[VAL_3]]
+// CHECK:         br label %[[VAL_5:.*]]
+// CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_4]]
+// CHECK:         ret void
+// CHECK:       omp.par.entry:
+// CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
+// CHECK:         %[[VAL_7:.*]] = load i32, ptr %[[VAL_8:.*]], align 4
+// CHECK:         store i32 %[[VAL_7]], ptr %[[VAL_6]], align 4
+// CHECK:         %[[VAL_9:.*]] = load i32, ptr %[[VAL_6]], align 4
+// CHECK:         %[[VAL_10:.*]] = alloca i32, align 4
+// CHECK:         %[[VAL_11:.*]] = alloca i32, align 4
+// CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
+// CHECK:         %[[VAL_13:.*]] = alloca i32, align 4
+// CHECK:         %[[VAL_14:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_15:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[VAL_16:.*]]
+// CHECK:         br label %[[VAL_17:.*]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
+// CHECK:         %[[VAL_18:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
+// CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
+// CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_18]], align 8
+// CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
+// CHECK:         store ptr %[[VAL_18]], ptr %[[VAL_21]], align 8
+// CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_19]], align 8
+// CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
+// CHECK:         store ptr %[[VAL_19]], ptr %[[VAL_23]], align 8
+// CHECK:         br label %[[VAL_24:.*]]
+// CHECK:       omp_loop.preheader:                               ; preds = %[[VAL_17]]
+// CHECK:         store i32 0, ptr %[[VAL_11]], align 4
+// CHECK:         store i32 10, ptr %[[VAL_12]], align 4
+// CHECK:         store i32 1, ptr %[[VAL_13]], align 4
+// CHECK:         %[[VAL_25:.*]] = call i32 @__kmpc_global_thread_num(ptr @1)
+// CHECK:         call void @__kmpc_for_static_init_4u(ptr @1, i32 %[[VAL_25]], i32 34, ptr %[[VAL_10]], ptr %[[VAL_11]], ptr %[[VAL_12]], ptr %[[VAL_13]], i32 1, i32 0)
+// CHECK:         %[[VAL_26:.*]] = load i32, ptr %[[VAL_11]], align 4
+// CHECK:         %[[VAL_27:.*]] = load i32, ptr %[[VAL_12]], align 4
+// CHECK:         %[[VAL_28:.*]] = sub i32 %[[VAL_27]], %[[VAL_26]]
+// CHECK:         %[[VAL_29:.*]] = add i32 %[[VAL_28]], 1
+// CHECK:         br label %[[VAL_30:.*]]
+// CHECK:       omp_loop.header:                                  ; preds = %[[VAL_31:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_32:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_33:.*]], %[[VAL_31]] ]
+// CHECK:         br label %[[VAL_34:.*]]
+// CHECK:       omp_loop.cond:                                    ; preds = %[[VAL_30]]
+// CHECK:         %[[VAL_35:.*]] = icmp ult i32 %[[VAL_32]], %[[VAL_29]]
+// CHECK:         br i1 %[[VAL_35]], label %[[VAL_36:.*]], label %[[VAL_37:.*]]
+// CHECK:       omp_loop.exit:                                    ; preds = %[[VAL_34]]
+// CHECK:         call void @__kmpc_for_static_fini(ptr @1, i32 %[[VAL_25]])
+// CHECK:         %[[VAL_38:.*]] = call i32 @__kmpc_global_thread_num(ptr @1)
+// CHECK:         call void @__kmpc_barrier(ptr @2, i32 %[[VAL_38]])
+// CHECK:         br label %[[VAL_39:.*]]
+// CHECK:       omp_loop.after:                                   ; preds = %[[VAL_37]]
+// CHECK:         %[[VAL_40:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_14]], i64 0, i64 0
+// CHECK:         store ptr %[[VAL_21]], ptr %[[VAL_40]], align 8
+// CHECK:         %[[VAL_41:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_14]], i64 0, i64 1
+// CHECK:         store ptr %[[VAL_23]], ptr %[[VAL_41]], align 8
+// CHECK:         %[[VAL_42:.*]] = call i32 @__kmpc_global_thread_num(ptr @1)
+// CHECK:         %[[VAL_43:.*]] = call i32 @__kmpc_reduce(ptr @1, i32 %[[VAL_42]], i32 2, i64 16, ptr %[[VAL_14]], ptr @.omp.reduction.func, ptr @.gomp_critical_user_.reduction.var)
+// CHECK:         switch i32 %[[VAL_43]], label %[[VAL_44:.*]] [
+// CHECK:           i32 1, label %[[VAL_45:.*]]
+// CHECK:           i32 2, label %[[VAL_46:.*]]
+// CHECK:         ]
+// CHECK:       reduce.switch.atomic:                             ; preds = %[[VAL_39]]
+// CHECK:         unreachable
+// CHECK:       reduce.switch.nonatomic:                          ; preds = %[[VAL_39]]
+// CHECK:         %[[VAL_47:.*]] = load ptr, ptr %[[VAL_21]], align 8
+// CHECK:         %[[VAL_48:.*]] = load ptr, ptr %[[VAL_23]], align 8
+// CHECK:         call void @__kmpc_end_reduce(ptr @1, i32 %[[VAL_42]], ptr @.gomp_critical_user_.reduction.var)
+// CHECK:         br label %[[VAL_44]]
+// CHECK:       reduce.finalize:                                  ; preds = %[[VAL_45]], %[[VAL_39]]
+// CHECK:         %[[VAL_49:.*]] = call i32 @__kmpc_global_thread_num(ptr @1)
+// CHECK:         call void @__kmpc_barrier(ptr @2, i32 %[[VAL_49]])
+// CHECK:         br label %[[VAL_50:.*]]
+// CHECK:       omp.region.cont:                                  ; preds = %[[VAL_44]]
+// CHECK:         br label %[[VAL_51:.*]]
+// CHECK:       omp.par.pre_finalize:                             ; preds = %[[VAL_50]]
+// CHECK:         br label %[[VAL_52:.*]]
+// CHECK:       omp_loop.body:                                    ; preds = %[[VAL_34]]
+// CHECK:         %[[VAL_53:.*]] = add i32 %[[VAL_32]], %[[VAL_26]]
+// CHECK:         %[[VAL_54:.*]] = mul i32 %[[VAL_53]], 1
+// CHECK:         %[[VAL_55:.*]] = add i32 %[[VAL_54]], 0
+// CHECK:         br label %[[VAL_56:.*]]
+// CHECK:       omp.wsloop.region:                                ; preds = %[[VAL_36]]
+// CHECK:         br label %[[VAL_57:.*]]
+// CHECK:       omp.region.cont2:                                 ; preds = %[[VAL_56]]
+// CHECK:         br label %[[VAL_31]]
+// CHECK:       omp_loop.inc:                                     ; preds = %[[VAL_57]]
+// CHECK:         %[[VAL_33]] = add nuw i32 %[[VAL_32]], 1
+// CHECK:         br label %[[VAL_30]]
+// CHECK:       omp.par.outlined.exit.exitStub:                   ; preds = %[[VAL_51]]
+// CHECK:         ret void
+// CHECK:         %[[VAL_58:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_59:.*]], i64 0, i64 0
+// CHECK:         %[[VAL_60:.*]] = load ptr, ptr %[[VAL_58]], align 8
+// CHECK:         %[[VAL_61:.*]] = load ptr, ptr %[[VAL_60]], align 8
+// CHECK:         %[[VAL_62:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_63:.*]], i64 0, i64 0
+// CHECK:         %[[VAL_64:.*]] = load ptr, ptr %[[VAL_62]], align 8
+// CHECK:         %[[VAL_65:.*]] = load ptr, ptr %[[VAL_64]], align 8
+// CHECK:         %[[VAL_66:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_59]], i64 0, i64 1
+// CHECK:         %[[VAL_67:.*]] = load ptr, ptr %[[VAL_66]], align 8
+// CHECK:         %[[VAL_68:.*]] = load ptr, ptr %[[VAL_67]], align 8
+// CHECK:         %[[VAL_69:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_63]], i64 0, i64 1
+// CHECK:         %[[VAL_70:.*]] = load ptr, ptr %[[VAL_69]], align 8
+// CHECK:         %[[VAL_71:.*]] = load ptr, ptr %[[VAL_70]], align 8
+// CHECK:         ret void
+

kiranchandramohan · 2024-04-03T07:42:42Z

@tblah Is the use case to derive the bounds from the box of the host variable? Either way, could you add the motivation or the reason for the difference of why the argument was not used before but is required now?

mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir

kiranchandramohan

LG.

The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument.

Patch 1: #86978 Patch 2: #86979

tblah requested review from DavidTruby, ftynse, skatrak, kiranchandramohan and Leporacanthicus March 28, 2024 17:03

llvmbot added mlir:llvm mlir mlir:openmp flang:openmp labels Mar 28, 2024

tblah mentioned this pull request Mar 28, 2024

[flang][OpenMP] lower reductions of assumed shape arrays #86982

Merged

kiranchandramohan reviewed Apr 3, 2024

View reviewed changes

mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir Outdated Show resolved Hide resolved

kiranchandramohan approved these changes Apr 3, 2024

View reviewed changes

Base automatically changed from users/tblah/omp_assumed_shape_reduction_1 to main April 4, 2024 09:46

tblah added 2 commits April 4, 2024 09:48

Simplify test

e92d23c

tblah force-pushed the users/tblah/omp_assumed_shape_reduction_2 branch from 92bd5f1 to e92d23c Compare April 4, 2024 09:55

tblah merged commit 099ecdf into main Apr 4, 2024

tblah deleted the users/tblah/omp_assumed_shape_reduction_2 branch April 4, 2024 09:55

tblah added a commit that referenced this pull request Apr 4, 2024

[flang][OpenMP] lower reductions of assumed shape arrays (#86982)

dbd6eb6

Patch 1: #86978 Patch 2: #86979

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][OpenMP] map argument to reduction initialization region #86979

[mlir][OpenMP] map argument to reduction initialization region #86979

Uh oh!

tblah commented Mar 28, 2024 •

edited

Loading

Uh oh!

llvmbot commented Mar 28, 2024 •

edited

Loading

Uh oh!

kiranchandramohan commented Apr 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

kiranchandramohan left a comment

Uh oh!

Uh oh!

[mlir][OpenMP] map argument to reduction initialization region #86979

[mlir][OpenMP] map argument to reduction initialization region #86979

Uh oh!

Conversation

tblah commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranchandramohan commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tblah commented Mar 28, 2024 •

edited

Loading

llvmbot commented Mar 28, 2024 •

edited

Loading

kiranchandramohan commented Apr 3, 2024 •

edited

Loading