[flang][OpenMP] lower simple array reductions #84958

tblah · 2024-03-12T17:36:29Z

This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported.

The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet).

We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now.

Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon.

OpenMP array reductions 6/6
Previous PR: #84957

llvmbot · 2024-03-12T17:36:59Z

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Tom Eccles (tblah)

Changes

This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported.

The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet).

We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now.

Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon.

OpenMP array reductions 6/6

Patch is 29.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84958.diff

5 Files Affected:

(modified) flang/lib/Lower/OpenMP/ReductionProcessor.cpp (+230-53)
(modified) flang/lib/Lower/OpenMP/ReductionProcessor.h (+1-1)
(removed) flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 (-15)
(added) flang/test/Lower/OpenMP/parallel-reduction-array.f90 (+74)
(added) flang/test/Lower/OpenMP/wsloop-reduction-array.f90 (+84)

diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index e6a63dd4b939ce..de2f11f5b9512e 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -13,6 +13,7 @@
 #include "ReductionProcessor.h"
 
 #include "flang/Lower/AbstractConverter.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
@@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name,
   if (isByRef)
     byrefAddition = "_byref";
 
-  return (llvm::Twine(name) +
-          (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) +
-          llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition)
-      .str();
+  if (fir::isa_trivial(ty))
+    return (llvm::Twine(name) +
+            (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) +
+            llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition)
+        .str();
+
+  // creates a name like reduction_i_64_box_ux4x3
+  if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(ty)) {
+    // TODO: support for allocatable boxes:
+    // !fir.box<!fir.heap<!fir.array<...>>>
+    fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy())
+                                  .dyn_cast_or_null<fir::SequenceType>();
+    if (!seqTy)
+      return {};
+
+    std::string prefix = getReductionName(
+        name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false);
+    if (prefix.empty())
+      return {};
+    std::stringstream tyStr;
+    tyStr << prefix << "_box_";
+    bool first = true;
+    for (std::int64_t extent : seqTy.getShape()) {
+      if (first)
+        first = false;
+      else
+        tyStr << "x";
+      if (extent == seqTy.getUnknownExtent())
+        tyStr << 'u'; // I'm not sure that '?' is safe in symbol names
+      else
+        tyStr << extent;
+    }
+    return (tyStr.str() + byrefAddition).str();
+  }
+
+  return {};
 }
 
 std::string ReductionProcessor::getReductionName(
@@ -283,6 +316,156 @@ mlir::Value ReductionProcessor::createScalarCombiner(
   return reductionOp;
 }
 
+/// Create reduction combiner region for reduction variables which are boxed
+/// arrays
+static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
+                           ReductionProcessor::ReductionIdentifier redId,
+                           fir::BaseBoxType boxTy, mlir::Value lhs,
+                           mlir::Value rhs) {
+  fir::SequenceType seqTy =
+      mlir::dyn_cast_or_null<fir::SequenceType>(boxTy.getEleTy());
+  // TODO: support allocatable arrays: !fir.box<!fir.heap<!fir.array<...>>>
+  if (!seqTy || seqTy.hasUnknownShape())
+    TODO(loc, "Unsupported boxed type in OpenMP reduction");
+
+  // load fir.ref<fir.box<...>>
+  mlir::Value lhsAddr = lhs;
+  lhs = builder.create<fir::LoadOp>(loc, lhs);
+  rhs = builder.create<fir::LoadOp>(loc, rhs);
+
+  const unsigned rank = seqTy.getDimension();
+  llvm::SmallVector<mlir::Value> extents;
+  extents.reserve(rank);
+  llvm::SmallVector<mlir::Value> lbAndExtents;
+  lbAndExtents.reserve(rank * 2);
+
+  // Get box lowerbounds and extents:
+  mlir::Type idxTy = builder.getIndexType();
+  for (unsigned i = 0; i < rank; ++i) {
+    // TODO: ideally we want to hoist box reads out of the critical section.
+    // We could do this by having box dimensions in block arguments like
+    // OpenACC does
+    mlir::Value dim = builder.createIntegerConstant(loc, idxTy, i);
+    auto dimInfo =
+        builder.create<fir::BoxDimsOp>(loc, idxTy, idxTy, idxTy, lhs, dim);
+    extents.push_back(dimInfo.getExtent());
+    lbAndExtents.push_back(dimInfo.getLowerBound());
+    lbAndExtents.push_back(dimInfo.getExtent());
+  }
+
+  auto shapeShiftTy = fir::ShapeShiftType::get(builder.getContext(), rank);
+  auto shapeShift =
+      builder.create<fir::ShapeShiftOp>(loc, shapeShiftTy, lbAndExtents);
+
+  // Iterate over array elements, applying the equivalent scalar reduction:
+
+  // A hlfir::elemental here gets inlined with a temporary so create the
+  // loop nest directly.
+  // This function already controls all of the code in this region so we
+  // know this won't miss any opportuinties for clever elemental inlining
+  hlfir::LoopNest nest =
+      hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+  builder.setInsertionPointToStart(nest.innerLoop.getBody());
+  mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy());
+  auto lhsEleAddr = builder.create<fir::ArrayCoorOp>(
+      loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{},
+      nest.oneBasedIndices, /*typeparms=*/mlir::ValueRange{});
+  auto rhsEleAddr = builder.create<fir::ArrayCoorOp>(
+      loc, refTy, rhs, shapeShift, /*slice=*/mlir::Value{},
+      nest.oneBasedIndices, /*typeparms=*/mlir::ValueRange{});
+  auto lhsEle = builder.create<fir::LoadOp>(loc, lhsEleAddr);
+  auto rhsEle = builder.create<fir::LoadOp>(loc, rhsEleAddr);
+  mlir::Value scalarReduction = ReductionProcessor::createScalarCombiner(
+      builder, loc, redId, refTy, lhsEle, rhsEle);
+  builder.create<fir::StoreOp>(loc, scalarReduction, lhsEleAddr);
+
+  builder.setInsertionPointAfter(nest.outerLoop);
+  builder.create<mlir::omp::YieldOp>(loc, lhsAddr);
+}
+
+// generate combiner region for reduction operations
+static void genCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
+                        ReductionProcessor::ReductionIdentifier redId,
+                        mlir::Type ty, mlir::Value lhs, mlir::Value rhs,
+                        bool isByRef) {
+  ty = fir::unwrapRefType(ty);
+
+  if (fir::isa_trivial(ty)) {
+    mlir::Value lhsLoaded = builder.loadIfRef(loc, lhs);
+    mlir::Value rhsLoaded = builder.loadIfRef(loc, rhs);
+
+    mlir::Value result = ReductionProcessor::createScalarCombiner(
+        builder, loc, redId, ty, lhsLoaded, rhsLoaded);
+    if (isByRef) {
+      builder.create<fir::StoreOp>(loc, result, lhs);
+      builder.create<mlir::omp::YieldOp>(loc, lhs);
+    } else {
+      builder.create<mlir::omp::YieldOp>(loc, result);
+    }
+    return;
+  }
+  // all arrays should have been boxed
+  if (auto boxTy = mlir::dyn_cast<fir::BaseBoxType>(ty)) {
+    genBoxCombiner(builder, loc, redId, boxTy, lhs, rhs);
+    return;
+  }
+
+  TODO(loc, "OpenMP genCombiner for unsupported reduction variable type");
+}
+
+static mlir::Value
+createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
+                          const ReductionProcessor::ReductionIdentifier redId,
+                          mlir::Type type, bool isByRef) {
+  mlir::Type ty = fir::unwrapRefType(type);
+  mlir::Value initValue = ReductionProcessor::getReductionInitValue(
+      loc, fir::unwrapSeqOrBoxedSeqType(ty), redId, builder);
+
+  if (fir::isa_trivial(ty)) {
+    if (isByRef) {
+      mlir::Value alloca = builder.create<fir::AllocaOp>(loc, ty);
+      builder.createStoreWithConvert(loc, initValue, alloca);
+      return alloca;
+    }
+    // by val
+    return initValue;
+  }
+
+  // all arrays are boxed
+  if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(ty)) {
+    assert(isByRef && "passing arrays by value is unsupported");
+    // TODO: support allocatable arrays: !fir.box<!fir.heap<!fir.array<...>>>
+    mlir::Type innerTy = fir::extractSequenceType(boxTy);
+    if (!mlir::isa<fir::SequenceType>(innerTy))
+      TODO(loc, "Unsupported boxed type for reduction");
+    // Create the private copy from the initial fir.box:
+    hlfir::Entity source = hlfir::Entity{builder.getBlock()->getArgument(0)};
+
+    // from hlfir::createTempFromMold() but with the allocation changed to
+    // use alloca so that we don't have to free it
+    assert(source.isArray());
+    mlir::Type sequenceType =
+        hlfir::getFortranElementOrSequenceType(source.getType());
+    const char *tmpName = "omp.reduction.array.init";
+    mlir::Value shape = hlfir::genShape(loc, builder, source);
+    auto extents = hlfir::getIndexExtents(loc, builder, shape);
+    mlir::Value alloc = builder.createTemporary(loc, sequenceType, tmpName,
+                                                extents, /*lenParams=*/{});
+    auto declareOp =
+        builder.create<hlfir::DeclareOp>(loc, alloc, tmpName, shape);
+    hlfir::Entity temp = hlfir::Entity{declareOp.getBase()};
+
+    // Put the temporary inside of a box:
+    hlfir::Entity box = hlfir::genVariableBox(loc, builder, temp);
+    builder.create<hlfir::AssignOp>(loc, initValue, box);
+    mlir::Value boxAlloca = builder.create<fir::AllocaOp>(loc, ty);
+    builder.create<fir::StoreOp>(loc, box, boxAlloca);
+    return boxAlloca;
+  }
+
+  TODO(loc, "createReductionInitRegion for unsupported type");
+}
+
 mlir::omp::ReductionDeclareOp ReductionProcessor::createReductionDecl(
     fir::FirOpBuilder &builder, llvm::StringRef reductionOpName,
     const ReductionIdentifier redId, mlir::Type type, mlir::Location loc,
@@ -290,6 +473,9 @@ mlir::omp::ReductionDeclareOp ReductionProcessor::createReductionDecl(
   mlir::OpBuilder::InsertionGuard guard(builder);
   mlir::ModuleOp module = builder.getModule();
 
+  if (reductionOpName.empty())
+    TODO(loc, "Reduction of some types is not supported");
+
   auto decl =
       module.lookupSymbol<mlir::omp::ReductionDeclareOp>(reductionOpName);
   if (decl)
@@ -306,14 +492,9 @@ mlir::omp::ReductionDeclareOp ReductionProcessor::createReductionDecl(
                       decl.getInitializerRegion().end(), {type}, {loc});
   builder.setInsertionPointToEnd(&decl.getInitializerRegion().back());
 
-  mlir::Value init = getReductionInitValue(loc, type, redId, builder);
-  if (isByRef) {
-    mlir::Value alloca = builder.create<fir::AllocaOp>(loc, valTy);
-    builder.createStoreWithConvert(loc, init, alloca);
-    builder.create<mlir::omp::YieldOp>(loc, alloca);
-  } else {
-    builder.create<mlir::omp::YieldOp>(loc, init);
-  }
+  mlir::Value init =
+      createReductionInitRegion(builder, loc, redId, type, isByRef);
+  builder.create<mlir::omp::YieldOp>(loc, init);
 
   builder.createBlock(&decl.getReductionRegion(),
                       decl.getReductionRegion().end(), {type, type},
@@ -322,19 +503,7 @@ mlir::omp::ReductionDeclareOp ReductionProcessor::createReductionDecl(
   builder.setInsertionPointToEnd(&decl.getReductionRegion().back());
   mlir::Value op1 = decl.getReductionRegion().front().getArgument(0);
   mlir::Value op2 = decl.getReductionRegion().front().getArgument(1);
-  mlir::Value outAddr = op1;
-
-  op1 = builder.loadIfRef(loc, op1);
-  op2 = builder.loadIfRef(loc, op2);
-
-  mlir::Value reductionOp =
-      createScalarCombiner(builder, loc, redId, type, op1, op2);
-  if (isByRef) {
-    builder.create<fir::StoreOp>(loc, reductionOp, outAddr);
-    builder.create<mlir::omp::YieldOp>(loc, outAddr);
-  } else {
-    builder.create<mlir::omp::YieldOp>(loc, reductionOp);
-  }
+  genCombiner(builder, loc, redId, type, op1, op2, isByRef);
 
   return decl;
 }
@@ -390,6 +559,7 @@ void ReductionProcessor::addReductionDecl(
 
   // initial pass to collect all recuction vars so we can figure out if this
   // should happen byref
+  fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   for (const Fortran::parser::OmpObject &ompObject : objectList.v) {
     if (const auto *name{
             Fortran::parser::Unwrap<Fortran::parser::Name>(ompObject)}) {
@@ -397,8 +567,27 @@ void ReductionProcessor::addReductionDecl(
         if (reductionSymbols)
           reductionSymbols->push_back(symbol);
         mlir::Value symVal = converter.getSymbolAddress(*symbol);
-        if (auto declOp = symVal.getDefiningOp<hlfir::DeclareOp>())
+        auto redType = mlir::cast<fir::ReferenceType>(symVal.getType());
+
+        // all arrays must be boxed so that we have convenient access to all the
+        // information needed to iterate over the array
+        if (mlir::isa<fir::SequenceType>(redType.getEleTy())) {
+          hlfir::Entity entity{symVal};
+          entity = genVariableBox(currentLocation, builder, entity);
+          mlir::Value box = entity.getBase();
+
+          // Always pass the box by reference so that the OpenMP dialect
+          // verifiers don't need to know anything about fir.box
+          auto alloca =
+              builder.create<fir::AllocaOp>(currentLocation, box.getType());
+          builder.create<fir::StoreOp>(currentLocation, box, alloca);
+
+          symVal = alloca;
+          redType = mlir::cast<fir::ReferenceType>(symVal.getType());
+        } else if (auto declOp = symVal.getDefiningOp<hlfir::DeclareOp>()) {
           symVal = declOp.getBase();
+        }
+
         reductionVars.push_back(symVal);
       }
     }
@@ -424,31 +613,19 @@ void ReductionProcessor::addReductionDecl(
            "Reduction of some intrinsic operators is not supported");
       break;
     }
-    for (const Fortran::parser::OmpObject &ompObject : objectList.v) {
-      if (const auto *name{
-              Fortran::parser::Unwrap<Fortran::parser::Name>(ompObject)}) {
-        if (const Fortran::semantics::Symbol * symbol{name->symbol}) {
-          mlir::Value symVal = converter.getSymbolAddress(*symbol);
-          if (auto declOp = symVal.getDefiningOp<hlfir::DeclareOp>())
-            symVal = declOp.getBase();
-          auto redType = symVal.getType().cast<fir::ReferenceType>();
-          if (redType.getEleTy().isa<fir::LogicalType>())
-            decl = createReductionDecl(
-                firOpBuilder,
-                getReductionName(intrinsicOp, firOpBuilder.getI1Type(),
-                                 isByRef),
-                redId, redType, currentLocation, isByRef);
-          else if (redType.getEleTy().isIntOrIndexOrFloat()) {
-            decl = createReductionDecl(
-                firOpBuilder, getReductionName(intrinsicOp, redType, isByRef),
-                redId, redType, currentLocation, isByRef);
-          } else {
-            TODO(currentLocation, "Reduction of some types is not supported");
-          }
-          reductionDeclSymbols.push_back(mlir::SymbolRefAttr::get(
-              firOpBuilder.getContext(), decl.getSymName()));
-        }
-      }
+    for (mlir::Value symVal : reductionVars) {
+      auto redType = mlir::cast<fir::ReferenceType>(symVal.getType());
+      if (redType.getEleTy().isa<fir::LogicalType>())
+        decl = createReductionDecl(
+            firOpBuilder,
+            getReductionName(intrinsicOp, firOpBuilder.getI1Type(), isByRef),
+            redId, redType, currentLocation, isByRef);
+      else
+        decl = createReductionDecl(
+            firOpBuilder, getReductionName(intrinsicOp, redType, isByRef),
+            redId, redType, currentLocation, isByRef);
+      reductionDeclSymbols.push_back(mlir::SymbolRefAttr::get(
+          firOpBuilder.getContext(), decl.getSymName()));
     }
   } else if (const auto *reductionIntrinsic =
                  std::get_if<Fortran::parser::ProcedureDesignator>(
@@ -465,8 +642,8 @@ void ReductionProcessor::addReductionDecl(
             if (auto declOp = symVal.getDefiningOp<hlfir::DeclareOp>())
               symVal = declOp.getBase();
             auto redType = symVal.getType().cast<fir::ReferenceType>();
-            assert(redType.getEleTy().isIntOrIndexOrFloat() &&
-                   "Unsupported reduction type");
+            if (!redType.getEleTy().isIntOrIndexOrFloat())
+              TODO(currentLocation, "User Defined Reduction on arrays");
             decl = createReductionDecl(
                 firOpBuilder,
                 getReductionName(getRealName(*reductionIntrinsic).ToString(),
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.h b/flang/lib/Lower/OpenMP/ReductionProcessor.h
index 679580f2a3cac7..0d581bee546dea 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.h
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.h
@@ -107,7 +107,7 @@ class ReductionProcessor {
   /// Creates an OpenMP reduction declaration and inserts it into the provided
   /// symbol table. The declaration has a constant initializer with the neutral
   /// value `initValue`, and the reduction combiner carried over from `reduce`.
-  /// TODO: Generalize this for non-integer types, add atomic region.
+  /// TODO: add atomic region.
   static mlir::omp::ReductionDeclareOp
   createReductionDecl(fir::FirOpBuilder &builder,
                       llvm::StringRef reductionOpName,
diff --git a/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 b/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90
deleted file mode 100644
index a21611faf248ca..00000000000000
--- a/flang/test/Lower/OpenMP/Todo/reduction-arrays.f90
+++ /dev/null
@@ -1,15 +0,0 @@
-! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
-! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
-
-! CHECK: not yet implemented: Reduction of some types is not supported
-subroutine reduction_array(y)
-  integer :: x(100), y(100,100)
-  !$omp parallel
-  !$omp do reduction(+:x)
-  do i=1, 100
-    x = x + y(:,i)
-  end do
-  !$omp end do
-  !$omp end parallel
-  print *, x
-end subroutine
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90
new file mode 100644
index 00000000000000..bb2ccfb8825012
--- /dev/null
+++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90
@@ -0,0 +1,74 @@
+! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+program reduce
+integer, dimension(3) :: i = 0
+
+!$omp parallel reduction(+:i)
+i(1) = 1
+i(2) = 2
+i(3) = 3
+!$omp end parallel
+
+print *,i
+end program
+
+! CHECK-LABEL:   omp.reduction.declare @add_reduction_i_32_box_3_byref : !fir.ref<!fir.box<!fir.array<3xi32>>> init {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.array<3xi32>>>):
+! CHECK:           %[[VAL_1:.*]] = fir.alloca !fir.array<3xi32> {bindc_name = "omp.reduction.array.init"}
+! CHECK:           %[[VAL_2:.*]] = arith.constant 0 : i32
+! CHECK:           %[[VAL_3:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.array<3xi32>>>
+! CHECK:           %[[VAL_4:.*]] = arith.constant 3 : index
+! CHECK:           %[[VAL_5:.*]] = fir.shape %[[VAL_4]] : (index) -> !fir.shape<1>
+! CHECK:           %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_1]](%[[VAL_5]]) {uniq_name = "omp.reduction.array.init"} : (!fir.ref<!fir.array<3xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<3xi32>>, !fir.ref<!fir.array<3xi32>>)
+! CHECK:           %[[VAL_7:.*]] = fir.embox %[[VAL_6]]#0(%[[VAL_5]]) : (!fir.ref<!fir.array<3xi32>>, !fir.shape<1>) -> !fir.box<!fir.array<3xi32>>
+! CHECK:           hlfir.assign %[[VAL_2]] to %[[VAL_7]] : i32, !fir.box<!fir.array<3xi32>>
+! CHECK:           %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>>
+! CHECK:           fir.store %[[VAL_7]] to %[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>
+! CHECK:           omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>)
+! CHECK:         } combiner {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.array<3xi32>>>, %[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.array<3xi32>>>):
+! CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.array<3xi32>>>
+! CHECK:           %[[VAL_3:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.box<!fir.array<3xi32>>>
+! CHECK:           %[[VAL_4:.*]] = arith.constant 0 : index
+! CHECK:           %[[VAL_5:.*]]:3 = fir.box_dims %[[VAL_2]], %[[VAL_4]] : (!fir.box<!fir.array<3xi32>>, index) -> (index, index, index)
+! CHECK:           %[[VAL_6:.*]] = fir.shape_shift %[[VAL_5]]#0, %[[VAL_5]]#1 : (index, index) -> !fir.shapeshift<1>
+! CHECK:           %[[VAL_7:.*]] = arith.constant 1 : index
+! CHECK:           fir.do_loop %[[VAL_8:.*]] = %[[VAL_7]] to %[[VAL_5]]#1 step %[[VAL_7]] unordered {
+! CHECK:             %[[VAL_9:.*]] = fir.array_coor %[[VAL_2]](%[[VAL_6]]) %[[VAL_8]] : (!fir.box<!fir.array<3xi32>>, !fir.shapeshift<1>, index) -> !fir.ref<i32>
+! CHECK:             %[[VAL_10:.*]] = fir.array_coor %[[VAL_3]](%[[VAL_6]]) %[[VAL_8]] : (!fir.box<!fir.array<3xi32>>, !fir.shapeshift<1>, index) -> !fir.ref<i32>
+! CHECK:             %[[VAL_11:.*]] = fir.load %[[VAL_9]] : !fir.ref<i32>
+! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
+! CHECK:             %[[VAL_13:...
[truncated]

DavidTruby · 2024-03-15T14:58:56Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

  for (const Fortran::parser::OmpObject &ompObject : objectList.v) {
    if (const auto *name{
            Fortran::parser::Unwrap<Fortran::parser::Name>(ompObject)}) {
      if (const Fortran::semantics::Symbol * symbol{name->symbol}) {
        if (reductionSymbols)
          reductionSymbols->push_back(symbol);
        mlir::Value symVal = converter.getSymbolAddress(*symbol);
-        if (auto declOp = symVal.getDefiningOp<hlfir::DeclareOp>())
+        auto redType = mlir::cast<fir::ReferenceType>(symVal.getType());


Is this cast definitely safe? Will symVal always be a reference type here?

The existing upstream code already does this, I was mostly moving code around here. See line 434 in the old version of the code.

This even predates the by-ref reduction changes. See

llvm-project/flang/lib/Lower/OpenMP/ReductionProcessor.cpp

Line 347 in f18d78b

symVal.getType().cast<fir::ReferenceType>().getEleTy();

For example, if you look at the llvm code we generate by-val reductions (which should be identical after my changes), it loaded the reduction variables unconditionally, so this must be some kind of pointer.

kiranchandramohan

Nice work. Have a few minor comments.

kiranchandramohan · 2024-03-17T20:00:10Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

+  if (fir::isa_trivial(ty))
+    return (llvm::Twine(name) +
+            (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) +
+            llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition)
+        .str();
+
+  // creates a name like reduction_i_64_box_ux4x3
+  if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(ty)) {
+    // TODO: support for allocatable boxes:
+    // !fir.box<!fir.heap<!fir.array<...>>>
+    fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy())
+                                  .dyn_cast_or_null<fir::SequenceType>();
+    if (!seqTy)
+      return {};
+
+    std::string prefix = getReductionName(
+        name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false);
+    if (prefix.empty())
+      return {};
+    std::stringstream tyStr;
+    tyStr << prefix << "_box_";
+    bool first = true;
+    for (std::int64_t extent : seqTy.getShape()) {
+      if (first)
+        first = false;
+      else
+        tyStr << "x";
+      if (extent == seqTy.getUnknownExtent())
+        tyStr << 'u'; // I'm not sure that '?' is safe in symbol names
+      else
+        tyStr << extent;
+    }
+    return (tyStr.str() + byrefAddition).str();
+  }
+
+  return {};


There is a getTypeAsString (

llvm-project/flang/lib/Optimizer/Dialect/FIRType.cpp

Line 520 in 07b18c5

std::string getTypeAsString(mlir::Type ty, const fir::KindMapping &kindMap,

). Can that be used? If it changes existing reduction names then we can move to this in a separate patch.

Yeah that looks much better, thanks! I will do it in a separate patch because I think it might change existing names.

OK, that is fine.

kiranchandramohan · 2024-03-17T20:02:38Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

+            if (!redType.getEleTy().isIntOrIndexOrFloat())
+              TODO(currentLocation, "User Defined Reduction on arrays");


Nit: We don't support complex type now. So this could be due to being a complex type as well.

kiranchandramohan · 2024-03-17T22:24:40Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

+    hlfir::Entity source = hlfir::Entity{builder.getBlock()->getArgument(0)};
+
+    // from hlfir::createTempFromMold() but with the allocation changed to
+    // use alloca so that we don't have to free it


What if this is inside a loop finally? We might need to add some stacksave and stack restore. Can we add a TODO for this case?

LLVM will hoist the allocation for these simple cases, but yeah once we support arrays with a dynamic shape, the box dimensions will only be read at this point and so the allocation cannot be moved.

Really, the reduction declare operation needs a deallocation region to undo any allocations in the initialization region. But that is beyond the scope of this patch.

That should come as part of support for delayed privatization. The privatization region will have an alloc and dealloc regions. So you can skip this for now.

kiranchandramohan · 2024-03-17T22:31:32Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

+    // Create the private copy from the initial fir.box:
+    hlfir::Entity source = hlfir::Entity{builder.getBlock()->getArgument(0)};
+
+    // from hlfir::createTempFromMold() but with the allocation changed to


Can we pull this out into a function either in this patch or later?

kiranchandramohan · 2024-03-17T22:36:36Z

flang/test/Lower/OpenMP/parallel-reduction-array.f90

+i(1) = 1
+i(2) = 2
+i(3) = 3


Is the test written to deliberately not use the reduction operation in the body?

I was just trying to create a test with a very simple body. Do you want a test with the reduction operation in the body as well?

Yes, it would be good to have the test match the default way that it will be used.

kiranchandramohan

For compile-time constant bounds we should switch to a non-box version sometime later.

Thanks for the changes.

kiranchandramohan · 2024-03-18T14:25:51Z

flang/lib/Optimizer/Builder/HLFIRTools.cpp

@@ -1108,6 +1108,35 @@ hlfir::createTempFromMold(mlir::Location loc, fir::FirOpBuilder &builder,
  return {hlfir::Entity{declareOp.getBase()}, isHeapAlloc};
 }

+hlfir::Entity hlfir::createStackTempFromMold(mlir::Location loc,


@ergawy FYI: This patch has a createStackTempFromMold which might be helpful.

…code (#84957) Moving extractSequenceType to FIRType.h so that this can also be used from OpenMP. OpenMP array reductions 5/6 Previous PR: #84955 Next PR: #84958

This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon.

…code (llvm#84957) Moving extractSequenceType to FIRType.h so that this can also be used from OpenMP. OpenMP array reductions 5/6 Previous PR: llvm#84955 Next PR: llvm#84958

This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. OpenMP array reductions 6/6 Previous PR: llvm#84957

Following up on a review comment: llvm#84958 (comment) Reductions might be inlined inside of a loop so stack allocations are not safe. Normally flang allocates arrays on the stack. Allocatable arrays have a different type: fir.box<fir.heap<fir.array<...>>> instead of fir.box<fir.array<...>>. This patch will allocate all arrays on the heap. Reductions on allocatable arrays still aren't supported (but I will get to this soon).

Following up on a review comment: #84958 (comment) Reductions might be inlined inside of a loop so stack allocations are not safe. Normally flang allocates arrays on the stack. Allocatable arrays have a different type: fir.box<fir.heap<fir.array<...>>> instead of fir.box<fir.array<...>>. This patch will allocate all arrays on the heap. Reductions on allocatable arrays still aren't supported (but I will get to this soon).

tblah requested review from DavidTruby, ergawy, skatrak, kiranchandramohan, Leporacanthicus and kparzysz March 12, 2024 17:36

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 12, 2024

tblah mentioned this pull request Mar 12, 2024

[flang][NFC] move extractSequenceType helper out of OpenACC to share code #84957

Merged

DavidTruby reviewed Mar 15, 2024

View reviewed changes

kiranchandramohan reviewed Mar 17, 2024

View reviewed changes

kiranchandramohan approved these changes Mar 18, 2024

View reviewed changes

tblah mentioned this pull request Mar 18, 2024

[flang][OpenMP] simplify getReductionName #85666

Merged

tblah force-pushed the users/tblah/omp_array_reduction_5 branch from 2ff12fa to d4485b1 Compare March 20, 2024 10:09

Base automatically changed from users/tblah/omp_array_reduction_5 to main March 20, 2024 10:09

tblah added 5 commits March 20, 2024 10:12

Update TODO message

d0b6f12

Add TODO for stack allocations inside of a loop

e6155a5

Move createStackTempFromMold into a function

209cd38

Add more tests

e30d18a

tblah force-pushed the users/tblah/omp_array_reduction_6 branch from 8eaefa5 to e30d18a Compare March 20, 2024 10:34

tblah merged commit 197f3ec into main Mar 20, 2024

tblah deleted the users/tblah/omp_array_reduction_6 branch March 20, 2024 10:35

tblah mentioned this pull request Mar 20, 2024

[Flang][OpenMP] Reduction support for arrays #58186

Closed

tblah mentioned this pull request Apr 5, 2024

[flang][OpenMP] Allocate array reduction variables on the heap #87773

Merged

		if (!redType.getEleTy().isIntOrIndexOrFloat())
		TODO(currentLocation, "User Defined Reduction on arrays");

[flang][OpenMP] lower simple array reductions #84958

[flang][OpenMP] lower simple array reductions #84958

Uh oh!

Conversation

tblah commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tblah commented Mar 12, 2024 •

edited

Loading

llvmbot commented Mar 12, 2024 •

edited

Loading