Skip to content

[Flang][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types #97855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

agozillon
Copy link
Contributor

This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.

@llvmbot
Copy link
Member

llvmbot commented Jul 5, 2024

@llvm/pr-subscribers-offload
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (agozillon)

Changes

This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.


Patch is 29.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97855.diff

9 Files Affected:

  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+3-3)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.cpp (+6-10)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (+3-7)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+20-12)
  • (modified) flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp (+32-13)
  • (modified) flang/test/Lower/OpenMP/array-bounds.f90 (+6-7)
  • (modified) flang/test/Lower/OpenMP/common-block-map.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/derived-type-map.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+11-12)
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index b3ed9acad36df4..fac2f609447bfd 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -341,11 +341,11 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
 }
 
 def OMPMapInfoFinalizationPass
-    : Pass<"omp-map-info-finalization"> {
+    : Pass<"omp-map-info-finalization", "mlir::func::FuncOp"> {
   let summary = "expands OpenMP MapInfo operations containing descriptors";
   let description = [{
-    Expands MapInfo operations containing descriptor types into multiple 
-    MapInfo's for each pointer element in the descriptor that requires 
+    Expands MapInfo operations containing descriptor types into multiple
+    MapInfo's for each pointer element in the descriptor that requires
     explicit individual mapping by the OpenMP runtime.
   }];
   let dependentDialects = ["mlir::omp::OpenMPDialect"];
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index d507e58b164dd8..3ad1d2e7a01181 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -970,25 +970,21 @@ bool ClauseProcessor::processMap(
                   object.ref(), clauseLocation, asFortran, bounds,
                   treatIndexAsSection);
 
-          auto origSymbol = converter.getSymbolAddress(*object.sym());
-          mlir::Value symAddr = info.addr;
-          if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
-            symAddr = origSymbol;
-
           // Explicit map captures are captured ByRef by default,
           // optimisation passes may alter this to ByCopy or other capture
           // types to optimise
+          mlir::Value baseOp = info.rawInput;
           auto location = mlir::NameLoc::get(
               mlir::StringAttr::get(firOpBuilder.getContext(), asFortran.str()),
-              symAddr.getLoc());
+              baseOp.getLoc());
           mlir::omp::MapInfoOp mapOp = createMapInfoOp(
-              firOpBuilder, location, symAddr,
+              firOpBuilder, location, baseOp,
               /*varPtrPtr=*/mlir::Value{}, asFortran.str(), bounds,
               /*members=*/{}, /*membersIndex=*/mlir::DenseIntElementsAttr{},
               static_cast<
                   std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                   mapTypeBits),
-              mlir::omp::VariableCaptureKind::ByRef, symAddr.getType());
+              mlir::omp::VariableCaptureKind::ByRef, baseOp.getType());
 
           if (object.sym()->owner().IsDerivedType()) {
             addChildIndexAndMapToParent(object, parentMemberIndices, mapOp,
@@ -997,9 +993,9 @@ bool ClauseProcessor::processMap(
             result.mapVars.push_back(mapOp);
             ptrMapSyms->push_back(object.sym());
             if (mapSymTypes)
-              mapSymTypes->push_back(symAddr.getType());
+              mapSymTypes->push_back(baseOp.getType());
             if (mapSymLocs)
-              mapSymLocs->push_back(symAddr.getLoc());
+              mapSymLocs->push_back(baseOp.getLoc());
           }
         }
       });
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 43795d5c253996..44b97749c9cb21 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -212,22 +212,18 @@ bool ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
                   object.ref(), clauseLocation, asFortran, bounds,
                   treatIndexAsSection);
 
-          auto origSymbol = converter.getSymbolAddress(*object.sym());
-          mlir::Value symAddr = info.addr;
-          if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
-            symAddr = origSymbol;
-
           // Explicit map captures are captured ByRef by default,
           // optimisation passes may alter this to ByCopy or other capture
           // types to optimise
+          mlir::Value baseOp = info.rawInput;
           mlir::omp::MapInfoOp mapOp = createMapInfoOp(
-              firOpBuilder, clauseLocation, symAddr,
+              firOpBuilder, clauseLocation, baseOp,
               /*varPtrPtr=*/mlir::Value{}, asFortran.str(), bounds,
               /*members=*/{}, /*membersIndex=*/mlir::DenseIntElementsAttr{},
               static_cast<
                   std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                   mapTypeBits),
-              mlir::omp::VariableCaptureKind::ByRef, symAddr.getType());
+              mlir::omp::VariableCaptureKind::ByRef, baseOp.getType());
 
           if (object.sym()->owner().IsDerivedType()) {
             addChildIndexAndMapToParent(object, parentMemberIndices, mapOp,
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 17804ff58edc03..f393d1c5fd7129 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1670,6 +1670,12 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
     if (dsp.getAllSymbolsToPrivatize().contains(&sym))
       return;
 
+    // Structure component symbols don't have bindings, and can only be
+    // explicitly mapped individually. If a member is captured implicitly
+    // we map the entirety of the derived type when we find its symbol.
+    if (sym.owner().IsDerivedType())
+      return;
+
     // if the symbol is part of an already mapped common block, do not make a
     // map for it.
     if (const Fortran::semantics::Symbol *common =
@@ -1677,16 +1683,18 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
       if (llvm::find(mapSyms, common) != mapSyms.end())
         return;
 
-    if (llvm::find(mapSyms, &sym) == mapSyms.end()) {
-      mlir::Value baseOp = converter.getSymbolAddress(sym);
-      if (!baseOp)
-        if (const auto *details =
-                sym.template detailsIf<semantics::HostAssocDetails>()) {
-          baseOp = converter.getSymbolAddress(details->symbol());
-          converter.copySymbolBinding(details->symbol(), sym);
-        }
+    // If we come across a symbol without a symbol address, we
+    // return as we cannot process it, this is intended as a
+    // catch all early exit for symbols that do not have a
+    // corresponding extended value. Such as subroutines,
+    // interfaces and named blocks.
+    if (!converter.getSymbolAddress(sym))
+      return;
 
-      if (baseOp) {
+    if (llvm::find(mapSyms, &sym) == mapSyms.end()) {
+      if (const auto *details =
+              sym.template detailsIf<semantics::HostAssocDetails>())
+        converter.copySymbolBinding(details->symbol(), sym);
         llvm::SmallVector<mlir::Value> bounds;
         std::stringstream name;
         fir::ExtendedValue dataExv = converter.getSymbolExtendedValue(sym);
@@ -1694,13 +1702,14 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
 
         lower::AddrAndBoundsInfo info = getDataOperandBaseAddr(
             converter, firOpBuilder, sym, converter.getCurrentLocation());
+        mlir::Value baseOp = info.rawInput;
         if (mlir::isa<fir::BaseBoxType>(
-                fir::unwrapRefType(info.addr.getType())))
+                fir::unwrapRefType(baseOp.getType())))
           bounds = lower::genBoundsOpsFromBox<mlir::omp::MapBoundsOp,
                                               mlir::omp::MapBoundsType>(
               firOpBuilder, converter.getCurrentLocation(), dataExv, info);
         if (mlir::isa<fir::SequenceType>(
-                fir::unwrapRefType(info.addr.getType()))) {
+                fir::unwrapRefType(baseOp.getType()))) {
           bool dataExvIsAssumedSize =
               semantics::IsAssumedSizeArray(sym.GetUltimate());
           bounds = lower::genBaseBoundsOps<mlir::omp::MapBoundsOp,
@@ -1755,7 +1764,6 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
         mapSyms.push_back(&sym);
         mapLocs.push_back(baseOp.getLoc());
         mapTypes.push_back(baseOp.getType());
-      }
     }
   };
   lower::pft::visitAllSymbols(eval, captureImplicitMap);
diff --git a/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp b/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
index 35203fe89f5bc4..b871467d570297 100644
--- a/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
@@ -51,6 +51,15 @@ class OMPMapInfoFinalizationPass
     : public fir::impl::OMPMapInfoFinalizationPassBase<
           OMPMapInfoFinalizationPass> {
 
+  /// Tracks any intermediate function/subroutine local allocations we
+  /// generate for the descriptors of box type dummy arguments, so that
+  /// we can retrieve it for subsequent reuses within the functions
+  /// scope
+  std::map</*descriptor opaque pointer=*/ void *,
+           /*corresponding local alloca=*/ fir::AllocaOp>
+      localBoxAllocas;
+
+
   void genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
                                fir::FirOpBuilder &builder,
                                mlir::Operation *target) {
@@ -75,14 +84,26 @@ class OMPMapInfoFinalizationPass
     // perform an alloca and then store to it and retrieve the data from the new
     // alloca.
     if (mlir::isa<fir::BaseBoxType>(descriptor.getType())) {
-      mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
-      mlir::Block *allocaBlock = builder.getAllocaBlock();
-      assert(allocaBlock && "No alloca block found for this top level op");
-      builder.setInsertionPointToStart(allocaBlock);
-      auto alloca = builder.create<fir::AllocaOp>(loc, descriptor.getType());
-      builder.restoreInsertionPoint(insPt);
-      builder.create<fir::StoreOp>(loc, descriptor, alloca);
-      descriptor = alloca;
+      // if we have already created a local allocation for this BoxType,
+      // we must be sure to re-use it so that we end up with the same
+      // allocations being utilised for the same descriptor across all map uses,
+      // this prevents runtime issues such as not appropriately releasing or
+      // deleting all mapped data.
+      auto find = localBoxAllocas.find(descriptor.getAsOpaquePointer());
+      if (find != localBoxAllocas.end()) {
+        builder.create<fir::StoreOp>(loc, descriptor, find->second);
+        descriptor = find->second;
+      } else {
+        mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
+        mlir::Block *allocaBlock = builder.getAllocaBlock();
+        assert(allocaBlock && "No alloca block found for this top level op");
+        builder.setInsertionPointToStart(allocaBlock);
+        auto alloca = builder.create<fir::AllocaOp>(loc, descriptor.getType());
+        builder.restoreInsertionPoint(insPt);
+        builder.create<fir::StoreOp>(loc, descriptor, alloca);
+        localBoxAllocas[descriptor.getAsOpaquePointer()] = alloca;
+        descriptor = alloca;
+      }
     }
 
     mlir::Value baseAddrAddr = builder.create<fir::BoxOffsetOp>(
@@ -228,14 +249,12 @@ class OMPMapInfoFinalizationPass
   // operation (usually function) containing the MapInfoOp because this pass
   // will mutate siblings of MapInfoOp.
   void runOnOperation() override {
-    mlir::ModuleOp module =
-        mlir::dyn_cast_or_null<mlir::ModuleOp>(getOperation());
-    if (!module)
-      module = getOperation()->getParentOfType<mlir::ModuleOp>();
+    mlir::func::FuncOp func = getOperation();
+    mlir::ModuleOp module = func->getParentOfType<mlir::ModuleOp>();
     fir::KindMapping kindMap = fir::getKindMapping(module);
     fir::FirOpBuilder builder{module, std::move(kindMap)};
 
-    getOperation()->walk([&](mlir::omp::MapInfoOp op) {
+    func->walk([&](mlir::omp::MapInfoOp op) {
       // TODO: Currently only supports a single user for the MapInfoOp, this
       // is fine for the moment as the Fortran Frontend will generate a
       // new MapInfoOp per Target operation for the moment. However, when/if
diff --git a/flang/test/Lower/OpenMP/array-bounds.f90 b/flang/test/Lower/OpenMP/array-bounds.f90
index f235d5041ab26a..55462daddfbc22 100644
--- a/flang/test/Lower/OpenMP/array-bounds.f90
+++ b/flang/test/Lower/OpenMP/array-bounds.f90
@@ -15,12 +15,12 @@
 !HOST:  %[[C2:.*]] = arith.constant 1 : index
 !HOST:  %[[C3:.*]] = arith.constant 4 : index
 !HOST:  %[[BOUNDS0:.*]] = omp.map.bounds   lower_bound(%[[C2]] : index) upper_bound(%[[C3]] : index) extent(%[[C10]] : index) stride(%[[C1]] : index) start_idx(%[[C1]] : index)
-!HOST:  %[[MAP0:.*]] = omp.map.info var_ptr(%[[READ_DECL]]#0 : !fir.ref<!fir.array<10xi32>>, !fir.array<10xi32>)   map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS0]]) -> !fir.ref<!fir.array<10xi32>> {name = "sp_read(2:5)"}
+!HOST:  %[[MAP0:.*]] = omp.map.info var_ptr(%[[READ_DECL]]#1 : !fir.ref<!fir.array<10xi32>>, !fir.array<10xi32>)   map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS0]]) -> !fir.ref<!fir.array<10xi32>> {name = "sp_read(2:5)"}
 !HOST:  %[[C4:.*]] = arith.constant 1 : index
 !HOST:  %[[C5:.*]] = arith.constant 1 : index
 !HOST:  %[[C6:.*]] = arith.constant 4 : index
 !HOST:  %[[BOUNDS1:.*]] = omp.map.bounds   lower_bound(%[[C5]] : index) upper_bound(%[[C6]] : index) extent(%[[C10_0]] : index) stride(%[[C4]] : index) start_idx(%[[C4]] : index)
-!HOST:  %[[MAP1:.*]] = omp.map.info var_ptr(%[[WRITE_DECL]]#0 : !fir.ref<!fir.array<10xi32>>, !fir.array<10xi32>)   map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS1]]) -> !fir.ref<!fir.array<10xi32>> {name = "sp_write(2:5)"}
+!HOST:  %[[MAP1:.*]] = omp.map.info var_ptr(%[[WRITE_DECL]]#1 : !fir.ref<!fir.array<10xi32>>, !fir.array<10xi32>)   map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS1]]) -> !fir.ref<!fir.array<10xi32>> {name = "sp_write(2:5)"}
 !HOST:  omp.target map_entries(%[[MAP0]] -> %{{.*}}, %[[MAP1]] -> %{{.*}}, {{.*}} -> {{.*}} : !fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>, !fir.ref<i32>) {
 
 subroutine read_write_section()
@@ -65,9 +65,10 @@ subroutine assumed_shape_array(arr_read_write)
     end subroutine assumed_shape_array
 
 
+
+
 !HOST-LABEL: func.func @_QMassumed_array_routinesPassumed_size_array(
 !HOST-SAME: %[[ARG0:.*]]: !fir.ref<!fir.array<?xi32>> {fir.bindc_name = "arr_read_write"}) {
-!HOST: %[[INTERMEDIATE_ALLOCA:.*]] = fir.alloca !fir.box<!fir.array<?xi32>>
 !HOST: %[[ARG0_SHAPE:.*]] = fir.shape %{{.*}} : (index) -> !fir.shape<1>
 !HOST: %[[ARG0_DECL:.*]]:2 = hlfir.declare %[[ARG0]](%[[ARG0_SHAPE]]) dummy_scope %{{[0-9]+}} {fortran_attrs = #fir.var_attrs<intent_inout>, uniq_name = "_QMassumed_array_routinesFassumed_size_arrayEarr_read_write"} : (!fir.ref<!fir.array<?xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.box<!fir.array<?xi32>>, !fir.ref<!fir.array<?xi32>>)
 !HOST: %[[ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QMassumed_array_routinesFassumed_size_arrayEi"}
@@ -75,10 +76,8 @@ end subroutine assumed_shape_array
 !HOST: %[[C4_1:.*]] = arith.subi %c4, %c1{{.*}} : index
 !HOST: %[[EXT:.*]] = arith.addi %[[C4_1]], %c1{{.*}} : index
 !HOST: %[[BOUNDS:.*]] = omp.map.bounds lower_bound(%c1{{.*}} : index) upper_bound(%c4{{.*}} : index) extent(%[[EXT]] : index) stride(%[[DIMS0]]#2 : index) start_idx(%c1{{.*}} : index) {stride_in_bytes = true}
-!HOST: %[[VAR_PTR_PTR:.*]] = fir.box_offset %[[INTERMEDIATE_ALLOCA]] base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
-!HOST: %[[MAP_INFO_MEMBER:.*]] = omp.map.info var_ptr(%[[INTERMEDIATE_ALLOCA]] : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.array<?xi32>) var_ptr_ptr(%[[VAR_PTR_PTR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
-!HOST: %[[MAP:.*]] = omp.map.info var_ptr(%[[INTERMEDIATE_ALLOCA]] : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(tofrom) capture(ByRef) members(%[[MAP_INFO_MEMBER]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "arr_read_write(2:5)"}
-!HOST: omp.target map_entries(%[[MAP_INFO_MEMBER]] -> %{{.*}}, %[[MAP]] -> %{{.*}}, {{.*}} -> {{.*}} : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, !fir.ref<!fir.array<?xi32>>, !fir.ref<i32>) {
+!HOST: %[[MAP:.*]] = omp.map.info var_ptr(%[[ARG0_DECL]]#1 : !fir.ref<!fir.array<?xi32>>, !fir.array<?xi32>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<?xi32>> {name = "arr_read_write(2:5)"}
+!HOST: omp.target map_entries(%[[MAP]] -> %{{.*}}, {{.*}} -> {{.*}} : !fir.ref<!fir.array<?xi32>>, !fir.ref<i32>) {
     subroutine assumed_size_array(arr_read_write)
         integer, intent(inout) :: arr_read_write(*)
 
diff --git a/flang/test/Lower/OpenMP/common-block-map.f90 b/flang/test/Lower/OpenMP/common-block-map.f90
index 5033129683a8eb..0c423efd5eef49 100644
--- a/flang/test/Lower/OpenMP/common-block-map.f90
+++ b/flang/test/Lower/OpenMP/common-block-map.f90
@@ -40,7 +40,7 @@ subroutine map_full_block
 !CHECK: %[[COORD:.*]] = fir.coordinate_of %[[CB_CONV]], %[[INDEX]] : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
 !CHECK: %[[CONV:.*]] = fir.convert %[[COORD]] : (!fir.ref<i8>) -> !fir.ref<i32>
 !CHECK: %[[CB_MEMBER_2:.*]]:2 = hlfir.declare %[[CONV]] {uniq_name = "_QFmap_mix_of_membersEvar2"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK: %[[MAP_EXP:.*]] = omp.map.info var_ptr(%[[CB_MEMBER_2]]#0 : !fir.ref<i32>, i32) map_clauses(tofrom) capture(ByRef) -> !fir.ref<i32> {name = "var2"}
+!CHECK: %[[MAP_EXP:.*]] = omp.map.info var_ptr(%[[CB_MEMBER_2]]#1 : !fir.ref<i32>, i32) map_clauses(tofrom) capture(ByRef) -> !fir.ref<i32> {name = "var2"}
 !CHECK: %[[MAP_IMP:.*]] = omp.map.info var_ptr(%[[CB_MEMBER_1]]#1 : !fir.ref<i32>, i32) map_clauses(implicit, exit_release_or_enter_alloc) capture(ByCopy) -> !fir.ref<i32> {name = "var1"}
 !CHECK: omp.target map_entries(%[[MAP_EXP]] -> %[[ARG_EXP:.*]], %[[MAP_IMP]] -> %[[ARG_IMP:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
 !CHECK: ^bb0(%[[ARG_EXP]]: !fir.ref<i32>, %[[ARG_IMP]]: !fir.ref<i32>):
diff --git a/flang/test/Lower/OpenMP/derived-type-map.f90 b/flang/test/Lower/OpenMP/derived-type-map.f90
index 6121b450f06206..30b89e90470b0a 100644
--- a/flang/test/Lower/OpenMP/derived-type-map.f90
+++ b/flang/test/Lower/OpenMP/derived-type-map.f90
@@ -21,7 +21,7 @@ end subroutine mapType_derived_implicit
 
 !CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}> {bindc_name = "scalar_arr", uniq_name = "_QFmaptype_derived_explicitEscalar_arr"}
 !CHECK: %[[DECLARE:.*]]:2 = hlfir.declare %[[ALLOCA]] {uniq_name = "_QFmaptype_derived_explicitEscalar_arr"} : (!fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>) -> (!fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>, !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>)
-!CHECK: %[[MAP:.*]] = omp.map.info var_ptr(%[[DECLARE]]#0 : !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>, !fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>) map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>> {name = "scalar_arr"}
+!CHECK: %[[MAP:.*]] = omp.map.info var_ptr(%[[DECLARE]]#1 : !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>, !fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>) map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>> {name = "scalar_arr"}
 !CHECK:  omp.target map_entries(%[[MAP]] -> %[[ARG0:.*]] : !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>) {
 !CHECK:    ^bb0(%[[ARG0]]: !fir.ref<!fir.type<_QFmaptype_derived_explicitTscalar_and_array{real:f32,array:!fir.array<10xi32>,int:i32}>>):
 subroutine mapType_derived_explicit
diff --git a/flang/te...
[truncated]

Copy link

github-actions bot commented Jul 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

// Explicit map captures are captured ByRef by default,
// optimisation passes may alter this to ByCopy or other capture
// types to optimise
mlir::Value baseOp = info.rawInput;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanPerier @tblah I am not the most familiar with HLFIR (or FIR) yet unfortunately, however, I was interested in knowing if there was any possible side effects of utilising the memref/input from a HLFIR declare operation as opposed to the other possible result which seems to carry more information?

I believe, at least for the moment, we don't need the extra information provided by the HLFIR DeclareOp's other result from my (possibly flawed) understanding, As we already generate and carry the bounds around as part of the MapInfoOp, and we don't need any further information it currently provides from what I can tell. The main benefit of utilising the rawInput (result 1, of the declare op when there, otherwise just the symbol adcress) is it simplifies the maps we generate in certain scenarios as we end up processing less BoxTypes (as in certain cases, the HLFIR declare op's result 0 provides the more complex to map BoxType as opposed to the simpler original input type, presumably to help carry more information).

Essentially wondering if utilising the rawInput in all cases (at least that we currently handle) seems reasonable and sane from a FIR/HLFIR perspective. If we ever required the more complex result, it would be possible to do so, just with a slight extra layer of complexity to the lowering and a (likely mild) performance hit from extra mappings in certain cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if you are sure you don't need the extra information, using the raw input will avoid unnecessary embox operations etc (for the unused result).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @tblah, at least for the moment I don't believe we need the extra information (and testing against the little local map test suite I have made it seems to function as correctly as it did previously), as we tend to package the bounds as a seperate input to Map Info and at least for the moment that's all we care about, it could change in the future but we can likely cross that bridge if/when we come to a situation that requires it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with Tom. If you are not directly mapping this "raw" value to the symbol in the OpenMP context, and are not using hlfir tools with this value, you are fine.

Copy link
Contributor Author

@agozillon agozillon Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with Tom. If you are not directly mapping this "raw" value to the symbol in the OpenMP context, and are not using hlfir tools with this value, you are fine.

Thank you very much @jeanPerier! In this case, as the target region is IFA, there's a block argument generation and symbol rebinding that occurs for all map.info operations to corresponding block arguments (that correspond to symbols used inside of the region). However, this binding only extends for the scope of the target region and seems to not pose a problem when utilising the "raw" value as the map input. Would this be something you think would cause an issue (from testing so far, it doesn't appear to)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this binding only extends for the scope of the target region and seems to not pose a problem when utilising the "raw" value as the map input. Would this be something you think would cause an issue (from testing so far, it doesn't appear to)?

As long as this new binding inside the OpenMP region happens by generating a new hlfir.declare with the non default lower bounds (also passed as block arguments?), then it is fine, even if the hlfir.declare input is the "raw" value (just like outside of the OpenMP region).

If you tested something like with n different from 1, you are fine:

subroutine omp_target_implicit_bounds(n, m)
   integer(8) :: n,m
   integer :: a(n:m)
   !$omp target
      a(11) = 22
   !$omp end target
end subroutine omp_target_implicit_bounds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subroutine omp_target_implicit_bounds(n, m)
    integer(8) :: n,m
    integer :: a(n:m)
   !$omp target
       a(4) = 22
       a(15) = 2222
   !$omp end target
    print *, a
! Results in:  22 0 0 0 0 0 0 0 0 0 0 2222
end subroutine omp_target_implicit_bounds

subroutine omp_target_passed_in_implicit_bounds(n, m, a)
    integer(8) :: n,m
    integer :: a(n:m)
   !$omp target
       a(4) = 22
       a(15) = 2222
   !$omp end target
end subroutine omp_target_passed_in_implicit_bounds

program test
    integer(8) :: n,m
    integer :: array(20)
    n = 4
    m = 15
    call omp_target_implicit_bounds(n,m)
    call omp_target_passed_in_implicit_bounds(n, m, array)
    print *, array
    ! Results in: 22 0 0 0 0 0 0 0 0 0 0 2222 0 0 0 0 0 0 0 0
end program

Thank you very much @jeanPerier, I took your test and expanded it a bit into a full program, it seems to function as I would expect (although, I am far from a Fortran expert) providing the same results when there is a target region and when there's not.

From IR introspection, it does appear we create a hlfir.declare inside of the target region, and specify a shapeshift (lower bound and extent) calculated from mapped in bounds information.

So, it seems if the above assessment is correct, using the raw input for the moment is fine. At least from a correctness perspective, I am not sure which would come with a higher performance tax (if either).

@agozillon
Copy link
Contributor Author

Small ping on this for some reviewer attention please if at all possible! Thank you very much ahead of time.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Andrew, only minor comments from me. There are some tests that you had to update, but are there any new cases handled this patch for which adding a new test makes sense?

@@ -341,11 +341,11 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
}

def OMPMapInfoFinalizationPass
: Pass<"omp-map-info-finalization"> {
: Pass<"omp-map-info-finalization", "mlir::func::FuncOp"> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes it possible for the pass manager to run this pass in parallel on all func.func present in the module, rather than sequentially as it was done before. Is it guaranteed there won't be race conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always been able to be ran in parallel, at least up until a month or so ago when we tweaked all the passes, only then was the postfix removed :-) although, I am more than happy to continue to have it ran sequentially as it simplifies things quite a bit, so I've reverted the change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking a bit further into it, it's still ran as a FuncOp pass, I've just made it a little bit more explicit with these additions. Not what I would have expected in this case! But in either case, happy to make it a ModuleOp pass, I've yet to encounter any race condition issues, but considering the pass is in flux quite a bit, it's likely better safe than sorry and we can convert it to a FuncOp pass and iron out any possible issues when it's a little more stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an issue with it running in parallel for each function. If it already used to run like that, then my guess is that it's not a problem here, so it's fine by me either way. The main thing to be wary of in that case is accessing other functions or global operations, and perhaps creating new global symbols or making changes to module attributes.

builder.restoreInsertionPoint(insPt);
builder.create<fir::StoreOp>(loc, descriptor, alloca);
descriptor = alloca;
// if we have already created a local allocation for this BoxType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if we have already created a local allocation for this BoxType,
// If we have already created a local allocation for this BoxType,


integer :: n
integer :: a(n)
!CHECK: %[[VAL_14:.*]] = omp.map.bounds lower_bound(%c0{{.*}} : index) upper_bound(%[[UB]] : index) extent(%[[DIMS0]]#1 : index) stride(%[[DIMS0]]#2 : index) start_idx(%c1{{.*}} : index) {stride_in_bytes = true}
!CHECK: %[[VAL_14:.*]] = omp.map.bounds lower_bound(%c0{{.*}} : index) upper_bound(%[[UB]] : index) extent(%[[VAL_7]] : index) stride(%c1{{.*}} : index) start_idx(%c1{{.*}} : index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the removal of the stride_in_bytes related to this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're moving to utilising the "raw input" across all cases of map.info generation for now (until a case arises where we need the alternative, sometimes more "complex" variation, can see prior discussion in the comments for some more background information on this, as well as the description of the PR i believe) to simplify and unify things we end up with a non-BoxType in certain cases where previously we used to have a BoxType, in some of these cases we generate the bounds differently e.g. line 1729~/1733 of Lower/OpenMP.cpp this can result in certain differences like the above, where we end up with a stride no longer in bytes (you can see it the effect partially in the way we end up with a constant input for stride now instead of the previous access into the BoxType's dimensions, which is an access into the fortran descriptor), basically we end up with a SequenceType instead of a BoxType due to utilization of the "raw input" which means the bounds are generated slightly differently. A side affect of this is that we end up with simpler, likely quicker mapping in a lot of cases, at the cost of less information on the device, for the moment this doesn't appear to have any side affects from testing, but I am keeping an eye on it for the future.

Copy link
Contributor Author

@agozillon agozillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new cases really being handled, just an overall simplification of the code and making both implicit map and explicit maps aligned. However, there's also a minor bug fix via sharing locally made temporary allocations that are load/stored into (fixed by both the OMPMapInfoFinalization pass changes and to a lesser extent by avoiding some cases with the swap to raw input, but the former should hopefully catch any cases the latter doesn't avoid). I am more than happy to add a test case for the bug fix to the runtime tests as that's primarily where the problem will be detected, as it occurs as a runtime mapping error!

I'll update the test and have a look into swapping back to a module only pass, it'll require a bit more complexity in the pass as I'll have to work out a way to appropriately track which allocations belong to which function. Likely not too difficult though!


integer :: n
integer :: a(n)
!CHECK: %[[VAL_14:.*]] = omp.map.bounds lower_bound(%c0{{.*}} : index) upper_bound(%[[UB]] : index) extent(%[[DIMS0]]#1 : index) stride(%[[DIMS0]]#2 : index) start_idx(%c1{{.*}} : index) {stride_in_bytes = true}
!CHECK: %[[VAL_14:.*]] = omp.map.bounds lower_bound(%c0{{.*}} : index) upper_bound(%[[UB]] : index) extent(%[[VAL_7]] : index) stride(%c1{{.*}} : index) start_idx(%c1{{.*}} : index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're moving to utilising the "raw input" across all cases of map.info generation for now (until a case arises where we need the alternative, sometimes more "complex" variation, can see prior discussion in the comments for some more background information on this, as well as the description of the PR i believe) to simplify and unify things we end up with a non-BoxType in certain cases where previously we used to have a BoxType, in some of these cases we generate the bounds differently e.g. line 1729~/1733 of Lower/OpenMP.cpp this can result in certain differences like the above, where we end up with a stride no longer in bytes (you can see it the effect partially in the way we end up with a constant input for stride now instead of the previous access into the BoxType's dimensions, which is an access into the fortran descriptor), basically we end up with a SequenceType instead of a BoxType due to utilization of the "raw input" which means the bounds are generated slightly differently. A side affect of this is that we end up with simpler, likely quicker mapping in a lot of cases, at the cost of less information on the device, for the moment this doesn't appear to have any side affects from testing, but I am keeping an eye on it for the future.

@@ -341,11 +341,11 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
}

def OMPMapInfoFinalizationPass
: Pass<"omp-map-info-finalization"> {
: Pass<"omp-map-info-finalization", "mlir::func::FuncOp"> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always been able to be ran in parallel, at least up until a month or so ago when we tweaked all the passes, only then was the postfix removed :-) although, I am more than happy to continue to have it ran sequentially as it simplifies things quite a bit, so I've reverted the change!

@agozillon
Copy link
Contributor Author

agozillon commented Aug 2, 2024

The recent commit adds a runtime test case this PR fixes, as well as moves the OMPMapInfoFinalization pass to work on a per module rather than per function basis to make sure it's performed sequentially, at least for the moment, as whilst there's no race conditions I am aware of at the moment, this pass undergoes a lot of maintenance/extension at the moment, so likely best to keep it simple until it stabilizes (for my own sanity as well as others).

Otherwise I tried to address the questions, hopefully the answers were helpful! If not I am of course happy to try and answer again, I am aware I am not always the best at explaining unfortunately :-)

@agozillon
Copy link
Contributor Author

Small ping for some reviewer attention on this PR please, if at all possible. Thank you ahead of time as always! :-)

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Andrew, this LGTM. I'd suggest waiting for at least a second reviewer on this one, because it's likely some details here are flying over my head... 😅

@agozillon
Copy link
Contributor Author

Thank you Andrew, this LGTM. I'd suggest waiting for at least a second reviewer on this one, because it's likely some details here are flying over my head... 😅

No worries, I'll do that! Thank you very much for the review, it's greatly appreciated!

If one other person could volunteer and give a review of this PR it would be greatly appreciated, it would be nice to land it in the near future :-)

Copy link
Contributor

@jsjodin jsjodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@TIFitis TIFitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@agozillon
Copy link
Contributor Author

Thank you very much everyone for the reviews! I'll update and land this tomorrow or sometime during this week.

…ared allocations for assumed shape/size descriptor types

This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.
@agozillon
Copy link
Contributor Author

Updated to resolve conflicts, will now wait until it passes at least the Linux CI before landing the PR, thank you for the reviews everyone it's greatly appreciated.

@agozillon agozillon merged commit f4cf93f into llvm:main Aug 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category offload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants