Skip to content

[Flang][OpenMP] Derived type explicit allocatable member mapping #113557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 16, 2024

Conversation

agozillon
Copy link
Contributor

This PR is one of 3 in a PR stack, this is the primary change set which seeks to extend the current derived type explicit member mapping support to handle descriptor member mapping at arbitrary levels of nesting. The PR stack seems to do this reasonably (from testing so far) but as you can create quite complex mappings with derived types (in particular when adding allocatable derived types or arrays of allocatable derived types) I imagine there will be hiccups, which I am more than happy to address. There will also be further extensions to this work to handle the implicit auto-magical mapping of descriptor members in derived types and a few other changes planned for the future (with some ideas on optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and the OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended to support the generation of appropriate intermediate member mappings which are currently required when the parent (or multiple parents) of a mapped member are descriptor types. We need to map the entirety of these types or do a "deep copy" for lack of a better term, where we map both the base address and the descriptor as without the copying of both of these we lack the information in the case of the descriptor to access the member or attach the pointers data to the pointer and in the latter case we require the base address to map the chunk of data. Currently we do not segment descriptor based derived types as we do with regular non-descriptor derived types, we effectively map their entirety in all cases at the moment, I hope to address this at some point in the future as it adds a fair bit of a performance penalty to having nestings of allocatable derived types as an example. The process of mapping all intermediate descriptor members in a members path only occurs if a member has an allocatable or object parent in its symbol path or the member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate the appropriate address for the allocatable member within the derived type to use as a the varPtr field of the map (for intermediate allocatable maps and final allocatable mappings). In this case it's necessary as we can't utilise the usual Fortran::lower functionality such as gatherDataOperandAddrAndBounds without causing issues later in the lowering due to extra allocas being spawned which seem to affect the pointer attachment (at least this is my current assumption, it results in memory access errors on the device due to incorrect map information generation). This is similar to why we do not use the MLIR value generated for this and utilise the original symbol provided when mapping descriptor types external to derived types. Hopefully this can be rectified in the future so this function can be simplified and more closely aligned to the other type mappings. We also make use of fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version doesn't support the appropriate lowering to FIR necessary at the moment, we also cannot use a single CoordinateOp (similarly to a single GEP) as when we index through a descriptor operation (BoxType) we encounter issues later in the lowering, however in either case we need access to intermediate descriptors so individual CoordinateOp's aid this (although, being able to compress them into a smaller amount of CoordinateOp's may simplify the IR and perhaps result in a better end product, something to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass, where the pass had to be extended to support the expansion of box types (or multiple nestings of box types) within derived types, or box type derived types. This requires expanding each BoxType mapping from one into two maps and then modifying all of the existing member indices of the overarching parent mapping to account for the addition of these new members alongside adjusting the existing member indices to support the addition of these new maps which extend the original member indices (as a base address of a box type is currently considered a member of the box type at a position of 0 as when lowered to LLVM-IR it's a pointer contained at this position in the descriptor type, however, this means extending mapped children of this expanded descriptor type to additionally incorporate the new member index in the correct location in its own index list). I believe there is a reasonable amount of comments that should aid in understanding this better, alongside the test alterations for the pass.

A subset of the changes were also aimed at making some of the utilities for packing and unpacking the DenseIntElementsAttr containing the member indices shareable across the lowering and OMPMapInfoFinalization, this required moving some functions to the Lower/Support/Utils.h header, and transforming the lowering structure containing the member index data into something more similar to the version used in OMPMapInfoFinalization. There we also some other attempts at tidying things up in relation to the member index data generation in the lowering, some of which required creating a logical operator for the OpenMP ID class so it can be utilised as a map key (it simply utilises the symbol address for the moment as ordering isn't particularly important).

Otherwise I have added a set of new tests encompassing some of the mappings currently supported by this PR (unfortunately as you can have arbitrary nestings of all shapes and types it's not very feasible to cover them all).

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Oct 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2024

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-offload
@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: None (agozillon)

Changes

This PR is one of 3 in a PR stack, this is the primary change set which seeks to extend the current derived type explicit member mapping support to handle descriptor member mapping at arbitrary levels of nesting. The PR stack seems to do this reasonably (from testing so far) but as you can create quite complex mappings with derived types (in particular when adding allocatable derived types or arrays of allocatable derived types) I imagine there will be hiccups, which I am more than happy to address. There will also be further extensions to this work to handle the implicit auto-magical mapping of descriptor members in derived types and a few other changes planned for the future (with some ideas on optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and the OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended to support the generation of appropriate intermediate member mappings which are currently required when the parent (or multiple parents) of a mapped member are descriptor types. We need to map the entirety of these types or do a "deep copy" for lack of a better term, where we map both the base address and the descriptor as without the copying of both of these we lack the information in the case of the descriptor to access the member or attach the pointers data to the pointer and in the latter case we require the base address to map the chunk of data. Currently we do not segment descriptor based derived types as we do with regular non-descriptor derived types, we effectively map their entirety in all cases at the moment, I hope to address this at some point in the future as it adds a fair bit of a performance penalty to having nestings of allocatable derived types as an example. The process of mapping all intermediate descriptor members in a members path only occurs if a member has an allocatable or object parent in its symbol path or the member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate the appropriate address for the allocatable member within the derived type to use as a the varPtr field of the map (for intermediate allocatable maps and final allocatable mappings). In this case it's necessary as we can't utilise the usual Fortran::lower functionality such as gatherDataOperandAddrAndBounds without causing issues later in the lowering due to extra allocas being spawned which seem to affect the pointer attachment (at least this is my current assumption, it results in memory access errors on the device due to incorrect map information generation). This is similar to why we do not use the MLIR value generated for this and utilise the original symbol provided when mapping descriptor types external to derived types. Hopefully this can be rectified in the future so this function can be simplified and more closely aligned to the other type mappings. We also make use of fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version doesn't support the appropriate lowering to FIR necessary at the moment, we also cannot use a single CoordinateOp (similarly to a single GEP) as when we index through a descriptor operation (BoxType) we encounter issues later in the lowering, however in either case we need access to intermediate descriptors so individual CoordinateOp's aid this (although, being able to compress them into a smaller amount of CoordinateOp's may simplify the IR and perhaps result in a better end product, something to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass, where the pass had to be extended to support the expansion of box types (or multiple nestings of box types) within derived types, or box type derived types. This requires expanding each BoxType mapping from one into two maps and then modifying all of the existing member indices of the overarching parent mapping to account for the addition of these new members alongside adjusting the existing member indices to support the addition of these new maps which extend the original member indices (as a base address of a box type is currently considered a member of the box type at a position of 0 as when lowered to LLVM-IR it's a pointer contained at this position in the descriptor type, however, this means extending mapped children of this expanded descriptor type to additionally incorporate the new member index in the correct location in its own index list). I believe there is a reasonable amount of comments that should aid in understanding this better, alongside the test alterations for the pass.

A subset of the changes were also aimed at making some of the utilities for packing and unpacking the DenseIntElementsAttr containing the member indices shareable across the lowering and OMPMapInfoFinalization, this required moving some functions to the Lower/Support/Utils.h header, and transforming the lowering structure containing the member index data into something more similar to the version used in OMPMapInfoFinalization. There we also some other attempts at tidying things up in relation to the member index data generation in the lowering, some of which required creating a logical operator for the OpenMP ID class so it can be utilised as a map key (it simply utilises the symbol address for the moment as ordering isn't particularly important).

Otherwise I have added a set of new tests encompassing some of the mappings currently supported by this PR (unfortunately as you can have arbitrary nestings of all shapes and types it's not very feasible to cover them all).


Patch is 245.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113557.diff

20 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/FIRBuilder.h (+5)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.cpp (+38-24)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (+1-2)
  • (modified) flang/lib/Lower/OpenMP/Clauses.h (+11)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+2-2)
  • (modified) flang/lib/Lower/OpenMP/Utils.cpp (+362-132)
  • (modified) flang/lib/Lower/OpenMP/Utils.h (+78-17)
  • (modified) flang/lib/Optimizer/Builder/FIRBuilder.cpp (+20)
  • (modified) flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (+320-121)
  • (modified) flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp (+1-1)
  • (modified) flang/test/Fir/convert-to-llvm-openmp-and-fir.fir (+168-2)
  • (modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+444-27)
  • (modified) flang/test/Lower/OpenMP/allocatable-array-bounds.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/allocatable-map.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/array-bounds.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/declare-target-link-tarop-cap.f90 (+2-2)
  • (added) flang/test/Lower/OpenMP/derived-type-allocatable-map.f90 (+161)
  • (modified) flang/test/Lower/OpenMP/derived-type-map.f90 (+5-5)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+2-2)
  • (modified) flang/test/Transforms/omp-map-info-finalization.fir (+260-55)
diff --git a/flang/include/flang/Optimizer/Builder/FIRBuilder.h b/flang/include/flang/Optimizer/Builder/FIRBuilder.h
index 09f7b892f1ecbe..b772c523caba49 100644
--- a/flang/include/flang/Optimizer/Builder/FIRBuilder.h
+++ b/flang/include/flang/Optimizer/Builder/FIRBuilder.h
@@ -215,6 +215,11 @@ class FirOpBuilder : public mlir::OpBuilder, public mlir::OpBuilder::Listener {
                             llvm::ArrayRef<mlir::Value> lenParams,
                             bool asTarget = false);
 
+  /// Create a two dimensional ArrayAttr containing integer data as
+  /// IntegerAttrs, effectively: ArrayAttr<ArrayAttr<IntegerAttr>>>.
+  mlir::ArrayAttr create2DI64ArrayAttr(
+      llvm::SmallVectorImpl<llvm::SmallVector<int64_t>> &intData);
+
   /// Create a temporary using `fir.alloca`. This function does not hoist.
   /// It is the callers responsibility to set the insertion point if
   /// hoisting is required.
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index fbc031f3a93d7d..305606073c6fe0 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -881,14 +881,15 @@ void ClauseProcessor::processMapObjects(
     lower::StatementContext &stmtCtx, mlir::Location clauseLocation,
     const omp::ObjectList &objects,
     llvm::omp::OpenMPOffloadMappingFlags mapTypeBits,
-    std::map<const semantics::Symbol *,
-             llvm::SmallVector<OmpMapMemberIndicesData>> &parentMemberIndices,
+    std::map<Object, OmpMapParentAndMemberData> &parentMemberIndices,
     llvm::SmallVectorImpl<mlir::Value> &mapVars,
     llvm::SmallVectorImpl<const semantics::Symbol *> &mapSyms) const {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
   for (const omp::Object &object : objects) {
     llvm::SmallVector<mlir::Value> bounds;
     std::stringstream asFortran;
+    std::optional<omp::Object> parentObj;
 
     lower::AddrAndBoundsInfo info =
         lower::gatherDataOperandAddrAndBounds<mlir::omp::MapBoundsOp,
@@ -897,24 +898,43 @@ void ClauseProcessor::processMapObjects(
             object.ref(), clauseLocation, asFortran, bounds,
             treatIndexAsSection);
 
+    mlir::Value baseOp = info.rawInput;
+    if (object.sym()->owner().IsDerivedType()) {
+      omp::ObjectList objectList = gatherObjectsOf(object, semaCtx);
+      assert(!objectList.empty() &&
+             "could not find parent objects of derived type member");
+      parentObj = objectList[0];
+      parentMemberIndices.emplace(parentObj.value(),
+                                  OmpMapParentAndMemberData{});
+
+      if (isMemberOrParentAllocatableOrPointer(object, semaCtx)) {
+        llvm::SmallVector<int64_t> indices;
+        generateMemberPlacementIndices(object, indices, semaCtx);
+        baseOp = createParentSymAndGenIntermediateMaps(
+            clauseLocation, converter, semaCtx, stmtCtx, objectList, indices,
+            parentMemberIndices[parentObj.value()], asFortran.str(),
+            mapTypeBits);
+      }
+    }
+
     // Explicit map captures are captured ByRef by default,
     // optimisation passes may alter this to ByCopy or other capture
     // types to optimise
-    mlir::Value baseOp = info.rawInput;
     auto location = mlir::NameLoc::get(
         mlir::StringAttr::get(firOpBuilder.getContext(), asFortran.str()),
         baseOp.getLoc());
     mlir::omp::MapInfoOp mapOp = createMapInfoOp(
         firOpBuilder, location, baseOp,
         /*varPtrPtr=*/mlir::Value{}, asFortran.str(), bounds,
-        /*members=*/{}, /*membersIndex=*/mlir::DenseIntElementsAttr{},
+        /*members=*/{}, /*membersIndex=*/mlir::ArrayAttr{},
         static_cast<
             std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
             mapTypeBits),
         mlir::omp::VariableCaptureKind::ByRef, baseOp.getType());
 
-    if (object.sym()->owner().IsDerivedType()) {
-      addChildIndexAndMapToParent(object, parentMemberIndices, mapOp, semaCtx);
+    if (parentObj.has_value()) {
+      parentMemberIndices[parentObj.value()].addChildIndexAndMapToParent(
+          object, mapOp, semaCtx);
     } else {
       mapVars.push_back(mapOp);
       mapSyms.push_back(object.sym());
@@ -932,9 +952,7 @@ bool ClauseProcessor::processMap(
   llvm::SmallVector<const semantics::Symbol *> localMapSyms;
   llvm::SmallVectorImpl<const semantics::Symbol *> *ptrMapSyms =
       mapSyms ? mapSyms : &localMapSyms;
-  std::map<const semantics::Symbol *,
-           llvm::SmallVector<OmpMapMemberIndicesData>>
-      parentMemberIndices;
+  std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
 
   auto process = [&](const omp::clause::Map &clause,
                      const parser::CharBlock &source) {
@@ -994,17 +1012,15 @@ bool ClauseProcessor::processMap(
   };
 
   bool clauseFound = findRepeatableClause<omp::clause::Map>(process);
-  insertChildMapInfoIntoParent(converter, parentMemberIndices, result.mapVars,
-                               *ptrMapSyms);
+  insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
+                               result.mapVars, *ptrMapSyms);
 
   return clauseFound;
 }
 
 bool ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
                                            mlir::omp::MapClauseOps &result) {
-  std::map<const semantics::Symbol *,
-           llvm::SmallVector<OmpMapMemberIndicesData>>
-      parentMemberIndices;
+  std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
   llvm::SmallVector<const semantics::Symbol *> mapSymbols;
 
   auto callbackFn = [&](const auto &clause, const parser::CharBlock &source) {
@@ -1025,8 +1041,9 @@ bool ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
   clauseFound =
       findRepeatableClause<omp::clause::From>(callbackFn) || clauseFound;
 
-  insertChildMapInfoIntoParent(converter, parentMemberIndices, result.mapVars,
-                               mapSymbols);
+  insertChildMapInfoIntoParent(
+      converter, semaCtx, stmtCtx, parentMemberIndices, result.mapVars, mapSymbols);
+
   return clauseFound;
 }
 
@@ -1088,9 +1105,7 @@ bool ClauseProcessor::processEnter(
 bool ClauseProcessor::processUseDeviceAddr(
     lower::StatementContext &stmtCtx, mlir::omp::UseDeviceAddrClauseOps &result,
     llvm::SmallVectorImpl<const semantics::Symbol *> &useDeviceSyms) const {
-  std::map<const semantics::Symbol *,
-           llvm::SmallVector<OmpMapMemberIndicesData>>
-      parentMemberIndices;
+  std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
   bool clauseFound = findRepeatableClause<omp::clause::UseDeviceAddr>(
       [&](const omp::clause::UseDeviceAddr &clause,
           const parser::CharBlock &source) {
@@ -1103,7 +1118,7 @@ bool ClauseProcessor::processUseDeviceAddr(
                           useDeviceSyms);
       });
 
-  insertChildMapInfoIntoParent(converter, parentMemberIndices,
+  insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
                                result.useDeviceAddrVars, useDeviceSyms);
   return clauseFound;
 }
@@ -1111,9 +1126,8 @@ bool ClauseProcessor::processUseDeviceAddr(
 bool ClauseProcessor::processUseDevicePtr(
     lower::StatementContext &stmtCtx, mlir::omp::UseDevicePtrClauseOps &result,
     llvm::SmallVectorImpl<const semantics::Symbol *> &useDeviceSyms) const {
-  std::map<const semantics::Symbol *,
-           llvm::SmallVector<OmpMapMemberIndicesData>>
-      parentMemberIndices;
+  std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
+
   bool clauseFound = findRepeatableClause<omp::clause::UseDevicePtr>(
       [&](const omp::clause::UseDevicePtr &clause,
           const parser::CharBlock &source) {
@@ -1126,7 +1140,7 @@ bool ClauseProcessor::processUseDevicePtr(
                           useDeviceSyms);
       });
 
-  insertChildMapInfoIntoParent(converter, parentMemberIndices,
+  insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
                                result.useDevicePtrVars, useDeviceSyms);
   return clauseFound;
 }
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index f34121c70d0b44..cf2fc362a52d8c 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -166,8 +166,7 @@ class ClauseProcessor {
       lower::StatementContext &stmtCtx, mlir::Location clauseLocation,
       const omp::ObjectList &objects,
       llvm::omp::OpenMPOffloadMappingFlags mapTypeBits,
-      std::map<const semantics::Symbol *,
-               llvm::SmallVector<OmpMapMemberIndicesData>> &parentMemberIndices,
+      std::map<Object, OmpMapParentAndMemberData> &parentMemberIndices,
       llvm::SmallVectorImpl<mlir::Value> &mapVars,
       llvm::SmallVectorImpl<const semantics::Symbol *> &mapSyms) const;
 
diff --git a/flang/lib/Lower/OpenMP/Clauses.h b/flang/lib/Lower/OpenMP/Clauses.h
index 1e911a20468575..7b7331f43b989e 100644
--- a/flang/lib/Lower/OpenMP/Clauses.h
+++ b/flang/lib/Lower/OpenMP/Clauses.h
@@ -51,6 +51,13 @@ struct IdTyTemplate {
     return designator == other.designator;
   }
 
+  // Defining an "ordering" which allows types derived from this to be
+  // utilised in maps and other containers that require comparison
+  // operators for ordering
+  bool operator<(const IdTyTemplate &other) const {
+    return symbol < other.symbol;
+  }
+
   operator bool() const { return symbol != nullptr; }
 };
 
@@ -72,6 +79,10 @@ struct ObjectT<Fortran::lower::omp::IdTyTemplate<Fortran::lower::omp::ExprTy>,
   Fortran::semantics::Symbol *sym() const { return identity.symbol; }
   const std::optional<ExprTy> &ref() const { return identity.designator; }
 
+  bool operator<(const ObjectT<IdTy, ExprTy> &other) const {
+    return identity < other.identity;
+  }
+
   IdTy identity;
 };
 } // namespace tomp::type
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index fc54da8babe63e..99632750ff0f28 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -999,7 +999,7 @@ static void genBodyOfTargetOp(
             firOpBuilder, copyVal.getLoc(), copyVal,
             /*varPtrPtr=*/mlir::Value{}, name.str(), bounds,
             /*members=*/llvm::SmallVector<mlir::Value>{},
-            /*membersIndex=*/mlir::DenseIntElementsAttr{},
+            /*membersIndex=*/mlir::ArrayAttr{},
             static_cast<
                 std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                 llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT),
@@ -1781,7 +1781,7 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
       mlir::Value mapOp = createMapInfoOp(
           firOpBuilder, location, baseOp, /*varPtrPtr=*/mlir::Value{},
           name.str(), bounds, /*members=*/{},
-          /*membersIndex=*/mlir::DenseIntElementsAttr{},
+          /*membersIndex=*/mlir::ArrayAttr{},
           static_cast<
               std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
               mapFlag),
diff --git a/flang/lib/Lower/OpenMP/Utils.cpp b/flang/lib/Lower/OpenMP/Utils.cpp
index a09d91540ec222..d96a3c7a3c3714 100644
--- a/flang/lib/Lower/OpenMP/Utils.cpp
+++ b/flang/lib/Lower/OpenMP/Utils.cpp
@@ -13,6 +13,8 @@
 #include "Utils.h"
 
 #include "Clauses.h"
+#include <DirectivesCommon.h>
+
 #include <flang/Lower/AbstractConverter.h>
 #include <flang/Lower/ConvertType.h>
 #include <flang/Lower/PFTBuilder.h>
@@ -21,11 +23,10 @@
 #include <flang/Parser/parse-tree.h>
 #include <flang/Parser/tools.h>
 #include <flang/Semantics/tools.h>
+#include <iterator>
+#include <llvm-14/llvm/ADT/STLExtras.h>
 #include <llvm/Support/CommandLine.h>
 
-#include <algorithm>
-#include <numeric>
-
 llvm::cl::opt<bool> treatIndexAsSection(
     "openmp-treat-index-as-section",
     llvm::cl::desc("In the OpenMP data clauses treat `a(N)` as `a(N:N)`."),
@@ -119,10 +120,10 @@ void gatherFuncAndVarSyms(
 
 mlir::omp::MapInfoOp
 createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
-                mlir::Value baseAddr, mlir::Value varPtrPtr, std::string name,
-                llvm::ArrayRef<mlir::Value> bounds,
-                llvm::ArrayRef<mlir::Value> members,
-                mlir::DenseIntElementsAttr membersIndex, uint64_t mapType,
+                mlir::Value baseAddr, mlir::Value varPtrPtr,
+                llvm::StringRef name, mlir::ArrayRef<mlir::Value> bounds,
+                mlir::ArrayRef<mlir::Value> members,
+                mlir::ArrayAttr membersIndex, uint64_t mapType,
                 mlir::omp::VariableCaptureKind mapCaptureType, mlir::Type retTy,
                 bool partialMap) {
   if (auto boxTy = llvm::dyn_cast<fir::BaseBoxType>(baseAddr.getType())) {
@@ -145,11 +146,294 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
       builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
       builder.getAttr<mlir::omp::VariableCaptureKindAttr>(mapCaptureType),
       builder.getStringAttr(name), builder.getBoolAttr(partialMap));
-
   return op;
 }
 
-static int
+// This function gathers the individual omp::Object's that make up an
+// larger omp::Object symbol.
+//
+// For example, provided the larger symbol: "parent%child%member", this
+// function breaks it up into it's constituent components ("parent",
+// "child", "member"), so we can access each individual component and
+// introspect details, important to note this function breaks it up from
+// RHS to LHS ("member" to "parent") and then we reverse it so that the
+// returned omp::ObjectList is LHS to RHS, with the "parent" at the
+// beginning.
+omp::ObjectList gatherObjectsOf(omp::Object derivedTypeMember,
+                                semantics::SemanticsContext &semaCtx) {
+  omp::ObjectList objList;
+  std::optional<omp::Object> baseObj = derivedTypeMember;
+  while (baseObj.has_value()) {
+    objList.push_back(baseObj.value());
+    baseObj = getBaseObject(baseObj.value(), semaCtx);
+  }
+  return omp::ObjectList{llvm::reverse(objList)};
+}
+
+// This function generates a series of indices from a provided omp::Object,
+// that devolves to an ArrayRef symbol, e.g. "array(2,3,4)", this function
+// would generate a series of indices of "[1][2][3]" for the above example,
+// offsetting by -1 to account for the non-zero fortran indexes.
+//
+// These indices can then be provided to a coordinate operation or other
+// GEP-like operation to access the relevant positional member of the
+// array.
+//
+// It is of note that the function only supports subscript integers currently
+// and not Triplets i.e. Array(1:2:3).
+static void generateArrayIndices(lower::AbstractConverter &converter,
+                                 fir::FirOpBuilder &firOpBuilder,
+                                 lower::StatementContext &stmtCtx,
+                                 mlir::Location clauseLocation,
+                                 llvm::SmallVectorImpl<mlir::Value> &indices,
+                                 omp::Object object) {
+  auto maybeRef = evaluate::ExtractDataRef(*object.ref());
+  if (!maybeRef)
+    return;
+
+  auto *arr = std::get_if<evaluate::ArrayRef>(&maybeRef->u);
+  if (!arr)
+    return;
+
+  for (auto v : arr->subscript()) {
+    if (std::holds_alternative<Triplet>(v.u)) {
+      llvm_unreachable("Triplet indexing in map clause is unsupported");
+    } else {
+      auto expr =
+          std::get<Fortran::evaluate::IndirectSubscriptIntegerExpr>(v.u);
+      mlir::Value subscript =
+          fir::getBase(converter.genExprValue(toEvExpr(expr.value()), stmtCtx));
+      mlir::Value one = firOpBuilder.createIntegerConstant(
+          clauseLocation, firOpBuilder.getIndexType(), 1);
+      subscript = firOpBuilder.createConvert(
+          clauseLocation, firOpBuilder.getIndexType(), subscript);
+      indices.push_back(firOpBuilder.create<mlir::arith::SubIOp>(
+          clauseLocation, subscript, one));
+    }
+  }
+}
+
+/// When mapping members of derived types, there is a chance that one of the
+/// members along the way to a mapped member is an descriptor. In which case
+/// we have to make sure we generate a map for those along the way otherwise
+/// we will be missing a chunk of data required to actually map the member
+/// type to device. This function effectively generates these maps and the
+/// appropriate data accesses required to generate these maps. It will avoid
+/// creating duplicate maps, as duplicates are just as bad as unmapped
+/// descriptor data in a lot of cases for the runtime (and unnecessary
+/// data movement should be avoided where possible).
+///
+/// As an example for the following mapping:
+///
+/// type :: vertexes
+///     integer(4), allocatable :: vertexx(:)
+///     integer(4), allocatable :: vertexy(:)
+/// end type vertexes
+///
+/// type :: dtype
+///     real(4) :: i
+///     type(vertexes), allocatable :: vertexes(:)
+/// end type dtype
+///
+/// type(dtype), allocatable :: alloca_dtype
+///
+/// !$omp target map(tofrom: alloca_dtype%vertexes(N1)%vertexx)
+///
+/// The below HLFIR/FIR is generated (trimmed for conciseness):
+///
+/// On the first iteration we index into the record type alloca_dtype
+/// to access "vertexes", we then generate a map for this descriptor
+/// alongside bounds to indicate we only need the 1 member, rather than
+/// the whole array block in this case (In theory we could map its
+/// entirety at the cost of data transfer bandwidth).
+///
+/// %13:2 = hlfir.declare ... "alloca_dtype" ...
+/// %39 = fir.load %13#0 : ...
+/// %40 = fir.coordinate_of %39, %c1 : ...
+/// %51 = omp.map.info var_ptr(%40 : ...) map_clauses(to) capture(ByRef) ...
+/// %52 = fir.load %40 : ...
+///
+/// Second iteration generating access to "vertexes(N1) utilising the N1 index
+/// %53 = load N1 ...
+/// %54 = fir.convert %53 : (i32) -> i64
+/// %55 = fir.convert %54 : (i64) -> index
+/// %56 = arith.subi %55, %c1 : index
+/// %57 = fir.coordinate_of %52, %56 : ...
+///
+/// Still in the second iteration we access the allocatable member "vertexx",
+/// we return %58 from the function and provide it to the final and "main"
+/// map of processMap (generated by the record type segment of the below
+/// function), if this were not the final symbol in the list, i.e. we accessed
+/// a member below vertexx, we would have generated the map below as we did in
+/// the first iteration and then continue to generate further coordinates to
+/// access further components as required.
+///
+/// %58 = fir.coordinate_of %57, %c0 : ...
+/// %61 = omp.map.info var_ptr(%58 : ...) map_clauses(to) capture(ByRef) ...
+///
+/// Parent mapping containing prior generated mapped members, generated at
+/// a later step but here to showcase the "end" result
+///
+/// omp.map.info var_ptr(%13#1 : ...) map_clauses(to) capture(ByRef)
+///   members(%50, %61 : [0, 1, 0], [0, 1, 0] : ...
+///
+/// \param objectList - The list of omp::Object symbol data for each parent
+///  to the mapped member (also includes the mapped member), generated via
+///  gatherObjectsOf.
+/// \param indices - List of index data associated with the mapped member
+///   symbol, which identifies the placement of the member in its parent,
+///   this helps generate the appropriate member accesses. These indices
+///   can be generated via generateMemberPlacementIndices.
+/// \param asFortran - A string generated from the mapped variable to be
+///   associated with the main map, generally (but not restricted to)
+///   generated via gatherDataOperandAddrAndBounds or other
+///   DirectiveCommons.hpp utilities.
+/// \param mapTypeBits - The map flags that will be associated with the
+///   generated maps, minus alterations of the TO and FROM bits for the
+///   intermediate components to prevent accidental overwriting on device
+///   write back.
+mlir::Value createParentSymAndGenIntermediateMaps(
+    mlir::Location clauseLocation, lower::AbstractConverter &converter,
+    semantics::SemanticsContext &semaCtx, lower::StatementContext &stmtCtx,
+    omp::ObjectList &objectList, llvm::SmallVector<int64_t> &indices,
+    OmpMapParentAndMemberData &parentMemberIndices, std::string asFortran,
+    llvm::omp::OpenMPOffloadMappingFlags mapTypeBits) {
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+  /// Checks if an omp::Obj...
[truncated]

Copy link

github-actions bot commented Oct 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from 8f0c362 to 89747db Compare October 24, 2024 12:49
@agozillon
Copy link
Contributor Author

Would love a review on this whenever you all get a bit of spare time! Thank you very much for your help and time it's greatly appreciated :-)

@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from 113f7a9 to b27db91 Compare October 30, 2024 17:53
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from 89747db to 28f4bf9 Compare October 30, 2024 17:56
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from b27db91 to 70265b8 Compare October 31, 2024 11:58
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch 3 times, most recently from 23663c6 to c7016d2 Compare October 31, 2024 14:17
Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew. Compared the new PR against my comments in the old one and seems ok to me.

However, the PR is huuuuge and I cannot claim I got every little detail of what is going on, therefore, it would be better to wait for other approvals.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Andrew for all this work. I went through your PR and I have some comments, the vast majority of which are only nits.

Generally, this looks ok to me, though it's a very large and complex patch, so I'm putting a good amount of trust in unit tests and in that you know what you're doing 😅. Not sure if anything can be done here to make it simpler for other people to follow, since you've already introduced a good amount of comments.

@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from 70265b8 to d0373c8 Compare November 8, 2024 04:24
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from c7016d2 to 2a5db63 Compare November 8, 2024 04:25
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from d0373c8 to 8dcb02f Compare November 8, 2024 04:33
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from 2a5db63 to 3f95334 Compare November 8, 2024 04:34
Copy link
Contributor Author

@agozillon agozillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to incorporate the nits where feasible and replied in comment where it wasn't quite possible for various reasons, think there's only two cases of that though! Thank you very much for the reviews everyone, it's greatly appreciated.

P.S. sorry for the noise from the pushes, accidentally rebased everything on main instead of each of the relevant PR stack components when performing a rebase!

llvm::SmallVector<mlir::Value> intermBounds;
if (i + 1 < objectList.size() &&
objectList[i + 1].sym()->IsObjectArray()) {
std::stringstream intermFortran;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going more for a shorthand of "intermediate" :-) but not fussed if you think interim works as well! And I did misspell the indices, from the rest, so need to tweak it all to be inline anyway!

@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from 3f95334 to aa16b36 Compare November 8, 2024 06:48
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Andrew for all your work on this, LGTM!

@agozillon
Copy link
Contributor Author

Thank you very much @skatrak and @ergawy, I'll land this PR stack on either Friday or the coming Monday, going to give a few days leeway incase anyone else wishes to make any comments!

@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from 8dcb02f to 7a6a02b Compare November 16, 2024 11:02
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from aa16b36 to 64946da Compare November 16, 2024 11:03
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-2 branch from 7a6a02b to c366a1a Compare November 16, 2024 11:26
Base automatically changed from users/agozillo/alloca-member-map-2 to main November 16, 2024 11:26
This PR is one of 3 in a PR stack, this is the primary change set which seeks
to extend the current derived type explicit member mapping support to
handle descriptor member mapping at arbitrary levels of nesting. The PR
stack seems to do this reasonably (from testing so far) but as you can
create quite complex mappings with derived types (in particular when adding
allocatable derived types or arrays of allocatable derived types) I imagine
there will be hiccups, which I am more than happy to address. There will
also be further extensions to this work to handle the implicit auto-magical
mapping of descriptor members in derived types and a few other changes
planned for the future (with some ideas on optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and
the OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of
these types or do a "deep copy" for lack of a better term, where we map
both the base address and the descriptor as without the copying of both
of these we lack the information in the case of the descriptor to access the
member or attach the pointers data to the pointer and in the latter case we
require the base address to map the chunk of data. Currently we do not
segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future as
it adds a fair bit of a performance penalty to having nestings of allocatable
derived types as an example. The process of mapping all intermediate
descriptor members in a members path only occurs if a member has
an allocatable or object parent in its symbol path or the member itself
is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also
generate the appropriate address for the allocatable member
within the derived type to use as a the varPtr field of the map (for
intermediate allocatable maps and final allocatable mappings). In
this case it's necessary as we can't utilise the usual Fortran::lower
functionality such as gatherDataOperandAddrAndBounds without
causing issues later in the lowering due to extra allocas being spawned
which seem to affect the pointer attachment (at least this is my
current assumption, it results in memory access errors on the device
due to incorrect map information generation). This is similar
to why we do not use the MLIR value generated for this and utilise
the original symbol provided when mapping descriptor types external
to derived types. Hopefully this can be rectified in the future so this
function can be simplified and more closely aligned to the other type
mappings. We also make use of fir::CoordinateOp as opposed to the
HLFIR version as the HLFIR version doesn't support the appropriate
lowering to FIR necessary at the moment, we also cannot use a
single CoordinateOp (similarly to a single GEP) as when we index
through a descriptor operation (BoxType) we encounter issues later
in the lowering, however in either case we need access to intermediate
descriptors so individual CoordinateOp's aid this (although, being
able to compress them into a smaller amount of CoordinateOp's may
simplify the IR and perhaps result in a better end product, something
to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box
types (or multiple nestings of box types) within derived types, or box
type derived types. This requires expanding each BoxType mapping
from one into two maps and then modifying all of the existing
member indices of the overarching parent mapping to account for
the addition of these new members alongside adjusting the existing
member indices to support the addition of these new maps which
extend the original member indices (as a base address of a box type
is currently considered a member of the box type at a position of
0 as when lowered to LLVM-IR it's a pointer contained at this position
in the descriptor type, however, this means extending mapped children
of this expanded descriptor type to additionally incorporate the new
member index in the correct location in its own index list). I believe
there is a reasonable amount of comments that should aid in
understanding this better, alongside the test alterations for the pass.

A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr
containing the member indices shareable across the lowering and
OMPMapInfoFinalization, this required moving some functions to the
Lower/Support/Utils.h header, and transforming the lowering structure
containing the member index data into something more similar to the
version used in OMPMapInfoFinalization. There we also some other
attempts at tidying things up in relation to the member index data
generation in the lowering, some of which required creating a logical
operator for the OpenMP ID class so it can be utilised as a map key
(it simply utilises the symbol address for the moment as ordering
isn't particularly important).

Otherwise I have added a set of new tests encompassing some of
the mappings currently supported by this PR (unfortunately as
you can have arbitrary nestings of all shapes and types it's not
very feasible to cover them all).
@agozillon agozillon force-pushed the users/agozillo/alloca-member-map-3 branch from 64946da to f3d5168 Compare November 16, 2024 11:27
@agozillon agozillon merged commit e508bac into main Nov 16, 2024
5 of 8 checks passed
@agozillon agozillon deleted the users/agozillo/alloca-member-map-3 branch November 16, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants