Skip to content

[Flang][OpenMP][MLIR] Initial array section mapping MLIR -> LLVM-IR lowering utilising omp.bounds #68689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

agozillon
Copy link
Contributor

@agozillon agozillon commented Oct 10, 2023

This patch seeks to add initial lowering of OpenMP array sections within target region map clauses from MLIR to LLVM IR.

This patch seeks to support fixed sized contiguous (don't think OpenMP supports anything other than contiguous sections from my reading but i could be wrong) arrays initially, before looking toward assumed size and shaped arrays. The patch also currently does not include stride, it's left for future work.

Although, assumed size works in some fashion (dummy arguments) with some minor alterations to the OMPEarlyOutliner, so it is possible changes made in the IsolatedFromAbove series may allow this to work with no further required patches.

It utilises the generated omp.bounds to calculate the size of the mapped OpenMP array (both for sectioned and un-sectioned arrays) as well as the offset to be passed to the kernel argument structure.

Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values.

The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added.

A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2023

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-mlir-openmp

Author: None (agozillon)

Changes

This patch seeks to add initial lowering of OpenMP array sections within target region map clauses from MLIR to LLVM IR.

This patch seeks to support fixed sized arrays initially, before looking toward assumed size and shaped arrays.

Although, assumed size works in some fashion (dummy arguments) with some minor alterations to the OMPEarlyOutliner, so it is possible changes made in the IsolatedFromAbove series may allow this to work with no further required patches.

It utilises the generated omp.bounds to calculate the size of the mapped OpenMP array (both for sectioned and un-sectioned arrays) as well as the offset to be passed to the kernel argument structure.

Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values.

The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added.

A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.


Patch is 49.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/68689.diff

8 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+372-140)
  • (added) mlir/test/Target/LLVMIR/omptarget-array-sectioning-device.mlir (+78)
  • (added) mlir/test/Target/LLVMIR/omptarget-array-sectioning-host.mlir (+80)
  • (modified) mlir/test/Target/LLVMIR/omptarget-llvm.mlir (+43-18)
  • (added) openmp/libomptarget/test/offloading/fortran/basic-target-region-1D-array-section.f90 (+27)
  • (added) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array-section.f90 (+39)
  • (added) openmp/libomptarget/test/offloading/fortran/basic-target-region-3D-array.f90 (+45)
  • (added) openmp/libomptarget/test/offloading/fortran/basic-target-region-array.f90 (+29)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 1ec3bb8e7562a9e..e2a408ffd810e2e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1528,13 +1528,6 @@ convertOmpThreadprivate(Operation &opInst, llvm::IRBuilderBase &builder,
   return success();
 }
 
-int64_t getSizeInBytes(DataLayout &DL, const Type &type) {
-  if (isa<LLVM::LLVMPointerType>(type))
-    return DL.getTypeSize(cast<LLVM::LLVMPointerType>(type).getElementType());
-
-  return 0;
-}
-
 static llvm::OffloadEntriesInfoManager::OMPTargetDeviceClauseKind
 convertToDeviceClauseKind(mlir::omp::DeclareTargetDeviceType deviceClause) {
   switch (deviceClause) {
@@ -1629,13 +1622,153 @@ getRefPtrIfDeclareTarget(mlir::Value value,
   return nullptr;
 }
 
+// A small helper structure to contain data gathered
+// for map lowering and coalese it into one area and
+// avoiding extra computations such as searches in the
+// llvm module for lowered mapped varibles or checking
+// if something is declare target (and retrieving the
+// value).
+struct MapData {
+  bool isDeclareTarget = false;
+  mlir::Operation *mapClause;
+  llvm::Value *basePointer;
+  llvm::Value *pointer;
+  llvm::Value *kernelValue;
+  llvm::Type *underlyingType;
+  llvm::Value *sizeInBytes;
+};
+
+uint64_t getArrayElementSizeInBits(LLVM::LLVMArrayType arrTy, DataLayout &dl) {
+  if (auto nestedArrTy = llvm::dyn_cast_if_present<LLVM::LLVMArrayType>(
+          arrTy.getElementType()))
+    return getArrayElementSizeInBits(nestedArrTy, dl);
+  return dl.getTypeSizeInBits(arrTy.getElementType());
+}
+
+// This function calculates the size to be offloaded for a specified type, given
+// its associated map clause (which can contain bounds information which affects
+// the total size), this size is calculated based on the underlying element type
+// e.g. given a 1-D array of ints, we will calculate the size from the integer
+// type * number of elements in the array. This size can be used in other
+// calculations but is ultimately used as an argument to the OpenMP runtimes
+// kernel argument structure which is generated through the combinedInfo data
+// structures.
+// This function is somewhat equivalent to Clang's getExprTypeSize inside of
+// CGOpenMPRuntime.cpp.
+llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type,
+                            Operation *clauseOp, llvm::IRBuilderBase &builder,
+                            LLVM::ModuleTranslation &moduleTranslation) {
+  // utilising getTypeSizeInBits instead of getTypeSize as getTypeSize gives
+  // the size in inconsistent byte or bit format.
+  uint64_t underlyingTypeSzInBits = dl.getTypeSizeInBits(type);
+  if (auto ptrTy = llvm::dyn_cast_if_present<LLVM::LLVMPointerType>(type)) {
+    underlyingTypeSzInBits = dl.getTypeSizeInBits(ptrTy.getElementType());
+    if (auto arrTy = llvm::dyn_cast_if_present<LLVM::LLVMArrayType>(
+            ptrTy.getElementType())) {
+      underlyingTypeSzInBits = getArrayElementSizeInBits(arrTy, dl);
+    }
+  }
+
+  if (auto memberClause =
+          mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(clauseOp)) {
+    // This calculates the size to transfer based on bounds and the underlying
+    // element type, provided bounds have been specified (Fortran
+    // pointers/allocatables/target and arrays that have sections specified fall
+    // into this as well).
+    if (!memberClause.getBounds().empty()) {
+      llvm::Value *elementCount = nullptr;
+      for (auto bounds : memberClause.getBounds()) {
+        if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
+                bounds.getDefiningOp())) {
+
+          if (!elementCount) {
+            elementCount = builder.CreateAdd(
+                builder.CreateSub(
+                    moduleTranslation.lookupValue(boundOp.getUpperBound()),
+                    moduleTranslation.lookupValue(boundOp.getLowerBound())),
+                builder.getInt64(1));
+          } else {
+            elementCount = builder.CreateMul(
+                elementCount,
+                builder.CreateAdd(
+                    builder.CreateSub(
+                        moduleTranslation.lookupValue(boundOp.getUpperBound()),
+                        moduleTranslation.lookupValue(boundOp.getLowerBound())),
+                    builder.getInt64(1)));
+          }
+        }
+      }
+
+      // The size in bytes x number of elements, the sizeInBytes stored is
+      // the underyling types size, e.g. if ptr<i32>, it'll be the i32's
+      // size, so we do some on the fly runtime math to get the size in
+      // bytes from the extent (ub - lb) * sizeInBytes. NOTE: This may need
+      // some adjustment for members with more complex types.
+      return builder.CreateMul(elementCount,
+                               builder.getInt64(underlyingTypeSzInBits / 8));
+    }
+  }
+
+  return builder.getInt64(underlyingTypeSzInBits / 8);
+}
+
+// Get the underlying LLVM type, this will bypass the pointer and
+// access the underlying type. Which bypasses llvm's opaque pointers
+// to get the underlying type via MLIR.
+llvm::Type *getLLVMIRType(mlir::Type inputType,
+                          LLVM::ModuleTranslation &moduleTranslation) {
+  llvm::Type *type = moduleTranslation.convertType(inputType);
+  if (auto pointerType =
+          llvm::dyn_cast<mlir::omp::PointerLikeType>(inputType)) {
+    if (auto eleType = pointerType.getElementType()) {
+      type = moduleTranslation.convertType(eleType);
+    }
+  }
+  return type;
+}
+
+void collectMapDataFromMapOperands(llvm::SmallVectorImpl<MapData> &mapData,
+                                   llvm::SmallVectorImpl<Value> &mapOperands,
+                                   LLVM::ModuleTranslation &moduleTranslation,
+                                   DataLayout &dl,
+                                   llvm::IRBuilderBase &builder) {
+  auto getMapData = [&](Value val, Operation *clauseOp) -> MapData {
+    MapData map;
+    if (llvm::Value *refPtr = getRefPtrIfDeclareTarget(
+            val, moduleTranslation)) { // declare target
+      map.kernelValue = moduleTranslation.lookupValue(val);
+      map.isDeclareTarget = true;
+      map.basePointer = refPtr;
+      map.pointer = map.kernelValue;
+    } else { // regular mapped variable
+      map.kernelValue = moduleTranslation.lookupValue(val);
+      map.isDeclareTarget = false;
+      map.basePointer = map.kernelValue;
+      map.pointer = map.kernelValue;
+    }
+
+    map.sizeInBytes =
+        getSizeInBytes(dl, val.getType(), clauseOp, builder, moduleTranslation);
+    map.underlyingType = getLLVMIRType(val.getType(), moduleTranslation);
+    map.mapClause = clauseOp;
+
+    return map;
+  };
+
+  for (mlir::Value mapValue : mapOperands) {
+    if (auto mapOp = mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(
+            mapValue.getDefiningOp())) {
+      mapData.push_back(getMapData(mapOp.getVarPtr(), mapOp));
+    }
+  }
+}
+
 // Generate all map related information and fill the combinedInfo.
 static void genMapInfos(llvm::IRBuilderBase &builder,
                         LLVM::ModuleTranslation &moduleTranslation,
-                        DataLayout &DL,
+                        DataLayout &dl,
                         llvm::OpenMPIRBuilder::MapInfosTy &combinedInfo,
-                        const SmallVector<Value> &mapOperands,
-                        const ArrayAttr &mapTypes,
+                        llvm::SmallVectorImpl<MapData> &mapData,
                         const SmallVector<Value> &devPtrOperands = {},
                         const SmallVector<Value> &devAddrOperands = {},
                         bool isTargetParams = false) {
@@ -1650,58 +1783,39 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
     combinedInfo.Names.clear();
   };
 
-  auto findMapInfo = [&combinedInfo](llvm::Value *val, unsigned &index) {
-    index = 0;
-    for (auto basePtr : combinedInfo.BasePointers) {
-      if (basePtr == val)
-        return true;
-      index++;
-    }
-    return false;
-  };
-
-  unsigned index = 0;
-  for (const auto &mapOp : mapOperands) {
-    // Unlike dev_ptr and dev_addr operands these map operands point
-    // to a map entry operation which contains further information
-    // on the variable being mapped and how it should be mapped.
-    auto mapInfoOp =
-        mlir::dyn_cast<mlir::omp::MapInfoOp>(mapOp.getDefiningOp());
-
-    // TODO: Only LLVMPointerTypes are handled.
-    if (!mapInfoOp.getType().isa<LLVM::LLVMPointerType>())
-      return fail();
-
-    llvm::Value *mapOpValue =
-        moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
-
-    llvm::Value *refPtr =
-        getRefPtrIfDeclareTarget(mapInfoOp.getVarPtr(), moduleTranslation);
-
-    combinedInfo.BasePointers.emplace_back(refPtr ? refPtr : mapOpValue);
-    combinedInfo.Pointers.emplace_back(mapOpValue);
-    combinedInfo.DevicePointers.emplace_back(
-        llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-    combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
-        mapInfoOp.getVarPtr().getLoc(), *ompBuilder));
-
-    auto mapFlag = llvm::omp::OpenMPOffloadMappingFlags(
-        mapTypes[index].cast<IntegerAttr>().getUInt());
+  for (MapData &mapValue : mapData) {
+    auto mapOp = mlir::dyn_cast<mlir::omp::MapInfoOp>(mapValue.mapClause);
+    auto mapFlag =
+        llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value());
 
     // Declare Target Mappings are excluded from being marked as
     // OMP_MAP_TARGET_PARAM as they are not passed as parameters, they're marked
     // with OMP_MAP_PTR_AND_OBJ instead.
-    if (refPtr)
+    if (mapValue.isDeclareTarget)
       mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_PTR_AND_OBJ;
     else if (isTargetParams)
       mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TARGET_PARAM;
 
+    combinedInfo.BasePointers.emplace_back(mapValue.basePointer);
+    combinedInfo.Pointers.emplace_back(mapValue.pointer);
+    combinedInfo.DevicePointers.emplace_back(
+        llvm::OpenMPIRBuilder::DeviceInfoTy::None);
+    combinedInfo.Names.emplace_back(
+        LLVM::createMappingInformation(mapOp.getLoc(), *ompBuilder));
     combinedInfo.Types.emplace_back(mapFlag);
-    combinedInfo.Sizes.emplace_back(
-        builder.getInt64(getSizeInBytes(DL, mapInfoOp.getVarPtr().getType())));
-    index++;
+    combinedInfo.Sizes.emplace_back(mapValue.sizeInBytes);
   }
 
+  auto findMapInfo = [&combinedInfo](llvm::Value *val, unsigned &index) {
+    index = 0;
+    for (llvm::Value *basePtr : combinedInfo.BasePointers) {
+      if (basePtr == val)
+        return true;
+      index++;
+    }
+    return false;
+  };
+
   auto addDevInfos = [&, fail](auto devOperands, auto devOpType) -> void {
     for (const auto &devOp : devOperands) {
       // TODO: Only LLVMPointerTypes are handled.
@@ -1747,19 +1861,6 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
-  auto getMapTypes = [](mlir::OperandRange mapOperands,
-                        mlir::MLIRContext *ctx) {
-    SmallVector<mlir::Attribute> mapTypes;
-    for (auto mapValue : mapOperands) {
-      if (mapValue.getDefiningOp()) {
-        auto mapOp =
-            mlir::dyn_cast<mlir::omp::MapInfoOp>(mapValue.getDefiningOp());
-        mapTypes.push_back(mapOp.getMapTypeAttr());
-      }
-    }
-    return mlir::ArrayAttr::get(ctx, mapTypes);
-  };
-
   LogicalResult result =
       llvm::TypeSwitch<Operation *, LogicalResult>(op)
           .Case([&](omp::DataOp dataOp) {
@@ -1773,8 +1874,6 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
                   deviceID = intAttr.getInt();
 
             mapOperands = dataOp.getMapOperands();
-            mapTypes =
-                getMapTypes(dataOp.getMapOperands(), dataOp->getContext());
             useDevPtrOperands = dataOp.getUseDevicePtr();
             useDevAddrOperands = dataOp.getUseDeviceAddr();
             return success();
@@ -1793,8 +1892,6 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
                   deviceID = intAttr.getInt();
             RTLFn = llvm::omp::OMPRTL___tgt_target_data_begin_mapper;
             mapOperands = enterDataOp.getMapOperands();
-            mapTypes = getMapTypes(enterDataOp.getMapOperands(),
-                                   enterDataOp->getContext());
             return success();
           })
           .Case([&](omp::ExitDataOp exitDataOp) {
@@ -1812,8 +1909,6 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
 
             RTLFn = llvm::omp::OMPRTL___tgt_target_data_end_mapper;
             mapOperands = exitDataOp.getMapOperands();
-            mapTypes = getMapTypes(exitDataOp.getMapOperands(),
-                                   exitDataOp->getContext());
             return success();
           })
           .Default([&](Operation *op) {
@@ -1826,17 +1921,20 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
 
   using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
 
+  llvm::SmallVector<MapData> mapData;
+  collectMapDataFromMapOperands(mapData, mapOperands, moduleTranslation, DL,
+                                builder);
+
   // Fill up the arrays with all the mapped variables.
   llvm::OpenMPIRBuilder::MapInfosTy combinedInfo;
   auto genMapInfoCB =
       [&](InsertPointTy codeGenIP) -> llvm::OpenMPIRBuilder::MapInfosTy & {
     builder.restoreIP(codeGenIP);
-    if (auto DataOp = dyn_cast<omp::DataOp>(op)) {
-      genMapInfos(builder, moduleTranslation, DL, combinedInfo, mapOperands,
-                  mapTypes, useDevPtrOperands, useDevAddrOperands);
+    if (auto dataOp = dyn_cast<omp::DataOp>(op)) {
+      genMapInfos(builder, moduleTranslation, DL, combinedInfo, mapData,
+                  useDevPtrOperands, useDevAddrOperands);
     } else {
-      genMapInfos(builder, moduleTranslation, DL, combinedInfo, mapOperands,
-                  mapTypes);
+      genMapInfos(builder, moduleTranslation, DL, combinedInfo, mapData);
     }
     return combinedInfo;
   };
@@ -1988,38 +2086,53 @@ static bool targetOpSupported(Operation &opInst) {
 }
 
 static void
-handleDeclareTargetMapVar(llvm::ArrayRef<Value> mapOperands,
+handleDeclareTargetMapVar(llvm::ArrayRef<MapData> mapData,
                           LLVM::ModuleTranslation &moduleTranslation,
                           llvm::IRBuilderBase &builder) {
-  for (const mlir::Value &mapOp : mapOperands) {
-    auto mapInfoOp =
-        mlir::dyn_cast<mlir::omp::MapInfoOp>(mapOp.getDefiningOp());
-    llvm::Value *mapOpValue =
-        moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
-    if (auto *declareTarget = getRefPtrIfDeclareTarget(mapInfoOp.getVarPtr(),
-                                                       moduleTranslation)) {
-      // The user's iterator will get invalidated if we modify an element,
+  for (const auto &mapValue : mapData) {
+    // In the case of declare target mapped variables, the basePointer is
+    // the reference pointer generated by the convertDeclareTargetAttr
+    // method. Whereas the kernelValue is the original variable, so for
+    // the device we must replace all uses of this original global variable
+    // (stored in kernelValue) with the reference pointer (stored in
+    // basePointer for declare target mapped variables), as for device the
+    // data is mapped into this reference pointer and should be loaded
+    // from it, the original variable is discarded. On host both exist and
+    // metadata is generated (elsewhere in the convertDeclareTargetAttr)
+    // function to link the two variables in the runtime and then both the
+    // reference pointer and the pointer are assigned in the kernel argument
+    // structure for the host.
+    if (mapValue.isDeclareTarget) {
+      // The users iterator will get invalidated if we modify an element,
       // so we populate this vector of uses to alter each user on an individual
       // basis to emit its own load (rather than one load for all).
       llvm::SmallVector<llvm::User *> userVec;
-      for (llvm::User *user : mapOpValue->users())
+      for (llvm::User *user : mapValue.kernelValue->users())
         userVec.push_back(user);
 
-      for (llvm::User *user : userVec) {
+      for (auto *user : userVec) {
         if (auto *insn = dyn_cast<llvm::Instruction>(user)) {
-          auto *load = builder.CreateLoad(
-              moduleTranslation.convertType(mapInfoOp.getVarPtr().getType()),
-              declareTarget);
+          auto *load = builder.CreateLoad(mapValue.basePointer->getType(),
+                                          mapValue.basePointer);
           load->moveBefore(insn);
-          user->replaceUsesOfWith(mapOpValue, load);
+          user->replaceUsesOfWith(mapValue.kernelValue, load);
         }
       }
     }
   }
 }
 
+// This currently implements a very light version of Clang's
+// EmitParmDecl's handling of direct argument handling as well
+// as a portion of the argument access generation based on
+// capture types found at the end of emitOutlinedFunctionPrologue
+// in Clang. The indirect path handling of EmitParmDecl's may be
+// required for future work, but a direct 1-to-1 copy doesn't seem
+// possible as the logic is rather scattered throughout Clang's
+// lowering and perhaps we wish to deviate slightly.
 static llvm::IRBuilderBase::InsertPoint
-createDeviceArgumentAccessor(llvm::Argument &arg, llvm::Value *input,
+createDeviceArgumentAccessor(llvm::SmallVectorImpl<MapData> &mapData,
+                             llvm::Argument &arg, llvm::Value *input,
                              llvm::Value *&retVal, llvm::IRBuilderBase &builder,
                              llvm::OpenMPIRBuilder &ompBuilder,
                              LLVM::ModuleTranslation &moduleTranslation,
@@ -2027,22 +2140,152 @@ createDeviceArgumentAccessor(llvm::Argument &arg, llvm::Value *input,
                              llvm::IRBuilderBase::InsertPoint codeGenIP) {
   builder.restoreIP(allocaIP);
 
-  llvm::Value *addr =
+  mlir::omp::VariableCaptureKind capture =
+      mlir::omp::VariableCaptureKind::ByRef;
+  llvm::Type *inputType = input->getType();
+
+  // Find the associated MapData entry for the current input
+  for (const MapData &map : mapData) {
+    if (map.kernelValue == input) {
+      if (auto mapOp =
+              mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(map.mapClause)) {
+        capture = mapOp.getMapCaptureType().value_or(
+            mlir::omp::VariableCaptureKind::ByRef);
+      }
+
+      inputType = map.underlyingType;
+      break;
+    }
+  }
+
+  unsigned int allocaAS = ompBuilder.M.getDataLayout().getAllocaAddrSpace();
+  unsigned int defaultAS =
+      ompBuilder.M.getDataLayout().getProgramAddressSpace();
+
+  // Create the alloca for the argument the current point.
+  llvm::Value *v =
       builder.CreateAlloca(arg.getType()->isPointerTy()
                                ? arg.getType()
                                : llvm::Type::getInt64Ty(builder.getContext()),
-                           ompBuilder.M.getDataLayout().getAllocaAddrSpace());
-  llvm::Value *addrAscast =
-      builder.CreatePointerBitCastOrAddrSpaceCast(addr, input->getType());
-  builder.CreateStore(&arg, addrAscast);
+                           allocaAS);
+
+  if (allocaAS != defaultAS) {
+    v = builder.CreatePointerBitCastOrAddrSpaceCast(
+        v, arg.getType()->getPointerTo(defaultAS));
+  }
+
+  builder.CreateStore(&arg, v);
 
   builder.restoreIP(codeGenIP);
 
-  retVal = builder.CreateLoad(arg.getType(), addrAscast);
+  switch (capture) {
+  case mlir::omp::VariableCaptureKind::ByCopy: {
+    if (inputType->isPointerTy()) {
+      retVal = v;
+      return builder.saveIP();
+    }
+
+    // Ignore conversions like int -> uint.
+    if (v->getType() == inputType->getPointerTo()) {
+      retVal = v;
+      return builder.saveIP();
+    }
+
+    assert(false && "Currently unsupported OMPTargetVarCaptureByCopy Type");
+    break;
+  }
+  case mlir::omp::VariableCaptureKind::ByRef: {
+    retVal = builder.CreateAlignedLoad(
+        v->getType(), v,
+        ompBuilder.M.getDataLayout().get...
[truncated]

@agozillon
Copy link
Contributor Author

A large portion of this patch is unfortunately the addition of the MapData structure and the addition of utilising byref/bycopy but they're rather linked together. If there's a wish to split the patch please do suggest how you would like the split (e.g. what portions into seperate patches) done,

@agozillon
Copy link
Contributor Author

Small ping for some reviewer attention on this if at all possible please.

@kiranchandramohan
Copy link
Contributor

@TIFitis could you review? As this is changing code that you initially wrote and subsequently modified by @agozillon.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values.

The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added.

A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.

Would it be possible to push all the refactorings directly as an NFC patch? And then have the changes necessary for array-sections only in this patch?

// structures.
// This function is somewhat equivalent to Clang's getExprTypeSize inside of
// CGOpenMPRuntime.cpp.
llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following example:

subroutine omp_target(a, b, c)
   integer, intent(in) :: a, b, c
   integer :: x(a, b, c)
   !$omp target map(tofrom : x)
   !$omp end target
end subroutine omp_target

Here's a slice of the llvm IR generated:

  %.offload_sizes = alloca [1 x i64], align 8
  %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8
  %4 = load i32, ptr %0, align 4
  %5 = sext i32 %4 to i64
  %6 = icmp sgt i64 %5, 0
  %7 = select i1 %6, i64 %5, i64 0
  %8 = load i32, ptr %1, align 4
  %9 = sext i32 %8 to i64
  %10 = icmp sgt i64 %9, 0
  %11 = select i1 %10, i64 %9, i64 0
  %12 = load i32, ptr %2, align 4
  %13 = sext i32 %12 to i64
  %14 = icmp sgt i64 %13, 0
  %15 = select i1 %14, i64 %13, i64 0
  %16 = mul i64 1, %7
  %17 = mul i64 %16, %11
  %18 = mul i64 %17, %15
  %19 = alloca i32, i64 %18, align 4
  %20 = sub i64 %7, 1
  %21 = sub i64 %11, 1
  %22 = sub i64 %15, 1
  br label %entry

entry:                                            ; preds = %3
  %23 = sub i64 %20, 0
  %24 = add i64 %23, 1
  %25 = sub i64 %21, 0
  %26 = add i64 %25, 1
  %27 = mul i64 %24, %26
  %28 = sub i64 %22, 0
  %29 = add i64 %28, 1
  %30 = mul i64 %27, %29
  %31 = mul i64 %30, 4
  %34 = getelementptr inbounds [1 x i64], ptr %.offload_sizes, i32 0, i32 0
  store i64 %31, ptr %34, align 8

The above code basically tries to recompute %30 which is the same as %18 already present in the alloca instruction.

My view is that we don't need and should neither use nor generate a boundsOp unless explicit bounds have been provided by the user.

Others however, have already expressed that we would like to have a boundsOp present at all times and use it whenever possible as it favours a single solution for all cases. I am not strictly against this, but I prefer the former way of doing things.

And from what I can tell Clang also reuses %18 for the offload_size here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to always utilise the bounds as it makes calculating these trivial for all scenarios (fixed/ assumed shape / assumed size). Although, it is a shame we are re-computing the value in this case. I am fine with either method we choose though, so I'll leave it up to reviewers. There's a fairly length discussion relating to this topic here: #67164

Please do let me know which direction the reviewers wish to go so the patch can progress!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we currently have a mechanism to get all the information about arrays represented as descriptors in the LLVM Dialect. Without such a mechanism, we cannot get all the information to do the mapping. omp.bounds provides all the information currently, so I think we should stick with that for uniformity.

Copy link
Contributor Author

@agozillon agozillon Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer the uniformity and ease of use, but perhaps I'm to close to the source on this one. I do understand the dislike for the excess generated instructions though, and the possible performance impact if subsequent optimisation passes can't tidy it up.

However, it's not the only thing I've encountered recently during the lowering that will need some polishing off when we can. Allocatables for some odd reason spawn multiple allocas of the descriptor + data structure that get the same value assigned to them, and then only one of them is actually used. Unsure why (perhaps someone else does and it has a good reason or maybe it's an OpenMP lowering oddity), but it is something that might be worth a look into in the future.

// llvm module for lowered mapped varibles or checking
// if something is declare target (and retrieving the
// value).
struct MapData {
Copy link
Member

@TIFitis TIFitis Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the data in this struct seems to overlap with llvm::OpenMPIRBuilder::MapInfosTy.

Instead of creating a new structTy, I'd rather inherit the existing MapInfosTy and add to it.

Similar to struct MapCombinedInfoTy : llvm::OpenMPIRBuilder::MapInfosTy in clang/lib/CodeGen/CGOpenMPRuntime.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea, I can certainly try to do so!

@agozillon
Copy link
Contributor Author

Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values.
The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added.
A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.

Would it be possible to push all the refactorings directly as an NFC patch? And then have the changes necessary for array-sections only in this patch?

I can try to do so.

@agozillon
Copy link
Contributor Author

Updated the PR to tidy up the test a little using the recommended optimisation passes and utilising the MapInfoTy from the OMPIRBuilder, thank you both for your suggestions.

I will try to split the PR up on Monday or Tuesday next week, but feel free to further review this PR until they become available.

The plan for splitting up the PR will likely be into 3 patches:

  1. NFC changes that are refactoring for the follow up additions
  2. The addition of ByRef/ByCopy handling for values on the device/host border (marshalling data on host and accessing data
    on the device) which begins to replicate Clang a bit more closely and allows a wider range of map behaviour
  3. The array sectioning specific lowering changes

There will likely be a dependency for these patches based on ordering e.g. Patch 3 needs 2+1, 2 needs 1 and 1 will ideally be self contained. I am not sure based on the wording of the original request for splinting up comment @kiranchandramohan if you'd prefer if I just pushed the NFC changes directly without review (patch 1 in this case) or if you'd prefer it to go through review still?

@kiranchandramohan
Copy link
Contributor

@agozillon NFC changes can go directly in without a review.

If @TIFitis is OK with the current changes in this patch and can approve then this can go in without breaking up into patches.

@agozillon
Copy link
Contributor Author

most recent update is a rebase and attempt at addressing the review comments from @TIFitis via the implementation of an overloaded append and a more appropriate name for MapData (now MapInfoData)

Copy link
Member

@TIFitis TIFitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏽

@agozillon
Copy link
Contributor Author

LGTM 👍🏽

Thank you very much for the review @TIFitis, greatly appreciated!

@kiranchandramohan I'd love a final acceptance from you if at all possible.

I will land the patch tomorrow afternoon after another rebase, if no one has any further comments they wish to add.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need more time to go through this. Have left a few comments.

if (auto arrTy = llvm::dyn_cast_if_present<LLVM::LLVMArrayType>(
ptrTy.getElementType())) {
underlyingTypeSzInBits = getArrayElementSizeInBits(arrTy, dl);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should else StructType be a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's actually covered by the default case, it will calculate the full structure size in bits. We primarily just need the if to get the element type size to calculate the array size when using bounds.

If we ever wish to drop the default bounds generation for arrays, we'd have to alter this segment slightly to get full array size calculation, but that wouldn't be too difficult.

for (auto bounds : memberClause.getBounds()) {
if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
bounds.getDefiningOp())) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short comment regarding this computation something like (elemCount*[UB-LB]+1)?

Comment on lines 1761 to 1746
if (llvm::Value *refPtr = getRefPtrIfDeclareTarget(
mapOp.getVarPtr(), moduleTranslation)) { // declare target
mapData.OriginalValue.push_back(
moduleTranslation.lookupValue(mapOp.getVarPtr()));
mapData.IsDeclareTarget.push_back(true);
mapData.BasePointers.push_back(refPtr);
mapData.Pointers.push_back(mapData.OriginalValue.back());
} else { // regular mapped variable
mapData.OriginalValue.push_back(
moduleTranslation.lookupValue(mapOp.getVarPtr()));
mapData.IsDeclareTarget.push_back(false);
mapData.BasePointers.push_back(mapData.OriginalValue.back());
mapData.Pointers.push_back(mapData.OriginalValue.back());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoist out the common values.

userVec.push_back(user);

for (llvm::User *user : userVec) {
for (auto *user : userVec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Was there a need for the type change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this might have just slipped in during some change set, thank you.

@@ -38,7 +38,7 @@ module attributes {omp.is_target_device = false} {
// CHECK: store ptr %[[ADDR_B]], ptr %[[GEP2]], align 8
// CHECK: %[[GEP3:.*]] = getelementptr { ptr, ptr, ptr }, ptr %[[STRUCTARG]], i32 0, i32 2
// CHECK: store ptr %[[ADDR_C]], ptr %[[GEP3]], align 8
// CHECK: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @1, i32 1, ptr @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_region__l[[LINE]]..omp_par, ptr %[[STRUCTARG]])
// CHECK: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @4, i32 1, ptr @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_region__l[[LINE]]..omp_par, ptr %[[STRUCTARG]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can @4 be captured and checked? Or alternatively you could use {{.*}} if it is not significant.

Comment on lines 35 to 49
// CHECK: @.offload_sizes = private unnamed_addr constant [2 x i64] [i64 36, i64 108]
// CHECK: @.offload_maptypes = private unnamed_addr constant [2 x i64] [i64 35, i64 35]
// CHECKL: @.offload_mapnames = private constant [2 x ptr] [ptr @0, ptr @1]

// CHECK: define void @_3d_target_array_section()

// CHECK: %1 = getelementptr inbounds [2 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
// CHECK: store ptr @_QFEinarray, ptr %1, align 8
// CHECK: %2 = getelementptr inbounds [2 x ptr], ptr %.offload_ptrs, i32 0, i32 0
// CHECK: store ptr getelementptr inbounds ([3 x [3 x [3 x i32]]], ptr @_QFEinarray, i64 0, i64 1, i64 0, i64 0), ptr %2, align 8

// CHECK: %4 = getelementptr inbounds [2 x ptr], ptr %.offload_baseptrs, i32 0, i32 1
// CHECK: store ptr @_QFEoutarray, ptr %4, align 8
// CHECK: %5 = getelementptr inbounds [2 x ptr], ptr %.offload_ptrs, i32 0, i32 1
// CHECK: store ptr @_QFEoutarray, ptr %5, align 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture values as variables that need to be checked. Checking value numbers in particular can break.

@agozillon
Copy link
Contributor Author

rebased and attempted to resolve the latest reviewer comments

@agozillon
Copy link
Contributor Author

I will need more time to go through this. Have left a few comments.

No problem, take your time.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

@agozillon
Copy link
Contributor Author

Rebased and tried to address @kiranchandramohan's comments added two new mlir tests for bycopy vs byref lowering for host and device. And expanded on the comment for createDeviceArgumentAccessor.

I haven't changed the bounds sizing for the 3D Array sectioning test as I wasn't too sure if you were just asking for clarity on it or not! I am happy to change it though if desired.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine except for the getElementType.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agozillon for the changes. I missed one point, I recently converted all OpenMP translation tests to use opaque pointers. You will have to do that for all the tests before submitting, i.e remove the element-type from the !llvm.ptr.

LGTM.

@agozillon
Copy link
Contributor Author

Thanks @agozillon for the changes. I missed one point, I recently converted all OpenMP translation tests to use opaque pointers. You will have to do that for all the tests before submitting, i.e remove the element-type from the !llvm.ptr.

LGTM.

Thank you very much @kiranchandramohan I'll upstream on Monday afternoon, give a small amount of leeway for other people to have more input.

I'll fix the tests before upstreaming, I think I modified a few on rebase, but the newer ones do indeed need to be fixed to the new opaque pointers, thank you for reminding me.

…owering utilising omp.bounds

This patch seeks to add initial lowering of OpenMP array
sections within target region map clauses from MLIR to
LLVM IR.

This patch seeks to support fixed sized arrays initially,
before looking toward assumed size and shaped arrays.

Although, assumed size works in some fashion (dummy
arguments) with some minor alterations to the OMPEarlyOutliner,
so it is possible changes made in the IsolatedFromAbove series
may allow this to work with no further required patches.

It utilises the generated omp.bounds to calculate the size
of the mapped OpenMP array (both for sectioned and
un-sectioned arrays) as well as the offset to be passed to
the kernel argument structure.

Alongside these changes some refactoring of how map data is
handled is attempted, using a new MapData structure to keep
track of information utilised in the lowering of mapped values.

The initial addition of a more complex
createDeviceArgumentAccessor that utilises capture
kinds similarly to (and loosely based on) Clang to
generate different kernel argument accesses is also
added.

A similar function for altering how the kernel argument
is passed to the kernel argument structure on the host
is also utilised (createAlteredByCaptureMap), which
allows modification of the pointer/basePointer based
on their capture (and bounds information). It's of note
ByRef, is the default for explicit mappings and ByCopy
will be the default for implicit captures, so the former
is currently tested in this patch and the latter is not
for the moment.
@agozillon
Copy link
Contributor Author

Rebased and removed final typed-pointers from the tests and did some final local checks, will merge when the final in PR tests complete.

So please do speak up if anyone has any final comments they wish to make or any want to delay this PR.

@agozillon agozillon merged commit 6a62707 into llvm:main Oct 30, 2023
tsitdikov added a commit to tsitdikov/llvm-project that referenced this pull request Oct 30, 2023
All usages of the variable have been removed in llvm#68689, we now need to clean it up.
jreiffers pushed a commit that referenced this pull request Oct 30, 2023
All usages of the variable have been removed in
#68689, we now need to clean it
up.
@agozillon
Copy link
Contributor Author

seems to be some related gfortran regressions caused by the PR looking into them at the moment.

@agozillon
Copy link
Contributor Author

The gfortran regressions should be fixed by: 68c3846 which re-introduces the very basic implicit capture we have currently that I a little too hastily removed in this PR. It appears while we don't use it in any of our tests, we do depend on the behavior in the regression test suite to some extent. So until the IFA series of patches lands, we must continue to keep this little bit of behavior to satisfy the regression tests, when the IFA patches from @TIFitis land this segment can likely be removed (can be done in the IFA patches themselves, or I can do so after the fact).

Unfortunately, that means at the moment we occasionally (such as when using do loops) capture constants that we don't handle very well yet. Hopefully the IFA patch fixes this when it lands, but if it doesn't it's something I'm happy to look into as it's resolved downstream, so should just need a little nudge somewhere in the right direction. But in any case, I had to change some tests to utilise do while's with explicit arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants