[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for `private` clause on `target` constructs #105471

bhandarkar-pranav · 2024-08-21T04:49:23Z

This patch contains support to lower the private clause when used while offloading. We implement this by inlining the alloc region of omp.private at the start of the target region and adjusting the uses so that the values yielded from the alloc region are used in the target region.

At this point, we do not handle the following (the first of the two below is being worked on)

allocatables because we do not handle dealloc regions in this patch.
firstprivate because there isn't support to lower firstprivate to mlir yet.

… clause on `target` constructs This patch contains support to lower the `private` clause when used while offloading. We implement this by inlining the `alloc` region of `omp.private` at the start of the `target` region and adjusting the uses so that the values yielded from the `alloc` region are used in the `target` region. At this point, we do not handle the following (the first of the two below is being worked on) - allocatables because we do not handle `dealloc` regions in this patch. - firstprivate because there isn't support to lower firstprivate to mlir yet.

llvmbot · 2024-08-21T04:49:54Z

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-offload

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This patch contains support to lower the private clause when used while offloading. We implement this by inlining the alloc region of omp.private at the start of the target region and adjusting the uses so that the values yielded from the alloc region are used in the target region.

At this point, we do not handle the following (the first of the two below is being worked on)

allocatables because we do not handle dealloc regions in this patch.
firstprivate because there isn't support to lower firstprivate to mlir yet.

Full diff: https://github.com/llvm/llvm-project/pull/105471.diff

5 Files Affected:

(modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+6-1)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+131-1)
(modified) mlir/test/Dialect/OpenMP/invalid.mlir (+1-1)
(added) mlir/test/Target/LLVMIR/openmp-target-private.mlir (+73)
(added) offload/test/offloading/fortran/target-private-map.f90 (+41)

diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 11780f84697b15..31d97313aedb7a 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -2494,7 +2494,12 @@ LogicalResult PrivateClauseOp::verify() {
              << "Did not expect any values to be yielded.";
     }
 
-    if (yieldedTypes.size() == 1 && yieldedTypes.front() == symType)
+    if (yieldedTypes.size() != 1)
+      return mlir::emitError(terminator->getLoc())
+             << "Expected exactly 1 yielded value, but got "
+             << yieldedTypes.size();
+
+    if (yieldedTypes.front() == symType)
       return success();
 
     auto error = mlir::emitError(yieldOp.getLoc())
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 458d05d5059db7..af48b5379140c4 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -13,6 +13,7 @@
 #include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h"
 #include "mlir/Analysis/TopologicalSortUtils.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/OpenMP/OpenMPClauseOperands.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
 #include "mlir/IR/IRMapping.h"
@@ -1480,6 +1481,7 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
               counter);
 
           clone.setSymName(cloneName);
+
           return {privVar, clone};
         }
       }
@@ -3241,12 +3243,140 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
   llvm::Function *llvmOutlinedFn = nullptr;
-
   // TODO: It can also be false if a compile-time constant `false` IF clause is
   // specified.
   bool isOffloadEntry =
       isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
 
+  if (!targetOp.getPrivateVars().empty()) {
+    auto privateVars = targetOp.getPrivateVars();
+    auto privateSyms = targetOp.getPrivateSyms();
+    auto &firstTargetRegion = opInst.getRegion(0);
+    auto &firstTargetBlock = firstTargetRegion.front();
+    auto *regionArgsStart = firstTargetBlock.getArguments().begin();
+    auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size();
+    auto *privArgsEnd = privArgsStart + targetOp.getPrivateVars().size();
+    BitVector blockArgsBV(firstTargetBlock.getNumArguments(), false);
+    struct omp::PrivateClauseOps newPrivateClauses;
+    MutableArrayRef argSubRangePrivates(privArgsStart, privArgsEnd);
+    for (auto [privVar, privatizerNameAttr, blockArg] :
+         llvm::zip_equal(privateVars, *privateSyms, argSubRangePrivates)) {
+
+      SymbolRefAttr privSym = llvm::cast<SymbolRefAttr>(privatizerNameAttr);
+
+      // 1. Clone the privatizer so that we can make changes to it freely
+      MLIRContext &context = moduleTranslation.getContext();
+      mlir::IRRewriter rewriter(&context);
+      omp::PrivateClauseOp privatizer =
+          SymbolTable::lookupNearestSymbolFrom<omp::PrivateClauseOp>(&opInst,
+                                                                     privSym);
+      // Handle only private for now. Also, not handling allocatables yet.
+      if (privatizer.getDataSharingType() !=
+              omp::DataSharingClauseType::Private ||
+          !privatizer.getDeallocRegion().empty()) {
+        newPrivateClauses.privateSyms.push_back(privSym);
+        newPrivateClauses.privateVars.push_back(privVar);
+        continue;
+      }
+
+      auto &allocRegion = privatizer.getAllocRegion();
+      // `alloc` region should have only 1 argument
+      assert(allocRegion.getNumArguments() == 1 &&
+             "alloc region of omp::PrivateClauseOp should have one argument");
+      auto allocRegionArg = allocRegion.getArgument(0);
+      // Assuming we have the following targetOp and Privatizer
+      //
+      // omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
+      //   %1 = alloca...
+      //   omp.yield(%1)
+      // }
+      // omp.target map(..) map(..) private(privVar) {
+      //  ^bb0(map_arg0, ..., map_argn, priv_arg0):
+      //    ...
+      //    ...
+      //   store %v, %priv_arg0
+      //   omp.terminator
+      // }
+      // Roughly, we do the following -
+      // Split the first block (bb0) of the first region of the target into
+      // two block. Then clone the alloc region of the privatizer between the
+      // two new blocks. When cloning replace the alloc argument with privVar.
+      // We'll then have
+      //
+      // omp.target map(..) map(..) private(privVar) {
+      //  ^bb0(map_arg0, ..., map_argn, priv_arg0, ..., priv_argn):
+      //  ^bb1:  (cloned region)  // no predecessor
+      //   %1 = alloca...
+      //   omp.yield(%1)
+      //  ^bb2:                   // no predecessor
+      //    ...
+      //    ...
+      //   store %v, %priv_arg0
+      //   omp.terminator
+      // }
+      // Next, add an unconditional branch from ^bb0 to ^bb1 and change the
+      // yield in the last block of the cloned alloc region to an unconditional
+      // branch before replacing all uses of 'priv_arg0' with the yielded value
+      // to finally get the following
+      //
+      // omp.target map(..) map(..) private(privVar) {
+      //  ^bb0(map_arg0, ..., map_argn, priv_arg0, ..., priv_argn):
+      //   llvm.br ^bb1
+      //  ^bb1:  (cloned region) // pred: ^bb0
+      //   %1 = alloca...
+      //   llvm.br ^bb2
+      //  ^bb2:                  // pred: ^bb1
+      //    ...
+      //    ...
+      //   store %v, %1          // %priv_arg0 replaced with the yield value
+      //   omp.terminator
+      // }
+      Block &firstBlock = firstTargetRegion.front();
+      Block *newBlock = rewriter.splitBlock(&firstBlock, firstBlock.begin());
+      rewriter.setInsertionPointToStart(&firstBlock);
+      Location loc = targetOp.getLoc();
+      rewriter.create<LLVM::BrOp>(loc, ValueRange(), newBlock);
+
+      IRMapping cloneMap;
+      cloneMap.map(allocRegionArg, privVar);
+      auto secondBlockIter = std::next(firstTargetRegion.begin(), 1);
+      allocRegion.cloneInto(&firstTargetRegion, secondBlockIter, cloneMap);
+
+      unsigned allocRegNumBlocks = allocRegion.getBlocks().size();
+      secondBlockIter = std::next(firstTargetRegion.begin(), 1);
+      auto clonedAllocRegionEndIter =
+          std::next(secondBlockIter, (allocRegNumBlocks - 1));
+      Block &clonedAllocRegEndBlock = *clonedAllocRegionEndIter;
+
+      Operation *br = firstBlock.getTerminator();
+      LLVM::BrOp brOp = dyn_cast<LLVM::BrOp>(br);
+      Operation *yield = clonedAllocRegEndBlock.getTerminator();
+      omp::YieldOp yieldOp = dyn_cast<omp::YieldOp>(yield);
+      auto newValue = yieldOp.getResults().front();
+      rewriter.setInsertionPointAfter(yield);
+      auto *oldSucc = brOp.getSuccessor();
+      brOp.setSuccessor(&*secondBlockIter);
+      // TODO:Consider cloning brOp and adding it to clonedAllocRegEngBlock
+      // TODO: Can we not simply merge clonedAllocRegEngBlock and oldSucc?
+      rewriter.create<LLVM::BrOp>(loc, ValueRange(), oldSucc);
+      blockArgsBV.set(blockArg.getArgNumber());
+      rewriter.replaceAllUsesWith(blockArg, newValue);
+      rewriter.eraseOp(yield);
+    }
+    // Do some fix ups now.
+    // First, remove the blockArguments that we just privatized
+    firstTargetBlock.eraseArguments(blockArgsBV);
+    // Then remove the private vars and privatizers that we have
+    // processed i.e privatized just now.
+    if (newPrivateClauses.privateSyms.empty()) {
+      targetOp.getPrivateVarsMutable().clear();
+      targetOp.removePrivateSymsAttr();
+    } else {
+      targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get(
+          targetOp.getContext(), newPrivateClauses.privateSyms));
+      targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars);
+    }
+  }
   LogicalResult bodyGenStatus = success();
   using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
   auto bodyCB = [&](InsertPointTy allocaIP,
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index 1d1d93f0977588..e3267c520490a2 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -2219,7 +2219,7 @@ omp.private {type = private} @x.privatizer : i32 alloc {
 
 omp.private {type = private} @x.privatizer : i32 alloc {
 ^bb0(%arg0: i32):
-  // expected-error @below {{Invalid yielded value. Expected type: 'i32', got: None}}
+  // expected-error @below {{Expected exactly 1 yielded value, but got 0}}
   omp.yield
 }
 
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
new file mode 100644
index 00000000000000..38c93b3f0568f8
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
@@ -0,0 +1,73 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+omp.private {type = private} @simple_var.privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var", pinned} : (i64) -> !llvm.ptr
+  omp.yield(%1 : !llvm.ptr)
+}
+llvm.func @target_map_single_private() attributes {fir.internal_name = "_QPtarget_map_single_private"} {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i64) : i64
+  %3 = llvm.alloca %2 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr
+  %4 = llvm.mlir.constant(2 : i32) : i32
+  llvm.store %4, %3 : i32, !llvm.ptr
+  %5 = omp.map.info var_ptr(%3 : !llvm.ptr, i32) map_clauses(to) capture(ByRef) -> !llvm.ptr {name = "a"}
+  omp.target map_entries(%5 -> %arg0 : !llvm.ptr) private(@simple_var.privatizer %1 -> %arg1 : !llvm.ptr) {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+    %6 = llvm.mlir.constant(10 : i32) : i32
+    %7 = llvm.load %arg0 : !llvm.ptr -> i32
+    %8 = llvm.add %7, %6 : i32
+    llvm.store %8, %arg1 : i32, !llvm.ptr
+    omp.terminator
+  }
+  llvm.return
+}
+// CHECK: define internal void @__omp_offloading_fd00
+// CHECK-DAG: %[[PRIV_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK-DAG: %[[ADD:.*]] = add i32 {{.*}}, 10
+// CHECK-DAG: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4
+
+omp.private {type = private} @n.privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x f32 {bindc_name = "n", pinned} : (i64) -> !llvm.ptr
+  omp.yield(%1 : !llvm.ptr)
+}
+llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_map_2_privates"} {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i64) : i64
+  %3 = llvm.alloca %2 x f32 {bindc_name = "n"} : (i64) -> !llvm.ptr
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  %5 = llvm.alloca %4 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr
+  %6 = llvm.mlir.constant(2 : i32) : i32
+  llvm.store %6, %5 : i32, !llvm.ptr
+  %7 = omp.map.info var_ptr(%5 : !llvm.ptr, i32) map_clauses(to) capture(ByRef) -> !llvm.ptr {name = "a"}
+  omp.target map_entries(%7 -> %arg0 : !llvm.ptr) private(@simple_var.privatizer %1 -> %arg1 : !llvm.ptr, @n.privatizer %3 -> %arg2 : !llvm.ptr) {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr):
+    %8 = llvm.mlir.constant(1.100000e+01 : f32) : f32
+    %9 = llvm.mlir.constant(10 : i32) : i32
+    %10 = llvm.load %arg0 : !llvm.ptr -> i32
+    %11 = llvm.add %10, %9 : i32
+    llvm.store %11, %arg1 : i32, !llvm.ptr
+    %12 = llvm.load %arg1 : !llvm.ptr -> i32
+    %13 = llvm.sitofp %12 : i32 to f32
+    %14 = llvm.fadd %13, %8  {fastmathFlags = #llvm.fastmath<contract>} : f32
+    llvm.store %14, %arg2 : f32, !llvm.ptr
+    omp.terminator
+  }
+  llvm.return
+}
+
+// CHECK: define internal void @__omp_offloading_fd00
+// CHECK-DAG: %[[PRIV_I32_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK-DAG: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK-DAG: %[[ADD_I32:.*]] = add i32 {{.*}}, 10
+// CHECK-DAG: store i32 %[[ADD_I32]], ptr %[[PRIV_I32_ALLOC]], align 4
+// CHECK-DAG: %[[LOAD_I32_AGAIN:.*]] = load i32, ptr %6, align 4
+// CHECK-DAG: %[[CAST_TO_FLOAT:.*]] = sitofp i32 %[[LOAD_I32_AGAIN]] to float
+// CHECK-DAG: %[[ADD_FLOAT:.*]] = fadd contract float %[[CAST_TO_FLOAT]], 1.100000e+01
+// CHECK-DAG: store float %[[ADD_FLOAT]], ptr %[[PRIV_FLOAT_ALLOC]], align 4
+      
diff --git a/offload/test/offloading/fortran/target-private-map.f90 b/offload/test/offloading/fortran/target-private-map.f90
new file mode 100644
index 00000000000000..861a1886c03844
--- /dev/null
+++ b/offload/test/offloading/fortran/target-private-map.f90
@@ -0,0 +1,41 @@
+! Test target private
+! REQUIRES: flang, amdgcn-amd-amdhsa
+! UNSUPPORTED: nvptx64-nvidia-cuda
+! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+! UNSUPPORTED: aarch64-unknown-linux-gnu
+! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+! UNSUPPORTED: x86_64-pc-linux-gnu
+! UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+
+! RUN: %libomptarget-compile-fortran-run-and-check-generic
+program main
+  integer :: a = 0
+  call target_private(a)
+  print*, "======= FORTRAN Test passed! ======="
+  print*, "foo(a) should not return 20, got " , a
+  if (a /= 20) then
+     stop 0
+  else
+     stop 1
+  end if
+  
+  !       stop 0
+end program main
+subroutine target_private(r)
+  implicit none
+  integer, dimension(2) :: simple_vars
+  integer :: a, r
+  ! set a to 10
+  a = 5
+  simple_vars(1) = a
+  simple_vars(2) = a
+  !$omp target map(tofrom: simple_vars) private(a)
+  ! Without private(a), a would be firstprivate, meaning it's value would be 5
+  ! with private(a), it's value would be uninitialized, which means it'd have
+  ! a very small chance of being 5.
+  ! So, simple_vars(1|2) should be 5 + a_private
+  simple_vars(1) = simple_vars(1) + a
+  simple_vars(2) = simple_vars(2) + a
+  !$omp end target
+  r = simple_vars(1) + simple_vars(2)
+end subroutine target_private

Dinistro

A few drive by comments. Note that this will require a review from someone more familiar with the exact semantics of this.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

Dinistro · 2024-08-21T05:35:38Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

  // TODO: It can also be false if a compile-time constant `false` IF clause is
  // specified.
  bool isOffloadEntry =
      isTargetDevice || !ompBuilder->Config.TargetTriples.empty();

+  if (!targetOp.getPrivateVars().empty()) {
+    auto privateVars = targetOp.getPrivateVars();


Nit: Only use auto when the type is given on the RHS or too complex to write out.

https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Thank you. I am not a big fan of auto everywhere myself. Thanks for pointing me to this rule for LLVM. It makes a lot of sense.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

mlir/test/Target/LLVMIR/openmp-target-private.mlir

Dinistro · 2024-08-21T05:44:05Z

offload/test/offloading/fortran/target-private-map.f90

+  ! with private(a), it's value would be uninitialized, which means it'd have
+  ! a very small chance of being 5.


Ultra nit: This case is UB, so anything can happen. Not sure if there would be another case that would not invoke UB in case of missing private support?

Dinistro · 2024-08-21T05:46:28Z

mlir/test/Target/LLVMIR/openmp-target-private.mlir

+// CHECK-DAG: %[[PRIV_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK-DAG: %[[ADD:.*]] = add i32 {{.*}}, 10
+// CHECK-DAG: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4


Nit: Avoid using CHECK-DAG when possible. They are more expensive than normal CHECKs and imply that there is really a DAG ordering happening.

I used CHECk-DAG because the def-use chain between the interesting instructions_ spans different basic blocks. As such then the test itself becomes subject to the order in which blocks are laid out. Isn't CHECK-DAG meant to address such situations?

It is usually save to assume that the block order doesn't change. While the order is not relevant (apart from the entry block), it is deterministic.
In my experience, CHECK-DAG is mainly used for constructs that are very susceptible to order changes, like locations or complicated attributes.

Note that this is not necessarily a big issue, and if you assume that this order might indeed change at some point in the future, feel free to keep it as is.

Just out of curiosity, what would be some examples of OpenMP constructs which might be susceptible to order changes and therefore require the more rigorous CHECK-DAG?

ergawy

Thanks @bhandarkar-pranav. Just a few small comments.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

ergawy · 2024-08-23T05:34:35Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+      targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get(
+          targetOp.getContext(), newPrivateClauses.privateSyms));
+      targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars);


What do we do with the un-privatized variables now?

I am thinking, we should emit an error in this case. We have openmp-enable-delayed-privatization-staging to hide incomplete implementations of delayed privatization so we can just emit an error to clearly say which variables were not successfully privatized.

That would convert what is essentially silently failing right now (ie not getting lowered to LLVMIR silently) into hard failures. This change would mean omp.private for allocatables will start failing with errors. For firstprivate we already fail early during parsing. Let me know if you still want me to make this change. I am leaning towards doing what you are suggesting because a hard failure is better than silently failing but let me know if you change your mind about what you'd prefer in light of this comment of mine.

I think a hard failure would good. At least it will allow someone working on the code to find the proper extension point to support allocatables (or whatever is not supported) on this layer.

mlir/test/Target/LLVMIR/openmp-target-private.mlir

offload/test/offloading/fortran/target-private-map.f90

bhandarkar-pranav · 2024-08-26T12:41:29Z

@ergawy - I have made changes based on your review except that one change that I have commented about. Please take a look when you get a chance.

bhandarkar-pranav · 2024-08-27T18:11:09Z

@ergawy - I am on vacation for a week. In the meantime, if you have any comments on this PR, please feel free to add them. I'll come back and take a look on 4th Sept. I'll also have the PR that deals with allocatables up shortly after I return.

ergawy

LGTM. Just a small comment about adding the error message we discussed.

ergawy · 2024-09-05T08:10:43Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+      targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get(
+          targetOp.getContext(), newPrivateClauses.privateSyms));
+      targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars);


I think a hard failure would good. At least it will allow someone working on the code to find the proper extension point to support allocatables (or whatever is not supported) on this layer.

tblah

Care is taken to undo any changes to the MLIR performed during delayed privatization for ParallelOp. Here you are leaving lasting changes. I'm not sure whether it matters or not that an mlir translation changes the mlir. I don't think it would be good to add this unless there is already an existing precedent?

I'm worried that allocas already in the body region will now come after the branches. Reviewers for reduction insisted it was very important that allocas are concentrated in the entry block as much as possible (this is acceptable to me if some later LLVM pass is guaranteed to hoist the allocas back to the entry block).

tblah · 2024-09-17T10:25:22Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+      // Handle only private for now. Also, not handling allocatables yet.
+      if (privatizer.getDataSharingType() !=
+              omp::DataSharingClauseType::Private ||
+          !privatizer.getDeallocRegion().empty()) {


Otherwise could you return to indicate failure?

ergawy · 2024-09-23T12:31:36Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+    auto &targetRegion = opInst.getRegion(0);
+    auto &firstTargetBlock = targetRegion.front();
+    auto *regionArgsStart = firstTargetBlock.getArguments().begin();
+    auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size();


Just to remove any target-specific logic as much as possible (we probably can outline this into a util that other ops use as well, for example distribute which I am working on now).

Suggested change

auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size();

auto *privArgsStart = regionArgsStart + targetOp.getPrivateVars().getBeginOperandIndex();

Just as a heads-up, I'm planning to create a PR stack this week to revamp the handling of clause-related block arguments such as these. At that point we won't have to worry so much about the order they appear in, but instead we'll be able to access them through an MLIR interface.

Excellent. That would be great, because the present usage is extremely brittle.

bhandarkar-pranav · 2024-09-23T14:34:16Z

Care is taken to undo any changes to the MLIR performed during delayed privatization for ParallelOp. Here you are leaving lasting changes. I'm not sure whether it matters or not that an mlir translation changes the mlir. I don't think it would be good to add this unless there is already an existing precedent?

I understand what you mean and ultimately I decided that it was perhaps ok to leave lasting changes because compilation from here on moves to LLVM IR.

I'm worried that allocas already in the body region will now come after the branches. Reviewers for reduction insisted it was very important that allocas are concentrated in the entry block as much as possible (this is acceptable to me if some later LLVM pass is guaranteed to hoist the allocas back to the entry block).

I hadn't thought about this. However, there is a mechanism in the OMPIRBuilder that records functions that need their constant allocas hoisted. The finalize method of OMPIRBuilder then hoists such allocas.

Anyway though, based on your suggestions, I have opened a new PR to translate private on omp.target. #109668 Could you please take a look. It is a much simpler change than this PR and ofcourse your two concerns above will be non-issues in that approach. Thank you for the suggestion.
@ergawy - FYI.

bhandarkar-pranav · 2024-10-01T03:01:25Z

Closing in favor of #109668

bhandarkar-pranav requested review from ergawy, skatrak, jsjodin, kiranchandramohan, raghavendhra, agozillon, DominikAdamski, dpalermo, TIFitis and kiranktp August 21, 2024 04:49

llvmbot added mlir:llvm mlir mlir:openmp flang:openmp offload labels Aug 21, 2024

remove a new line added accidently

08ff51e

Dinistro reviewed Aug 21, 2024

View reviewed changes

Address review comments

39a4db4

ergawy reviewed Aug 23, 2024

View reviewed changes

Address review comments

54ff624

ergawy approved these changes Sep 5, 2024

View reviewed changes

tblah reviewed Sep 17, 2024

View reviewed changes

ergawy reviewed Sep 23, 2024

View reviewed changes

ergawy mentioned this pull request Sep 24, 2024

[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization in omp.target ops. #109668

Merged

bhandarkar-pranav closed this Oct 1, 2024

		! with private(a), it's value would be uninitialized, which means it'd have
		! a very small chance of being 5.

	auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size();
	auto *privArgsStart = regionArgsStart + targetOp.getPrivateVars().getBeginOperandIndex();

[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for private clause on target constructs #105471

[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for private clause on target constructs #105471

Uh oh!

Conversation

bhandarkar-pranav commented Aug 21, 2024

Uh oh!

llvmbot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dinistro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ergawy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bhandarkar-pranav commented Aug 26, 2024

Uh oh!

bhandarkar-pranav commented Aug 27, 2024

Uh oh!

ergawy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tblah left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhandarkar-pranav commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhandarkar-pranav commented Oct 1, 2024

Uh oh!

Uh oh!

[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for `private` clause on `target` constructs #105471

[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for `private` clause on `target` constructs #105471

llvmbot commented Aug 21, 2024 •

edited

Loading

tblah left a comment •

edited

Loading

bhandarkar-pranav commented Sep 23, 2024 •

edited

Loading