-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for private
clause on target
constructs
#105471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][OpenMP] - Implement lowering from MLIR to LLVMIR for private
clause on target
constructs
#105471
Conversation
… clause on `target` constructs This patch contains support to lower the `private` clause when used while offloading. We implement this by inlining the `alloc` region of `omp.private` at the start of the `target` region and adjusting the uses so that the values yielded from the `alloc` region are used in the `target` region. At this point, we do not handle the following (the first of the two below is being worked on) - allocatables because we do not handle `dealloc` regions in this patch. - firstprivate because there isn't support to lower firstprivate to mlir yet.
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-offload Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis patch contains support to lower the At this point, we do not handle the following (the first of the two below is being worked on)
Full diff: https://github.com/llvm/llvm-project/pull/105471.diff 5 Files Affected:
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 11780f84697b15..31d97313aedb7a 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -2494,7 +2494,12 @@ LogicalResult PrivateClauseOp::verify() {
<< "Did not expect any values to be yielded.";
}
- if (yieldedTypes.size() == 1 && yieldedTypes.front() == symType)
+ if (yieldedTypes.size() != 1)
+ return mlir::emitError(terminator->getLoc())
+ << "Expected exactly 1 yielded value, but got "
+ << yieldedTypes.size();
+
+ if (yieldedTypes.front() == symType)
return success();
auto error = mlir::emitError(yieldOp.getLoc())
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 458d05d5059db7..af48b5379140c4 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -13,6 +13,7 @@
#include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h"
#include "mlir/Analysis/TopologicalSortUtils.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/OpenMP/OpenMPClauseOperands.h"
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/IRMapping.h"
@@ -1480,6 +1481,7 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
counter);
clone.setSymName(cloneName);
+
return {privVar, clone};
}
}
@@ -3241,12 +3243,140 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
SmallVector<Value> mapVars = targetOp.getMapVars();
llvm::Function *llvmOutlinedFn = nullptr;
-
// TODO: It can also be false if a compile-time constant `false` IF clause is
// specified.
bool isOffloadEntry =
isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
+ if (!targetOp.getPrivateVars().empty()) {
+ auto privateVars = targetOp.getPrivateVars();
+ auto privateSyms = targetOp.getPrivateSyms();
+ auto &firstTargetRegion = opInst.getRegion(0);
+ auto &firstTargetBlock = firstTargetRegion.front();
+ auto *regionArgsStart = firstTargetBlock.getArguments().begin();
+ auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size();
+ auto *privArgsEnd = privArgsStart + targetOp.getPrivateVars().size();
+ BitVector blockArgsBV(firstTargetBlock.getNumArguments(), false);
+ struct omp::PrivateClauseOps newPrivateClauses;
+ MutableArrayRef argSubRangePrivates(privArgsStart, privArgsEnd);
+ for (auto [privVar, privatizerNameAttr, blockArg] :
+ llvm::zip_equal(privateVars, *privateSyms, argSubRangePrivates)) {
+
+ SymbolRefAttr privSym = llvm::cast<SymbolRefAttr>(privatizerNameAttr);
+
+ // 1. Clone the privatizer so that we can make changes to it freely
+ MLIRContext &context = moduleTranslation.getContext();
+ mlir::IRRewriter rewriter(&context);
+ omp::PrivateClauseOp privatizer =
+ SymbolTable::lookupNearestSymbolFrom<omp::PrivateClauseOp>(&opInst,
+ privSym);
+ // Handle only private for now. Also, not handling allocatables yet.
+ if (privatizer.getDataSharingType() !=
+ omp::DataSharingClauseType::Private ||
+ !privatizer.getDeallocRegion().empty()) {
+ newPrivateClauses.privateSyms.push_back(privSym);
+ newPrivateClauses.privateVars.push_back(privVar);
+ continue;
+ }
+
+ auto &allocRegion = privatizer.getAllocRegion();
+ // `alloc` region should have only 1 argument
+ assert(allocRegion.getNumArguments() == 1 &&
+ "alloc region of omp::PrivateClauseOp should have one argument");
+ auto allocRegionArg = allocRegion.getArgument(0);
+ // Assuming we have the following targetOp and Privatizer
+ //
+ // omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
+ // %1 = alloca...
+ // omp.yield(%1)
+ // }
+ // omp.target map(..) map(..) private(privVar) {
+ // ^bb0(map_arg0, ..., map_argn, priv_arg0):
+ // ...
+ // ...
+ // store %v, %priv_arg0
+ // omp.terminator
+ // }
+ // Roughly, we do the following -
+ // Split the first block (bb0) of the first region of the target into
+ // two block. Then clone the alloc region of the privatizer between the
+ // two new blocks. When cloning replace the alloc argument with privVar.
+ // We'll then have
+ //
+ // omp.target map(..) map(..) private(privVar) {
+ // ^bb0(map_arg0, ..., map_argn, priv_arg0, ..., priv_argn):
+ // ^bb1: (cloned region) // no predecessor
+ // %1 = alloca...
+ // omp.yield(%1)
+ // ^bb2: // no predecessor
+ // ...
+ // ...
+ // store %v, %priv_arg0
+ // omp.terminator
+ // }
+ // Next, add an unconditional branch from ^bb0 to ^bb1 and change the
+ // yield in the last block of the cloned alloc region to an unconditional
+ // branch before replacing all uses of 'priv_arg0' with the yielded value
+ // to finally get the following
+ //
+ // omp.target map(..) map(..) private(privVar) {
+ // ^bb0(map_arg0, ..., map_argn, priv_arg0, ..., priv_argn):
+ // llvm.br ^bb1
+ // ^bb1: (cloned region) // pred: ^bb0
+ // %1 = alloca...
+ // llvm.br ^bb2
+ // ^bb2: // pred: ^bb1
+ // ...
+ // ...
+ // store %v, %1 // %priv_arg0 replaced with the yield value
+ // omp.terminator
+ // }
+ Block &firstBlock = firstTargetRegion.front();
+ Block *newBlock = rewriter.splitBlock(&firstBlock, firstBlock.begin());
+ rewriter.setInsertionPointToStart(&firstBlock);
+ Location loc = targetOp.getLoc();
+ rewriter.create<LLVM::BrOp>(loc, ValueRange(), newBlock);
+
+ IRMapping cloneMap;
+ cloneMap.map(allocRegionArg, privVar);
+ auto secondBlockIter = std::next(firstTargetRegion.begin(), 1);
+ allocRegion.cloneInto(&firstTargetRegion, secondBlockIter, cloneMap);
+
+ unsigned allocRegNumBlocks = allocRegion.getBlocks().size();
+ secondBlockIter = std::next(firstTargetRegion.begin(), 1);
+ auto clonedAllocRegionEndIter =
+ std::next(secondBlockIter, (allocRegNumBlocks - 1));
+ Block &clonedAllocRegEndBlock = *clonedAllocRegionEndIter;
+
+ Operation *br = firstBlock.getTerminator();
+ LLVM::BrOp brOp = dyn_cast<LLVM::BrOp>(br);
+ Operation *yield = clonedAllocRegEndBlock.getTerminator();
+ omp::YieldOp yieldOp = dyn_cast<omp::YieldOp>(yield);
+ auto newValue = yieldOp.getResults().front();
+ rewriter.setInsertionPointAfter(yield);
+ auto *oldSucc = brOp.getSuccessor();
+ brOp.setSuccessor(&*secondBlockIter);
+ // TODO:Consider cloning brOp and adding it to clonedAllocRegEngBlock
+ // TODO: Can we not simply merge clonedAllocRegEngBlock and oldSucc?
+ rewriter.create<LLVM::BrOp>(loc, ValueRange(), oldSucc);
+ blockArgsBV.set(blockArg.getArgNumber());
+ rewriter.replaceAllUsesWith(blockArg, newValue);
+ rewriter.eraseOp(yield);
+ }
+ // Do some fix ups now.
+ // First, remove the blockArguments that we just privatized
+ firstTargetBlock.eraseArguments(blockArgsBV);
+ // Then remove the private vars and privatizers that we have
+ // processed i.e privatized just now.
+ if (newPrivateClauses.privateSyms.empty()) {
+ targetOp.getPrivateVarsMutable().clear();
+ targetOp.removePrivateSymsAttr();
+ } else {
+ targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get(
+ targetOp.getContext(), newPrivateClauses.privateSyms));
+ targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars);
+ }
+ }
LogicalResult bodyGenStatus = success();
using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
auto bodyCB = [&](InsertPointTy allocaIP,
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index 1d1d93f0977588..e3267c520490a2 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -2219,7 +2219,7 @@ omp.private {type = private} @x.privatizer : i32 alloc {
omp.private {type = private} @x.privatizer : i32 alloc {
^bb0(%arg0: i32):
- // expected-error @below {{Invalid yielded value. Expected type: 'i32', got: None}}
+ // expected-error @below {{Expected exactly 1 yielded value, but got 0}}
omp.yield
}
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
new file mode 100644
index 00000000000000..38c93b3f0568f8
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
@@ -0,0 +1,73 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+omp.private {type = private} @simple_var.privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var", pinned} : (i64) -> !llvm.ptr
+ omp.yield(%1 : !llvm.ptr)
+}
+llvm.func @target_map_single_private() attributes {fir.internal_name = "_QPtarget_map_single_private"} {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1 : i64) : i64
+ %3 = llvm.alloca %2 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr
+ %4 = llvm.mlir.constant(2 : i32) : i32
+ llvm.store %4, %3 : i32, !llvm.ptr
+ %5 = omp.map.info var_ptr(%3 : !llvm.ptr, i32) map_clauses(to) capture(ByRef) -> !llvm.ptr {name = "a"}
+ omp.target map_entries(%5 -> %arg0 : !llvm.ptr) private(@simple_var.privatizer %1 -> %arg1 : !llvm.ptr) {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+ %6 = llvm.mlir.constant(10 : i32) : i32
+ %7 = llvm.load %arg0 : !llvm.ptr -> i32
+ %8 = llvm.add %7, %6 : i32
+ llvm.store %8, %arg1 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+// CHECK: define internal void @__omp_offloading_fd00
+// CHECK-DAG: %[[PRIV_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK-DAG: %[[ADD:.*]] = add i32 {{.*}}, 10
+// CHECK-DAG: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4
+
+omp.private {type = private} @n.privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x f32 {bindc_name = "n", pinned} : (i64) -> !llvm.ptr
+ omp.yield(%1 : !llvm.ptr)
+}
+llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_map_2_privates"} {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1 : i64) : i64
+ %3 = llvm.alloca %2 x f32 {bindc_name = "n"} : (i64) -> !llvm.ptr
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ %5 = llvm.alloca %4 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr
+ %6 = llvm.mlir.constant(2 : i32) : i32
+ llvm.store %6, %5 : i32, !llvm.ptr
+ %7 = omp.map.info var_ptr(%5 : !llvm.ptr, i32) map_clauses(to) capture(ByRef) -> !llvm.ptr {name = "a"}
+ omp.target map_entries(%7 -> %arg0 : !llvm.ptr) private(@simple_var.privatizer %1 -> %arg1 : !llvm.ptr, @n.privatizer %3 -> %arg2 : !llvm.ptr) {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr):
+ %8 = llvm.mlir.constant(1.100000e+01 : f32) : f32
+ %9 = llvm.mlir.constant(10 : i32) : i32
+ %10 = llvm.load %arg0 : !llvm.ptr -> i32
+ %11 = llvm.add %10, %9 : i32
+ llvm.store %11, %arg1 : i32, !llvm.ptr
+ %12 = llvm.load %arg1 : !llvm.ptr -> i32
+ %13 = llvm.sitofp %12 : i32 to f32
+ %14 = llvm.fadd %13, %8 {fastmathFlags = #llvm.fastmath<contract>} : f32
+ llvm.store %14, %arg2 : f32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+
+// CHECK: define internal void @__omp_offloading_fd00
+// CHECK-DAG: %[[PRIV_I32_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK-DAG: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK-DAG: %[[ADD_I32:.*]] = add i32 {{.*}}, 10
+// CHECK-DAG: store i32 %[[ADD_I32]], ptr %[[PRIV_I32_ALLOC]], align 4
+// CHECK-DAG: %[[LOAD_I32_AGAIN:.*]] = load i32, ptr %6, align 4
+// CHECK-DAG: %[[CAST_TO_FLOAT:.*]] = sitofp i32 %[[LOAD_I32_AGAIN]] to float
+// CHECK-DAG: %[[ADD_FLOAT:.*]] = fadd contract float %[[CAST_TO_FLOAT]], 1.100000e+01
+// CHECK-DAG: store float %[[ADD_FLOAT]], ptr %[[PRIV_FLOAT_ALLOC]], align 4
+
diff --git a/offload/test/offloading/fortran/target-private-map.f90 b/offload/test/offloading/fortran/target-private-map.f90
new file mode 100644
index 00000000000000..861a1886c03844
--- /dev/null
+++ b/offload/test/offloading/fortran/target-private-map.f90
@@ -0,0 +1,41 @@
+! Test target private
+! REQUIRES: flang, amdgcn-amd-amdhsa
+! UNSUPPORTED: nvptx64-nvidia-cuda
+! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+! UNSUPPORTED: aarch64-unknown-linux-gnu
+! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+! UNSUPPORTED: x86_64-pc-linux-gnu
+! UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+
+! RUN: %libomptarget-compile-fortran-run-and-check-generic
+program main
+ integer :: a = 0
+ call target_private(a)
+ print*, "======= FORTRAN Test passed! ======="
+ print*, "foo(a) should not return 20, got " , a
+ if (a /= 20) then
+ stop 0
+ else
+ stop 1
+ end if
+
+ ! stop 0
+end program main
+subroutine target_private(r)
+ implicit none
+ integer, dimension(2) :: simple_vars
+ integer :: a, r
+ ! set a to 10
+ a = 5
+ simple_vars(1) = a
+ simple_vars(2) = a
+ !$omp target map(tofrom: simple_vars) private(a)
+ ! Without private(a), a would be firstprivate, meaning it's value would be 5
+ ! with private(a), it's value would be uninitialized, which means it'd have
+ ! a very small chance of being 5.
+ ! So, simple_vars(1|2) should be 5 + a_private
+ simple_vars(1) = simple_vars(1) + a
+ simple_vars(2) = simple_vars(2) + a
+ !$omp end target
+ r = simple_vars(1) + simple_vars(2)
+end subroutine target_private
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few drive by comments. Note that this will require a review from someone more familiar with the exact semantics of this.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
// TODO: It can also be false if a compile-time constant `false` IF clause is | ||
// specified. | ||
bool isOffloadEntry = | ||
isTargetDevice || !ompBuilder->Config.TargetTriples.empty(); | ||
|
||
if (!targetOp.getPrivateVars().empty()) { | ||
auto privateVars = targetOp.getPrivateVars(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Only use auto
when the type is given on the RHS or too complex to write out.
https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I am not a big fan of auto everywhere myself. Thanks for pointing me to this rule for LLVM. It makes a lot of sense.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
! with private(a), it's value would be uninitialized, which means it'd have | ||
! a very small chance of being 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultra nit: This case is UB, so anything can happen. Not sure if there would be another case that would not invoke UB in case of missing private
support?
// CHECK-DAG: %[[PRIV_ALLOC:.*]] = alloca i32, i64 1, align 4 | ||
// CHECK-DAG: %[[ADD:.*]] = add i32 {{.*}}, 10 | ||
// CHECK-DAG: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Avoid using CHECK-DAG
when possible. They are more expensive than normal CHECK
s and imply that there is really a DAG ordering happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used CHECk-DAG
because the def-use
chain between the interesting instructions_ spans different basic blocks. As such then the test itself becomes subject to the order in which blocks are laid out. Isn't CHECK-DAG
meant to address such situations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is usually save to assume that the block order doesn't change. While the order is not relevant (apart from the entry block), it is deterministic.
In my experience, CHECK-DAG
is mainly used for constructs that are very susceptible to order changes, like locations or complicated attributes.
Note that this is not necessarily a big issue, and if you assume that this order might indeed change at some point in the future, feel free to keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, what would be some examples of OpenMP constructs which might be susceptible to order changes and therefore require the more rigorous CHECK-DAG
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bhandarkar-pranav. Just a few small comments.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get( | ||
targetOp.getContext(), newPrivateClauses.privateSyms)); | ||
targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do with the un-privatized variables now?
I am thinking, we should emit an error in this case. We have openmp-enable-delayed-privatization-staging
to hide incomplete implementations of delayed privatization so we can just emit an error to clearly say which variables were not successfully privatized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would convert what is essentially silently failing right now (ie not getting lowered to LLVMIR silently) into hard failures. This change would mean omp.private
for allocatables will start failing with errors. For firstprivate
we already fail early during parsing. Let me know if you still want me to make this change. I am leaning towards doing what you are suggesting because a hard failure is better than silently failing but let me know if you change your mind about what you'd prefer in light of this comment of mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a hard failure would good. At least it will allow someone working on the code to find the proper extension point to support allocatables (or whatever is not supported) on this layer.
@ergawy - I have made changes based on your review except that one change that I have commented about. Please take a look when you get a chance. |
@ergawy - I am on vacation for a week. In the meantime, if you have any comments on this PR, please feel free to add them. I'll come back and take a look on 4th Sept. I'll also have the PR that deals with allocatables up shortly after I return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a small comment about adding the error message we discussed.
targetOp.setPrivateSymsAttr(mlir::ArrayAttr::get( | ||
targetOp.getContext(), newPrivateClauses.privateSyms)); | ||
targetOp.getPrivateVarsMutable().assign(newPrivateClauses.privateVars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a hard failure would good. At least it will allow someone working on the code to find the proper extension point to support allocatables (or whatever is not supported) on this layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Care is taken to undo any changes to the MLIR performed during delayed privatization for ParallelOp. Here you are leaving lasting changes. I'm not sure whether it matters or not that an mlir translation changes the mlir. I don't think it would be good to add this unless there is already an existing precedent?
I'm worried that allocas already in the body region will now come after the branches. Reviewers for reduction insisted it was very important that allocas are concentrated in the entry block as much as possible (this is acceptable to me if some later LLVM pass is guaranteed to hoist the allocas back to the entry block).
// Handle only private for now. Also, not handling allocatables yet. | ||
if (privatizer.getDataSharingType() != | ||
omp::DataSharingClauseType::Private || | ||
!privatizer.getDeallocRegion().empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise could you return to indicate failure?
auto &targetRegion = opInst.getRegion(0); | ||
auto &firstTargetBlock = targetRegion.front(); | ||
auto *regionArgsStart = firstTargetBlock.getArguments().begin(); | ||
auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to remove any target
-specific logic as much as possible (we probably can outline this into a util that other ops use as well, for example distribute
which I am working on now).
auto *privArgsStart = regionArgsStart + targetOp.getMapVars().size(); | |
auto *privArgsStart = regionArgsStart + targetOp.getPrivateVars().getBeginOperandIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a heads-up, I'm planning to create a PR stack this week to revamp the handling of clause-related block arguments such as these. At that point we won't have to worry so much about the order they appear in, but instead we'll be able to access them through an MLIR interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. That would be great, because the present usage is extremely brittle.
I understand what you mean and ultimately I decided that it was perhaps ok to leave lasting changes because compilation from here on moves to LLVM IR.
I hadn't thought about this. However, there is a mechanism in the Anyway though, based on your suggestions, I have opened a new PR to translate |
Closing in favor of #109668 |
This patch contains support to lower the
private
clause when used while offloading. We implement this by inlining thealloc
region ofomp.private
at the start of thetarget
region and adjusting the uses so that the values yielded from thealloc
region are used in thetarget
region.At this point, we do not handle the following (the first of the two below is being worked on)
dealloc
regions in this patch.