-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][OpenMP] fix lastprivate for allocatables #99686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1) Do not deallocate allocatable private variables if they are lastprivate at the end of the openmp region because their address will have been transferred to the host variable when the variable is copied. 2) Don't use copyHostAssociateVar for allocatable variables. It isn't clear to me whether or not this should be addressed in copyHostAssociateVar instead of inside OpenMP. I opted for OpenMP to minimise how many things I effected. copyHostAssociateVar will not update the destination variable if the destination variable was unallocated. This is incorrect because assignment inside of the openmp block can cause the allocation status of the variable to change. Furthermore, copyHostAssociateVar seems to only copy the variable address not other metadata like the size of the allocation. Reallocation by assignment could cause this to change.
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: Tom Eccles (tblah) Changes
Non-OpenMP reviewers: should this actually be done inside of OpenMP reviewers: I am aware that this will change with delayed privatization. My intention here is to squash the bug before the LLVM branch Full diff: https://github.com/llvm/llvm-project/pull/99686.diff 2 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index c1d4c089df3b2..597d62f469d75 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -79,7 +79,8 @@ void DataSharingProcessor::processStep2(mlir::Operation *op, bool isLoop) {
void DataSharingProcessor::insertDeallocs() {
for (const semantics::Symbol *sym : allPrivatizedSymbols)
- if (semantics::IsAllocatable(sym->GetUltimate())) {
+ if (semantics::IsAllocatable(sym->GetUltimate()) &&
+ !sym->test(semantics::Symbol::Flag::OmpLastPrivate)) {
if (!useDelayedPrivatization) {
converter.createHostAssociateVarCloneDealloc(*sym);
continue;
@@ -127,8 +128,51 @@ void DataSharingProcessor::copyFirstPrivateSymbol(
void DataSharingProcessor::copyLastPrivateSymbol(
const semantics::Symbol *sym,
[[maybe_unused]] mlir::OpBuilder::InsertPoint *lastPrivIP) {
- if (sym->test(semantics::Symbol::Flag::OmpLastPrivate))
- converter.copyHostAssociateVar(*sym, lastPrivIP);
+ if (sym->test(semantics::Symbol::Flag::OmpLastPrivate)) {
+ bool allocatable = semantics::IsAllocatable(sym->GetUltimate());
+ if (!allocatable) {
+ converter.copyHostAssociateVar(*sym, lastPrivIP);
+ return;
+ }
+
+ // copyHostAssociateVar doesn't work properly if the privatised copy was
+ // reallocated (e.g. by assignment): it will only copy if the ultimate
+ // symbol was already allocated, and it only copies data so any reallocated
+ // lengths etc are lost
+
+ // 1) Fetch the original copy of the variable.
+ assert(sym->has<Fortran::semantics::HostAssocDetails>() &&
+ "No host-association found");
+ const Fortran::semantics::Symbol &hsym = sym->GetUltimate();
+ Fortran::lower::SymbolBox hsb = symTable->lookupOneLevelUpSymbol(hsym);
+ assert(hsb && "Host symbol box not found");
+
+ // 2) Fetch the copied one that will mask the original.
+ Fortran::lower::SymbolBox sb = symTable->shallowLookupSymbol(sym);
+ assert(sb && "Host-associated symbol box not found");
+ assert(hsb.getAddr() != sb.getAddr() &&
+ "Host and associated symbol boxes are the same");
+
+ // 3) Perform the assignment.
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ mlir::Location loc = converter.genLocation(sym->name());
+ mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
+ if (lastPrivIP && lastPrivIP->isSet())
+ builder.restoreInsertionPoint(*lastPrivIP);
+ else
+ builder.setInsertionPointAfter(sb.getAddr().getDefiningOp());
+
+ hlfir::Entity dst{hsb.getAddr()};
+ hlfir::Entity src{sb.getAddr()};
+ builder.create<hlfir::AssignOp>(
+ loc, src, dst, /*isWholeAllocatableAssignment=*/allocatable,
+ /*keepLhsLengthInAllocatableAssignment=*/false, /*temporary_lhs=*/true);
+
+ if (lastPrivIP && lastPrivIP->isSet() &&
+ sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate)) {
+ builder.restoreInsertionPoint(insPt);
+ }
+ }
}
void DataSharingProcessor::collectOmpObjectListSymbol(
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
new file mode 100644
index 0000000000000..201c93576bda2
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -0,0 +1,44 @@
+! RUN: %flang_fc1 -emit-hlfir -o - -fopenmp %s | FileCheck %s
+! RUN: bbc -emit-hlfir -o - -fopenmp %s | FileCheck %s
+
+program lastprivate_allocatable
+ integer, allocatable :: a
+ integer :: i
+ ! a is unallocated here
+ !$omp parallel do lastprivate(a)
+ do i=1,1
+ a = 42
+ enddo
+ !$omp end parallel do
+ ! a should be allocated here
+end program
+
+! CHECK-LABEL: func.func @_QQmain()
+! CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "a", uniq_name = "_QFEa"}
+! CHECK: %[[VAL_1:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK: %[[VAL_2:.*]] = fir.embox %[[VAL_1]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK: fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK: %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK: omp.parallel {
+! create original copy of private variable
+! CHECK: %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK: %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
+! CHECK: %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK: omp.wsloop {
+! CHECK: omp.loop_nest
+! [...]
+! if this is the last iteration
+! CHECK: fir.if %{{.*}} {
+! store loop IV
+! CHECK: fir.store %{{.*}} to %[[VAL_18]]#1 : !fir.ref<i32>
+! assign private variable to original copy: realloc and temporary_lhs
+! CHECK: hlfir.assign %[[VAL_16]]#0 to %[[VAL_3]]#0 realloc temporary_lhs : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT: }
+! CHECK-NEXT: omp.yield
+! CHECK-NEXT: }
+! CHECK-NEXT: omp.terminator
+! CHECK-NEXT: }
+! [no deallocation here]
+! CHECK-NEXT: omp.terminator
+! CHECK-NEXT: }
+
|
I was mistaken about the deallocation. Firstly, all of the other private copies on threads that don't execute the last iteration need to be freed. Secondly, the assign copies the data not the pointer.
38b2249
to
fc63462
Compare
I thought this was needed to avoid calling destructors, but that was theoretical. It turns out when this is used, arrays will not be assigned with different numbers of elements.
✅ With the latest revision this PR passed the C/C++ code formatter. |
It need not be. copyHostAssociateVar has a few customers and only OpenMP uses it for lastprivatisation. Previously the symbol table was not available in OpenMP.cpp. So this could only be done in the bridge. With changes to support recursive lowering, OpenMP.cpp has the Symbol table available. |
At the moment delayed privatisation has no support for lastprivate. When we add support for it then this new change should be usable.
In later work, we should investigate whether this is needed. |
I presume you mean Now that I am looking closer, it isn't clear to me whether calling the finalization would actually be a bug. The OpenMP 5.2 standard (section 5.4.5) says the update should occur as though by intrinsic assignment (which I believe would run finalization). But I might have missed something, because the finalization would have already run inside each thread when the private copy was overwritten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG.
builder.create<hlfir::AssignOp>( | ||
loc, src, dst, /*isWholeAllocatableAssignment=*/allocatable, | ||
/*keepLhsLengthInAllocatableAssignment=*/false, | ||
/*temporary_lhs=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code is placed in the isAllocatable
part of copyVarHLFIR
in Bridge.cpp, will there be lot of tests that have to be updated?
Can we do this at some point in the future after the branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom. Moving this into the copyHostAssociateVar/copyVarHLFIR would make sense to me.
I updated the allocatable handling part that was broken a while ago with the assumption that reallocation was not a thing inside the OpenMP region (#69441). If it is, then what you are doing is the right approach..
Thanks for taking a look. Explicit reallocation is not allowed inside of an OpenMP region but I think allocation is allowed implicitly via intrinsic assignment or privatisation (which works the same as intrinsic assignment). One of our internal tests failed when I tried to put this in copyHostAssociateVar. I will investigate this and post a followup patch if I get it working. |
Summary: Don't use `copyHostAssociateVar` for allocatable variables. It isn't clear to me whether or not this should be addressed in `copyHostAssociateVar` instead of inside OpenMP. I opted for OpenMP to minimise how many things I effected. `copyHostAssociateVar` will not update the destination variable if the destination variable was unallocated. This is incorrect because assignment inside of the openmp block can cause the allocation status of the variable to change. Furthermore, `copyHostAssociateVar` seems to only copy the variable address not other metadata like the size of the allocation. Reallocation by assignment could cause this to change. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251074
Don't use
copyHostAssociateVar
for allocatable variables. It isn'tclear to me whether or not this should be addressed in
copyHostAssociateVar
instead of inside OpenMP. I opted for OpenMPto minimise how many things I effected.
copyHostAssociateVar
willnot update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore,
copyHostAssociateVar
seems to only copy thevariable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.
Non-OpenMP reviewers: should this actually be done inside of
copyHostAssociateVar
?OpenMP reviewers: I am aware that this will change with delayed privatization. My intention here is to squash the bug before the LLVM branch