Skip to content

[flang][OpenMP] fix lastprivate for allocatables #99686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 22, 2024

Conversation

tblah
Copy link
Contributor

@tblah tblah commented Jul 19, 2024

Don't use copyHostAssociateVar for allocatable variables. It isn't
clear to me whether or not this should be addressed in
copyHostAssociateVar instead of inside OpenMP. I opted for OpenMP
to minimise how many things I effected. copyHostAssociateVar will
not update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore, copyHostAssociateVar seems to only copy the
variable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.

Non-OpenMP reviewers: should this actually be done inside of copyHostAssociateVar?

OpenMP reviewers: I am aware that this will change with delayed privatization. My intention here is to squash the bug before the LLVM branch

1) Do not deallocate allocatable private variables if they are
   lastprivate at the end of the openmp region because their address
   will have been transferred to the host variable when the variable
   is copied.
2) Don't use copyHostAssociateVar for allocatable variables. It isn't
   clear to me whether or not this should be addressed in
   copyHostAssociateVar instead of inside OpenMP. I opted for OpenMP
   to minimise how many things I effected. copyHostAssociateVar will
   not update the destination variable if the destination variable
   was unallocated. This is incorrect because assignment inside of the
   openmp block can cause the allocation status of the variable to
   change. Furthermore, copyHostAssociateVar seems to only copy the
   variable address not other metadata like the size of the allocation.
   Reallocation by assignment could cause this to change.
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Tom Eccles (tblah)

Changes
  1. Do not deallocate allocatable private variables if they are
    lastprivate at the end of the openmp region because their address
    will have been transferred to the host variable when the variable
    is copied.
  2. Don't use copyHostAssociateVar for allocatable variables. It isn't
    clear to me whether or not this should be addressed in
    copyHostAssociateVar instead of inside OpenMP. I opted for OpenMP
    to minimise how many things I effected. copyHostAssociateVar will
    not update the destination variable if the destination variable
    was unallocated. This is incorrect because assignment inside of the
    openmp block can cause the allocation status of the variable to
    change. Furthermore, copyHostAssociateVar seems to only copy the
    variable address not other metadata like the size of the allocation.
    Reallocation by assignment could cause this to change.

Non-OpenMP reviewers: should this actually be done inside of copyHostAssociateVar?

OpenMP reviewers: I am aware that this will change with delayed privatization. My intention here is to squash the bug before the LLVM branch


Full diff: https://github.com/llvm/llvm-project/pull/99686.diff

2 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+47-3)
  • (added) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+44)
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index c1d4c089df3b2..597d62f469d75 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -79,7 +79,8 @@ void DataSharingProcessor::processStep2(mlir::Operation *op, bool isLoop) {
 
 void DataSharingProcessor::insertDeallocs() {
   for (const semantics::Symbol *sym : allPrivatizedSymbols)
-    if (semantics::IsAllocatable(sym->GetUltimate())) {
+    if (semantics::IsAllocatable(sym->GetUltimate()) &&
+        !sym->test(semantics::Symbol::Flag::OmpLastPrivate)) {
       if (!useDelayedPrivatization) {
         converter.createHostAssociateVarCloneDealloc(*sym);
         continue;
@@ -127,8 +128,51 @@ void DataSharingProcessor::copyFirstPrivateSymbol(
 void DataSharingProcessor::copyLastPrivateSymbol(
     const semantics::Symbol *sym,
     [[maybe_unused]] mlir::OpBuilder::InsertPoint *lastPrivIP) {
-  if (sym->test(semantics::Symbol::Flag::OmpLastPrivate))
-    converter.copyHostAssociateVar(*sym, lastPrivIP);
+  if (sym->test(semantics::Symbol::Flag::OmpLastPrivate)) {
+    bool allocatable = semantics::IsAllocatable(sym->GetUltimate());
+    if (!allocatable) {
+      converter.copyHostAssociateVar(*sym, lastPrivIP);
+      return;
+    }
+
+    // copyHostAssociateVar doesn't work properly if the privatised copy was
+    // reallocated (e.g. by assignment): it will only copy if the ultimate
+    // symbol was already allocated, and it only copies data so any reallocated
+    // lengths etc are lost
+
+    // 1) Fetch the original copy of the variable.
+    assert(sym->has<Fortran::semantics::HostAssocDetails>() &&
+           "No host-association found");
+    const Fortran::semantics::Symbol &hsym = sym->GetUltimate();
+    Fortran::lower::SymbolBox hsb = symTable->lookupOneLevelUpSymbol(hsym);
+    assert(hsb && "Host symbol box not found");
+
+    // 2) Fetch the copied one that will mask the original.
+    Fortran::lower::SymbolBox sb = symTable->shallowLookupSymbol(sym);
+    assert(sb && "Host-associated symbol box not found");
+    assert(hsb.getAddr() != sb.getAddr() &&
+           "Host and associated symbol boxes are the same");
+
+    // 3) Perform the assignment.
+    fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+    mlir::Location loc = converter.genLocation(sym->name());
+    mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
+    if (lastPrivIP && lastPrivIP->isSet())
+      builder.restoreInsertionPoint(*lastPrivIP);
+    else
+      builder.setInsertionPointAfter(sb.getAddr().getDefiningOp());
+
+    hlfir::Entity dst{hsb.getAddr()};
+    hlfir::Entity src{sb.getAddr()};
+    builder.create<hlfir::AssignOp>(
+        loc, src, dst, /*isWholeAllocatableAssignment=*/allocatable,
+        /*keepLhsLengthInAllocatableAssignment=*/false, /*temporary_lhs=*/true);
+
+    if (lastPrivIP && lastPrivIP->isSet() &&
+        sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate)) {
+      builder.restoreInsertionPoint(insPt);
+    }
+  }
 }
 
 void DataSharingProcessor::collectOmpObjectListSymbol(
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
new file mode 100644
index 0000000000000..201c93576bda2
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -0,0 +1,44 @@
+! RUN: %flang_fc1 -emit-hlfir -o - -fopenmp %s | FileCheck %s
+! RUN: bbc -emit-hlfir -o - -fopenmp %s | FileCheck %s
+
+program lastprivate_allocatable
+  integer, allocatable :: a
+  integer :: i
+  ! a is unallocated here
+  !$omp parallel do lastprivate(a)
+  do i=1,1
+    a = 42
+  enddo
+  !$omp end parallel do
+  ! a should be allocated here
+end program
+
+! CHECK-LABEL:   func.func @_QQmain()
+! CHECK:           %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "a", uniq_name = "_QFEa"}
+! CHECK:           %[[VAL_1:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK:           %[[VAL_2:.*]] = fir.embox %[[VAL_1]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK:           fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:           omp.parallel {
+!                    create original copy of private variable
+! CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:             %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
+! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK:             omp.wsloop {
+! CHECK:               omp.loop_nest
+!                        [...]
+!                        if this is the last iteration
+! CHECK:                 fir.if %{{.*}} {
+!                          store loop IV
+! CHECK:                   fir.store %{{.*}} to %[[VAL_18]]#1 : !fir.ref<i32>
+!                          assign private variable to original copy: realloc and temporary_lhs
+! CHECK:                   hlfir.assign %[[VAL_16]]#0 to %[[VAL_3]]#0 realloc temporary_lhs : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:            }
+! CHECK-NEXT:            omp.yield
+! CHECK-NEXT:          }
+! CHECK-NEXT:          omp.terminator
+! CHECK-NEXT:        }
+!                    [no deallocation here]
+! CHECK-NEXT:        omp.terminator
+! CHECK-NEXT:      }
+

I was mistaken about the deallocation. Firstly, all of the other private
copies on threads that don't execute the last iteration need to be
freed. Secondly, the assign copies the data not the pointer.
@tblah tblah force-pushed the ecclescake/privatised_assign branch from 38b2249 to fc63462 Compare July 20, 2024 11:03
I thought this was needed to avoid calling destructors, but that was
theoretical. It turns out when this is used, arrays will not be assigned
with different numbers of elements.
Copy link

github-actions bot commented Jul 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kiranchandramohan
Copy link
Contributor

Non-OpenMP reviewers: should this actually be done inside of copyHostAssociateVar?

It need not be. copyHostAssociateVar has a few customers and only OpenMP uses it for lastprivatisation. Previously the symbol table was not available in OpenMP.cpp. So this could only be done in the bridge. With changes to support recursive lowering, OpenMP.cpp has the Symbol table available.

@kiranchandramohan
Copy link
Contributor

kiranchandramohan commented Jul 20, 2024

OpenMP reviewers: I am aware that this will change with delayed privatization. My intention here is to squash the bug before the LLVM branch

At the moment delayed privatisation has no support for lastprivate. When we add support for it then this new change should be usable.

realloc_lhs

In later work, we should investigate whether this is needed.

@tblah
Copy link
Contributor Author

tblah commented Jul 22, 2024

realloc_lhs

In later work, we should investigate whether this is needed.

I presume you mean temporary_lhs. This is tricky - the lhs of the assignment is not a temporary so it is a missuse of the flag. I originally included it because the documentation for hlfir.assign says that it prevents finalization of the lhs or lhs subobjects. However, when I tried more testcases it turned out this prevented arrays of different lengths from being assigned. This seems to contradict the documentation for realloc, but I wanted to get this landed before the branch if possible.

Now that I am looking closer, it isn't clear to me whether calling the finalization would actually be a bug. The OpenMP 5.2 standard (section 5.4.5) says the update should occur as though by intrinsic assignment (which I believe would run finalization). But I might have missed something, because the finalization would have already run inside each thread when the private copy was overwritten.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

Comment on lines +167 to +170
builder.create<hlfir::AssignOp>(
loc, src, dst, /*isWholeAllocatableAssignment=*/allocatable,
/*keepLhsLengthInAllocatableAssignment=*/false,
/*temporary_lhs=*/false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is placed in the isAllocatable part of copyVarHLFIR in Bridge.cpp, will there be lot of tests that have to be updated?

Can we do this at some point in the future after the branch?

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom. Moving this into the copyHostAssociateVar/copyVarHLFIR would make sense to me.

I updated the allocatable handling part that was broken a while ago with the assumption that reallocation was not a thing inside the OpenMP region (#69441). If it is, then what you are doing is the right approach..

@tblah
Copy link
Contributor Author

tblah commented Jul 22, 2024

Thanks Tom. Moving this into the copyHostAssociateVar/copyVarHLFIR would make sense to me.

I updated the allocatable handling part that was broken a while ago with the assumption that reallocation was not a thing inside the OpenMP region (#69441). If it is, then what you are doing is the right approach..

Thanks for taking a look. Explicit reallocation is not allowed inside of an OpenMP region but I think allocation is allowed implicitly via intrinsic assignment or privatisation (which works the same as intrinsic assignment).

One of our internal tests failed when I tried to put this in copyHostAssociateVar. I will investigate this and post a followup patch if I get it working.

@tblah tblah merged commit 2e6558b into llvm:main Jul 22, 2024
7 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Don't use `copyHostAssociateVar` for allocatable variables. It isn't
clear to me whether or not this should be addressed in
`copyHostAssociateVar` instead of inside OpenMP. I opted for OpenMP
to minimise how many things I effected. `copyHostAssociateVar` will
not update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore, `copyHostAssociateVar` seems to only copy the
variable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants