Skip to content

[flang][OpenMP] Make lastprivate work with reallocated variables #106559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

luporl
Copy link
Contributor

@luporl luporl commented Aug 29, 2024

Fixes #100951

@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Aug 29, 2024
@luporl luporl requested a review from tblah August 29, 2024 14:08
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-runtime

Author: Leandro Lupori (luporl)

Changes

Fixes #100951


Full diff: https://github.com/llvm/llvm-project/pull/106559.diff

7 Files Affected:

  • (modified) flang/lib/Lower/Bridge.cpp (+4-7)
  • (modified) flang/runtime/assign.cpp (+1-1)
  • (modified) flang/test/Lower/OpenMP/copyprivate2.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+42-12)
  • (modified) flang/test/Lower/OpenMP/task2.f90 (+2-2)
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index c48daba8cf7fab..1152cba8c5a228 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1253,19 +1253,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
       r = hlfir::loadTrivialScalar(loc, *builder, r);
       builder->create<hlfir::AssignOp>(
           loc, r, l,
-          /*isWholeAllocatableAssignment=*/false,
+          /*isWholeAllocatableAssignment=*/isAllocatable,
           /*keepLhsLengthInAllocatableAssignment=*/false,
           /*temporary_lhs=*/true);
     };
 
     if (isAllocatable) {
       // Deep copy allocatable if it is allocated.
-      // Note that when allocated, the RHS is already allocated with the LHS
-      // shape for copy on entry in createHostAssociateVarClone.
-      // For lastprivate, this assumes that the RHS was not reallocated in
-      // the OpenMP region.
-      lhs = hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
-      mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, lhs);
+      hlfir::Entity temp =
+          hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
+      mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, temp);
       mlir::Value isAllocated = builder->genIsNotNullAddr(loc, addr);
       builder->genIfThen(loc, isAllocated)
           .genThen([&]() {
diff --git a/flang/runtime/assign.cpp b/flang/runtime/assign.cpp
index c3c9b0ba10ab33..d558ada51cd21a 100644
--- a/flang/runtime/assign.cpp
+++ b/flang/runtime/assign.cpp
@@ -591,7 +591,7 @@ void RTDEF(AssignTemporary)(Descriptor &to, const Descriptor &from,
     }
   }
 
-  Assign(to, from, terminator, PolymorphicLHS);
+  Assign(to, from, terminator, MaybeReallocate | PolymorphicLHS);
 }
 
 void RTDEF(CopyInAssign)(Descriptor &temp, const Descriptor &var,
diff --git a/flang/test/Lower/OpenMP/copyprivate2.f90 b/flang/test/Lower/OpenMP/copyprivate2.f90
index 38235e8ec79c36..45d497314107ba 100644
--- a/flang/test/Lower/OpenMP/copyprivate2.f90
+++ b/flang/test/Lower/OpenMP/copyprivate2.f90
@@ -27,7 +27,7 @@
 !CHECK-NEXT:    %[[DST_BOX:.*]] = fir.load %[[DST]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 !CHECK:         fir.if %{{.*}} {
 !CHECK-NEXT:      %[[SRC_BOX:.*]] = fir.load %[[SRC]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
-!CHECK-NEXT:      hlfir.assign %[[SRC_BOX]] to %[[DST_BOX]] temporary_lhs : !fir.box<!fir.heap<!fir.array<?xi32>>>,
+!CHECK-NEXT:      hlfir.assign %[[SRC_BOX]] to %[[DST]]#0 realloc temporary_lhs : !fir.box<!fir.heap<!fir.array<?xi32>>>,
 !CHECK-SAME:        !fir.box<!fir.heap<!fir.array<?xi32>>>
 !CHECK-NEXT:    }
 !CHECK-NEXT:    return
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
index 47e163014fe868..707c0b1eb1bb6a 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
@@ -61,7 +61,7 @@ subroutine delayed_privatization_private(var1, l1)
 
 ! CHECK-NEXT:  fir.if %[[COPY_COND]] {
 ! CHECK-NEXT:    %[[PRIV_ORIG_ARG_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
-! CHECK-NEXT:    hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_BASE_VAL]] temporary_lhs
+! CHECK-NEXT:    hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_PRIV_ARG]] realloc temporary_lhs
 ! CHECK-NEXT:   }
 ! CHECK-NEXT:   omp.yield
 ! CHECK-NEXT: }
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
index 5f09371bbaba2e..b945084a4347ff 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
@@ -35,7 +35,7 @@ subroutine delayed_privatization_allocatable
 ! CHECK-NEXT:    %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
 ! CHECK-NEXT:    %[[ORIG_BASE_ADDR:.*]] = fir.box_addr %[[ORIG_BASE_VAL]]
 ! CHECK-NEXT:    %[[ORIG_BASE_LD:.*]] = fir.load %[[ORIG_BASE_ADDR]]
-! CHECK-NEXT:    hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_BASE_BOX]] temporary_lhs
+! CHECK-NEXT:    hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_PRIV_ARG]] realloc temporary_lhs
 ! CHECK-NEXT:  }
 
 ! RUN: %flang -c -emit-llvm -fopenmp -mmlir --openmp-enable-delayed-privatization \
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 41bbb182aade23..29d857afee80cf 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -1,18 +1,6 @@
 ! RUN: %flang_fc1 -emit-hlfir -o - -fopenmp %s | FileCheck %s
 ! RUN: bbc -emit-hlfir -o - -fopenmp %s | FileCheck %s
 
-program lastprivate_allocatable
-  integer, allocatable :: a
-  integer :: i
-  ! a is unallocated here
-  !$omp parallel do lastprivate(a)
-  do i=1,1
-    a = 42
-  enddo
-  !$omp end parallel do
-  ! a should be allocated here
-end program
-
 ! CHECK-LABEL:   func.func @_QQmain()
 ! CHECK:           %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "a", uniq_name = "_QFEa"}
 ! CHECK:           %[[VAL_1:.*]] = fir.zero_bits !fir.heap<i32>
@@ -36,3 +24,45 @@ program lastprivate_allocatable
 ! CHECK-NEXT:            }
 ! CHECK-NEXT:            omp.yield
 ! CHECK-NEXT:          }
+program lastprivate_allocatable
+  integer, allocatable :: a
+  integer :: i
+  ! a is unallocated here
+  !$omp parallel do lastprivate(a)
+  do i=1,1
+    a = 42
+  enddo
+  !$omp end parallel do
+  ! a should be allocated here
+end program
+
+! CHECK-LABEL:  func @_QPlastprivate_realloc()
+! CHECK:          %[[A:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFlastprivate_reallocEa"} :
+! CHECK-SAME:       (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>) ->
+! CHECK-SAME:       (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>)
+! CHECK:          omp.parallel {
+! CHECK:            %[[A_PRIV:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFlastprivate_reallocEa"} :
+! CHECK-SAME:         (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>) ->
+! CHECK-SAME:         (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>)
+! CHECK:            omp.sections {
+! CHECK:              omp.section {
+! CHECK:                fir.if %{{.*}} {
+! CHECK:                  %[[TEMP:.*]] = fir.load %[[A_PRIV:.*]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>
+! CHECK:                  hlfir.assign %[[TEMP]] to %[[A]]#0 realloc temporary_lhs : !fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>,
+! CHECK-SAME:               !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>
+! CHECK:                }
+! CHECK:              }
+! CHECK:            }
+! CHECK:          }
+subroutine lastprivate_realloc()
+  complex, allocatable :: a(:)
+
+  allocate(a(2))
+  !$omp parallel
+    !$omp sections lastprivate(a)
+      !$omp section
+        deallocate(a)
+        allocate(a(3))
+    !$omp end sections
+  !$omp end parallel
+end subroutine
diff --git a/flang/test/Lower/OpenMP/task2.f90 b/flang/test/Lower/OpenMP/task2.f90
index ce491d95e93972..63a38ee6207b4e 100644
--- a/flang/test/Lower/OpenMP/task2.f90
+++ b/flang/test/Lower/OpenMP/task2.f90
@@ -21,8 +21,8 @@ subroutine omp_task_nested_allocatable_firstprivate
 !CHECK:           %[[PRIV_A_BOX:.*]] = fir.load %[[PRIV_A]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 !CHECK:           fir.if %{{.*}} {
 !CHECK:             %[[TEMP:.*]] = fir.load %[[A]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
-!CHECK:             hlfir.assign %[[TEMP]] to %[[PRIV_A_BOX]] temporary_lhs :
-!CHECK-SAME:          !fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.box<!fir.heap<!fir.array<?xi32>>>
+!CHECK:             hlfir.assign %[[TEMP]] to %[[PRIV_A]]#0 realloc temporary_lhs :
+!CHECK-SAME:          !fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 !CHECK:           }
     !$omp task default(firstprivate)
       a = 2

! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>)
! CHECK: omp.sections {
! CHECK: omp.section {
! CHECK: fir.if %{{.*}} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if correct? On my reading of the full generated code (omitted from the test case here) it seems to be testing whether the original (outside of the parallel region) copy of the variable is allocated. I would imagine that we should still perform the assignment even if it is not.

There is no such if statement in the lastprivate_allocatable testcase (the if statement is checking whether it is the final iteration of the loop). The "last" section is judged lexically so it doesn't need an if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if was already being generated for allocatables in copyVarHLFIR, so I didn't want to change it.
It is indeed testing whether the original copy of the variable is allocated.
If I understood the standard correctly, it has this restriction:

A variable that appears in a lastprivate clause must be definable.
(https://www.openmp.org/spec-html/5.2/openmpsu39.html)

The copy must also be allocated upon exit of the last section:

If the original list item has the ALLOCATABLE attribute, the corresponding list item of which the value is assigned to the original item must have an allocation status of allocated upon exit from the sequentially last iteration or lexically last structured block sequence associated with a sections construct.

We don't check this, but since this is non-conforming code it should be OK. OTOH, following this logic, it would also be OK to not check if the original variable is allocated and let the behavior be undefined, at least for lastprivate.

There is no such if statement in the lastprivate_allocatable testcase (the if statement is checking whether it is the final iteration of the loop).

Hmm, interesting, I hadn't noticed that. In fact, I think that DataSharingProcessor::copyLastPrivateSymbol could be almost entirely replaced by this new version of copyHostAssociateVar/copyVarHLFIR.

I don't know if, besides lastprivate, all other uses of copyHostAssociateVar can skip the if allocated part.
But if they do and If others agree we can remove it.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original list item does not have the POINTER attribute, its update occurs as if by intrinsic assignment unless it has a type bound procedure as a defined assignment.

If an allocated allocatable is assigned to an unallocated allocatable, the lhs is allocated as part of the assignment. So I think we should still do the assignment, even if the list item was unallocated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove this if and adjust the tests.

BTW, according to the sentence that you have quoted above, when there is a type bound procedure as a defined assignment it should be used, but it's not, because we set AssignOp's temporary_lhs flag to false in Bridge.cpp.

There is a comment in the code saying that temporary_lhs also avoids finalization calls, but it seems to me we shouldn't avoid them:

Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region. The order in which any final subroutines for different variables of a finalizable type are called is unspecified.
(OMP 5.2, 5.3 List Item Privatization)

So I intend to also change temporary_lhs to false to fix these nonconformities. This is already being done in DataSharingProcessor::copyLastPrivateSymbol. Please let me know if you disagree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@luporl
Copy link
Contributor Author

luporl commented Sep 3, 2024

Removed if allocated check and set temporary_lhs to false.
The tests were adjusted and no regressions were found in GFortran and Fujitsu testsuites.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for the updates!

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The code I wrote here intended to do like the previous lowering code to FIR, it was not quite clear to me if allocation/reallocation could happen on the device. Supporting it makes some sense if the OpenMP standard does not explicitly prevents it.

Now it can use copyHostAssociateVar() for allocatables too.
@luporl
Copy link
Contributor Author

luporl commented Sep 4, 2024

Thanks for the reviews @jeanPerier and @tblah.

I have pushed one last commit with a change that I forgot to include previously.
It simplifies DataSharingProcessor::copyLastPrivateSymbol, which now can use copyHostAssociateVar() for allocatables too.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something in this PR has broken firstprivate with an unallocated variable:

program repo
  integer, allocatable :: b

  !$omp parallel firstprivate(b)
  b = 1
  !$omp end parallel
end program

For me this now produces a segfault, whereas it used to work correctly before this PR. I wonder if the if statement we removed from copyVarHLFIR was guarding this case in firstprivate. I think the right solution is to emit the if only for firstprivate.

@luporl
Copy link
Contributor Author

luporl commented Sep 5, 2024

It seems the standard indeed allows an allocatable variable to be unallocated in a firstprivate clause:

If the original list item that does not have the POINTER attribute has the allocation status of unallocated, the new list items will have the same status.
(OMP 5.2 - 5.4.4 firstprivate Clause - https://www.openmp.org/spec-html/5.2/openmpsu38.html)

I have modified copyVarHLFIR to emit the if in this case.

The if is not needed with copyprivate:

Any list item with the ALLOCATABLE attribute must have the allocation status of allocated when the intrinsic assignment is performed.
(OMP 5.2 - 5.7.2 copyprivate Clause - https://www.openmp.org/spec-html/5.2/openmpsu59.html)

@luporl
Copy link
Contributor Author

luporl commented Sep 5, 2024

But copyin needs the is allocated check:

If the original list item is unallocated or unassociated, the copy of the other thread inherits the declared type parameters and the default type parameter values from the original list item.

Am I forgetting any other copy clause?

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this. This passes our internal tests now and I agree with your reading of those sections of the standard.

@luporl
Copy link
Contributor Author

luporl commented Sep 5, 2024

I just realized that copyin of allocatables is not yet implemented, so no further changes are needed for now.

ClauseProcessor.cpp:624: not yet implemented: pointer or allocatable variables in Copyin clause

@luporl luporl merged commit 797f011 into llvm:main Sep 5, 2024
8 checks passed
@luporl luporl deleted the luporl-omp-copy-realloc branch September 5, 2024 17:55
DavidTruby added a commit to DavidTruby/llvm-project that referenced this pull request Sep 6, 2024
The copyin clause currently forbids pointer and allocatable variables,
which are allowed by the OpenMP 1.1 and 3.0 specifications respectively.

Since llvm#106559 it is sufficient to remove the TODO check to get correct
behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang:runtime flang Flang issues not falling into any other category
Projects
None yet
4 participants