-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][OpenMP] Make lastprivate work with reallocated variables #106559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-runtime Author: Leandro Lupori (luporl) ChangesFixes #100951 Full diff: https://github.com/llvm/llvm-project/pull/106559.diff 7 Files Affected:
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index c48daba8cf7fab..1152cba8c5a228 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1253,19 +1253,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
r = hlfir::loadTrivialScalar(loc, *builder, r);
builder->create<hlfir::AssignOp>(
loc, r, l,
- /*isWholeAllocatableAssignment=*/false,
+ /*isWholeAllocatableAssignment=*/isAllocatable,
/*keepLhsLengthInAllocatableAssignment=*/false,
/*temporary_lhs=*/true);
};
if (isAllocatable) {
// Deep copy allocatable if it is allocated.
- // Note that when allocated, the RHS is already allocated with the LHS
- // shape for copy on entry in createHostAssociateVarClone.
- // For lastprivate, this assumes that the RHS was not reallocated in
- // the OpenMP region.
- lhs = hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
- mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, lhs);
+ hlfir::Entity temp =
+ hlfir::derefPointersAndAllocatables(loc, *builder, lhs);
+ mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, temp);
mlir::Value isAllocated = builder->genIsNotNullAddr(loc, addr);
builder->genIfThen(loc, isAllocated)
.genThen([&]() {
diff --git a/flang/runtime/assign.cpp b/flang/runtime/assign.cpp
index c3c9b0ba10ab33..d558ada51cd21a 100644
--- a/flang/runtime/assign.cpp
+++ b/flang/runtime/assign.cpp
@@ -591,7 +591,7 @@ void RTDEF(AssignTemporary)(Descriptor &to, const Descriptor &from,
}
}
- Assign(to, from, terminator, PolymorphicLHS);
+ Assign(to, from, terminator, MaybeReallocate | PolymorphicLHS);
}
void RTDEF(CopyInAssign)(Descriptor &temp, const Descriptor &var,
diff --git a/flang/test/Lower/OpenMP/copyprivate2.f90 b/flang/test/Lower/OpenMP/copyprivate2.f90
index 38235e8ec79c36..45d497314107ba 100644
--- a/flang/test/Lower/OpenMP/copyprivate2.f90
+++ b/flang/test/Lower/OpenMP/copyprivate2.f90
@@ -27,7 +27,7 @@
!CHECK-NEXT: %[[DST_BOX:.*]] = fir.load %[[DST]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
!CHECK: fir.if %{{.*}} {
!CHECK-NEXT: %[[SRC_BOX:.*]] = fir.load %[[SRC]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
-!CHECK-NEXT: hlfir.assign %[[SRC_BOX]] to %[[DST_BOX]] temporary_lhs : !fir.box<!fir.heap<!fir.array<?xi32>>>,
+!CHECK-NEXT: hlfir.assign %[[SRC_BOX]] to %[[DST]]#0 realloc temporary_lhs : !fir.box<!fir.heap<!fir.array<?xi32>>>,
!CHECK-SAME: !fir.box<!fir.heap<!fir.array<?xi32>>>
!CHECK-NEXT: }
!CHECK-NEXT: return
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
index 47e163014fe868..707c0b1eb1bb6a 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
@@ -61,7 +61,7 @@ subroutine delayed_privatization_private(var1, l1)
! CHECK-NEXT: fir.if %[[COPY_COND]] {
! CHECK-NEXT: %[[PRIV_ORIG_ARG_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
-! CHECK-NEXT: hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_BASE_VAL]] temporary_lhs
+! CHECK-NEXT: hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_PRIV_ARG]] realloc temporary_lhs
! CHECK-NEXT: }
! CHECK-NEXT: omp.yield
! CHECK-NEXT: }
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
index 5f09371bbaba2e..b945084a4347ff 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
@@ -35,7 +35,7 @@ subroutine delayed_privatization_allocatable
! CHECK-NEXT: %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
! CHECK-NEXT: %[[ORIG_BASE_ADDR:.*]] = fir.box_addr %[[ORIG_BASE_VAL]]
! CHECK-NEXT: %[[ORIG_BASE_LD:.*]] = fir.load %[[ORIG_BASE_ADDR]]
-! CHECK-NEXT: hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_BASE_BOX]] temporary_lhs
+! CHECK-NEXT: hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_PRIV_ARG]] realloc temporary_lhs
! CHECK-NEXT: }
! RUN: %flang -c -emit-llvm -fopenmp -mmlir --openmp-enable-delayed-privatization \
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 41bbb182aade23..29d857afee80cf 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -1,18 +1,6 @@
! RUN: %flang_fc1 -emit-hlfir -o - -fopenmp %s | FileCheck %s
! RUN: bbc -emit-hlfir -o - -fopenmp %s | FileCheck %s
-program lastprivate_allocatable
- integer, allocatable :: a
- integer :: i
- ! a is unallocated here
- !$omp parallel do lastprivate(a)
- do i=1,1
- a = 42
- enddo
- !$omp end parallel do
- ! a should be allocated here
-end program
-
! CHECK-LABEL: func.func @_QQmain()
! CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "a", uniq_name = "_QFEa"}
! CHECK: %[[VAL_1:.*]] = fir.zero_bits !fir.heap<i32>
@@ -36,3 +24,45 @@ program lastprivate_allocatable
! CHECK-NEXT: }
! CHECK-NEXT: omp.yield
! CHECK-NEXT: }
+program lastprivate_allocatable
+ integer, allocatable :: a
+ integer :: i
+ ! a is unallocated here
+ !$omp parallel do lastprivate(a)
+ do i=1,1
+ a = 42
+ enddo
+ !$omp end parallel do
+ ! a should be allocated here
+end program
+
+! CHECK-LABEL: func @_QPlastprivate_realloc()
+! CHECK: %[[A:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFlastprivate_reallocEa"} :
+! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>) ->
+! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>)
+! CHECK: omp.parallel {
+! CHECK: %[[A_PRIV:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFlastprivate_reallocEa"} :
+! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>) ->
+! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>)
+! CHECK: omp.sections {
+! CHECK: omp.section {
+! CHECK: fir.if %{{.*}} {
+! CHECK: %[[TEMP:.*]] = fir.load %[[A_PRIV:.*]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>
+! CHECK: hlfir.assign %[[TEMP]] to %[[A]]#0 realloc temporary_lhs : !fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>,
+! CHECK-SAME: !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>
+! CHECK: }
+! CHECK: }
+! CHECK: }
+! CHECK: }
+subroutine lastprivate_realloc()
+ complex, allocatable :: a(:)
+
+ allocate(a(2))
+ !$omp parallel
+ !$omp sections lastprivate(a)
+ !$omp section
+ deallocate(a)
+ allocate(a(3))
+ !$omp end sections
+ !$omp end parallel
+end subroutine
diff --git a/flang/test/Lower/OpenMP/task2.f90 b/flang/test/Lower/OpenMP/task2.f90
index ce491d95e93972..63a38ee6207b4e 100644
--- a/flang/test/Lower/OpenMP/task2.f90
+++ b/flang/test/Lower/OpenMP/task2.f90
@@ -21,8 +21,8 @@ subroutine omp_task_nested_allocatable_firstprivate
!CHECK: %[[PRIV_A_BOX:.*]] = fir.load %[[PRIV_A]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
!CHECK: fir.if %{{.*}} {
!CHECK: %[[TEMP:.*]] = fir.load %[[A]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
-!CHECK: hlfir.assign %[[TEMP]] to %[[PRIV_A_BOX]] temporary_lhs :
-!CHECK-SAME: !fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.box<!fir.heap<!fir.array<?xi32>>>
+!CHECK: hlfir.assign %[[TEMP]] to %[[PRIV_A]]#0 realloc temporary_lhs :
+!CHECK-SAME: !fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
!CHECK: }
!$omp task default(firstprivate)
a = 2
|
! CHECK-SAME: (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.complex<4>>>>>) | ||
! CHECK: omp.sections { | ||
! CHECK: omp.section { | ||
! CHECK: fir.if %{{.*}} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this if
correct? On my reading of the full generated code (omitted from the test case here) it seems to be testing whether the original (outside of the parallel region) copy of the variable is allocated. I would imagine that we should still perform the assignment even if it is not.
There is no such if statement in the lastprivate_allocatable
testcase (the if statement is checking whether it is the final iteration of the loop). The "last" section is judged lexically so it doesn't need an if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
was already being generated for allocatables in copyVarHLFIR
, so I didn't want to change it.
It is indeed testing whether the original copy of the variable is allocated.
If I understood the standard correctly, it has this restriction:
A variable that appears in a lastprivate clause must be definable.
(https://www.openmp.org/spec-html/5.2/openmpsu39.html)
The copy must also be allocated upon exit of the last section:
If the original list item has the ALLOCATABLE attribute, the corresponding list item of which the value is assigned to the original item must have an allocation status of allocated upon exit from the sequentially last iteration or lexically last structured block sequence associated with a sections construct.
We don't check this, but since this is non-conforming code it should be OK. OTOH, following this logic, it would also be OK to not check if the original variable is allocated and let the behavior be undefined, at least for lastprivate.
There is no such if statement in the lastprivate_allocatable testcase (the if statement is checking whether it is the final iteration of the loop).
Hmm, interesting, I hadn't noticed that. In fact, I think that DataSharingProcessor::copyLastPrivateSymbol
could be almost entirely replaced by this new version of copyHostAssociateVar/copyVarHLFIR
.
I don't know if, besides lastprivate, all other uses of copyHostAssociateVar
can skip the if allocated
part.
But if they do and If others agree we can remove it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the original list item does not have the POINTER attribute, its update occurs as if by intrinsic assignment unless it has a type bound procedure as a defined assignment.
If an allocated allocatable is assigned to an unallocated allocatable, the lhs is allocated as part of the assignment. So I think we should still do the assignment, even if the list item was unallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll remove this if
and adjust the tests.
BTW, according to the sentence that you have quoted above, when there is a type bound procedure as a defined assignment
it should be used, but it's not, because we set AssignOp
's temporary_lhs
flag to false in Bridge.cpp
.
There is a comment in the code saying that temporary_lhs
also avoids finalization calls, but it seems to me we shouldn't avoid them:
Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region. The order in which any final subroutines for different variables of a finalizable type are called is unspecified.
(OMP 5.2, 5.3 List Item Privatization)
So I intend to also change temporary_lhs
to false to fix these nonconformities. This is already being done in DataSharingProcessor::copyLastPrivateSymbol
. Please let me know if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
Removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks for the updates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The code I wrote here intended to do like the previous lowering code to FIR, it was not quite clear to me if allocation/reallocation could happen on the device. Supporting it makes some sense if the OpenMP standard does not explicitly prevents it.
Now it can use copyHostAssociateVar() for allocatables too.
Thanks for the reviews @jeanPerier and @tblah. I have pushed one last commit with a change that I forgot to include previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something in this PR has broken firstprivate with an unallocated variable:
program repo
integer, allocatable :: b
!$omp parallel firstprivate(b)
b = 1
!$omp end parallel
end program
For me this now produces a segfault, whereas it used to work correctly before this PR. I wonder if the if
statement we removed from copyVarHLFIR was guarding this case in firstprivate. I think the right solution is to emit the if only for firstprivate.
It seems the standard indeed allows an allocatable variable to be unallocated in a firstprivate clause:
I have modified The
|
But copyin needs the
Am I forgetting any other copy clause? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this. This passes our internal tests now and I agree with your reading of those sections of the standard.
I just realized that
|
The copyin clause currently forbids pointer and allocatable variables, which are allowed by the OpenMP 1.1 and 3.0 specifications respectively. Since llvm#106559 it is sufficient to remove the TODO check to get correct behaviour.
Fixes #100951