-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][OpenMP] Move reductions from loop
to teams
when loop
is mapped to distribute
#132920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: Kareem Ergawy (ergawy) ChangesFollow-up to #132003, in particular, see This PR extends reduction support for subroutine bar
implicit none
integer :: x, i
!$omp teams loop reduction(+: x)
DO i = 1, 5
call foo()
END DO
end subroutine Note the following:
Full diff: https://github.com/llvm/llvm-project/pull/132920.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index 74ad6330b11a7..8858de075d193 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -59,8 +59,36 @@ class GenericLoopConversionPattern
case GenericLoopCombinedInfo::TeamsLoop:
if (teamsLoopCanBeParallelFor(loopOp))
rewriteToDistributeParallelDo(loopOp, rewriter);
- else
+ else {
+ auto teamsOp = llvm::cast<mlir::omp::TeamsOp>(loopOp->getParentOp());
+ auto teamsBlockArgIface =
+ llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*teamsOp);
+ auto loopBlockArgIface =
+ llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*loopOp);
+
+ for (unsigned i = 0; i < loopBlockArgIface.numReductionBlockArgs();
+ ++i) {
+ mlir::BlockArgument loopRedBlockArg =
+ loopBlockArgIface.getReductionBlockArgs()[i];
+ mlir::BlockArgument teamsRedBlockArg =
+ teamsBlockArgIface.getReductionBlockArgs()[i];
+ rewriter.replaceAllUsesWith(loopRedBlockArg, teamsRedBlockArg);
+ }
+
+ for (unsigned i = 0; i < loopBlockArgIface.numReductionBlockArgs();
+ ++i) {
+ loopOp.getRegion().eraseArgument(
+ loopBlockArgIface.getReductionBlockArgsStart());
+ }
+
+ loopOp.removeReductionModAttr();
+ loopOp.getReductionVarsMutable().clear();
+ loopOp.removeReductionByrefAttr();
+ loopOp.removeReductionSymsAttr();
+
rewriteToDistribute(loopOp, rewriter);
+ }
+
break;
}
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 954985e2d64f1..44337ddfbcb12 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -358,3 +358,40 @@ subroutine multi_block_teams
end select
!$omp end target teams
end subroutine
+
+
+! Verifies that reductions are hoisted to the parent `teams` directive and removed
+! from the `loop` dreictive when `loop` is mapped to `distribute`.
+
+! CHECK-LABEL: func.func @_QPteams_loop_cannot_be_parallel_for_with_reductions
+subroutine teams_loop_cannot_be_parallel_for_with_reductions
+ implicit none
+ integer :: x, y, i, p
+
+ ! CHECK: %[[ADD_RED:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QF{{.*}}Ex"}
+ ! CHECK: %[[MUL_RED:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QF{{.*}}Ey"}
+ ! CHECK: omp.teams reduction(
+ ! CHECK-SAME: @add_reduction_i32 %[[ADD_RED]]#0 -> %[[ADD_RED_ARG:[^[:space:]]*]],
+ ! CHECK-SAME: @multiply_reduction_i32 %[[MUL_RED]]#0 -> %[[MUL_RED_ARG:.*]] : {{.*}}) {
+
+ ! CHECK: omp.distribute private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %{{.*}} : {{.*}}) {
+ ! CHECK: %[[ADD_RED_DECL:.*]]:2 = hlfir.declare %[[ADD_RED_ARG]] {uniq_name = "_QF{{.*}}Ex"}
+ ! CHECK: %[[MUL_RED_DECL:.*]]:2 = hlfir.declare %[[MUL_RED_ARG]] {uniq_name = "_QF{{.*}}Ey"}
+
+ ! CHECK: %[[ADD_RES:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
+ ! CHECK: hlfir.assign %[[ADD_RES]] to %[[ADD_RED_DECL]]#0 : i32, !fir.ref<i32>
+
+ ! CHECK: %[[MUL_RES:.*]] = arith.muli %{{.*}}, %{{.*}} : i32
+ ! CHECK: hlfir.assign %[[MUL_RES]] to %[[MUL_RED_DECL]]#0 : i32, !fir.ref<i32>
+ ! CHECK: omp.yield
+ ! CHECK: }
+ ! CHECK: omp.terminator
+ ! CHECK: }
+ !$omp teams loop reduction(+: x) reduction(*: y) private(p)
+ DO i = 1, 5
+ call foo()
+ x = x + i
+ y = y * i
+ p = 42
+ END DO
+end subroutine
|
loop
to teams
when loop
is mapped to distribute
loop
to teams
when loop
is mapped to distribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some nitpicking left.
! CHECK: omp.terminator | ||
! CHECK: } | ||
!$omp teams loop reduction(+: x) reduction(*: y) private(p) | ||
DO i = 1, 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DO i = 1, 5 | |
do i = 1, 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
…ed to `distribute` Follow-up to #132003, in particular, see #132003 (comment). This PR extends reduction support for `loop` directives. Consider the following scenario: ```fortran subroutine bar implicit none integer :: x, i !$omp teams loop reduction(+: x) DO i = 1, 5 call foo() END DO end subroutine ``` Note the following: * According to the spec, the `reduction` clause will be attached to `loop` during earlier stages in the compiler. * Additionally, `loop` cannot be mapped to `distribute parallel for` due to the call to a foreign function inside the loop's body. * Therefore, `loop` must be mapped to `distribute`. * However, `distribute` does not have `reduction` clauses. * As a result, we have to move the `reduction`s from the `loop` to its parent `teams` directive, which is what is done by this PR.
008ca28
to
d368f44
Compare
Follow-up to #132003, in particular, see #132003 (comment).
This PR extends reduction support for
loop
directives. Consider the following scenario:Note the following:
reduction
clause will be attached toloop
during earlier stages in the compiler.loop
cannot be mapped todistribute parallel for
due to the call to a foreign function inside the loop's body.loop
must be mapped todistribute
.distribute
does not havereduction
clauses.reduction
s from theloop
to its parentteams
directive, which is what is done by this PR.