Skip to content

[flang][OpenMP] Move reductions from loop to teams when loop is mapped to distribute #132920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 4, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 25, 2025

Follow-up to #132003, in particular, see #132003 (comment).

This PR extends reduction support for loop directives. Consider the following scenario:

subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine

Note the following:

  • According to the spec, the reduction clause will be attached to loop during earlier stages in the compiler.
  • Additionally, loop cannot be mapped to distribute parallel for due to the call to a foreign function inside the loop's body.
  • Therefore, loop must be mapped to distribute.
  • However, distribute does not have reduction clauses.
  • As a result, we have to move the reductions from the loop to its parent teams directive, which is what is done by this PR.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Kareem Ergawy (ergawy)

Changes

Follow-up to #132003, in particular, see
#132003 (comment).

This PR extends reduction support for loop directives. Consider the following scenario:

subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine

Note the following:

  • According to the spec, the reduction clause will be attached to loop during earlier stages in the compiler.
  • Additionally, loop cannot be mapped to distribute parallel for due to the call to a foreign function inside the loop's body.
  • Therefore, loop must be mapped to distribute.
  • However, distribute does not have reduction clauses.
  • As a result, we have to move the reductions from the loop to its parent teams directive, which is what is done by this PR.

Full diff: https://github.com/llvm/llvm-project/pull/132920.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp (+29-1)
  • (modified) flang/test/Lower/OpenMP/loop-directive.f90 (+37)
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index 74ad6330b11a7..8858de075d193 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -59,8 +59,36 @@ class GenericLoopConversionPattern
     case GenericLoopCombinedInfo::TeamsLoop:
       if (teamsLoopCanBeParallelFor(loopOp))
         rewriteToDistributeParallelDo(loopOp, rewriter);
-      else
+      else {
+        auto teamsOp = llvm::cast<mlir::omp::TeamsOp>(loopOp->getParentOp());
+        auto teamsBlockArgIface =
+            llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*teamsOp);
+        auto loopBlockArgIface =
+            llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*loopOp);
+
+        for (unsigned i = 0; i < loopBlockArgIface.numReductionBlockArgs();
+             ++i) {
+          mlir::BlockArgument loopRedBlockArg =
+              loopBlockArgIface.getReductionBlockArgs()[i];
+          mlir::BlockArgument teamsRedBlockArg =
+              teamsBlockArgIface.getReductionBlockArgs()[i];
+          rewriter.replaceAllUsesWith(loopRedBlockArg, teamsRedBlockArg);
+        }
+
+        for (unsigned i = 0; i < loopBlockArgIface.numReductionBlockArgs();
+             ++i) {
+          loopOp.getRegion().eraseArgument(
+              loopBlockArgIface.getReductionBlockArgsStart());
+        }
+
+        loopOp.removeReductionModAttr();
+        loopOp.getReductionVarsMutable().clear();
+        loopOp.removeReductionByrefAttr();
+        loopOp.removeReductionSymsAttr();
+
         rewriteToDistribute(loopOp, rewriter);
+      }
+
       break;
     }
 
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 954985e2d64f1..44337ddfbcb12 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -358,3 +358,40 @@ subroutine multi_block_teams
   end select
   !$omp end target teams
 end subroutine
+
+
+! Verifies that reductions are hoisted to the parent `teams` directive and removed
+! from the `loop` dreictive when `loop` is mapped to `distribute`.
+
+! CHECK-LABEL: func.func @_QPteams_loop_cannot_be_parallel_for_with_reductions
+subroutine teams_loop_cannot_be_parallel_for_with_reductions
+  implicit none
+  integer :: x, y, i, p
+
+  ! CHECK: %[[ADD_RED:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QF{{.*}}Ex"}
+  ! CHECK: %[[MUL_RED:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QF{{.*}}Ey"}
+  ! CHECK: omp.teams reduction(
+  ! CHECK-SAME:  @add_reduction_i32 %[[ADD_RED]]#0 -> %[[ADD_RED_ARG:[^[:space:]]*]], 
+  ! CHECK-SAME:  @multiply_reduction_i32 %[[MUL_RED]]#0 -> %[[MUL_RED_ARG:.*]] : {{.*}}) {
+
+  ! CHECK:       omp.distribute private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %{{.*}} : {{.*}}) {
+  ! CHECK:         %[[ADD_RED_DECL:.*]]:2 = hlfir.declare %[[ADD_RED_ARG]] {uniq_name = "_QF{{.*}}Ex"}
+  ! CHECK:         %[[MUL_RED_DECL:.*]]:2 = hlfir.declare %[[MUL_RED_ARG]] {uniq_name = "_QF{{.*}}Ey"}
+
+  ! CHECK:         %[[ADD_RES:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
+  ! CHECK:         hlfir.assign %[[ADD_RES]] to %[[ADD_RED_DECL]]#0 : i32, !fir.ref<i32>
+
+  ! CHECK:         %[[MUL_RES:.*]] = arith.muli %{{.*}}, %{{.*}} : i32
+  ! CHECK:         hlfir.assign %[[MUL_RES]] to %[[MUL_RED_DECL]]#0 : i32, !fir.ref<i32>
+  ! CHECK:         omp.yield
+  ! CHECK:       }
+  ! CHECK:       omp.terminator
+  ! CHECK: }
+  !$omp teams loop reduction(+: x) reduction(*: y) private(p)
+  DO i = 1, 5
+    call foo()
+    x = x + i
+    y = y * i
+    p = 42
+  END DO
+end subroutine

@ergawy ergawy changed the title [flang][OpenMP] Move reductions loop to teams when loop is mapped to distribute [flang][OpenMP] Move reductions from loop to teams when loop is mapped to distribute Mar 25, 2025
Copy link
Contributor

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some nitpicking left.

! CHECK: omp.terminator
! CHECK: }
!$omp teams loop reduction(+: x) reduction(*: y) private(p)
DO i = 1, 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DO i = 1, 5
do i = 1, 5

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

ergawy added 2 commits April 3, 2025 22:57
…ed to `distribute`

Follow-up to #132003, in particular, see
#132003 (comment).

This PR extends reduction support for `loop` directives. Consider the
following scenario:
```fortran
subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine
```
Note the following:
* According to the spec, the `reduction` clause will be attached to
  `loop` during earlier stages in the compiler.
* Additionally, `loop` cannot be mapped to `distribute parallel for`
  due to the call to a foreign function inside the loop's body.
* Therefore, `loop` must be mapped to `distribute`.
* However, `distribute` does not have `reduction` clauses.
* As a result, we have to move the `reduction`s from the `loop` to its
  parent `teams` directive, which is what is done by this PR.
@ergawy ergawy force-pushed the users/ergawy/loop_to_distribute_reduction branch from 008ca28 to d368f44 Compare April 4, 2025 04:02
@ergawy ergawy merged commit 6333f84 into main Apr 4, 2025
11 checks passed
@ergawy ergawy deleted the users/ergawy/loop_to_distribute_reduction branch April 4, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants