Skip to content

[flang][OpenMP] Enable lastprivate with collapse #93778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NimishMishra
Copy link
Contributor

This PR enables the lowering of lastprivate in presence of collapse clause.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels May 30, 2024
@llvmbot
Copy link
Member

llvmbot commented May 30, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (NimishMishra)

Changes

This PR enables the lowering of lastprivate in presence of collapse clause.


Full diff: https://github.com/llvm/llvm-project/pull/93778.diff

2 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+1-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+33)
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 557a9685024c5..8aa37035ddee5 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -143,7 +143,7 @@ void DataSharingProcessor::collectOmpObjectListSymbol(
 }
 
 void DataSharingProcessor::collectSymbolsForPrivatization() {
-  bool hasCollapse = false;
+
   for (const omp::Clause &clause : clauses) {
     if (const auto &privateClause =
             std::get_if<omp::clause::Private>(&clause.u)) {
@@ -157,16 +157,11 @@ void DataSharingProcessor::collectSymbolsForPrivatization() {
       const ObjectList &objects = std::get<ObjectList>(lastPrivateClause->t);
       collectOmpObjectListSymbol(objects, explicitlyPrivatizedSymbols);
       hasLastPrivateOp = true;
-    } else if (std::get_if<omp::clause::Collapse>(&clause.u)) {
-      hasCollapse = true;
     }
   }
 
   for (auto *sym : explicitlyPrivatizedSymbols)
     allPrivatizedSymbols.insert(sym);
-
-  if (hasCollapse && hasLastPrivateOp)
-    TODO(converter.getCurrentLocation(), "Collapse clause with lastprivate");
 }
 
 bool DataSharingProcessor::needBarrier() {
diff --git a/flang/test/Lower/OpenMP/wsloop-collapse.f90 b/flang/test/Lower/OpenMP/wsloop-collapse.f90
index 67351ca275efb..3e862c25a8089 100644
--- a/flang/test/Lower/OpenMP/wsloop-collapse.f90
+++ b/flang/test/Lower/OpenMP/wsloop-collapse.f90
@@ -94,3 +94,36 @@ program wsloop_collapse
 !CHECK:         return
 !CHECK:       }
 end program wsloop_collapse
+
+subroutine collapse_with_lastprivate
+!CHECK: %[[VAL_I:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFcollapse_with_lastprivateEi"}
+!CHECK: %[[VAL_I_DECLARE:.*]]:2 = hlfir.declare %[[VAL_I]] {{.*}}
+!CHECK: %[[VAL_J:.*]] = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFcollapse_with_lastprivateEj"}
+!CHECK: %[[VAL_J_DECLARE:.*]]:2 = hlfir.declare %[[VAL_J]] {{.*}}
+!CHECK: %[[VAL_K:.*]] = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFcollapse_with_lastprivateEk"}
+!CHECK: %[[VAL_K_DECLARE:.*]]:2 = hlfir.declare %[[VAL_K]] {{.}}
+!CHECK: %[[VAL_X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFcollapse_with_lastprivateEx"}
+!CHECK: %[[VAL_X_DECLARE:.*]]:2 = hlfir.declare %[[VAL_X]] {{.*}}
+
+        integer :: x, i, j, k
+
+!CHECK: omp.parallel {
+!CHECK: %[[INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFcollapse_with_lastprivateEx"}
+!CHECK: %[[INNER_X_DECLARE:.*]]:2 = hlfir.declare %[[INNER_X]] {{.*}}
+        !$omp parallel do lastprivate(x) collapse(2)
+!CHECK: omp.wsloop {
+!CHECK: omp.loop_nest (%[[ARG0:.*]], %[[ARG1:.*]]) : i32 = (%[[LB1:.*]], %[[LB2:.*]]) to (%[[UB1:.*]], %[[UB2:.*]]) inclusive step (%[[STEP1:.*]], %[[STEP2:.*]]) {
+           do i = 1, 10
+             do j = 1, 20
+!CHECK: %[[LOOP_CONTROL:.*]]:2 = fir.do_loop %[[ARG2:.*]] = {{.*}}
+               do k = 1, 30
+                 x = x + 1
+               end do
+             end do
+           end do
+!CHECK: fir.if {{.*}} {
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[INNER_X_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[LOADED_X]] to %[[VAL_X_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: }
+        !$omp end parallel do
+end subroutine

@llvmbot
Copy link
Member

llvmbot commented May 30, 2024

@llvm/pr-subscribers-flang-openmp

Author: None (NimishMishra)

Changes

This PR enables the lowering of lastprivate in presence of collapse clause.


Full diff: https://github.com/llvm/llvm-project/pull/93778.diff

2 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+1-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+33)
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 557a9685024c5..8aa37035ddee5 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -143,7 +143,7 @@ void DataSharingProcessor::collectOmpObjectListSymbol(
 }
 
 void DataSharingProcessor::collectSymbolsForPrivatization() {
-  bool hasCollapse = false;
+
   for (const omp::Clause &clause : clauses) {
     if (const auto &privateClause =
             std::get_if<omp::clause::Private>(&clause.u)) {
@@ -157,16 +157,11 @@ void DataSharingProcessor::collectSymbolsForPrivatization() {
       const ObjectList &objects = std::get<ObjectList>(lastPrivateClause->t);
       collectOmpObjectListSymbol(objects, explicitlyPrivatizedSymbols);
       hasLastPrivateOp = true;
-    } else if (std::get_if<omp::clause::Collapse>(&clause.u)) {
-      hasCollapse = true;
     }
   }
 
   for (auto *sym : explicitlyPrivatizedSymbols)
     allPrivatizedSymbols.insert(sym);
-
-  if (hasCollapse && hasLastPrivateOp)
-    TODO(converter.getCurrentLocation(), "Collapse clause with lastprivate");
 }
 
 bool DataSharingProcessor::needBarrier() {
diff --git a/flang/test/Lower/OpenMP/wsloop-collapse.f90 b/flang/test/Lower/OpenMP/wsloop-collapse.f90
index 67351ca275efb..3e862c25a8089 100644
--- a/flang/test/Lower/OpenMP/wsloop-collapse.f90
+++ b/flang/test/Lower/OpenMP/wsloop-collapse.f90
@@ -94,3 +94,36 @@ program wsloop_collapse
 !CHECK:         return
 !CHECK:       }
 end program wsloop_collapse
+
+subroutine collapse_with_lastprivate
+!CHECK: %[[VAL_I:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFcollapse_with_lastprivateEi"}
+!CHECK: %[[VAL_I_DECLARE:.*]]:2 = hlfir.declare %[[VAL_I]] {{.*}}
+!CHECK: %[[VAL_J:.*]] = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFcollapse_with_lastprivateEj"}
+!CHECK: %[[VAL_J_DECLARE:.*]]:2 = hlfir.declare %[[VAL_J]] {{.*}}
+!CHECK: %[[VAL_K:.*]] = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFcollapse_with_lastprivateEk"}
+!CHECK: %[[VAL_K_DECLARE:.*]]:2 = hlfir.declare %[[VAL_K]] {{.}}
+!CHECK: %[[VAL_X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFcollapse_with_lastprivateEx"}
+!CHECK: %[[VAL_X_DECLARE:.*]]:2 = hlfir.declare %[[VAL_X]] {{.*}}
+
+        integer :: x, i, j, k
+
+!CHECK: omp.parallel {
+!CHECK: %[[INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFcollapse_with_lastprivateEx"}
+!CHECK: %[[INNER_X_DECLARE:.*]]:2 = hlfir.declare %[[INNER_X]] {{.*}}
+        !$omp parallel do lastprivate(x) collapse(2)
+!CHECK: omp.wsloop {
+!CHECK: omp.loop_nest (%[[ARG0:.*]], %[[ARG1:.*]]) : i32 = (%[[LB1:.*]], %[[LB2:.*]]) to (%[[UB1:.*]], %[[UB2:.*]]) inclusive step (%[[STEP1:.*]], %[[STEP2:.*]]) {
+           do i = 1, 10
+             do j = 1, 20
+!CHECK: %[[LOOP_CONTROL:.*]]:2 = fir.do_loop %[[ARG2:.*]] = {{.*}}
+               do k = 1, 30
+                 x = x + 1
+               end do
+             end do
+           end do
+!CHECK: fir.if {{.*}} {
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[INNER_X_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[LOADED_X]] to %[[VAL_X_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: }
+        !$omp end parallel do
+end subroutine

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -143,7 +143,7 @@ void DataSharingProcessor::collectOmpObjectListSymbol(
}

void DataSharingProcessor::collectSymbolsForPrivatization() {
bool hasCollapse = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: an empty line at the start of a function looks wrong to me (feel free to ignore)

@kiranchandramohan kiranchandramohan requested a review from luporl May 31, 2024 16:16
@kiranchandramohan
Copy link
Contributor

Have you checked whether the condition generation for the last iteration is correct when collapse is specified?

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @kiranchandramohan mentioned, DataSharingProcessor::insertLastPrivateCompare is currently assuming collapse == 1.

While this patch may produce correct results, it would result in the lastprivate variable being assigned to on every iteration of the inner loops, when on the last iteration of the outer loop, which doesn't seem right.

However, checking all related iteration variables during many iterations would not be very efficient. IIRC, some time ago it was suggested that lastprivate handling should be moved to omp dialect, to make it possible to update the lastprivate variable in the end of the loop and avoid iteration variable comparisons.

@kiranchandramohan
Copy link
Contributor

As @kiranchandramohan mentioned, DataSharingProcessor::insertLastPrivateCompare is currently assuming collapse == 1.

While this patch may produce correct results, it would result in the lastprivate variable being assigned to on every iteration of the inner loops, when on the last iteration of the outer loop, which doesn't seem right.

However, checking all related iteration variables during many iterations would not be very efficient. IIRC, some time ago it was suggested that lastprivate handling should be moved to omp dialect, to make it possible to update the lastprivate variable in the end of the loop and avoid iteration variable comparisons.

The previous suggestion was to create an omp.lastprivate operation that takes the place of the fir.if that we insert for the lastprivate comparison. Now that @ergawy has made progress with the delayed privatization, you might be able to use the infrastructure he created to do this. The idea is to use the %.omp.is_last variable (which identifies whether this is the last iteration or not) that is given as input to the __kmpc_for_static_init_4 openmp runtime call. You can use this in the exit block of the loop and do a comparison and branch to either do the lastprivate update (if it is the last iteration) or exit. All this to be done during translation or in the OpenMP IRBuilder.

call void @__kmpc_for_static_init_4(ptr @1, i32 %2, i32 34, ptr %.omp.is_last, ptr %.omp.lb, ptr %.omp.ub, ptr %.omp.stride, i32 1, i32 1)
; ... intervening code
omp.loop.exit:                                    ; preds = %omp.inner.for.end
  call void @__kmpc_for_static_fini(ptr @1, i32 %2)
  %14 = load i32, ptr %.omp.is_last, align 4
  %15 = icmp ne i32 %14, 0
  br i1 %15, label %.omp.lastprivate.then, label %.omp.lastprivate.done

.omp.lastprivate.then:                            ; preds = %omp.loop.exit
  %16 = load i32, ptr %x2, align 4
  store i32 %16, ptr %0, align 4
  br label %.omp.lastprivate.done

.omp.lastprivate.done:                            ; preds = %.omp.lastprivate.then, %omp.loop.exit
  ret void

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, just one nit.

!CHECK: %[[LOADED_X:.*]] = fir.load %[[INNER_X_DECLARE]]#0 : !fir.ref<i32>
!CHECK: hlfir.assign %[[LOADED_X]] to %[[VAL_X_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
!CHECK: }
!$omp end parallel do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to #93786 (comment), can you add the terminators to make the region structure more obvious?

@DavidTruby
Copy link
Member

Hi, just removing the current check doesn't quite do the right thing, as the value of the lastprivate variable will be a random one of the last values of the inner loops, rather than the definite final one. To implement this we need to check that we have reached the final value of all the loop indices, not just the outer one.

I've submitted a patch with this approach here: #99500

@NimishMishra
Copy link
Contributor Author

Hi, just removing the current check doesn't quite do the right thing, as the value of the lastprivate variable will be a random one of the last values of the inner loops, rather than the definite final one. To implement this we need to check that we have reached the final value of all the loop indices, not just the outer one.

I've submitted a patch with this approach here: #99500

Hi @DavidTruby

Thanks for the PR. I see what you mean. I'll close this then.

I've an internal reprioritization for a few days; I do plan to return to working on this and implementing the delayed privatization approach, as comments here (and your PR) mention. But in the meantime, we can go ahead with your PR.

@NimishMishra
Copy link
Contributor Author

Closing as the issue is covered by #99500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants