[flang][OpenMP] Fix LASTPRIVATE for iteration variables #69773

luporl · 2023-10-20T20:28:51Z

Iteration variables behave slightly different with LASTPRIVATE,
as they must be assigned the value that the copy would have after
sequential execution of the loop. This case is now handled.

This patch also fixes LASTPRIVATE for loops where the step is
different from 1.

Fixes #64055

Iteration variables behave slightly different with LASTPRIVATE, as they must be assigned the value that the copy would have after sequential execution of the loop. This case is now handled. This patch also fixes LASTPRIVATE for loops where the step is different from 1. Fixes llvm#64055

llvmbot · 2023-10-20T20:36:20Z

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Leandro Lupori (luporl)

Changes

Iteration variables behave slightly different with LASTPRIVATE,
as they must be assigned the value that the copy would have after
sequential execution of the loop. This case is now handled.

This patch also fixes LASTPRIVATE for loops where the step is
different from 1.

Fixes #64055

Full diff: https://github.com/llvm/llvm-project/pull/69773.diff

5 Files Affected:

(modified) flang/lib/Lower/OpenMP.cpp (+43-15)
(modified) flang/test/Lower/OpenMP/FIR/lastprivate-commonblock.f90 (+12-6)
(modified) flang/test/Lower/OpenMP/FIR/parallel-lastprivate-clause-scalar.f90 (+48-12)
(modified) flang/test/Lower/OpenMP/lastprivate-commonblock.f90 (+7-1)
(added) flang/test/Lower/OpenMP/lastprivate-iv.f90 (+66)

diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 5f5e968eaaa6414..91250fbaf42d6dd 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -109,6 +109,7 @@ class DataSharingProcessor {
   bool hasLastPrivateOp;
   mlir::OpBuilder::InsertPoint lastPrivIP;
   mlir::OpBuilder::InsertPoint insPt;
+  mlir::Value loopIV;
   // Symbols in private, firstprivate, and/or lastprivate clauses.
   llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
   llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols;
@@ -157,6 +158,11 @@ class DataSharingProcessor {
   // dealocation code as well.
   void processStep1();
   void processStep2(mlir::Operation *op, bool isLoop);
+
+  void setLoopIV(mlir::Value iv) {
+    assert(!loopIV && "Loop iteration variable already set");
+    loopIV = iv;
+  }
 };
 
 void DataSharingProcessor::processStep1() {
@@ -270,7 +276,6 @@ void DataSharingProcessor ::insertBarrier() {
 }
 
 void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
-  mlir::arith::CmpIOp cmpOp;
   bool cmpCreated = false;
   mlir::OpBuilder::InsertPoint localInsPt = firOpBuilder.saveInsertionPoint();
   for (const Fortran::parser::OmpClause &clause : opClauseList.v) {
@@ -349,18 +354,17 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
           }
         }
       } else if (mlir::isa<mlir::omp::WsLoopOp>(op)) {
-        mlir::Operation *lastOper = op->getRegion(0).back().getTerminator();
-        firOpBuilder.setInsertionPoint(lastOper);
-
         // Update the original variable just before exiting the worksharing
         // loop. Conversion as follows:
         //
         //                       omp.wsloop {
         // omp.wsloop {            ...
         //    ...                  store
-        //    store       ===>     %cmp = llvm.icmp "eq" %iv %ub
-        //    omp.yield            fir.if %cmp {
-        // }                         ^%lpv_update_blk:
+        //    store       ===>     %v = arith.addi %iv, %step
+        //    omp.yield            %cmp = %step < 0 ? %v < %ub : %v > %ub
+        // }                       fir.if %cmp {
+        //                           fir.store %v to %loopIV
+        //                           ^%lpv_update_blk:
         //                         }
         //                         omp.yield
         //                       }
@@ -368,15 +372,37 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
 
         // Only generate the compare once in presence of multiple LastPrivate
         // clauses.
-        if (!cmpCreated) {
-          cmpOp = firOpBuilder.create<mlir::arith::CmpIOp>(
-              op->getLoc(), mlir::arith::CmpIPredicate::eq,
-              op->getRegion(0).front().getArguments()[0],
-              mlir::dyn_cast<mlir::omp::WsLoopOp>(op).getUpperBound()[0]);
-        }
-        auto ifOp =
-            firOpBuilder.create<fir::IfOp>(op->getLoc(), cmpOp, /*else*/ false);
+        if (cmpCreated)
+          continue;
+        cmpCreated = true;
+
+        mlir::Location loc = op->getLoc();
+        mlir::Operation *lastOper = op->getRegion(0).back().getTerminator();
+        firOpBuilder.setInsertionPoint(lastOper);
+
+        mlir::Value iv = op->getRegion(0).front().getArguments()[0];
+        mlir::Value ub =
+            mlir::dyn_cast<mlir::omp::WsLoopOp>(op).getUpperBound()[0];
+        mlir::Value step = mlir::dyn_cast<mlir::omp::WsLoopOp>(op).getStep()[0];
+
+        // v = iv + step
+        // cmp = step < 0 ? v < ub : v > ub
+        mlir::Value v = firOpBuilder.create<mlir::arith::AddIOp>(loc, iv, step);
+        mlir::Value zero =
+            firOpBuilder.createIntegerConstant(loc, step.getType(), 0);
+        mlir::Value negativeStep = firOpBuilder.create<mlir::arith::CmpIOp>(
+            loc, mlir::arith::CmpIPredicate::slt, step, zero);
+        mlir::Value vLT = firOpBuilder.create<mlir::arith::CmpIOp>(
+            loc, mlir::arith::CmpIPredicate::slt, v, ub);
+        mlir::Value vGT = firOpBuilder.create<mlir::arith::CmpIOp>(
+            loc, mlir::arith::CmpIPredicate::sgt, v, ub);
+        mlir::Value cmpOp = firOpBuilder.create<mlir::arith::SelectOp>(
+            loc, negativeStep, vLT, vGT);
+
+        auto ifOp = firOpBuilder.create<fir::IfOp>(loc, cmpOp, /*else*/ false);
         firOpBuilder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+        assert(loopIV && "loopIV was not set");
+        firOpBuilder.create<fir::StoreOp>(op->getLoc(), v, loopIV);
         lastPrivIP = firOpBuilder.saveInsertionPoint();
       } else {
         TODO(converter.getCurrentLocation(),
@@ -2128,6 +2154,8 @@ static void createBodyOfOp(
       proc.processStep1();
       proc.processStep2(op, is_loop);
     } else {
+      if (is_loop && args.size() > 0)
+        dsp->setLoopIV(converter.getSymbolAddress(*args[0]));
       dsp->processStep2(op, is_loop);
     }
 
diff --git a/flang/test/Lower/OpenMP/FIR/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/FIR/lastprivate-commonblock.f90
index 06f3e1ca82234ee..7d2118305fb4c49 100644
--- a/flang/test/Lower/OpenMP/FIR/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/FIR/lastprivate-commonblock.f90
@@ -19,12 +19,18 @@
 !CHECK: %[[val_c1_i32_0:.*]] = arith.constant 1 : i32
 !CHECK: omp.wsloop   for (%[[arg:.*]]) : i32 = (%[[val_c1_i32]]) to (%[[val_c100_i32]]) inclusive step (%[[val_c1_i32_0]]) {
 !CHECK: fir.store %[[arg]] to %[[val_0]] : !fir.ref<i32>
-!CHECK: %[[val_11:.*]] = arith.cmpi eq, %[[arg]], %[[val_c100_i32]] : i32
-!CHECK: fir.if %[[val_11]] {
-!CHECK: %[[val_12:.*]] = fir.load %[[val_9]] : !fir.ref<f32>
-!CHECK: fir.store %[[val_12]] to %[[val_5]] : !fir.ref<f32>
-!CHECK: %[[val_13:.*]] = fir.load %[[val_10]] : !fir.ref<f32>
-!CHECK: fir.store %[[val_13]] to %[[val_8]] : !fir.ref<f32>
+!CHECK: %[[val_11:.*]] = arith.addi %[[arg]], %[[val_c1_i32_0]] : i32
+!CHECK: %[[val_c0_i32:.*]] = arith.constant 0 : i32
+!CHECK: %[[val_12:.*]] = arith.cmpi slt, %[[val_c1_i32_0]], %[[val_c0_i32]] : i32
+!CHECK: %[[val_13:.*]] = arith.cmpi slt, %[[val_11]], %[[val_c100_i32]] : i32
+!CHECK: %[[val_14:.*]] = arith.cmpi sgt, %[[val_11]], %[[val_c100_i32]] : i32
+!CHECK: %[[val_15:.*]] = arith.select %[[val_12]], %[[val_13]], %[[val_14]] : i1
+!CHECK: fir.if %[[val_15]] {
+!CHECK: fir.store %[[val_11]] to %[[val_0]] : !fir.ref<i32>
+!CHECK: %[[val_16:.*]] = fir.load %[[val_9]] : !fir.ref<f32>
+!CHECK: fir.store %[[val_16]] to %[[val_5]] : !fir.ref<f32>
+!CHECK: %[[val_17:.*]] = fir.load %[[val_10]] : !fir.ref<f32>
+!CHECK: fir.store %[[val_17]] to %[[val_8]] : !fir.ref<f32>
 !CHECK: }
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/FIR/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/FIR/parallel-lastprivate-clause-scalar.f90
index aa95c0712fbb88a..67e24088c533867 100644
--- a/flang/test/Lower/OpenMP/FIR/parallel-lastprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/FIR/parallel-lastprivate-clause-scalar.f90
@@ -1,4 +1,4 @@
-! This test checks lowering of `FIRSTPRIVATE` clause for scalar types.
+! This test checks lowering of `LASTPRIVATE` clause for scalar types.
 
 ! RUN: bbc -fopenmp -emit-fir %s -o - | FileCheck %s
 ! RUN: flang-new -fc1 -fopenmp -emit-fir %s -o - | FileCheck %s
@@ -24,8 +24,14 @@
 !CHECK-NEXT: %[[CALL_END_IO:.*]] = fir.call @_FortranAioEndIoStatement(%[[CALL_BEGIN_IO]])
 
 ! Testing last iteration check
-!CHECK-NEXT: %[[IV_CMP:.*]] = arith.cmpi eq, %[[INDX_WS]]
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
 !CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 
 ! Testing lastprivate val update
 !CHECK-DAG: %[[CVT:.*]] = fir.convert %[[ARG1_REF]] : (!fir.ref<!fir.char<1,5>>) -> !fir.ref<i8>
@@ -52,8 +58,14 @@ subroutine lastprivate_character(arg1)
 !CHECK: omp.wsloop for (%[[INDX_WS:.*]]) : {{.*}} {
 
 ! Testing last iteration check
-!CHECK-DAG: %[[IV_CMP:.*]] = arith.cmpi eq, %[[INDX_WS]]
-!CHECK-DAG: fir.if %[[IV_CMP]] {
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
+!CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 
 ! Testing lastprivate val update
 !CHECK-NEXT: %[[CLONE_LD:.*]] = fir.load %[[CLONE]] : !fir.ref<i32>
@@ -81,8 +93,14 @@ subroutine lastprivate_int(arg1)
 !CHECK: omp.wsloop for (%[[INDX_WS:.*]]) : {{.*}} {
 
 ! Testing last iteration check
-!CHECK: %[[IV_CMP1:.*]] = arith.cmpi eq, %[[INDX_WS]]
-!CHECK-NEXT: fir.if %[[IV_CMP1]] {
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
+!CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 ! Testing lastprivate val update
 !CHECK-DAG: %[[CLONE_LD1:.*]] = fir.load %[[CLONE1]] : !fir.ref<i32>
 !CHECK-DAG: fir.store %[[CLONE_LD1]] to %[[ARG1]] : !fir.ref<i32>
@@ -112,8 +130,14 @@ subroutine mult_lastprivate_int(arg1, arg2)
 !CHECK: omp.wsloop for (%[[INDX_WS:.*]]) : {{.*}} {
 
 !Testing last iteration check
-!CHECK: %[[IV_CMP1:.*]] = arith.cmpi eq, %[[INDX_WS]]
-!CHECK-NEXT: fir.if %[[IV_CMP1]] {
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
+!CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 !Testing lastprivate val update
 !CHECK-DAG: %[[CLONE_LD2:.*]] = fir.load %[[CLONE2]] : !fir.ref<i32>
 !CHECK-DAG: fir.store %[[CLONE_LD2]] to %[[ARG2]] : !fir.ref<i32>
@@ -148,8 +172,14 @@ subroutine mult_lastprivate_int2(arg1, arg2)
 !CHECK: omp.wsloop for (%[[INDX_WS:.*]]) : {{.*}} {
 
 ! Testing last iteration check
-!CHECK: %[[IV_CMP1:.*]] = arith.cmpi eq, %[[INDX_WS]]
-!CHECK-NEXT: fir.if %[[IV_CMP1]] {
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
+!CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 ! Testing lastprivate val update
 !CHECK-NEXT: %[[CLONE_LD:.*]] = fir.load %[[CLONE2]] : !fir.ref<i32>
 !CHECK-NEXT: fir.store %[[CLONE_LD]] to %[[ARG2]] : !fir.ref<i32>
@@ -179,8 +209,14 @@ subroutine firstpriv_lastpriv_int(arg1, arg2)
 !CHECK-NEXT: omp.barrier
 !CHECK: omp.wsloop for (%[[INDX_WS:.*]]) : {{.*}} {
 ! Testing last iteration check
-!CHECK: %[[IV_CMP1:.*]] = arith.cmpi eq, %[[INDX_WS]]
-!CHECK-NEXT: fir.if %[[IV_CMP1]] {
+!CHECK: %[[V:.*]] = arith.addi %[[INDX_WS]], %{{.*}} : i32
+!CHECK: %[[C0:.*]] = arith.constant 0 : i32
+!CHECK: %[[T1:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK: %[[T2:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK: %[[T3:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK: %[[IV_CMP:.*]] = arith.select %[[T1]], %[[T2]], %[[T3]] : i1
+!CHECK: fir.if %[[IV_CMP]] {
+!CHECK: fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 ! Testing lastprivate val update
 !CHECK-NEXT: %[[CLONE_LD:.*]] = fir.load %[[CLONE1]] : !fir.ref<i32>
 !CHECK-NEXT: fir.store %[[CLONE_LD]] to %[[ARG1]] : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
index 06fa0a12107763f..a11bdee156637b0 100644
--- a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
@@ -16,8 +16,14 @@
 !CHECK:    %[[PRIVATE_Y_REF:.*]] = fir.alloca f32 {bindc_name = "y", pinned, uniq_name = "_QFlastprivate_commonEy"}
 !CHECK:    %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y_REF]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
 !CHECK:    omp.wsloop   for  (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
-!CHECK:      %[[LAST_ITER:.*]] = arith.cmpi eq, %[[I]], %{{.*}} : i32
+!CHECK:      %[[V:.*]] = arith.addi %[[I]], %{{.*}} : i32
+!CHECK:      %[[C0:.*]] = arith.constant 0 : i32
+!CHECK:      %[[NEG_STEP:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
+!CHECK:      %[[V_LT:.*]] = arith.cmpi slt, %[[V]], %{{.*}} : i32
+!CHECK:      %[[V_GT:.*]] = arith.cmpi sgt, %[[V]], %{{.*}} : i32
+!CHECK:      %[[LAST_ITER:.*]] = arith.select %[[NEG_STEP]], %[[V_LT]], %[[V_GT]] : i1
 !CHECK:      fir.if %[[LAST_ITER]] {
+!CHECK:        fir.store %[[V]] to %{{.*}} : !fir.ref<i32>
 !CHECK:        %[[PRIVATE_X_VAL:.*]] = fir.load %[[PRIVATE_X_DECL]]#0 : !fir.ref<f32>
 !CHECK:        hlfir.assign %[[PRIVATE_X_VAL]] to %[[X_DECL]]#0 temporary_lhs : f32, !fir.ref<f32>
 !CHECK:        %[[PRIVATE_Y_VAL:.*]] = fir.load %[[PRIVATE_Y_DECL]]#0 : !fir.ref<f32>
diff --git a/flang/test/Lower/OpenMP/lastprivate-iv.f90 b/flang/test/Lower/OpenMP/lastprivate-iv.f90
new file mode 100644
index 000000000000000..70fe500129d128f
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-iv.f90
@@ -0,0 +1,66 @@
+! Test LASTPRIVATE with iteration variable.
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+!CHECK-LABEL: func @_QPlastprivate_iv_inc
+!CHECK:    %[[I_MEM:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_incEi"}
+!CHECK:    %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    %[[LB:.*]] = arith.constant 4 : i32
+!CHECK:    %[[UB:.*]] = arith.constant 10 : i32
+!CHECK:    %[[STEP:.*]]  = arith.constant 3 : i32
+!CHECK:    omp.wsloop for  (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:      fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
+!CHECK:      %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
+!CHECK:      %[[C0:.*]] = arith.constant 0 : i32
+!CHECK:      %[[STEP_NEG:.*]] = arith.cmpi slt, %[[STEP]], %[[C0]] : i32
+!CHECK:      %[[V_LT:.*]] = arith.cmpi slt, %[[V]], %[[UB]] : i32
+!CHECK:      %[[V_GT:.*]] = arith.cmpi sgt, %[[V]], %[[UB]] : i32
+!CHECK:      %[[CMP:.*]] = arith.select %[[STEP_NEG]], %[[V_LT]], %[[V_GT]] : i1
+!CHECK:      fir.if %[[CMP]] {
+!CHECK:        fir.store %[[V]] to %[[I]]#1 : !fir.ref<i32>
+!CHECK:        %[[I_VAL:.*]] = fir.load %[[I]]#0 : !fir.ref<i32>
+!CHECK:        hlfir.assign %[[I_VAL]] to %[[I2]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK:      }
+!CHECK:      omp.yield
+!CHECK:    }
+subroutine lastprivate_iv_inc()
+  integer :: i
+
+  !$omp do lastprivate(i)
+  do i = 4, 10, 3
+  end do
+  !$omp end do
+end subroutine
+
+!CHECK-LABEL: func @_QPlastprivate_iv_dec
+!CHECK:    %[[I_MEM:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK:    %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_decEi"}
+!CHECK:    %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK:    %[[LB:.*]] = arith.constant 10 : i32
+!CHECK:    %[[UB:.*]] = arith.constant 1 : i32
+!CHECK:    %[[STEP:.*]]  = arith.constant -3 : i32
+!CHECK:    omp.wsloop for  (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:      fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
+!CHECK:      %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
+!CHECK:      %[[C0:.*]] = arith.constant 0 : i32
+!CHECK:      %[[STEP_NEG:.*]] = arith.cmpi slt, %[[STEP]], %[[C0]] : i32
+!CHECK:      %[[V_LT:.*]] = arith.cmpi slt, %[[V]], %[[UB]] : i32
+!CHECK:      %[[V_GT:.*]] = arith.cmpi sgt, %[[V]], %[[UB]] : i32
+!CHECK:      %[[CMP:.*]] = arith.select %[[STEP_NEG]], %[[V_LT]], %[[V_GT]] : i1
+!CHECK:      fir.if %[[CMP]] {
+!CHECK:        fir.store %[[V]] to %[[I]]#1 : !fir.ref<i32>
+!CHECK:        %[[I_VAL:.*]] = fir.load %[[I]]#0 : !fir.ref<i32>
+!CHECK:        hlfir.assign %[[I_VAL]] to %[[I2]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK:      }
+!CHECK:      omp.yield
+!CHECK:    }
+subroutine lastprivate_iv_dec()
+  integer :: i
+
+  !$omp do lastprivate(i)
+  do i = 10, 1, -3
+  end do
+  !$omp end do
+end subroutine

NimishMishra

Thank you for the patch!

If I understand correctly, the previous implementation was generating the fir.if depending on a equality check between v and loopIV. As such, the final update to the iteration variable was being missed.

I'm ok with the approach taken here. Basically, we force the iteration variable to advance by step, and then check if it crosses the upperbound ub. Depending on whether step is positive or negative, a > or a < predicate is needed for comparison. As such, the ternary select operation seems fine.

I've just one point to raise (@kiranchandramohan can comment better). Through this approach, we have introduced a mandatory set of 1 arith.addi, 3 arith.cmpi, and 1 arith.select in every iteration of the loop. Again, not very sure about this, but can this have a bearing on performance, since loops are extensively present across benchmarks we use for validation?

Not analysed this in depth, but could the following work?

%cmp = arith.cmpi eq, %u, %iv
fir.if %cmp {
    %temp = arith.addi %arg0, %step
     fir.store %temp to %iv
}

Basically, we let the current flow (generating if on u = ub) be, and add the extra operations within the fir.if? Would it cause some problems I have missed? I expect if the loop is well formed (say both ub and step are negative), then also the equality condition should work.

NimishMishra · 2023-10-23T04:34:18Z

flang/lib/Lower/OpenMP.cpp

+        //    store       ===>     %v = arith.addi %iv, %step
+        //    omp.yield            %cmp = %step < 0 ? %v < %ub : %v > %ub
+        // }                       fir.if %cmp {
+        //                           fir.store %v to %loopIV


Nit: An additional fit.store here is redundant. We expect all last private operations to be abstracted within ^%lpv_update_blk

The fir.store here is needed. Considering the example of the previous conversation, loopIV is updated in the beginning of the loop body only, to have the same value as iv, but we need it to hold v instead (iv + step). Also, ^%lpv_update_blk copies loopIV to ia and not iv to loopIV. In the end this store gets optimized, because ^%lpv_update_blk generates a load loopIV right after it.

kiranchandramohan

Thanks @luporl for the fix. Looks fine to me.

Is the update of the original loop variable unconditionally done now? Should it be only if lastprivate(loop_variable) is present? Or is the point that the loop_variable is always private and hence the value of the original variable is undefined if it is not lastprivate and hence it is OK to update always?

I've just one point to raise (@kiranchandramohan can comment better). Through this approach, we have introduced a mandatory set of 1 arith.addi, 3 arith.cmpi, and 1 arith.select in every iteration of the loop. Again, not very sure about this, but can this have a bearing on performance, since loops are extensively present across benchmarks we use for validation?

Please refer to https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526/4. The kmpc_for_static_init calls provide a value called %.omp.is_last that can be used to check if this is the last iteration. And updates can be placed by checking this value on the last iteration in the exiting path of the loop. One way to achieve this is by having a new operation in the dialect (in-place of the fir.if and related comparisons) say, omp.last_private_loop_update, that takes in the loop_variable, step-size as arguments. This can be extracted out and put in the exit path of the loop. Other alternatives include just modelling the lastprivate variables as part of the omp.wsloop construct and appropriately lowering them with the correct conditions and updates only during lowering to LLVM IR.

Please wait for @NimishMishra.

luporl · 2023-10-23T17:45:30Z

Thank you for the review @NimishMishra!

If I understand correctly, the previous implementation was generating the fir.if depending on a equality check between v and loopIV. As such, the final update to the iteration variable was being missed.

Not exactly. Consider this code snippet:

integer :: ia
ia = 0
!$omp parallel do lastprivate(ia)
do ia = 1, 10
enddo
!$omp end parallel do

In the generated FIR, there will be 3 versions of the ia variable:

%ia = fir.alloca i32 {bindc_name = "ia", uniq_name = "_QFEia"}
omp.parallel {
    %loopIV = fir.alloca i32 {adapt.valuebyref, pinned}
    omp.wsloop for  (%iv) : i32 = (%lb) to (%ub) inclusive step (%step) {
    }
}

%ia - the original variable
%loopIV - this is the private copy of %ia
%iv - this is the iteration variable created and managed by omp.wsloop

So, the previous implementation was actually checking for equality between iv and ub. The problem is that iv goes from lb to ub, but LASTPRIVATE expects ia to be updated with iv + step, where iv's value is that of the last loop iteration. v was added by this patch and is just the result of iv + step.

I've just one point to raise (@kiranchandramohan can comment better). Through this approach, we have introduced a mandatory set of 1 arith.addi, 3 arith.cmpi, and 1 arith.select in every iteration of the loop. Again, not very sure about this, but can this have a bearing on performance, since loops are extensively present across benchmarks we use for validation?

It is true that now we need to perform more operations to find out when we are in the last loop iteration. However, these are added just when LASTPRIVATE is used and is optimized to 2 instructions when lb, ub and step are
constant. In the example above, it becomes:

%v = add i32 %iv, %step
%cmp = icmp sgt i32 %v, 10

But when all of the loop parameters are variables, then I guess it is not always possible to optimize it.

Not analysed this in depth, but could the following work?
%cmp = arith.cmpi eq, %u, %iv
fir.if %cmp {
    %temp = arith.addi %arg0, %step
     fir.store %temp to %iv
}
Basically, we let the current flow (generating if on u = ub) be, and add the extra operations within the fir.if? Would it cause some problems I have missed? I expect if the loop is well formed (say both ub and step are negative), then also the equality condition should work.

This works only when step is 1 or -1. Consider this loop:

ia = 0
!$omp do lastprivate(ia)
do ia = 1, 10, 4
enddo
!$omp end do

iv will have the following values: 1, 5 and 9. After exiting the loop, ia must be updated with 13. With this approach (and without this patch), iv will never be equal to ub, resulting in ia not being updated and remaining with value 0.

NimishMishra · 2023-10-23T17:50:06Z

Thank you for the detailed response. I'm happy with the current changes. Please go ahead with the merge.

luporl · 2023-10-23T18:17:26Z

Thanks for the review @kiranchandramohan!

Is the update of the original loop variable unconditionally done now? Should it be only if lastprivate(loop_variable) is present? Or is the point that the loop_variable is always private and hence the value of the original variable is undefined if it is not lastprivate and hence it is OK to update always?

The update of the original loop variable is still performed only if lastprivate(loop_variable) is present. But the private copy is now being updated, in the last iteration, whenever a lastprivate clause is present, even if it doesn't include the loop variable. Do you think it's better to omit the store in this case? I've noticed that this store (and the load after it, for loop variables) is usually optimized out, when llvm bitcode is generated, and only the store to the original variable remains.

I've just one point to raise (@kiranchandramohan can comment better). Through this approach, we have introduced a mandatory set of 1 arith.addi, 3 arith.cmpi, and 1 arith.select in every iteration of the loop. Again, not very sure about this, but can this have a bearing on performance, since loops are extensively present across benchmarks we use for validation?

Please refer to https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526/4. The kmpc_for_static_init calls provide a value called %.omp.is_last that can be used to check if this is the last iteration. And updates can be placed by checking this value on the last iteration in the exiting path of the loop. One way to achieve this is by having a new operation in the dialect (in-place of the fir.if and related comparisons) say, omp.last_private_loop_update, that takes in the loop_variable, step-size as arguments. This can be extracted out and put in the exit path of the loop. Other alternatives include just modelling the lastprivate variables as part of the omp.wsloop construct and appropriately lowering them with the correct conditions and updates only during lowering to LLVM IR.

If I understand correctly, this is a suggested optimization for a future patch, correct?
This would be especially useful when support for lastprivate with collapse is added. Otherwise, we would need to test the loop variables of each nested loop.

kiranchandramohan · 2023-10-23T20:58:04Z

The update of the original loop variable is still performed only if lastprivate(loop_variable) is present. But the private copy is now being updated, in the last iteration, whenever a lastprivate clause is present, even if it doesn't include the loop variable. Do you think it's better to omit the store in this case? I've noticed that this store (and the load after it, for loop variables) is usually optimized out, when llvm bitcode is generated, and only the store to the original variable remains.

OK. That is fine.

If I understand correctly, this is a suggested optimization for a future patch, correct?
This would be especially useful when support for lastprivate with collapse is added. Otherwise, we would need to test the loop variables of each nested loop.

Yes, it is something for future work.

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Oct 20, 2023

luporl requested a review from kiranchandramohan October 20, 2023 20:31

luporl requested a review from NimishMishra October 20, 2023 20:36

luporl mentioned this pull request Oct 20, 2023

[Flang][OpenMP]Incorrect execution result of LASTPRIVATE clause specified in DO construct #64055

Closed

NimishMishra reviewed Oct 23, 2023

View reviewed changes

kiranchandramohan approved these changes Oct 23, 2023

View reviewed changes

luporl merged commit ed97932 into llvm:main Oct 24, 2023

luporl deleted the luporl-lastpriv branch October 24, 2023 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][OpenMP] Fix LASTPRIVATE for iteration variables #69773

[flang][OpenMP] Fix LASTPRIVATE for iteration variables #69773

Uh oh!

luporl commented Oct 20, 2023

Uh oh!

llvmbot commented Oct 20, 2023 •

edited

Loading

Uh oh!

NimishMishra left a comment

Uh oh!

NimishMishra Oct 23, 2023

Uh oh!

luporl Oct 23, 2023

Uh oh!

kiranchandramohan left a comment

Uh oh!

luporl commented Oct 23, 2023

Uh oh!

NimishMishra commented Oct 23, 2023

Uh oh!

luporl commented Oct 23, 2023 •

edited

Loading

Uh oh!

kiranchandramohan commented Oct 23, 2023

Uh oh!

Uh oh!

[flang][OpenMP] Fix LASTPRIVATE for iteration variables #69773

[flang][OpenMP] Fix LASTPRIVATE for iteration variables #69773

Uh oh!

Conversation

luporl commented Oct 20, 2023

Uh oh!

llvmbot commented Oct 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NimishMishra left a comment

Choose a reason for hiding this comment

Uh oh!

NimishMishra Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

luporl Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

luporl commented Oct 23, 2023

Uh oh!

NimishMishra commented Oct 23, 2023

Uh oh!

luporl commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranchandramohan commented Oct 23, 2023

Uh oh!

Uh oh!

llvmbot commented Oct 20, 2023 •

edited

Loading

luporl commented Oct 23, 2023 •

edited

Loading