Skip to content

[OpenMP 6.0 ]Codegen for Reduction over private variables with reduction clause #134709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 11, 2025

Conversation

chandraghale
Copy link
Contributor

@chandraghale chandraghale commented Apr 7, 2025

Codegen support for reduction over private variable with reduction clause. Section 7.6.10 in in OpenMP 6.0 spec.

  • An internal shared copy is initialized with an initializer value.
  • The shared copy is updated by combining its value with the values from the private copies created by the clause.
  • Once an encountering thread verifies that all updates are complete, its original list item is updated by merging its value with that of the shared copy and then broadcast to all threads.

Sample Test Case from OpenMP 6.0 Example

#include <assert.h>
#include <omp.h>
#define N 10

void do_red(int n, int *v, int &sum_v)
{
    sum_v = 0; // sum_v is private
    #pragma omp for reduction(original(private),+: sum_v)
    for (int i = 0; i < n; i++) 
    {
        sum_v += v[i];
    }
}

int main(void)
{
    int v[N];
    for (int i = 0; i < N; i++)
        v[i] = i;
    #pragma omp parallel num_threads(4)
    {
        int s_v; // s_v is private
        do_red(N, v, s_v);
        assert(s_v == 45);
    }
    return 0;
}

Expected Codegen:

 // A shared global/static variable is introduced for the reduction result.
 // This variable is initialized (e.g., using memset or a UDR initializer)
 // e.g., .omp.reduction.internal_private_var

 // Barrier before any thread performs combination
  call void @__kmpc_barrier(...)

 // Initialization block (executed by thread 0)
 // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)

  call void @__kmpc_critical(...)
    // Inside critical section:
    // Load the current value from the shared variable
    // Load the thread-local private variable's value
    // Perform the reduction operation 
    // Store the result back to the shared variable

  call void @__kmpc_end_critical(...)
  // Barrier after all threads complete their combinations

  call void @__kmpc_barrier(...)
 // Broadcast phase:
 // Load the final result from the shared variable)
 // Store the final result to the original private variable in each thread
 // Final barrier after broadcast

  call void @__kmpc_barrier(...)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Apr 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 7, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: CHANDRA GHALE (chandraghale)

Changes

Codegen support for reduction over private variable with reduction clause. Section 7.6.10 in in OpenMP 6.0 spec.

  • An internal shared copy is initialized with an initializer value.
  • The shared copy is updated by combining its value with the values from the private copies created by the clause.
  • Once an encountering thread verifies that all updates are complete, its original list item is updated by merging its value with that of the shared copy and then broadcast to all threads.

Patch is 24.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134709.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+147)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+14)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+9-3)
  • (added) clang/test/OpenMP/for_private_reduction_codegen.cpp (+236)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 06a652c146fb9..3424227e5da79 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -4899,6 +4899,150 @@ void CGOpenMPRuntime::emitSingleReductionCombiner(CodeGenFunction &CGF,
   }
 }
 
+void CGOpenMPRuntime::emitPrivateReduction(
+    CodeGenFunction &CGF, SourceLocation Loc, ArrayRef<const Expr *> Privates,
+    ArrayRef<const Expr *> LHSExprs, ArrayRef<const Expr *> RHSExprs,
+    ArrayRef<const Expr *> ReductionOps) {
+
+  if (LHSExprs.empty() || Privates.empty() || ReductionOps.empty())
+    return;
+
+  if (LHSExprs.size() != Privates.size() ||
+      LHSExprs.size() != ReductionOps.size())
+    return;
+
+  QualType PrivateType = Privates[0]->getType();
+  llvm::Type *LLVMType = CGF.ConvertTypeForMem(PrivateType);
+
+  BinaryOperatorKind MainBO = BO_Comma;
+  if (const auto *BinOp = dyn_cast<BinaryOperator>(ReductionOps[0])) {
+    if (const auto *RHSExpr = BinOp->getRHS()) {
+      if (const auto *BORHS =
+              dyn_cast<BinaryOperator>(RHSExpr->IgnoreParenImpCasts())) {
+        MainBO = BORHS->getOpcode();
+      }
+    }
+  }
+
+  llvm::Constant *InitVal = llvm::Constant::getNullValue(LLVMType);
+  const Expr *Private = Privates[0];
+
+  if (const auto *DRE = dyn_cast<DeclRefExpr>(Private)) {
+    if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+      if (const Expr *Init = VD->getInit()) {
+        if (Init->isConstantInitializer(CGF.getContext(), false)) {
+          Expr::EvalResult Result;
+          if (Init->EvaluateAsRValue(Result, CGF.getContext())) {
+            APValue &InitValue = Result.Val;
+            if (InitValue.isInt()) {
+              InitVal = llvm::ConstantInt::get(LLVMType, InitValue.getInt());
+            }
+          }
+        }
+      }
+    }
+  }
+
+  // Create an internal shared variable
+  std::string SharedName = getName({"internal_private_var"});
+  llvm::GlobalVariable *SharedVar = new llvm::GlobalVariable(
+      CGM.getModule(), LLVMType, false, llvm::GlobalValue::CommonLinkage,
+      InitVal, ".omp.reduction." + SharedName, nullptr,
+      llvm::GlobalVariable::NotThreadLocal);
+
+  SharedVar->setAlignment(
+      llvm::MaybeAlign(CGF.getContext().getTypeAlign(PrivateType) / 8));
+
+  Address SharedResult(SharedVar, SharedVar->getValueType(),
+                       CGF.getContext().getTypeAlignInChars(PrivateType));
+
+  llvm::Value *ThreadId = getThreadID(CGF, Loc);
+  llvm::Value *BarrierLoc = emitUpdateLocation(CGF, Loc, OMP_ATOMIC_REDUCE);
+  llvm::Value *BarrierArgs[] = {BarrierLoc, ThreadId};
+
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_barrier),
+                      BarrierArgs);
+
+  llvm::BasicBlock *InitBB = CGF.createBasicBlock("init");
+  llvm::BasicBlock *InitEndBB = CGF.createBasicBlock("init.end");
+
+  llvm::Value *IsWorker = CGF.Builder.CreateICmpEQ(
+      ThreadId, llvm::ConstantInt::get(ThreadId->getType(), 0));
+  CGF.Builder.CreateCondBr(IsWorker, InitBB, InitEndBB);
+
+  CGF.EmitBlock(InitBB);
+  CGF.Builder.CreateStore(InitVal, SharedResult);
+  CGF.Builder.CreateBr(InitEndBB);
+
+  CGF.EmitBlock(InitEndBB);
+
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_barrier),
+                      BarrierArgs);
+
+  for (unsigned I = 0; I < ReductionOps.size(); ++I) {
+    if (I >= LHSExprs.size()) {
+      break;
+    }
+
+    const auto *BinOp = dyn_cast<BinaryOperator>(ReductionOps[I]);
+    if (!BinOp || BinOp->getOpcode() != BO_Assign)
+      continue;
+
+    const Expr *RHSExpr = BinOp->getRHS();
+    if (!RHSExpr)
+      continue;
+
+    BinaryOperatorKind BO = BO_Comma;
+    if (const auto *BORHS =
+            dyn_cast<BinaryOperator>(RHSExpr->IgnoreParenImpCasts())) {
+      BO = BORHS->getOpcode();
+    }
+
+    LValue SharedLV = CGF.MakeAddrLValue(SharedResult, PrivateType);
+    LValue LHSLV = CGF.EmitLValue(LHSExprs[I]);
+    RValue PrivateRV = CGF.EmitLoadOfLValue(LHSLV, Loc);
+    auto &&UpdateOp = [&CGF, PrivateRV, BinOp, BO](RValue OldVal) {
+      if (BO == BO_Mul) {
+        llvm::Value *OldScalar = OldVal.getScalarVal();
+        llvm::Value *PrivateScalar = PrivateRV.getScalarVal();
+        llvm::Value *Result = CGF.Builder.CreateMul(OldScalar, PrivateScalar);
+        return RValue::get(Result);
+      } else {
+        OpaqueValueExpr OVE(BinOp->getLHS()->getExprLoc(),
+                            BinOp->getLHS()->getType(),
+                            ExprValueKind::VK_PRValue);
+        CodeGenFunction::OpaqueValueMapping OldValMapping(CGF, &OVE, OldVal);
+        return CGF.EmitAnyExpr(BinOp->getRHS());
+      }
+    };
+
+    (void)CGF.EmitOMPAtomicSimpleUpdateExpr(
+        SharedLV, PrivateRV, BO, true,
+        llvm::AtomicOrdering::SequentiallyConsistent, Loc, UpdateOp);
+  }
+
+  // Final barrier
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_barrier),
+                      BarrierArgs);
+
+  // Broadcast final result
+  llvm::Value *FinalResult = CGF.Builder.CreateLoad(SharedResult);
+
+  // Update private variables with final result
+  for (unsigned I = 0; I < Privates.size(); ++I) {
+    LValue LHSLV = CGF.EmitLValue(LHSExprs[I]);
+    CGF.Builder.CreateStore(FinalResult, LHSLV.getAddress());
+  }
+
+  // Final synchronization
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_barrier),
+                      BarrierArgs);
+}
+
 void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
                                     ArrayRef<const Expr *> Privates,
                                     ArrayRef<const Expr *> LHSExprs,
@@ -5201,6 +5345,9 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
 
   CGF.EmitBranch(DefaultBB);
   CGF.EmitBlock(DefaultBB, /*IsFinished=*/true);
+  if (Options.IsPrivateVarReduction) {
+    emitPrivateReduction(CGF, Loc, Privates, LHSExprs, RHSExprs, ReductionOps);
+  }
 }
 
 /// Generates unique name for artificial threadprivate variables.
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 4321712e1521d..50ba28b565b6d 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1201,8 +1201,22 @@ class CGOpenMPRuntime {
   struct ReductionOptionsTy {
     bool WithNowait;
     bool SimpleReduction;
+    bool IsPrivateVarReduction;
     OpenMPDirectiveKind ReductionKind;
   };
+
+  /// Emits code for private variable reduction
+  /// \param Privates List of private copies for original reduction arguments.
+  /// \param LHSExprs List of LHS in \a ReductionOps reduction operations.
+  /// \param RHSExprs List of RHS in \a ReductionOps reduction operations.
+  /// \param ReductionOps List of reduction operations in form 'LHS binop RHS'
+  /// or 'operator binop(LHS, RHS)'.
+  void emitPrivateReduction(CodeGenFunction &CGF, SourceLocation Loc,
+                            ArrayRef<const Expr *> Privates,
+                            ArrayRef<const Expr *> LHSExprs,
+                            ArrayRef<const Expr *> RHSExprs,
+                            ArrayRef<const Expr *> ReductionOps);
+
   /// Emit a code for reduction clause. Next code should be emitted for
   /// reduction:
   /// \code
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index e4d1db264aac9..720a88e075ddd 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1470,6 +1470,7 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
   llvm::SmallVector<const Expr *, 8> LHSExprs;
   llvm::SmallVector<const Expr *, 8> RHSExprs;
   llvm::SmallVector<const Expr *, 8> ReductionOps;
+  llvm::SmallVector<bool, 8> IsPrivate;
   bool HasAtLeastOneReduction = false;
   bool IsReductionWithTaskMod = false;
   for (const auto *C : D.getClausesOfKind<OMPReductionClause>()) {
@@ -1480,6 +1481,8 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
     Privates.append(C->privates().begin(), C->privates().end());
     LHSExprs.append(C->lhs_exprs().begin(), C->lhs_exprs().end());
     RHSExprs.append(C->rhs_exprs().begin(), C->rhs_exprs().end());
+    IsPrivate.append(C->private_var_reduction_flags().begin(),
+                     C->private_var_reduction_flags().end());
     ReductionOps.append(C->reduction_ops().begin(), C->reduction_ops().end());
     IsReductionWithTaskMod =
         IsReductionWithTaskMod || C->getModifier() == OMPC_REDUCTION_task;
@@ -1499,9 +1502,11 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
     bool SimpleReduction = ReductionKind == OMPD_simd;
     // Emit nowait reduction if nowait clause is present or directive is a
     // parallel directive (it always has implicit barrier).
+    bool IsPrivateVarReduction =
+        llvm::any_of(IsPrivate, [](bool IsPriv) { return IsPriv; });
     CGM.getOpenMPRuntime().emitReduction(
         *this, D.getEndLoc(), Privates, LHSExprs, RHSExprs, ReductionOps,
-        {WithNowait, SimpleReduction, ReductionKind});
+        {WithNowait, SimpleReduction, IsPrivateVarReduction, ReductionKind});
   }
 }
 
@@ -3943,7 +3948,8 @@ static void emitScanBasedDirective(
       PrivScope.Privatize();
       CGF.CGM.getOpenMPRuntime().emitReduction(
           CGF, S.getEndLoc(), Privates, LHSs, RHSs, ReductionOps,
-          {/*WithNowait=*/true, /*SimpleReduction=*/true, OMPD_unknown});
+          {/*WithNowait=*/true, /*SimpleReduction=*/true,
+           /*IsPrivateVarReduction */ false, OMPD_unknown});
     }
     llvm::Value *NextIVal =
         CGF.Builder.CreateNUWSub(IVal, llvm::ConstantInt::get(CGF.SizeTy, 1));
@@ -5747,7 +5753,7 @@ void CodeGenFunction::EmitOMPScanDirective(const OMPScanDirective &S) {
       }
       CGM.getOpenMPRuntime().emitReduction(
           *this, ParentDir.getEndLoc(), Privates, LHSs, RHSs, ReductionOps,
-          {/*WithNowait=*/true, /*SimpleReduction=*/true, OMPD_simd});
+          {/*WithNowait=*/true, /*SimpleReduction=*/true, false, OMPD_simd});
       for (unsigned I = 0, E = CopyArrayElems.size(); I < E; ++I) {
         const Expr *PrivateExpr = Privates[I];
         LValue DestLVal;
diff --git a/clang/test/OpenMP/for_private_reduction_codegen.cpp b/clang/test/OpenMP/for_private_reduction_codegen.cpp
new file mode 100644
index 0000000000000..be50991ca193e
--- /dev/null
+++ b/clang/test/OpenMP/for_private_reduction_codegen.cpp
@@ -0,0 +1,236 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --include-generated-funcs --prefix-filecheck-ir-name _ --version 5
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -fopenmp-version=60 -x c++ -std=c++17  -emit-llvm %s -o - | FileCheck %s
+// expected-no-diagnostics
+#define N 10
+void do_red(int n, int *v, int &sum_v)
+ {
+ 	 sum_v = 0;
+ 	#pragma omp for reduction(original(private),+: sum_v)
+ 	for (int i = 0; i < n; i++)
+	{
+ 		sum_v += v[i];
+	}
+ }
+ int main(void)
+ {
+ 	int v[N];
+	 for (int i = 0; i < N; i++)
+	 v[i] = i;
+	 #pragma omp parallel num_threads(4)
+	 {
+ 		int s_v;
+ 		do_red(N, v, s_v);
+	 }
+ 	return 0;
+ }
+//.
+// CHECK: @[[GLOB0:[0-9]+]] = private unnamed_addr constant [23 x i8] c"
+// CHECK: @[[GLOB1:[0-9]+]] = private unnamed_addr constant %struct.ident_t { i32 0, i32 514, i32 0, i32 22, ptr @[[GLOB0]] }, align 8
+// CHECK: @[[GLOB2:[0-9]+]] = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @[[GLOB0]] }, align 8
+// CHECK: @.gomp_critical_user_.reduction.var = common global [8 x i32] zeroinitializer, align 8
+// CHECK: @[[GLOB3:[0-9]+]] = private unnamed_addr constant %struct.ident_t { i32 0, i32 18, i32 0, i32 22, ptr @[[GLOB0]] }, align 8
+// CHECK: @.omp.reduction..internal_private_var = common global i32 0, align 4
+// CHECK: @[[GLOB4:[0-9]+]] = private unnamed_addr constant %struct.ident_t { i32 0, i32 66, i32 0, i32 22, ptr @[[GLOB0]] }, align 8
+//.
+// CHECK-LABEL: define dso_local void @_Z6do_rediPiRi(
+// CHECK-SAME: i32 noundef [[N:%.*]], ptr noundef [[V:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[SUM_V:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[V_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[SUM_V_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[TMP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTOMP_IV:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[_TMP1:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTOMP_LB:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTOMP_UB:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTOMP_IS_LAST:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[SUM_V4:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[_TMP5:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[I6:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[DOTOMP_REDUCTION_RED_LIST:%.*]] = alloca [1 x ptr], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB2]])
+// CHECK-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-NEXT:    store ptr [[V]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[SUM_V]], ptr [[SUM_V_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[SUM_V_ADDR]], align 8
+// CHECK-NEXT:    store i32 0, ptr [[TMP1]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[SUM_V_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[TMP2]], ptr [[TMP]], align 8
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[N_ADDR]], align 4
+// CHECK-NEXT:    store i32 [[TMP3]], ptr [[DOTCAPTURE_EXPR_]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
+// CHECK-NEXT:    [[SUB:%.*]] = sub nsw i32 [[TMP4]], 0
+// CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[SUB]], 1
+// CHECK-NEXT:    [[SUB3:%.*]] = sub nsw i32 [[DIV]], 1
+// CHECK-NEXT:    store i32 [[SUB3]], ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK-NEXT:    store i32 0, ptr [[I]], align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 0, [[TMP5]]
+// CHECK-NEXT:    br i1 [[CMP]], label %[[OMP_PRECOND_THEN:.*]], label %[[OMP_PRECOND_END:.*]]
+// CHECK:       [[OMP_PRECOND_THEN]]:
+// CHECK-NEXT:    store i32 0, ptr [[DOTOMP_LB]], align 4
+// CHECK-NEXT:    [[TMP6:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK-NEXT:    store i32 [[TMP6]], ptr [[DOTOMP_UB]], align 4
+// CHECK-NEXT:    store i32 1, ptr [[DOTOMP_STRIDE]], align 4
+// CHECK-NEXT:    store i32 0, ptr [[DOTOMP_IS_LAST]], align 4
+// CHECK-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[TMP]], align 8
+// CHECK-NEXT:    store i32 0, ptr [[SUM_V4]], align 4
+// CHECK-NEXT:    store ptr [[SUM_V4]], ptr [[_TMP5]], align 8
+// CHECK-NEXT:    call void @__kmpc_for_static_init_4(ptr @[[GLOB1]], i32 [[TMP0]], i32 34, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_LB]], ptr [[DOTOMP_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
+// CHECK-NEXT:    [[TMP8:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
+// CHECK-NEXT:    [[TMP9:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
+// CHECK-NEXT:    br i1 [[CMP7]], label %[[COND_TRUE:.*]], label %[[COND_FALSE:.*]]
+// CHECK:       [[COND_TRUE]]:
+// CHECK-NEXT:    [[TMP10:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK-NEXT:    br label %[[COND_END:.*]]
+// CHECK:       [[COND_FALSE]]:
+// CHECK-NEXT:    [[TMP11:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
+// CHECK-NEXT:    br label %[[COND_END]]
+// CHECK:       [[COND_END]]:
+// CHECK-NEXT:    [[COND:%.*]] = phi i32 [ [[TMP10]], %[[COND_TRUE]] ], [ [[TMP11]], %[[COND_FALSE]] ]
+// CHECK-NEXT:    store i32 [[COND]], ptr [[DOTOMP_UB]], align 4
+// CHECK-NEXT:    [[TMP12:%.*]] = load i32, ptr [[DOTOMP_LB]], align 4
+// CHECK-NEXT:    store i32 [[TMP12]], ptr [[DOTOMP_IV]], align 4
+// CHECK-NEXT:    br label %[[OMP_INNER_FOR_COND:.*]]
+// CHECK:       [[OMP_INNER_FOR_COND]]:
+// CHECK-NEXT:    [[TMP13:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
+// CHECK-NEXT:    [[TMP14:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
+// CHECK-NEXT:    [[CMP8:%.*]] = icmp sle i32 [[TMP13]], [[TMP14]]
+// CHECK-NEXT:    br i1 [[CMP8]], label %[[OMP_INNER_FOR_BODY:.*]], label %[[OMP_INNER_FOR_END:.*]]
+// CHECK:       [[OMP_INNER_FOR_BODY]]:
+// CHECK-NEXT:    [[TMP15:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
+// CHECK-NEXT:    [[MUL:%.*]] = mul nsw i32 [[TMP15]], 1
+// CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 0, [[MUL]]
+// CHECK-NEXT:    store i32 [[ADD]], ptr [[I6]], align 4
+// CHECK-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[V_ADDR]], align 8
+// CHECK-NEXT:    [[TMP17:%.*]] = load i32, ptr [[I6]], align 4
+// CHECK-NEXT:    [[IDXPROM:%.*]] = sext i32 [[TMP17]] to i64
+// CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i64 [[IDXPROM]]
+// CHECK-NEXT:    [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+// CHECK-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[_TMP5]], align 8
+// CHECK-NEXT:    [[TMP20:%.*]] = load i32, ptr [[TMP19]], align 4
+// CHECK-NEXT:    [[ADD9:%.*]] = add nsw i32 [[TMP20]], [[TMP18]]
+// CHECK-NEXT:    store i32 [[ADD9]], ptr [[TMP19]], align 4
+// CHECK-NEXT:    br label %[[OMP_BODY_CONTINUE:.*]]
+// CHECK:       [[OMP_BODY_CONTINUE]]:
+// CHECK-NEXT:    br label %[[OMP_INNER_FOR_INC:.*]]
+// CHECK:       [[OMP_INNER_FOR_INC]]:
+// CHECK-NEXT:    [[TMP21:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
+// CHECK-NEXT:    [[ADD10:%.*]] = add nsw i32 [[TMP21]], 1
+// CHECK-NEXT:    store i32 [[ADD10]], ptr [[DOTOMP_IV]], align 4
+// CHECK-NEXT:    br label %[[OMP_INNER_FOR_COND]]
+// CHECK:       [[OMP_INNER_FOR_END]]:
+// CHECK-NEXT:    br label %[[OMP_LOOP_EXIT:.*]]
+// CHECK:       [[OMP_LOOP_EXIT]]:
+// CHECK-NEXT:    call void @__kmpc_for_static_fini(ptr @[[GLOB1]], i32 [[TMP0]])
+// CHECK-NEXT:    [[TMP22:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOMP_REDUCTION_RED_LIST]], i64 0, i64 0
+// CHECK-NEXT:    store ptr [[SUM_V4]], ptr [[TMP22]], align 8
+// CHECK-NEXT:    [[TMP23:%.*]] = call i32 @__kmpc_reduce(ptr @[[GLOB3]], i32 [[TMP0]], i32 1, i64 8, ptr [[DOTOMP_REDUCTION_RED_LIST]], ptr @_Z6do_rediPiRi.omp.reduction.reduction_func, ptr @.gomp_critical_user_.reduction.var)
+// CHECK-NEXT:    switch i32 [[TMP23]], [[DOTOMP_REDUCTION_DEFAULT:label %.*]] [
+// CHECK-NEXT:      i32 1, [[DOTOMP_REDUCTION_CASE1:label %.*]]
+// CHECK-NEXT:      i32 2, [[DOTOMP_REDUCTION_CASE2:label %.*]]
+// CHECK-NEXT:    ]
+// CHECK:       [[_OMP_REDUCTION_CASE1:.*:]]
+// CHECK-NEXT:    [[TMP24:%.*]] = load i32, ptr [[TMP7]], align 4
+// CHECK-NEXT:    [[TMP25:%.*]] = load i32, ptr [[SUM_V4]], align 4
+// CHECK-NEXT:    [[ADD11:%.*]] = add nsw i32 [[TMP24]], [[TMP25]]
+// CHECK-NEXT:    store i32 [[ADD11]], ptr [[TMP7]], align 4
+// CHECK-NEXT:    call void @__kmpc_end_reduce(ptr @[[GLOB3]], i32 [[TMP0]], ptr @.gomp_critical_user_.reduction.var)
+// CHECK-NEXT:    br [[DOTOMP_REDUCTION_DEFAULT]]
+// CHECK:       [[_OMP_REDUCTION_CASE2:.*:]]
+// CHECK-NEXT:    [[TMP26:%.*]] = load i32, ptr [[SUM_V4]], align 4
+// CHECK-NEXT:    [[TMP27:%.*]] = atomicrmw add ptr [[TMP7]], i32 [[TMP26]] monotonic, align 4
+// CHECK-NEXT:    call void @__kmpc_end_reduce(ptr @[[GLOB3]], i32 [[TMP0]], ptr @.gomp_critical_user_.reduction.var)
+// CHECK-NEXT:    br [[DOTOMP_REDUCTION_DEFAULT]]
+// CHECK:       [[_OMP_REDUCTION_DEFAULT:.*:]]
+// CHECK-NEXT:    call void @__kmpc_barrier(ptr @[[GLOB3]], i32 [[TMP0]])
+// CHECK-NEXT:    [[TMP28:%.*]] = icmp eq i32 [[TMP0]], 0
+// CHECK-NEXT:    br i1 [[TMP28]], label %[[INIT:.*]], label %[[INIT_END:.*]]
+// CHECK:       [[INIT]]:
+// CHECK-NEXT:    store i32 0, ptr @.omp.reduction..internal_private_var, align 4
+// CHECK-NEXT:    br label %[[INIT_END]]
+// CHECK:       [[INIT_END]]:
+// CHECK-NEXT:    call void @__kmpc_barrier(ptr @[[GLOB3]], i32 [[TMP0]])
+// CHECK-NEXT:    [[TMP29:%.*]] = load i32, ptr [[TMP7]], align 4
+// CHECK-NEXT:    [[T...
[truncated]

@chandraghale chandraghale changed the title Codegen for Reduction over private variables with reduction clause [OpenMP 6.0 ]Codegen for Reduction over private variables with reduction clause Apr 8, 2025
@chandraghale
Copy link
Contributor Author

Refactored code to look up the expression and handle UDR, emitting it as is. updated the lit test.

@chandraghale chandraghale requested a review from alexey-bataev May 1, 2025 11:13
@chandraghale
Copy link
Contributor Author

@alexey-bataev . Reworked on UDR. Will be helpful if you will be able to review further.

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update OpenMPSupport.rst and release notes

@chandraghale chandraghale requested a review from alexey-bataev May 7, 2025 12:10
@chandraghale chandraghale requested a review from alexey-bataev May 7, 2025 16:55
@chandraghale chandraghale force-pushed the codegen_private_variable_reducn branch from 0d50b0d to 70b7e90 Compare May 20, 2025 14:46
@chandraghale chandraghale force-pushed the codegen_private_variable_reducn branch from 70b7e90 to 4e244c3 Compare May 21, 2025 15:11
@chandraghale chandraghale force-pushed the codegen_private_variable_reducn branch from 4e244c3 to 694e241 Compare May 23, 2025 16:09
Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG with nits

@chandraghale
Copy link
Contributor Author

@alexey-bataev Added few more changes on top, to be on accordance with spec. Shared copy is updated with values from private copies until all updates are complete, before combining into the original list item. Hope the changes are fine with you.

@chandraghale chandraghale force-pushed the codegen_private_variable_reducn branch from 3bb17c1 to e1ca648 Compare June 9, 2025 10:25
@chandraghale chandraghale merged commit afbcf95 into llvm:main Jun 11, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 11, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building clang,openmp at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/7099

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: worksharing/for/omp_for_private_reduction.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ -fopenmp   -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/ompt -std=c++17 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp -lm -latomic -fopenmp-version=60  && /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ -fopenmp -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/ompt -std=c++17 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp -lm -latomic -fopenmp-version=60
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
# `-----------------------------
# error: command failed with exit status: 1

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 11, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building clang,openmp at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/12678

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: worksharing/for/omp_for_private_reduction.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang++ -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt -std=c++17 /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp -lm -latomic -fopenmp-version=60  && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang++ -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt -std=c++17 /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/worksharing/for/Output/omp_for_private_reduction.cpp.tmp -lm -latomic -fopenmp-version=60
# .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
# `-----------------------------
# error: command failed with exit status: 1

--

********************


chandraghale added a commit that referenced this pull request Jun 11, 2025
Error looks to be missing includes for complex number support in some
system. Removing test for now.
Relevant PR :
[PR-134709](#134709)
```
 .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
```

Co-authored-by: Chandra Ghale <[email protected]>
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 11, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot2 while building clang,openmp at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/9002

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90224 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 
FAIL: LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s (54825 of 90224)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp && mkdir -p /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp # RUN: at line 1
+ rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
+ mkdir -p /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s # RUN: at line 2
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 3
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 4
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
292.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
209.16s: Clang :: Driver/fsanitize.c
135.48s: Clang :: Driver/arm-cortex-cpus-2.c
133.76s: Clang :: Driver/arm-cortex-cpus-1.c
132.64s: LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
118.66s: Clang :: Preprocessor/riscv-target-features.c
111.30s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
97.03s: Clang :: OpenMP/target_update_codegen.cpp
93.36s: Clang :: Driver/x86-target-features.c
93.22s: LLVM :: CodeGen/RISCV/attributes.ll
92.10s: Clang :: CodeGen/AArch64/sve-intrinsics/acle_sve_reinterpret.c
89.75s: Clang :: Driver/linux-ld.c
79.05s: Clang :: Preprocessor/aarch64-target-features.c
78.52s: Clang :: Preprocessor/arm-target-features.c
77.94s: Clang :: Driver/clang_f_opts.c
74.10s: Clang :: Driver/cl-options.c
73.87s: Clang :: Headers/arm-neon-header.c
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:73: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90224 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 
FAIL: LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s (54825 of 90224)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/MachO_weak_references.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp && mkdir -p /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp # RUN: at line 1
+ rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
+ mkdir -p /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s # RUN: at line 2
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 3
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-present -abs bar=0x1 -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o # RUN: at line 4
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-jitlink -noexec -check-name=jitlink-check-bar-absent -check=/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/MachO_weak_references.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/test/ExecutionEngine/JITLink/x86-64/Output/MachO_weak_references.s.tmp/macho_weak_refs.o

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
292.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
209.16s: Clang :: Driver/fsanitize.c
135.48s: Clang :: Driver/arm-cortex-cpus-2.c
133.76s: Clang :: Driver/arm-cortex-cpus-1.c
132.64s: LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
118.66s: Clang :: Preprocessor/riscv-target-features.c
111.30s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
97.03s: Clang :: OpenMP/target_update_codegen.cpp
93.36s: Clang :: Driver/x86-target-features.c
93.22s: LLVM :: CodeGen/RISCV/attributes.ll
92.10s: Clang :: CodeGen/AArch64/sve-intrinsics/acle_sve_reinterpret.c
89.75s: Clang :: Driver/linux-ld.c
79.05s: Clang :: Preprocessor/aarch64-target-features.c
78.52s: Clang :: Preprocessor/arm-target-features.c
77.94s: Clang :: Driver/clang_f_opts.c
74.10s: Clang :: Driver/cl-options.c
73.87s: Clang :: Headers/arm-neon-header.c

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 11, 2025
Error looks to be missing includes for complex number support in some
system. Removing test for now.
Relevant PR :
[PR-134709](llvm/llvm-project#134709)
```
 .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
```

Co-authored-by: Chandra Ghale <[email protected]>
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…ion clause (llvm#134709)

Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.

Sample Test Case from OpenMP 6.0 Example 
```
#include <assert.h>
#include <omp.h>
#define N 10

void do_red(int n, int *v, int &sum_v)
{
    sum_v = 0; // sum_v is private
    #pragma omp for reduction(original(private),+: sum_v)
    for (int i = 0; i < n; i++) 
    {
        sum_v += v[i];
    }
}

int main(void)
{
    int v[N];
    for (int i = 0; i < N; i++)
        v[i] = i;
    #pragma omp parallel num_threads(4)
    {
        int s_v; // s_v is private
        do_red(N, v, s_v);
        assert(s_v == 45);
    }
    return 0;
}
```
Expected Codegen:
```
 // A shared global/static variable is introduced for the reduction result.
 // This variable is initialized (e.g., using memset or a UDR initializer)
 // e.g., .omp.reduction.internal_private_var

 // Barrier before any thread performs combination
  call void @__kmpc_barrier(...)

 // Initialization block (executed by thread 0)
 // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)

  call void @__kmpc_critical(...)
    // Inside critical section:
    // Load the current value from the shared variable
    // Load the thread-local private variable's value
    // Perform the reduction operation 
    // Store the result back to the shared variable

  call void @__kmpc_end_critical(...)
  // Barrier after all threads complete their combinations

  call void @__kmpc_barrier(...)
 // Broadcast phase:
 // Load the final result from the shared variable)
 // Store the final result to the original private variable in each thread
 // Final barrier after broadcast

  call void @__kmpc_barrier(...)
```

---------

Co-authored-by: Chandra Ghale <[email protected]>
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…ion clause (llvm#134709)

Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.

Sample Test Case from OpenMP 6.0 Example 
```
#include <assert.h>
#include <omp.h>
#define N 10

void do_red(int n, int *v, int &sum_v)
{
    sum_v = 0; // sum_v is private
    #pragma omp for reduction(original(private),+: sum_v)
    for (int i = 0; i < n; i++) 
    {
        sum_v += v[i];
    }
}

int main(void)
{
    int v[N];
    for (int i = 0; i < N; i++)
        v[i] = i;
    #pragma omp parallel num_threads(4)
    {
        int s_v; // s_v is private
        do_red(N, v, s_v);
        assert(s_v == 45);
    }
    return 0;
}
```
Expected Codegen:
```
 // A shared global/static variable is introduced for the reduction result.
 // This variable is initialized (e.g., using memset or a UDR initializer)
 // e.g., .omp.reduction.internal_private_var

 // Barrier before any thread performs combination
  call void @__kmpc_barrier(...)

 // Initialization block (executed by thread 0)
 // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)

  call void @__kmpc_critical(...)
    // Inside critical section:
    // Load the current value from the shared variable
    // Load the thread-local private variable's value
    // Perform the reduction operation 
    // Store the result back to the shared variable

  call void @__kmpc_end_critical(...)
  // Barrier after all threads complete their combinations

  call void @__kmpc_barrier(...)
 // Broadcast phase:
 // Load the final result from the shared variable)
 // Store the final result to the original private variable in each thread
 // Final barrier after broadcast

  call void @__kmpc_barrier(...)
```

---------

Co-authored-by: Chandra Ghale <[email protected]>
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
Error looks to be missing includes for complex number support in some
system. Removing test for now.
Relevant PR :
[PR-134709](llvm#134709)
```
 .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
```

Co-authored-by: Chandra Ghale <[email protected]>
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…ion clause (llvm#134709)

Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.

Sample Test Case from OpenMP 6.0 Example 
```
#include <assert.h>
#include <omp.h>
#define N 10

void do_red(int n, int *v, int &sum_v)
{
    sum_v = 0; // sum_v is private
    #pragma omp for reduction(original(private),+: sum_v)
    for (int i = 0; i < n; i++) 
    {
        sum_v += v[i];
    }
}

int main(void)
{
    int v[N];
    for (int i = 0; i < N; i++)
        v[i] = i;
    #pragma omp parallel num_threads(4)
    {
        int s_v; // s_v is private
        do_red(N, v, s_v);
        assert(s_v == 45);
    }
    return 0;
}
```
Expected Codegen:
```
 // A shared global/static variable is introduced for the reduction result.
 // This variable is initialized (e.g., using memset or a UDR initializer)
 // e.g., .omp.reduction.internal_private_var

 // Barrier before any thread performs combination
  call void @__kmpc_barrier(...)

 // Initialization block (executed by thread 0)
 // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)

  call void @__kmpc_critical(...)
    // Inside critical section:
    // Load the current value from the shared variable
    // Load the thread-local private variable's value
    // Perform the reduction operation 
    // Store the result back to the shared variable

  call void @__kmpc_end_critical(...)
  // Barrier after all threads complete their combinations

  call void @__kmpc_barrier(...)
 // Broadcast phase:
 // Load the final result from the shared variable)
 // Store the final result to the original private variable in each thread
 // Final barrier after broadcast

  call void @__kmpc_barrier(...)
```

---------

Co-authored-by: Chandra Ghale <[email protected]>
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
Error looks to be missing includes for complex number support in some
system. Removing test for now.
Relevant PR :
[PR-134709](llvm#134709)
```
 .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.
```

Co-authored-by: Chandra Ghale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants