Skip to content

[VPlan] Add ReductionStartVector VPInstruction. #142290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 9, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 31, 2025

Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc.

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-powerpc

Author: Florian Hahn (fhahn)

Changes

Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc.


Patch is 35.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142290.diff

14 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+34-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+4-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+28-48)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+16)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+14)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/if-pred-stores.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+12-6)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..1830ea9678e85 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7501,8 +7501,11 @@ static void fixReductionScalarResumeWhenVectorizingEpilog(
       cast<VPReductionPHIRecipe>(EpiRedResult->getOperand(0));
   const RecurrenceDescriptor &RdxDesc =
       EpiRedHeaderPhi->getRecurrenceDescriptor();
-  Value *MainResumeValue =
-      EpiRedHeaderPhi->getStartValue()->getUnderlyingValue();
+  Value *MainResumeValue;
+  if (auto *VPI = dyn_cast<VPInstruction>(EpiRedHeaderPhi->getStartValue()))
+    MainResumeValue = VPI->getOperand(0)->getUnderlyingValue();
+  else
+    MainResumeValue = EpiRedHeaderPhi->getStartValue()->getUnderlyingValue();
   if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
           RdxDesc.getRecurrenceKind())) {
     auto *Cmp = cast<ICmpInst>(MainResumeValue);
@@ -8552,6 +8555,7 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
       // If the PHI is used by a partial reduction, set the scale factor.
       unsigned ScaleFactor =
           getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);
+
       PhiRecipe = new VPReductionPHIRecipe(
           Phi, RdxDesc, *StartV, CM.isInLoopReduction(Phi),
           CM.useOrderedReductions(RdxDesc), ScaleFactor);
@@ -9439,7 +9443,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       continue;
 
     const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
-    Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
+    Type *PhiTy = PhiR->getUnderlyingValue()->getType();
     // If tail is folded by masking, introduce selects between the phi
     // and the users outside the vector region of each reduction, at the
     // beginning of the dedicated latch block.
@@ -9569,6 +9573,27 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       // start value.
       PhiR->setOperand(0, Plan->getOrAddLiveIn(RdxDesc.getSentinelValue()));
     }
+    RecurKind RK = RdxDesc.getRecurrenceKind();
+    if (PhiR->isOrdered() || PhiR->isInLoop() ||
+        (!RecurrenceDescriptor::isAnyOfRecurrenceKind(RK) &&
+         !RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK) &&
+         !RecurrenceDescriptor::isMinMaxRecurrenceKind(RK))) {
+      VPBuilder PHBuilder(Plan->getVectorPreheader());
+      VPValue *Iden = Plan->getOrAddLiveIn(
+          getRecurrenceIdentity(RK, PhiTy, RdxDesc.getFastMathFlags()));
+      // If the PHI is used by a partial reduction, set the scale factor.
+      unsigned ScaleFactor =
+          RecipeBuilder.getScalingForReduction(RdxDesc.getLoopExitInstr())
+              .value_or(1);
+      Type *I32Ty = IntegerType::getInt32Ty(PhiTy->getContext());
+      auto *ScalarFactorVPV =
+          Plan->getOrAddLiveIn(ConstantInt::get(I32Ty, ScaleFactor));
+      VPValue *StartV =
+          PHBuilder.createNaryOp(VPInstruction::ReductionStartVector,
+                                 {PhiR->getStartValue(), Iden, ScalarFactorVPV},
+                                 RdxDesc.getFastMathFlags());
+      PhiR->setOperand(0, StartV);
+    }
   }
   for (VPRecipeBase *R : ToDelete)
     R->eraseFromParent();
@@ -10081,6 +10106,12 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
     }
     assert(ResumeV && "Must have a resume value");
     VPValue *StartVal = Plan.getOrAddLiveIn(ResumeV);
+    if (auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&R)) {
+      if (auto *VPI = dyn_cast<VPInstruction>(PhiR->getStartValue())) {
+        VPI->setOperand(0, StartVal);
+        continue;
+      }
+    }
     cast<VPHeaderPHIRecipe>(&R)->setStartValue(StartVal);
   }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 44f0b6d964a6e..c49e20518b506 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -907,6 +907,10 @@ class VPInstruction : public VPRecipeWithIRFlags,
     BranchOnCount,
     BranchOnCond,
     Broadcast,
+    /// Start vector for reductions with 3 operands: the original start value,
+    /// the identity value for the reduction and an integer indicating the
+    /// scaling factor.
+    ReductionStartVector,
     ComputeFindLastIVResult,
     ComputeReductionResult,
     // Extracts the last lane from its operand if it is a vector, or the last
@@ -2225,13 +2229,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
 
   /// Returns true, if the phi is part of an in-loop reduction.
   bool isInLoop() const { return IsInLoop; }
-
-  /// Returns true if the recipe only uses the first lane of operand \p Op.
-  bool onlyFirstLaneUsed(const VPValue *Op) const override {
-    assert(is_contained(operands(), Op) &&
-           "Op must be an operand of the recipe");
-    return Op == getStartValue();
-  }
 };
 
 /// A recipe for vectorizing a phi-node as a sequence of mask-based select
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 926490bfad7d0..7cfb07ab25e48 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -87,6 +87,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
                inferScalarType(R->getOperand(1)) &&
            "different types inferred for different operands");
     return IntegerType::get(Ctx, 1);
+  case VPInstruction::ReductionStartVector:
+    return inferScalarType(R->getOperand(0));
   case VPInstruction::ComputeFindLastIVResult:
   case VPInstruction::ComputeReductionResult: {
     auto *PhiR = cast<VPReductionPHIRecipe>(R->getOperand(0));
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index a4831ea7c11f7..0cdd3216288ea 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -604,6 +604,20 @@ Value *VPInstruction::generate(VPTransformState &State) {
     return Builder.CreateVectorSplat(
         State.VF, State.get(getOperand(0), /*IsScalar*/ true), "broadcast");
   }
+  case VPInstruction::ReductionStartVector: {
+    if (State.VF.isScalar())
+      return State.get(getOperand(0), true);
+    IRBuilderBase::FastMathFlagGuard FMFG(Builder);
+    Builder.setFastMathFlags(getFastMathFlags());
+    // If this start vector is scaled then it should produce a vector with fewer
+    // elements than the VF.
+    ElementCount VF = State.VF.divideCoefficientBy(
+        cast<ConstantInt>(getOperand(2)->getLiveInIRValue())->getZExtValue());
+    auto *Iden = Builder.CreateVectorSplat(VF, State.get(getOperand(1), true));
+    Constant *Zero = Builder.getInt32(0);
+    return Builder.CreateInsertElement(Iden, State.get(getOperand(0), true),
+                                       Zero);
+  }
   case VPInstruction::ComputeFindLastIVResult: {
     // FIXME: The cross-recipe dependency on VPReductionPHIRecipe is temporary
     // and will be removed by breaking up the recipe further.
@@ -892,6 +906,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
   case VPInstruction::PtrAdd:
   case VPInstruction::WideIVStep:
   case VPInstruction::StepVector:
+  case VPInstruction::ReductionStartVector:
     return false;
   default:
     return true;
@@ -922,6 +937,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
   case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::BranchOnCount:
   case VPInstruction::BranchOnCond:
+  case VPInstruction::ReductionStartVector:
     return true;
   case VPInstruction::PtrAdd:
     return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this);
@@ -1023,6 +1039,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
   case VPInstruction::FirstActiveLane:
     O << "first-active-lane";
     break;
+  case VPInstruction::ReductionStartVector:
+    O << "reduction-start-vector";
+    break;
   default:
     O << Instruction::getOpcodeName(getOpcode());
   }
@@ -1613,6 +1632,7 @@ bool VPIRFlags::flagsValidForOpcode(unsigned Opcode) const {
            Opcode == Instruction::FDiv || Opcode == Instruction::FRem ||
            Opcode == Instruction::FCmp || Opcode == Instruction::Select ||
            Opcode == VPInstruction::WideIVStep ||
+           Opcode == VPInstruction::ReductionStartVector ||
            Opcode == VPInstruction::ComputeReductionResult;
   case OperationType::NonNegOp:
     return Opcode == Instruction::ZExt;
@@ -3843,17 +3863,19 @@ void VPFirstOrderRecurrencePHIRecipe::print(raw_ostream &O, const Twine &Indent,
 #endif
 
 void VPReductionPHIRecipe::execute(VPTransformState &State) {
-  // If this phi is fed by a scaled reduction then it should output a
-  // vector with fewer elements than the VF.
-  ElementCount VF = State.VF.divideCoefficientBy(VFScaleFactor);
+  // Reductions do not have to start at zero. They can start with
+  // any loop invariant values.
+  VPValue *StartVPV = getStartValue();
 
   // In order to support recurrences we need to be able to vectorize Phi nodes.
   // Phi nodes have cycles, so we need to vectorize them in two stages. This is
   // stage #1: We create a new vector PHI node with no incoming edges. We'll use
   // this value when we vectorize all of the instructions that use the PHI.
-  auto *ScalarTy = State.TypeAnalysis.inferScalarType(this);
+  BasicBlock *VectorPH =
+      State.CFG.VPBB2IRBB.at(getParent()->getCFGPredecessor(0));
   bool ScalarPHI = State.VF.isScalar() || IsInLoop;
-  Type *VecTy = ScalarPHI ? ScalarTy : VectorType::get(ScalarTy, VF);
+  Value *StartV = State.get(StartVPV, ScalarPHI);
+  Type *VecTy = StartV->getType();
 
   BasicBlock *HeaderBB = State.CFG.PrevBB;
   assert(State.CurrentParentLoop->getHeader() == HeaderBB &&
@@ -3862,49 +3884,7 @@ void VPReductionPHIRecipe::execute(VPTransformState &State) {
   Phi->insertBefore(HeaderBB->getFirstInsertionPt());
   State.set(this, Phi, IsInLoop);
 
-  BasicBlock *VectorPH =
-      State.CFG.VPBB2IRBB.at(getParent()->getCFGPredecessor(0));
-  // Create start and identity vector values for the reduction in the preheader.
-  // TODO: Introduce recipes in VPlan preheader to create initial values.
-  IRBuilderBase::InsertPointGuard IPBuilder(State.Builder);
-  State.Builder.SetInsertPoint(VectorPH->getTerminator());
-
-  // Reductions do not have to start at zero. They can start with
-  // any loop invariant values.
-  VPValue *StartVPV = getStartValue();
-  RecurKind RK = RdxDesc.getRecurrenceKind();
-  if (RecurrenceDescriptor::isMinMaxRecurrenceKind(RK) ||
-      RecurrenceDescriptor::isAnyOfRecurrenceKind(RK) ||
-      RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) {
-    // [I|F]FindLastIV will use a sentinel value to initialize the reduction
-    // phi or the resume value from the main vector loop when vectorizing the
-    // epilogue loop. In the exit block, ComputeReductionResult will generate
-    // checks to verify if the reduction result is the sentinel value. If the
-    // result is the sentinel value, it will be corrected back to the start
-    // value.
-    // TODO: The sentinel value is not always necessary. When the start value is
-    // a constant, and smaller than the start value of the induction variable,
-    // the start value can be directly used to initialize the reduction phi.
-    Phi->addIncoming(State.get(StartVPV, ScalarPHI), VectorPH);
-    return;
-  }
-
-  Value *Iden = getRecurrenceIdentity(RK, VecTy->getScalarType(),
-                                      RdxDesc.getFastMathFlags());
-  unsigned CurrentPart = getUnrollPart(*this);
-  Value *StartV = StartVPV->getLiveInIRValue();
-  if (!ScalarPHI) {
-    if (CurrentPart == 0) {
-      Iden = State.Builder.CreateVectorSplat(VF, Iden);
-      Constant *Zero = State.Builder.getInt32(0);
-      StartV = State.Builder.CreateInsertElement(Iden, StartV, Zero);
-    } else {
-      Iden = State.Builder.CreateVectorSplat(VF, Iden);
-    }
-  }
-
-  Value *StartVal = (CurrentPart == 0) ? StartV : Iden;
-  Phi->addIncoming(StartVal, VectorPH);
+  Phi->addIncoming(StartV, VectorPH);
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 5c8849be3d23e..354dac84a0c83 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1151,6 +1151,22 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
     R.setOperand(0, Y);
     return;
   }
+
+  auto *Plan = R.getParent()->getPlan();
+  if (!Plan->isUnrolled())
+    return;
+  /// Simplify redundant ReductionStartVector recipes after unrolling.
+  VPValue *StartV;
+  if (match(&R, m_VPInstruction<VPInstruction::ReductionStartVector>(
+                    m_VPValue(StartV), m_VPValue(), m_VPValue()))) {
+    R.getVPSingleValue()->replaceUsesWithIf(
+        StartV, [&R](const VPUser &U, unsigned Idx) {
+          auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&U);
+          return PhiR && R.getVPSingleValue() == PhiR->getOperand(Idx) &&
+                 PhiR->isInLoop();
+        });
+    return;
+  }
 }
 
 void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index e1fb3d476c58d..3a48feb8c4ef2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -223,6 +223,20 @@ void UnrollState::unrollHeaderPHIByUF(VPHeaderPHIRecipe *R,
       Copy->addOperand(R);
       Copy->addOperand(getConstantVPV(Part));
     } else if (RdxPhi) {
+      // If the start value is a ReductionStartVector, use the identity value (second operand) for unrolled parts. If the scaling factor is > 1, create a new ReductionStartVector with the scale factor and both operands set to the identity value.
+      if (auto *VPI = dyn_cast<VPInstruction>(RdxPhi->getStartValue())) {
+        if (cast<ConstantInt>(VPI->getOperand(2)->getLiveInIRValue())
+                ->getZExtValue() == 1)
+          Copy->setOperand(0, VPI->getOperand(1));
+        else {
+          if (Part == 1) {
+            auto *C = VPI->clone();
+            C->setOperand(0, C->getOperand(1));
+            C->insertAfter(VPI);
+            addUniformForAllParts(C);
+          }
+        }
+      }
       Copy->addOperand(getConstantVPV(Part));
     } else {
       assert(isa<VPActiveLaneMaskPHIRecipe>(R) &&
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll
index 0e5e785a94636..c3fc91c4574f1 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll
@@ -161,8 +161,8 @@ define void @dotp_small_epilogue_vf(i64 %idx.neg, i8 %a) #1 {
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i8> poison, i8 [[A]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT6]], <4 x i8> poison, <4 x i32> zeroinitializer
 ; CHECK-NEXT:    [[IND_END:%.*]] = add i64 [[IDX_NEG]], [[N_VEC5]]
-; CHECK-NEXT:    [[TMP8:%.*]] = sext <4 x i8> [[BROADCAST_SPLAT7]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> zeroinitializer, i32 [[BC_MERGE_RDX]], i32 0
+; CHECK-NEXT:    [[TMP8:%.*]] = sext <4 x i8> [[BROADCAST_SPLAT7]] to <4 x i32>
 ; CHECK-NEXT:    br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
 ; CHECK:       vec.epilog.vector.body:
 ; CHECK-NEXT:    [[INDEX9:%.*]] = phi i64 [ [[IV]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT14:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll b/llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll
index f1947dec2ea23..b4987127a513d 100644
--- a/llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll
@@ -153,10 +153,10 @@ define i1 @select_exit_cond(ptr %start, ptr %end, i64 %N) {
 ; CHECK-NEXT:    [[N_MOD_VF24:%.*]] = urem i64 [[TMP2]], 2
 ; CHECK-NEXT:    [[N_VEC25:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF24]]
 ; CHECK-NEXT:    [[TMP56:%.*]] = getelementptr i8, ptr [[START]], i64 [[N_VEC25]]
+; CHECK-NEXT:    [[TMP57:%.*]] = insertelement <2 x i64> zeroinitializer, i64 [[BC_MERGE_RDX]], i32 0
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[VEC_EPILOG_RESUME_VAL]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <2 x i64> [[DOTSPLAT]], <i64 0, i64 1>
-; CHECK-NEXT:    [[TMP57:%.*]] = insertelement <2 x i64> zeroinitializer, i64 [[BC_MERGE_RDX]], i32 0
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX38:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT32:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll
index 01a7ea4ffcd05..3f17c95f7ca95 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll
@@ -277,9 +277,9 @@ define i32 @cond_add_pred(ptr %a, i64 %n, i32 %start) {
 ; IF-EVL-OUTLOOP-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
 ; IF-EVL-OUTLOOP-NEXT:    [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
 ; IF-EVL-OUTLOOP-NEXT:    [[TMP8:%.*]] = mul i64 [[TMP7]], 4
+; IF-EVL-OUTLOOP-NEXT:    [[TMP9:%.*]] = insertelement <vscale x 4 x i32> zeroinitializer, i32 [[START]], i32 0
 ; IF-EVL-OUTLOOP-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
 ; IF-EVL-OUTLOOP-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; IF-EVL-OUTLOOP-NEXT:    [[TMP9:%.*]] = insertelement <vscale x 4 x i32> zeroinitializer, i32 [[START]], i32 0
 ; IF-EVL-OUTLOOP-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; IF-EVL-OUTLOOP:       vector.body:
 ; IF-EVL-OUTLOOP-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -581,8 +581,8 @@ define i32 @step_cond_add(ptr %a, i64 %n, i32 %start) {
 ; NO-VP-OUTLOOP-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; NO-VP-OUTLOOP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
-; NO-VP-OUTLOOP-NEXT:    [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP11:%.*]] = insertelement <vscale x 4 x i32> zeroinitializer, i32 [[START]], i32 0
+; NO-VP-OUTLOOP-NEXT:    [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP14:%.*]] = mul <vscale x 4 x i32> [[TMP12]], splat (i32 1)
 ; NO-VP-OUTLOOP-NEXT:    [[INDUCTION:%.*]] = add <vscale x 4 x i32> zeroinitializer, [[TMP14]]
 ; NO-VP-OUTLOOP-NEXT:    [[TMP16:%.*]] = trunc i64 [[TMP10]] to i32
@@ -771,8 +771,8 @@ define i32 @step_cond_add_pred(ptr %a, i64 %n, i32 %start) {
 ; NO-VP-OUTLOOP-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; NO-VP-OUTLOOP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
-; NO-VP-OUTLOOP-NEXT:    [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP11:%.*]] = insertelement <vscale x 4 x i32> zeroinitializer, i32 [[START]], i32 0
+; NO-VP-OUTLOOP-NEXT:    [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
 ; NO-VP-OUTLOOP-NEXT:    [[TMP14:%.*]] = mul <vscale x 4 x i32> [[TMP12]], splat (i32 1)
 ; NO-VP-OUTLOOP-NEXT:    [[INDUCTION:%.*]] = add <vscale x 4 x i32> zeroinitializer, [[TMP14]]
 ; NO-VP-OUTLOOP-NEXT:    [[TMP16:%.*]] = trunc i64 [[TMP10]] to i32
diff --...
[truncated]

Copy link

github-actions bot commented May 31, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn fhahn force-pushed the vplan-reductionstartvector branch from aefed54 to d7bfe5b Compare June 1, 2025 21:19
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 1, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
@fhahn fhahn force-pushed the vplan-reductionstartvector branch from d7bfe5b to 2c3c96d Compare June 4, 2025 09:46
@@ -8286,6 +8289,7 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
// If the PHI is used by a partial reduction, set the scale factor.
unsigned ScaleFactor =
getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant blank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripped, thanks!

Comment on lines 92 to 93
case VPInstruction::ReductionStartVector:
return inferScalarType(R->getOperand(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code matches Instruction::ExtractElement and Instruction::Freeze, maybe we can combine them to avoid duplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move case VPInstruction::ReductionStartVector: below case Instruction::Freeze: so they can share the same return inferScalarType(R->getOperand(0)) statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved thanks!

m_VPValue(StartV), m_VPValue(), m_VPValue()))) {
Def->replaceUsesWithIf(StartV, [Def](const VPUser &U, unsigned Idx) {
auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&U);
return PhiR && Def == PhiR->getOperand(Idx) && PhiR->isInLoop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need Def == PhiR->getOperand(Idx)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's not needed in the latest version, removed thanks!

Comment on lines 230 to 241
if (auto *VPI = dyn_cast<VPInstruction>(RdxPhi->getStartValue())) {
if (cast<ConstantInt>(VPI->getOperand(2)->getLiveInIRValue())
->getZExtValue() == 1)
Copy->setOperand(0, VPI->getOperand(1));
else {
if (Part == 1) {
auto *C = VPI->clone();
C->setOperand(0, C->getOperand(1));
C->insertAfter(VPI);
addUniformForAllParts(C);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some format suggestions.

Suggested change
if (auto *VPI = dyn_cast<VPInstruction>(RdxPhi->getStartValue())) {
if (cast<ConstantInt>(VPI->getOperand(2)->getLiveInIRValue())
->getZExtValue() == 1)
Copy->setOperand(0, VPI->getOperand(1));
else {
if (Part == 1) {
auto *C = VPI->clone();
C->setOperand(0, C->getOperand(1));
C->insertAfter(VPI);
addUniformForAllParts(C);
}
}
}
if (auto *VPI = dyn_cast<VPInstruction>(RdxPhi->getStartValue()))
if (cast<ConstantInt>(VPI->getOperand(2)->getLiveInIRValue())
->getZExtValue() == 1) {
Copy->setOperand(0, VPI->getOperand(1));
} else if (Part == 1) {
auto *C = VPI->clone();
C->setOperand(0, C->getOperand(1));
C->insertAfter(VPI);
addUniformForAllParts(C);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 5, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 5, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ReductionStartVector will need to be handled in VPlanAnalysis::getVFScaleFactor so that its register usage isn't overestimated.

RecipeBuilder.getScalingForReduction(RdxDesc.getLoopExitInstr())
.value_or(1);
Type *I32Ty = IntegerType::getInt32Ty(PhiTy->getContext());
auto *ScalarFactorVPV =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar -> Scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Comment on lines -3858 to -3852
// If this phi is fed by a scaled reduction then it should output a
// vector with fewer elements than the VF.
ElementCount VF = State.VF.divideCoefficientBy(VFScaleFactor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, both the start value and the backedge value now are narrowed already.

@fhahn fhahn force-pushed the vplan-reductionstartvector branch from 2c3c96d to ab43958 Compare June 5, 2025 21:04
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ReductionStartVector will need to be handled in VPlanAnalysis::getVFScaleFactor so that its register usage isn't overestimated.

Thanks, I think for now we never try to get the scaling factor as it is produced outside the vector loop. I added an assert for now

@@ -8286,6 +8289,7 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
// If the PHI is used by a partial reduction, set the scale factor.
unsigned ScaleFactor =
getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripped, thanks!

RecipeBuilder.getScalingForReduction(RdxDesc.getLoopExitInstr())
.value_or(1);
Type *I32Ty = IntegerType::getInt32Ty(PhiTy->getContext());
auto *ScalarFactorVPV =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Comment on lines -3858 to -3852
// If this phi is fed by a scaled reduction then it should output a
// vector with fewer elements than the VF.
ElementCount VF = State.VF.divideCoefficientBy(VFScaleFactor);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, both the start value and the backedge value now are narrowed already.

m_VPValue(StartV), m_VPValue(), m_VPValue()))) {
Def->replaceUsesWithIf(StartV, [Def](const VPUser &U, unsigned Idx) {
auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&U);
return PhiR && Def == PhiR->getOperand(Idx) && PhiR->isInLoop();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's not needed in the latest version, removed thanks!

Comment on lines 230 to 241
if (auto *VPI = dyn_cast<VPInstruction>(RdxPhi->getStartValue())) {
if (cast<ConstantInt>(VPI->getOperand(2)->getLiveInIRValue())
->getZExtValue() == 1)
Copy->setOperand(0, VPI->getOperand(1));
else {
if (Part == 1) {
auto *C = VPI->clone();
C->setOperand(0, C->getOperand(1));
C->insertAfter(VPI);
addUniformForAllParts(C);
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 5, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good to me!

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 8, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
fhahn added 2 commits June 8, 2025 12:50
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute.
@fhahn fhahn force-pushed the vplan-reductionstartvector branch from ab43958 to 3c2e248 Compare June 8, 2025 11:52
Comment on lines 92 to 93
case VPInstruction::ReductionStartVector:
return inferScalarType(R->getOperand(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move case VPInstruction::ReductionStartVector: below case Instruction::Freeze: so they can share the same return inferScalarType(R->getOperand(0)) statement?

Comment on lines 400 to 403
if (auto *VPI = dyn_cast<VPInstruction>(R))
assert(
VPI->getOpcode() != VPInstruction::ReductionStartVector &&
"getting scaling factor of reduction-start-vector not implemented yet");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto *VPI = dyn_cast<VPInstruction>(R))
assert(
VPI->getOpcode() != VPInstruction::ReductionStartVector &&
"getting scaling factor of reduction-start-vector not implemented yet");
assert((!isa<VPInstruction>(R) ||
cast<VPInstruction>(R)->getOpcode() != VPInstruction::ReductionStartVector) &&
"getting scaling factor of reduction-start-vector not implemented yet");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 9, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
@fhahn fhahn merged commit 6108d50 into llvm:main Jun 9, 2025
5 of 7 checks passed
@fhahn fhahn deleted the vplan-reductionstartvector branch June 9, 2025 19:59
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 9, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 9, 2025
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: llvm/llvm-project#142290
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 9, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: llvm#142290
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: llvm#142290
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: llvm#142290
@asmok-g
Copy link

asmok-g commented Jun 17, 2025

Heads-up: We noticed a miscompile in one target in google after this patch. Working on a repro

@asmok-g
Copy link

asmok-g commented Jun 17, 2025

The diff in IR before and after the patch is

@@ -624,7 +624,7 @@
 196:                                              ; preds = %196, %194
   %197 = phi i64 [ 0, %194 ], [ %210, %196 ]
   %198 = phi <4 x i32> [ %195, %194 ], [ %208, %196 ]
-  %199 = phi <4 x i32> [ zeroinitializer, %194 ], [ %209, %196 ]
+  %199 = phi <4 x i32> [ %195, %194 ], [ %209, %196 ]
   %200 = getelementptr inbounds nuw i8, ptr %189, i64 %197
   %201 = getelementptr inbounds nuw i8, ptr %200, i64 16
   %202 = load <16 x i8>, ptr %200, align 1, !tbaa !49

The source function is

__attribute__((optnone)) void ComputeVariances(const char* input_map, int input_width, int input_height,
                      absl::Span<const int> sum_map, int out_width,
                      int out_height, int block_size,
                      std::vector<double>* texture_map) {
  // Note that:
  //  sum((x_i - avg)^2)
  //          = sum(x_i^2) - 2*avg*sum(x_i) + avg^2*sum(1))
  //          = sum(x_i^2) - 2*block_sum/n*block_sum + block_sum^2/n^2*n
  //          = sum(x_i^2) - block_sum^2/n
  std::vector<int32_t> cur_row(out_width);
  for (int i = 0; i < out_height; ++i) {
    // Compute sum(x_i^2).
    memset(cur_row.data(), 0, out_width * sizeof(int32_t));
    for (int bi = 0; bi < block_size; ++bi) {
      const auto input_row_start = (i * block_size + bi) * input_width;
      for (int j = 0; j < out_width; ++j) {
        int acc = cur_row[j];
        for (int bj = 0; bj < block_size; ++bj) {
          const auto v = input_map[input_row_start + j * block_size + bj];
          acc += v * v;
        }
        cur_row[j] = acc;
      }
    }
    for (int j = 0; j < out_width; ++j) {
      const int sum = sum_map[i * out_width + j];
      const int sum_sqr = cur_row[j];
      (*texture_map)[i * out_width + j] =
          static_cast<double>(sum_sqr) -
          static_cast<double>(sum) * sum / (block_size * block_size);
    }
  }
}

still working on having the reduced source

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 17, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
@asmok-g
Copy link

asmok-g commented Jun 18, 2025

@fhahn just in case you missed the previous notifications..

@fhahn
Copy link
Contributor Author

fhahn commented Jun 18, 2025

@fhahn just in case you missed the previous notifications..

I saw it, but we still need a reproducer before this is actionable and confirmed it is actually a problem with the patch

@asmok-g
Copy link

asmok-g commented Jun 19, 2025

@fhahn The repro I got is

char *ComputeVariances_input_map;
int ComputeVariances_block_size, ComputeVariances_acc;
void ComputeVariances() {
  for (;;)
    for (int bj; bj < ComputeVariances_block_size; ++bj)
      ComputeVariances_acc +=
          ComputeVariances_input_map[bj] * ComputeVariances_input_map[bj];
}

cmd: clang -O2 '--target=aarch64-grtev4-linux-gnu' '-mcpu=neoverse-n1' '-std=gnu++20' pre.ii -emit-llvm -S
diff:

@@ -54,7 +54,7 @@
 31:                                               ; preds = %31, %27
   %32 = phi i64 [ 0, %27 ], [ %45, %31 ]
   %33 = phi <4 x i32> [ %29, %27 ], [ %43, %31 ]
-  %34 = phi <4 x i32> [ zeroinitializer, %27 ], [ %44, %31 ]
+  %34 = phi <4 x i32> [ %29, %27 ], [ %44, %31 ]
   %35 = getelementptr i8, ptr %30, i64 %32
   %36 = getelementptr inbounds nuw i8, ptr %35, i64 16
   %37 = load <16 x i8>, ptr %35, align 1, !tbaa !10, !alias.scope !11

@sjoerdmeijer
Copy link
Collaborator

Came here to say the same thing: we have also bisected a correctness issue down to this change.
It occurs in SPECv8 and 706.stockfish_r if you happen to have access to the next SPEC by any chance. Haven't tried reproducing it yet, but it looks like we already have a reproducer now.

Is this worth a revert? Or will there be a quick fix?

@fhahn
Copy link
Contributor Author

fhahn commented Jun 19, 2025

Thanks for the reproducer, should be fixed soon

fhahn added a commit that referenced this pull request Jun 19, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 19, 2025
fhahn added a commit that referenced this pull request Jun 19, 2025
Update handling of ReductionStartVector in VPlanUnroll for partial
reductions. The new code makes sure all parts are properly set to the
cloned ReductionStartVector.

Fixes a mis-compile reported for
#142290.
@fhahn
Copy link
Contributor Author

fhahn commented Jun 19, 2025

Should be fixed in c4c2d77

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 19, 2025
… unrolling.

Update handling of ReductionStartVector in VPlanUnroll for partial
reductions. The new code makes sure all parts are properly set to the
cloned ReductionStartVector.

Fixes a mis-compile reported for
llvm/llvm-project#142290.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 25, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 29, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 29, 2025
Replace VPReductionPHIRecipe with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
llvm#141860
llvm#141932
llvm#142290
llvm#142291
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants