Skip to content

[VPlan] Implement cloning of VPlans. #73158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 27, 2024
Merged

[VPlan] Implement cloning of VPlans. #73158

merged 9 commits into from
Jan 27, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Nov 22, 2023

This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.


Patch is 20.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73158.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+121)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+171)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index df71593543051f6..469e77d72b6b998 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10293,7 +10293,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
                                            EPI, &LVL, &CM, BFI, PSI, Checks);
 
-        VPlan &BestMainPlan = LVP.getBestPlanFor(EPI.MainLoopVF);
+        VPlan &BestMainPlan = *LVP.getBestPlanFor(EPI.MainLoopVF).clone();
         auto ExpandedSCEVs = LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF,
                                              BestMainPlan, MainILV, DT, true);
         ++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 9ffce475f022eb3..e7130a889e8b307 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -614,6 +614,18 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+VPBlockBase *VPRegionBlock::clone() {
+  DenseMap<VPBlockBase *, VPBlockBase *> Old2New;
+  DenseMap<VPValue *, VPValue *> Old2NewVPValues;
+  VPBlockBase *NewEntry =
+      VPBlockUtils::cloneCFG(Entry, Old2New, Old2NewVPValues);
+  auto *NewR =
+      new VPRegionBlock(NewEntry, Old2New[Exiting], getName(), isReplicator());
+  for (VPBlockBase *Block : vp_depth_first_shallow(NewEntry))
+    Block->setParent(NewR);
+  return NewR;
+}
+
 void VPRegionBlock::dropAllReferences(VPValue *NewValue) {
   for (VPBlockBase *Block : vp_depth_first_shallow(Entry))
     // Drop all references in VPBasicBlocks and replace all uses with
@@ -964,6 +976,65 @@ void VPlan::updateDominatorTree(DominatorTree *DT, BasicBlock *LoopHeaderBB,
   assert(DT->verify(DominatorTree::VerificationLevel::Fast));
 }
 
+static void remapVPValues(VPBasicBlock *OldBB, VPBasicBlock *NewBB,
+                          DenseMap<VPValue *, VPValue *> &Old2NewVPValues,
+                          bool Full = false) {
+  for (const auto &[OldR, NewR] : zip(*OldBB, *NewBB)) {
+    for (unsigned I = 0, E = NewR.getNumOperands(); I != E; ++I) {
+      VPValue *NewOp = Old2NewVPValues.lookup(OldR.getOperand(I));
+      if (!Full)
+        continue;
+      NewR.setOperand(I, NewOp);
+    }
+    for (const auto &[OldV, NewV] :
+         zip(OldR.definedValues(), NewR.definedValues()))
+      Old2NewVPValues[OldV] = NewV;
+  }
+}
+
+VPlan *VPlan::clone() {
+  DenseMap<VPBlockBase *, VPBlockBase *> Old2New;
+  DenseMap<VPValue *, VPValue *> Old2NewVPValues;
+
+  auto *P = new VPlan();
+  SmallVector<VPValue *, 16> NewLiveIns;
+  for (VPValue *LI : VPLiveInsToFree) {
+    VPValue *NewLI = new VPValue(LI->getLiveInIRValue());
+    P->VPLiveInsToFree.push_back(NewLI);
+    Old2NewVPValues[LI] = NewLI;
+  }
+
+  Old2NewVPValues[&VectorTripCount] = &P->VectorTripCount;
+  if (BackedgeTakenCount) {
+    Old2NewVPValues[BackedgeTakenCount] = new VPValue();
+    P->BackedgeTakenCount = Old2NewVPValues[BackedgeTakenCount];
+  }
+
+  auto NewPH = cast<VPBasicBlock>(Preheader->clone());
+  remapVPValues(cast<VPBasicBlock>(Preheader), cast<VPBasicBlock>(NewPH),
+                Old2NewVPValues, /*Full*/ true);
+  VPValue *NewTC = Old2NewVPValues.lookup(TripCount);
+  if (!NewTC)
+    Old2NewVPValues[TripCount] = new VPValue(TripCount->getLiveInIRValue());
+  P->TripCount = Old2NewVPValues[TripCount];
+
+  auto *NewEntry = cast<VPBasicBlock>(VPBlockUtils::cloneCFG(
+      getEntry(), Old2New, Old2NewVPValues, /*FullRemapping*/ true));
+
+  P->Entry = NewEntry;
+  P->Preheader = NewPH;
+  NewEntry->setPlan(P);
+  NewPH->setPlan(P);
+  P->VFs = VFs;
+  P->UFs = UFs;
+  P->Name = Name;
+
+  for (const auto &[_, LO] : LiveOuts) {
+    P->addLiveOut(LO->getPhi(), Old2NewVPValues[LO->getOperand(0)]);
+  }
+  return P;
+}
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 
 Twine VPlanPrinter::getUID(const VPBlockBase *Block) {
@@ -1168,6 +1239,56 @@ void VPUser::printOperands(raw_ostream &O, VPSlotTracker &SlotTracker) const {
 }
 #endif
 
+VPBlockBase *VPBlockUtils::cloneCFG(
+    VPBlockBase *Entry, DenseMap<VPBlockBase *, VPBlockBase *> &Old2New,
+    DenseMap<VPValue *, VPValue *> &Old2NewVPValues, bool FullRemapping) {
+  ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
+      Entry);
+  VPBlockBase *NewEntry = nullptr;
+  for (VPBlockBase *BB : RPOT) {
+    VPBlockBase *NewBB = BB->clone();
+    if (!NewEntry)
+      NewEntry = NewBB;
+
+    for (VPBlockBase *Pred : BB->getPredecessors())
+      connectBlocks(Old2New[Pred], NewBB);
+
+    Old2New[BB] = NewBB;
+
+    if (!isa<VPBasicBlock>(BB))
+      continue;
+  }
+
+  // Update the operands of all cloned recipes starting at NewEntry. This traverses all reachable blocks. This is done in two steps, to handle cycles in PHI recipes.
+  ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>>
+      OldDeepRPOT(Entry);
+  ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>>
+      NewDeepRPOT(NewEntry);
+  // First, collect all mappings from old to new VPValues defined by cloned recipes.
+  for (const auto &[OldBB, NewBB] :
+       zip(VPBlockUtils::blocksOnly<VPBasicBlock>(OldDeepRPOT),
+           VPBlockUtils::blocksOnly<VPBasicBlock>(NewDeepRPOT))) {
+    for (const auto &[OldR, NewR] : zip(*OldBB, *NewBB))
+      for (const auto &[OldV, NewV] :
+           zip(OldR.definedValues(), NewR.definedValues()))
+        Old2NewVPValues[OldV] = NewV;
+  }
+
+  // Update all operands to use cloned VPValues.
+  for (VPBasicBlock *NewBB :
+           VPBlockUtils::blocksOnly<VPBasicBlock>(NewDeepRPOT)) {
+    for (VPRecipeBase &NewR : *NewBB)
+      for (unsigned I = 0, E = NewR.getNumOperands(); I != E; ++I) {
+        VPValue *NewOp = Old2NewVPValues.lookup(NewR.getOperand(I));
+        if (!FullRemapping)
+          continue;
+        NewR.setOperand(I, NewOp);
+      }
+  }
+
+  return NewEntry;
+}
+
 void VPInterleavedAccessInfo::visitRegion(VPRegionBlock *Region,
                                           Old2NewTy &Old2New,
                                           InterleavedAccessInfo &IAI) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index a26308a212bbd3c..83e689e5cb3a06e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -482,6 +482,8 @@ class VPBlockBase {
 
   using VPBlocksTy = SmallVectorImpl<VPBlockBase *>;
 
+  virtual VPBlockBase *clone() = 0;
+
   virtual ~VPBlockBase() = default;
 
   const std::string &getName() const { return Name; }
@@ -729,6 +731,9 @@ class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock>,
       : VPDef(SC), VPUser(Operands, VPUser::VPUserID::Recipe), DL(DL) {}
   virtual ~VPRecipeBase() = default;
 
+  /// Clone the current recipe.
+  virtual VPRecipeBase *clone() { return nullptr; }
+
   /// \return the VPBasicBlock which this VPRecipe belongs to.
   VPBasicBlock *getParent() { return Parent; }
   const VPBasicBlock *getParent() const { return Parent; }
@@ -874,6 +879,12 @@ class VPRecipeWithIRFlags : public VPRecipeBase {
     unsigned AllFlags;
   };
 
+protected:
+  void transferFlags(VPRecipeWithIRFlags &Other) {
+    OpType = Other.OpType;
+    AllFlags = Other.AllFlags;
+  }
+
 public:
   template <typename IterT>
   VPRecipeWithIRFlags(const unsigned char SC, IterT Operands, DebugLoc DL = {})
@@ -1086,6 +1097,13 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {
 
   VP_CLASSOF_IMPL(VPDef::VPInstructionSC)
 
+  virtual VPRecipeBase *clone() override {
+    SmallVector<VPValue *, 2> Operands(operands());
+    auto *New = new VPInstruction(Opcode, Operands, getDebugLoc(), Name);
+    New->transferFlags(*this);
+    return New;
+  }
+
   unsigned getOpcode() const { return Opcode; }
 
   /// Generate the instruction.
@@ -1166,6 +1184,12 @@ class VPWidenRecipe : public VPRecipeWithIRFlags, public VPValue {
 
   ~VPWidenRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    auto *R = new VPWidenRecipe(*getUnderlyingInstr(), operands());
+    R->transferFlags(*this);
+    return R;
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenSC)
 
   /// Produce widened copies of all Ingredients.
@@ -1201,6 +1225,12 @@ class VPWidenCastRecipe : public VPRecipeBase, public VPValue {
 
   ~VPWidenCastRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenCastRecipe(
+        Opcode, getOperand(0), ResultTy,
+        cast_if_present<CastInst>(getUnderlyingValue()));
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenCastSC)
 
   /// Produce widened copies of the cast.
@@ -1239,6 +1269,11 @@ class VPWidenCallRecipe : public VPRecipeBase, public VPValue {
 
   ~VPWidenCallRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenCallRecipe(*cast<CallInst>(getUnderlyingInstr()),
+                                 operands(), VectorIntrinsicID, Variant);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenCallSC)
 
   /// Produce a widened version of the call instruction.
@@ -1260,6 +1295,11 @@ struct VPWidenSelectRecipe : public VPRecipeBase, public VPValue {
 
   ~VPWidenSelectRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenSelectRecipe(*cast<SelectInst>(getUnderlyingInstr()),
+                                   operands());
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenSelectSC)
 
   /// Produce a widened version of the select instruction.
@@ -1304,6 +1344,11 @@ class VPWidenGEPRecipe : public VPRecipeWithIRFlags, public VPValue {
 
   ~VPWidenGEPRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenGEPRecipe(cast<GetElementPtrInst>(getUnderlyingInstr()),
+                                operands());
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenGEPSC)
 
   /// Generate the gep nodes.
@@ -1419,6 +1464,11 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
 
   ~VPWidenIntOrFpInductionRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(),
+                                             getStepValue(), IndDesc, Trunc);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
 
   /// Generate the vectorized and scalarized versions of the phi node as
@@ -1489,6 +1539,12 @@ class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe {
 
   ~VPWidenPointerInductionRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenPointerInductionRecipe(
+        cast<PHINode>(getUnderlyingInstr()), getOperand(0), getOperand(1),
+        IndDesc, IsScalarAfterVectorization);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenPointerInductionSC)
 
   /// Generate vector values for the pointer induction.
@@ -1522,6 +1578,13 @@ class VPWidenPHIRecipe : public VPHeaderPHIRecipe {
       addOperand(Start);
   }
 
+  VPRecipeBase *clone() override {
+    auto *Res = new VPWidenPHIRecipe(cast<PHINode>(getUnderlyingInstr()),
+                                     getOperand(0));
+    Res->IncomingBlocks = IncomingBlocks;
+    return Res;
+  }
+
   ~VPWidenPHIRecipe() override = default;
 
   VP_CLASSOF_IMPL(VPDef::VPWidenPHISC)
@@ -1561,6 +1624,11 @@ struct VPFirstOrderRecurrencePHIRecipe : public VPHeaderPHIRecipe {
     return R->getVPDefID() == VPDef::VPFirstOrderRecurrencePHISC;
   }
 
+  VPRecipeBase *clone() override {
+    return new VPFirstOrderRecurrencePHIRecipe(
+        cast<PHINode>(getUnderlyingInstr()), *getOperand(0));
+  }
+
   void execute(VPTransformState &State) override;
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
@@ -1596,6 +1664,14 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe {
 
   ~VPReductionPHIRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    auto *R =
+        new VPReductionPHIRecipe(cast<PHINode>(getUnderlyingInstr()), RdxDesc,
+                                 *getOperand(0), IsInLoop, IsOrdered);
+    R->addOperand(getBackedgeValue());
+    return R;
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPReductionPHISC)
 
   static inline bool classof(const VPHeaderPHIRecipe *R) {
@@ -1638,6 +1714,11 @@ class VPBlendRecipe : public VPRecipeBase, public VPValue {
            "of operands");
   }
 
+  VPRecipeBase *clone() override {
+    SmallVector<VPValue *> Ops(operands());
+    return new VPBlendRecipe(cast<PHINode>(getUnderlyingValue()), Ops);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPBlendSC)
 
   /// Return the number of incoming values, taking into account that a single
@@ -1707,6 +1788,11 @@ class VPInterleaveRecipe : public VPRecipeBase {
   }
   ~VPInterleaveRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPInterleaveRecipe(IG, getAddr(), getStoredValues(), getMask(),
+                                  NeedsMaskForGaps);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPInterleaveSC)
 
   /// Return the address accessed by this recipe.
@@ -1773,6 +1859,11 @@ class VPReductionRecipe : public VPRecipeBase, public VPValue {
 
   ~VPReductionRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPReductionRecipe(RdxDesc, getUnderlyingInstr(), getChainOp(),
+                                 getVecOp(), getCondOp());
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPReductionSC)
 
   /// Generate the reduction in the loop
@@ -1817,6 +1908,11 @@ class VPReplicateRecipe : public VPRecipeWithIRFlags, public VPValue {
 
   ~VPReplicateRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPReplicateRecipe(getUnderlyingInstr(), operands(), IsUniform,
+                                 isPredicated() ? getMask() : nullptr);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPReplicateSC)
 
   /// Generate replicas of the desired Ingredient. Replicas will be generated
@@ -1869,6 +1965,10 @@ class VPBranchOnMaskRecipe : public VPRecipeBase {
       addOperand(BlockInMask);
   }
 
+  VPRecipeBase *clone() override {
+    return new VPBranchOnMaskRecipe(getOperand(0));
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPBranchOnMaskSC)
 
   /// Generate the extraction of the appropriate bit from the block mask and the
@@ -1916,6 +2016,10 @@ class VPPredInstPHIRecipe : public VPRecipeBase, public VPValue {
       : VPRecipeBase(VPDef::VPPredInstPHISC, PredV), VPValue(this) {}
   ~VPPredInstPHIRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPPredInstPHIRecipe(getOperand(0));
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPPredInstPHISC)
 
   /// Generates phi nodes for live-outs as needed to retain SSA form.
@@ -1979,6 +2083,16 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     setMask(Mask);
   }
 
+  VPRecipeBase *clone() override {
+    if (isStore())
+      return new VPWidenMemoryInstructionRecipe(
+          cast<StoreInst>(Ingredient), getAddr(), getStoredValue(), getMask(),
+          Consecutive, Reverse);
+
+    return new VPWidenMemoryInstructionRecipe(
+        cast<LoadInst>(Ingredient), getAddr(), getMask(), Consecutive, Reverse);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenMemoryInstructionSC)
 
   /// Return the address accessed by this recipe.
@@ -2045,6 +2159,8 @@ class VPExpandSCEVRecipe : public VPRecipeBase, public VPValue {
 
   ~VPExpandSCEVRecipe() override = default;
 
+  VPRecipeBase *clone() override { return new VPExpandSCEVRecipe(Expr, SE); }
+
   VP_CLASSOF_IMPL(VPDef::VPExpandSCEVSC)
 
   /// Generate a canonical vector induction variable of the vector loop, with
@@ -2070,6 +2186,12 @@ class VPCanonicalIVPHIRecipe : public VPHeaderPHIRecipe {
 
   ~VPCanonicalIVPHIRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    auto *R = new VPCanonicalIVPHIRecipe(getOperand(0), getDebugLoc());
+    R->addOperand(getBackedgeValue());
+    return R;
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPCanonicalIVPHISC)
 
   static inline bool classof(const VPHeaderPHIRecipe *D) {
@@ -2116,6 +2238,10 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
 
   ~VPActiveLaneMaskPHIRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPActiveLaneMaskPHIRecipe(getOperand(0), getDebugLoc());
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPActiveLaneMaskPHISC)
 
   static inline bool classof(const VPHeaderPHIRecipe *D) {
@@ -2141,6 +2267,11 @@ class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
 
   ~VPWidenCanonicalIVRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPWidenCanonicalIVRecipe(
+        cast<VPCanonicalIVPHIRecipe>(getOperand(0)));
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPWidenCanonicalIVSC)
 
   /// Generate a canonical vector induction variable of the vector loop, with
@@ -2183,9 +2314,22 @@ class VPDerivedIVRecipe : public VPRecipeBase, public VPValue {
         VPValue(this), TruncResultTy(TruncResultTy), Kind(IndDesc.getKind()),
         FPBinOp(dyn_cast_or_null<FPMathOperator>(IndDesc.getInductionBinOp())) {
   }
+  VPDerivedIVRecipe(InductionDescriptor::InductionKind Kind,
+                    const FPMathOperator *FPBinOp, VPValue *Start,
+                    VPCanonicalIVPHIRecipe *CanonicalIV, VPValue *Step,
+                    Type *TruncResultTy)
+      : VPRecipeBase(VPDef::VPDerivedIVSC, {Start, CanonicalIV, Step}),
+        VPValue(this), TruncResultTy(TruncResultTy), Kind(Kind),
+        FPBinOp(FPBinOp) {}
 
   ~VPDerivedIVRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPDerivedIVRecipe(Kind, FPBinOp, getOperand(0),
+                                 cast<VPCanonicalIVPHIRecipe>(getOperand(1)),
+                                 getOperand(2), TruncResultTy);
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPDerivedIVSC)
 
   /// Generate the transformed value of the induction at offset StartValue (1.
@@ -2237,6 +2381,12 @@ class VPScalarIVStepsRecipe : public VPRecipeWithIRFlags, public VPValue {
 
   ~VPScalarIVStepsRecipe() override = default;
 
+  VPRecipeBase *clone() override {
+    return new VPScalarIVStepsRecipe(
+        getOperand(0), getOperand(1), InductionOpcode,
+        hasFastMathFlags() ? getFastMathFlags() : FastMathFlags());
+  }
+
   VP_CLASSOF_IMPL(VPDef::VPScalarIVStepsSC)
 
   /// Generate the scalarized versions of the phi node as needed by their users.
@@ -2281,6 +2431,13 @@ class VPBasicBlock : public VPBlockBase {
       Recipes.pop_back();
   }
 
+  VPBlockBase *clone() override {
+    auto *NewBlock = new VPBasicBlock(getName());
+    for (VPRecipeBase &R : *this)
+      NewBlock->appendRecipe(R.clone());
+    return NewBlock;
+  }
+
   /// Instruction iterators...
   using iterator = RecipeListTy::iterator;
   using const_iterator = RecipeListTy::const_iterator;
@@ -2419,6 +2576,8 @@ class VPRegionBlock : public VPBlockBase {
     }
   }
 
+  VPBlockBase *clone() override;
+
   /// Method to support type inquiry through isa, cast, and dyn_cast.
   static inline bool classof(const VPBlockBase *V) {
     return V->getVPBlockID() == VPBlockBase::VPRegionBlockSC;
@@ -2536,6 +2695,9 @@ class VPlan {
   /// been modeled in VPlan directly.
   DenseMap<const SCEV *, VPValue *> SCEVToExpansion;
 
+  /// Construct an uninitialized VPlan, should be used for cloning only.
+  VPlan() {}
+
 public:
   /// Construct a VPlan with original preheader \p Preheader, trip count \p TC
   /// and \p Entry to the plan. At the moment, \p Preheader and \p Entry need to
@@ -2720,6 +2882,8 @@ class VPlan {
   VPBasicBlock *getPreheader() { return Preheader; }
   const VPBasicBlock *getPreheader() const { return Preheader; }
 
+  VPlan *clone();
+
 private:
   /// Add to the given dominator tree the header block and every new basic block
   /// that was created between it and the latch block, inclusive.
@@ -2882,6 +3046,13 @@ class VPBlockUtils {
       return cast<BlockTy>(&Block);
     });
   }
+
+  /// Clone the CFG for all nodes reachable from \p Entry, this includes cloning the blocks and their recipes. Operands of cloned recipes will be updated to use new VPValues from \p Old2NewValues. If \p FullRemapping is set to true, then all old VPValues from outside the cloned nodes must be mapped in \p Old2NewValues.
+  static VPBlockBase *
+  cloneCFG(VPBlockBase *Entry,
+           DenseMap<VPBlockBase *, VPBlockBase *> &Old2NewBBs,
+           DenseMap<VPValue *, VPValue *> &Old2NewValues,
+        ...
[truncated]

Copy link

github-actions bot commented Nov 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -729,6 +731,9 @@ class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock>,
: VPDef(SC), VPUser(Operands, VPUser::VPUserID::Recipe), DL(DL) {}
virtual ~VPRecipeBase() = default;

/// Clone the current recipe.
virtual VPRecipeBase *clone() { return nullptr; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can make it pure virtual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@@ -1086,6 +1097,13 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {

VP_CLASSOF_IMPL(VPDef::VPInstructionSC)

virtual VPRecipeBase *clone() override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop virtual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@@ -2536,6 +2695,9 @@ class VPlan {
/// been modeled in VPlan directly.
DenseMap<const SCEV *, VPValue *> SCEVToExpansion;

/// Construct an uninitialized VPlan, should be used for cloning only.
VPlan() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPlan() {}
explicit VPlan() = default;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be addressed, thanks! Also rebased on top of current main after landing VPlan-based truncateToMinimalBitwidthss

@@ -729,6 +731,9 @@ class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock>,
: VPDef(SC), VPUser(Operands, VPUser::VPUserID::Recipe), DL(DL) {}
virtual ~VPRecipeBase() = default;

/// Clone the current recipe.
virtual VPRecipeBase *clone() { return nullptr; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@@ -1086,6 +1097,13 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {

VP_CLASSOF_IMPL(VPDef::VPInstructionSC)

virtual VPRecipeBase *clone() override {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@@ -2536,6 +2695,9 @@ class VPlan {
/// been modeled in VPlan directly.
DenseMap<const SCEV *, VPValue *> SCEVToExpansion;

/// Construct an uninitialized VPlan, should be used for cloning only.
VPlan() {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

This patch implements cloning for VPlans and recipes. Cloning is used in
the epilogue vectorization path, to clone the VPlan for the main vector
loop. This means we won't re-use a VPlan when executing the VPlan for
the epilogue vector loop, which in turn will enable us to perform
optimizations based on UF & VF.
@fhahn
Copy link
Contributor Author

fhahn commented Jan 7, 2024

Rebased on top of recent changes including newly added recipes. Ping :)

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good step forward! Towards better support of epilog vectorization, unrolling by UF, and steps along VPlan's evolution. Adding various technical comments.

@@ -1594,6 +1657,13 @@ class VPWidenPHIRecipe : public VPHeaderPHIRecipe {
addOperand(Start);
}

VPRecipeBase *clone() override {
auto *Res = new VPWidenPHIRecipe(cast<PHINode>(getUnderlyingInstr()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cloning is presumably unreachable, currently, as this recipe participates in Native VPlans only which refrain from epilog vectorization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better mark it unreachable than have untested dead code, potentially misconsidered live?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to llvm_unreachable, thanks!

@@ -2262,9 +2400,22 @@ class VPDerivedIVRecipe : public VPRecipeBase, public VPValue {
VPValue(this), TruncResultTy(TruncResultTy), Kind(IndDesc.getKind()),
FPBinOp(dyn_cast_or_null<FPMathOperator>(IndDesc.getInductionBinOp())) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the above constructor call the new one below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 2414 to 2416
return new VPDerivedIVRecipe(Kind, FPBinOp, getOperand(0),
cast<VPCanonicalIVPHIRecipe>(getOperand(1)),
getOperand(2), TruncResultTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new VPDerivedIVRecipe(Kind, FPBinOp, getOperand(0),
cast<VPCanonicalIVPHIRecipe>(getOperand(1)),
getOperand(2), TruncResultTy);
return new VPDerivedIVRecipe(Kind, FPBinOp, getStartValue() ,
cast<VPCanonicalIVPHIRecipe>(getCanonicalIV()),
getStepValue(), TruncResultTy);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted, thanks!

for (const auto &[OldR, NewR] : zip(*OldBB, *NewBB)) {
for (unsigned I = 0, E = NewR.getNumOperands(); I != E; ++I) {
VPValue *NewOp = Old2NewVPValues.lookup(OldR.getOperand(I));
if (!Full)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check if NewOp is a live-in and !Full?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsFull is gone now

@@ -10078,7 +10078,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

VPlan &BestMainPlan = LVP.getBestPlanFor(EPI.MainLoopVF);
VPlan &BestMainPlan = *LVP.getBestPlanFor(EPI.MainLoopVF).clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cloning will enable LVP::executePlan() to drop the following guard:

  if (!IsEpilogueVectorization)
    VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);

as mentioned in the summary, but that is left for a follow-up patch, effectively keeping this patch an NFC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly!

NewPlan->BackedgeTakenCount = Old2NewVPValues[BackedgeTakenCount];
}

auto NewPH = cast<VPBasicBlock>(Preheader->clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto NewPH = cast<VPBasicBlock>(Preheader->clone());
auto NewPreHeader = cast<VPBasicBlock>(Preheader->clone());

to be more consistent with (Old)Preheader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks!

Comment on lines 1051 to 1052
for (const auto &[_, LO] : LiveOuts)
NewPlan->addLiveOut(LO->getPhi(), Old2NewVPValues[LO->getOperand(0)]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set the live-outs earlier, leaving the above setting of NewPlan's fields to be last?

Worth documenting the stages, going top-down:

  1. live-ins (held in VPlan either individually as pointers or instances, or collectively in VPLiveInsToFree)
  2. blocks (preheader, entry and all it reaches, which are currently disconnected)
  3. live-outs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented stages and moved, thanks!

@@ -615,6 +615,18 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
}
#endif

VPBlockBase *VPRegionBlock::clone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Region::clone() deal only with cloning its blocks, recursively, w/o remapping any VPValues, similar to VPBB::clone()? To be followed by a single deep scan over all VPBB's of VPlan to map VPValues, fully, once. Otherwise a region nested within another will undergo multiple remappings of its VPValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! Also removed VPBlockBase::clone and moved to static functions, as without remapping it is not really useful (also, generally full remapping isn't possible without mappings for defs outside the block/region)

NewPH->setPlan(NewPlan);
NewPlan->VFs = VFs;
NewPlan->UFs = UFs;
NewPlan->Name = Name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some attention should be given to renaming, of all clones. Would be good to leave behind a note to resolve later, if not now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO to adjust names, thanks!

NewEntry = NewBB;

for (VPBlockBase *Pred : BB->getPredecessors())
connectBlocks(Old2New[Pred], NewBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to also preserve order of successors. Suffice to rely on RPOT to traverse all successors in order, or worth validating at the end by traversing successors of both old and new in parallel and asserting NewSucc == Old2New[OldSucc]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added asserts checking successors/predecessors after cloning the CFG, thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest comments should be addressed, thanks!

@@ -10078,7 +10078,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

VPlan &BestMainPlan = LVP.getBestPlanFor(EPI.MainLoopVF);
VPlan &BestMainPlan = *LVP.getBestPlanFor(EPI.MainLoopVF).clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly!

@@ -615,6 +615,18 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
}
#endif

VPBlockBase *VPRegionBlock::clone() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! Also removed VPBlockBase::clone and moved to static functions, as without remapping it is not really useful (also, generally full remapping isn't possible without mappings for defs outside the block/region)

@@ -615,6 +615,18 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
}
#endif

VPBlockBase *VPRegionBlock::clone() {
DenseMap<VPBlockBase *, VPBlockBase *> Old2New;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks!

DenseMap<VPValue *, VPValue *> Old2NewVPValues;
VPBlockBase *NewEntry =
VPBlockUtils::cloneCFG(Entry, Old2New, Old2NewVPValues);
auto *NewR =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

DenseMap<VPBlockBase *, VPBlockBase *> Old2New;
DenseMap<VPValue *, VPValue *> Old2NewVPValues;
VPBlockBase *NewEntry =
VPBlockUtils::cloneCFG(Entry, Old2New, Old2NewVPValues);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

for (VPRecipeBase &NewR : *NewBB)
for (unsigned I = 0, E = NewR.getNumOperands(); I != E; ++I) {
VPValue *NewOp = Old2NewVPValues.lookup(NewR.getOperand(I));
if (!FullRemapping)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FullRemapping is gone

@@ -1594,6 +1657,13 @@ class VPWidenPHIRecipe : public VPHeaderPHIRecipe {
addOperand(Start);
}

VPRecipeBase *clone() override {
auto *Res = new VPWidenPHIRecipe(cast<PHINode>(getUnderlyingInstr()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@@ -2262,9 +2400,22 @@ class VPDerivedIVRecipe : public VPRecipeBase, public VPValue {
VPValue(this), TruncResultTy(TruncResultTy), Kind(IndDesc.getKind()),
FPBinOp(dyn_cast_or_null<FPMathOperator>(IndDesc.getInductionBinOp())) {
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 2414 to 2416
return new VPDerivedIVRecipe(Kind, FPBinOp, getOperand(0),
cast<VPCanonicalIVPHIRecipe>(getOperand(1)),
getOperand(2), TruncResultTy);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted, thanks!

@@ -2807,6 +2976,8 @@ class VPlan {
VPBasicBlock *getPreheader() { return Preheader; }
const VPBasicBlock *getPreheader() const { return Preheader; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

static void cloneCFG(VPBlockBase *Entry,
DenseMap<VPBlockBase *, VPBlockBase *> &Old2NewVPBlocks);

static VPBlockBase *cloneVPB(VPBlockBase *BB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloneVPB() seems to better fit as a virtual method VPBlockBase::clone()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, the reason for keeping it static here is that it would only be useful in combination with operand remapping, having it as member of the API may be a bit error prone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the term clone() refers to "w/o-remapping-operands" when applied to recipes - as a virtual method rather than a static cloneR() function - so may seem natural if applied as such to their containing VPBB's and regions. The exception is VPlan::clone() which also takes care of remapping the operands. To better indicate when operands are remapped and when not, perhaps unmappedClone() and mappedClone() could be used instead, or clone() for w/o remapping, and, say, VPlan::duplicate() for w/ remapping (or vise versa).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to move block cloning to ::clone() (in line with recipes) and renamed VPlan::clone to VPlan::duplicate.

Comment on lines 617 to 618
static void cloneCFG(VPBlockBase *Entry,
DenseMap<VPBlockBase *, VPBlockBase *> &Old2NewVPBlocks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cloning is recursive, so perhaps more accurately called cloneHCFG().
It also clones a (maximal(*)) SESE region, so could be called cloneSESE(), and return a pair of VPB's - the new entry and new exit? Could Old2NewVPBlocks map be internal to the function, rather than an in/out parameter?

(*) There are currently two callees: the top-level of VPlan consisting of the vector-loop-preheader - loop-region - middle-block, and the internal CFG of a region from its entry/header to its exit(ing)/latch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested and renamed, thanks!

DenseMap<VPBlockBase *, VPBlockBase *> Old2NewVPBlocks;
DenseMap<VPValue *, VPValue *> Old2NewVPValues;

auto *NewPlan = new VPlan();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps blocks should be clones first, then values cloned and remapped. I.e.,

  auto *NewPreheader = getPreheader().clone(); // Currently a disconnected VPBB.
  auto [NewEntry, _ ] = cloneSESE(getEntry());
  auto *NewPlan = new VPlan(NewPreheader, NewEntry);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordered as suggested, thanks!

auto *NewPlan = new VPlan();

// Clone live-ins.
SmallVector<VPValue *, 16> NewLiveIns;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is NewLiveIns needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the latest version, removed, thanks!

}
assert(TripCount && "trip count must be set");
if (TripCount->isLiveIn())
Old2NewVPValues[TripCount] = new VPValue(TripCount->getLiveInIRValue());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// else NewTripCount will be created and inserted into Old2NewVPValues when TripCount is cloned. In any case NewPlan->TripCount is updated below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

NewPlan->UFs = UFs;
// TODO: Adjust names.
NewPlan->Name = Name;
NewPlan->TripCount = Old2NewVPValues[TripCount];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can assert that Old2NewVPValues contains TripCount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@@ -1594,6 +1657,13 @@ class VPWidenPHIRecipe : public VPHeaderPHIRecipe {
addOperand(Start);
}

VPRecipeBase *clone() override {
auto *Res = new VPWidenPHIRecipe(cast<PHINode>(getUnderlyingInstr()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better mark it unreachable than have untested dead code, potentially misconsidered live?

Comment on lines 2855 to 2857
/// Construct an uninitialized VPlan, should be used for cloning only.
explicit VPlan() = default;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed? (See above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

zip(OldBB->successors(), NewBB->successors()))
assert(NewSucc == Old2NewVPBlocks[OldSucc] && "Different successors");
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return Old2NewVPBlocks[Entry] along with (the last) NewBB, or along with Old2NewVPBlocks[Exit] where Exit = *RPOT->end()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!


auto *VPR = cast<VPRegionBlock>(BB);
DenseMap<VPBlockBase *, VPBlockBase *> Old2NewVPBlocks;
DenseMap<VPValue *, VPValue *> Old2NewVPValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Old2NewVPValues needed here?

Hopefully Old2NewVPBlocks will not be needed either, doing instead something like:

  auto [NewEntry, NewExiting] = cloneSESE(VPR->getEntry());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed in the latest version, removed, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few last minor comments.

VPBlockUtils::blocksOnly<VPBasicBlock>(NewDeepRPOT))) {
assert(OldBB->getRecipeList().size() == NewBB->getRecipeList().size() &&
"blocks must have the same number of recipes");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

Comment on lines 1083 to 1084
VPBlockBase *NewPreheader = cloneVPB(Preheader);
const auto &[NewEntry, __] = cloneSESE(getEntry());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency nit: use Preheader and Entry, or getPreheader() and getEntry().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use Entry, thanks!

@@ -982,6 +1037,92 @@ void VPlan::updateDominatorTree(DominatorTree *DT, BasicBlock *LoopHeaderBB,
assert(DT->verify(DominatorTree::VerificationLevel::Fast));
}

static void remapOperands(VPBlockBase *Entry, VPBlockBase *NewEntry,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remapOperands() be a lambda inside VPlan::clone()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is, as keeping it is quite long and having it separate makes it a bit easier to read IMO (also reduces one level of indent), but happy to move it preferred.

static void cloneCFG(VPBlockBase *Entry,
DenseMap<VPBlockBase *, VPBlockBase *> &Old2NewVPBlocks);

static VPBlockBase *cloneVPB(VPBlockBase *BB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the term clone() refers to "w/o-remapping-operands" when applied to recipes - as a virtual method rather than a static cloneR() function - so may seem natural if applied as such to their containing VPBB's and regions. The exception is VPlan::clone() which also takes care of remapping the operands. To better indicate when operands are remapped and when not, perhaps unmappedClone() and mappedClone() could be used instead, or clone() for w/o remapping, and, say, VPlan::duplicate() for w/ remapping (or vise versa).

Comment on lines 1114 to 1115
NewEntry->setPlan(NewPlan);
NewPreheader->setPlan(NewPlan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NewEntry->setPlan(NewPlan);
NewPreheader->setPlan(NewPlan);

taken care of when constructing NewPlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

}

VPlan *VPlan::clone() {
DenseMap<VPValue *, VPValue *> Old2NewVPValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can define below when starting to deal with values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, ship it!

@fhahn fhahn merged commit ec402a2 into llvm:main Jan 27, 2024
@fhahn fhahn deleted the vplan-clone branch January 27, 2024 13:30
@Enna1
Copy link
Contributor

Enna1 commented Jan 28, 2024

Hi @fhahn

I think this change introduce a memory leak in VPlan::duplicate(), see https://lab.llvm.org/buildbot/#/builders/168/builds/18308/steps/10/logs/stdio

Can you take a look?

@fhahn
Copy link
Contributor Author

fhahn commented Jan 28, 2024

@Enna1 thanks, should be fixed by 1b37e80. Bot should be back to green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants