Skip to content

[VPlan] Update getBestPlan to return VF, use also for epilogue vec. #98821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 26, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jul 14, 2024

Update getBestPlan to return the VF alongside the best plan instead of restricting the returned plan's VFs to the best VF.

This is allows using getBestPlan to also get the best VPlan for epilogue vectorization. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues.

Update getBestPlan to return the VF alongside the best plan instead of
restricting the returned plan's VFs to the best VF.

This is allows using getBestPlan to also get the best VPlan for epilogue
vectorization. As the same plan may be used to vectorize both the main
and epilogue loop, restricting the VF of the best plan would cause
issues.
@llvmbot
Copy link
Member

llvmbot commented Jul 14, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update getBestPlan to return the VF alongside the best plan instead of restricting the returned plan's VFs to the best VF.

This is allows using getBestPlan to also get the best VPlan for epilogue vectorization. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues.


Full diff: https://github.com/llvm/llvm-project/pull/98821.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+3-2)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+8-12)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index c63cf0c37f2f9..62b5d270e2f04 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -365,8 +365,9 @@ class LoopVectorizationPlanner {
   /// Return the best VPlan for \p VF.
   VPlan &getBestPlanFor(ElementCount VF) const;
 
-  /// Return the most profitable plan and fix its VF to the most profitable one.
-  VPlan &getBestPlan() const;
+  /// Return the most profitable vectorization factor together with the most
+  /// profitable plan containing that vectorization factor.
+  std::pair<ElementCount, VPlan &> getBestPlan() const;
 
   /// Generate the IR code for the vectorized loop captured in VPlan \p BestPlan
   /// according to the best selected \p VF and  \p UF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7d37d67cde29c..2ea136c7ebc48 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7423,11 +7423,11 @@ InstructionCost LoopVectorizationPlanner::cost(VPlan &Plan,
   return Cost;
 }
 
-VPlan &LoopVectorizationPlanner::getBestPlan() const {
+std::pair<ElementCount, VPlan &> LoopVectorizationPlanner::getBestPlan() const {
   // If there is a single VPlan with a single VF, return it directly.
   VPlan &FirstPlan = *VPlans[0];
   if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1)
-    return FirstPlan;
+    return {*FirstPlan.vectorFactors().begin(), FirstPlan};
 
   VPlan *BestPlan = &FirstPlan;
   ElementCount ScalarVF = ElementCount::getFixed(1);
@@ -7466,8 +7466,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
       }
     }
   }
-  BestPlan->setVF(BestFactor.Width);
-  return *BestPlan;
+  return {BestFactor.Width, *BestPlan};
 }
 
 VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {
@@ -10287,6 +10286,11 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     } else {
       // If we decided that it is *legal* to vectorize the loop, then do it.
 
+      const auto &[Width, BestPlan] = LVP.getBestPlan();
+      LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width << "\n");
+      assert(VF.Width == Width &&
+             "VPlan cost model and legacy cost model disagreed");
+
       // Consider vectorizing the epilogue too if it's profitable.
       VectorizationFactor EpilogueVF =
           LVP.selectEpilogueVectorizationFactor(VF.Width, IC);
@@ -10395,14 +10399,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                                VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
                                PSI, Checks);
 
-        VPlan &BestPlan = LVP.getBestPlan();
-        assert(size(BestPlan.vectorFactors()) == 1 &&
-               "Plan should have a single VF");
-        ElementCount Width = *BestPlan.vectorFactors().begin();
-        LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width
-                          << "\n");
-        assert(VF.Width == Width &&
-               "VPlan cost model and legacy cost model disagreed");
         LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 

Copy link
Member

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a simple cleanup, LGTM, but please wait a day in case others have comments.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further thoughts.

ElementCount Width = VF.Width;
std::optional<VPlan *> VPlanFromVPCost;
if (!UseLegacyCostModel) {
const auto &[VPWidth, Plan] = LVP.getBestPlan();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing to claim that getBestPlan() is also used for epilogue vectorization, given that Plan is soon overwritten by BestPlan, which is later overwritten by BestMainPlan (and BestEpiPlan) for epilogue vectorization. Could (some of) this redundancy be eliminated.

The current assertion checks best VF's rather than best Plans, verifying that LVP.getBestVF() equals the VF.Width of LVP.plan(), when vectorizing the main loop only; i.e., avoids checking the VF's of the main or epilog loops when doing epilog vectorization, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the temporary UseLegacyCostModel option now that we branched for 19.x, so this is now simpler again

Comment on lines 10344 to 10345
Width = VPWidth;
assert(VF.Width == Width &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Width = VPWidth;
assert(VF.Width == Width &&
assert(VF.Width == VPWidth &&

A bit confusing to have both VPWidth and VF.Width, perhaps clearer to rename the former VPlanWidth. Width is expected to be equal to both.

@@ -7496,8 +7496,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
}
}
}
BestPlan->setVF(BestFactor.Width);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is better to avoid restricting BestPlan's VF's to BestFactor.Width alone, thereby discarding all other associated VF's, one of which may be reuseful for subsequent epilog vectorization. This was the original intent of LVP's getBestPlan(). Cloning could be employed to retain the other VF's, but may be redundant - in case they are not needed.

VPlan &getBestPlan() const;
/// Return the most profitable vectorization factor together with the most
/// profitable plan containing that vectorization factor.
std::pair<ElementCount, VPlan &> getBestPlan() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an underlying assumption that, at-least at this stage, there is at most a single VPlan per VF.

Perhaps it would be clearer to have getBestVF() complement getPlanFor(VF) - rather than getBestPlan() and getBestPlanFor(VF), which is more aligned with LVP's plan()? This would require two calls instead of a single getBestPlan() with two return values, but getPlanFor(VF) could be a simple map retrieval.

The VPlan per scalar VF deserves perhaps to be held separately, corresponding to the original scalar loop, w/ or w/o undergoing unrolling(?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to introduce getBestVF and use it in combination with getBestPlanFor, without a map yet.

// If there is a single VPlan with a single VF, return it directly.
VPlan &FirstPlan = *VPlans[0];
if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1)
return FirstPlan;
return {*FirstPlan.vectorFactors().begin(), FirstPlan};

VPlan *BestPlan = &FirstPlan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (independent of this patch) a slight discrepancy in recording an initial {BestPlan, BestFactor} pair as {FirstPlan, ScalarCost or Max cost}, rather than having BestFactor be a factor of BestPlan. Somewhat more consistent to set BestPlan initially to ScalarPlan instead?

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, adding last few minor comments.

@@ -7178,12 +7177,10 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
VectorizationFactor CurrentFactor(VF, Cost, ScalarCost);
if (isMoreProfitable(CurrentFactor, BestFactor)) {
BestFactor = CurrentFactor;
BestPlan = &*P;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can now also drop the {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 9972 to 9974
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
assert(BestVF.isScalar() &&
"VPlan cost model and legacy cost model disagreed");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
assert(BestVF.isScalar() &&
"VPlan cost model and legacy cost model disagreed");
assert(BestVF.isScalar() &&
"VPlan cost model and legacy cost model disagreed");
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordered, thanks!

@@ -9985,15 +9983,20 @@ bool LoopVectorizePass::processLoop(Loop *L) {
} else {
// If we decided that it is *legal* to vectorize the loop, then do it.

ElementCount BestVF = LVP.getBestVF();
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);

better set closer to where its used below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epilogue code now also uses the fetched plan here to be duplicated for the main loop, so I retained it here, as it shared by both sides of the if

VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
PSI, Checks);
LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left at the original position as it is shared by both sides of the if

Comment on lines 365 to 366
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
/// Return the VPlan for \p VF.
VPlan &getPlanFor(ElementCount VF) const;

would be more accurate, independent of this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do separately, thanks!

assert(BestPlan.hasScalarVFOnly() &&
ElementCount BestVF = LVP.getBestVF();
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
assert(BestVF.isScalar() &&
"VPlan cost model and legacy cost model disagreed");
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
LVP.executePlan(BestVF, IC, BestPlan, Unroller, DT, false);

following the typical pattern of having LVP.executePlan(VF, IC, Plan, ...) follow VPlan &Plan = LVP.getPlanFor(VF).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

fhahn added a commit that referenced this pull request Jul 26, 2024
Cleanup to make things consistent in preparation for
#98821.
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

assert(EPI.MainLoopVF == VF.Width && "VFs must match");
std::unique_ptr<VPlan> BestMainPlan(
LVP.getBestPlanFor(VF.Width).duplicate());
std::unique_ptr<VPlan> BestMainPlan(BestPlan.duplicate());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, BestPlan is now used here.

if (EpilogueVF.Width.isVector()) {

// The first pass vectorizes the main loop and creates a scalar epilogue
// to be vectorized by executing the plan (potentially with a different
// factor) again shortly afterwards.
EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1);
EpilogueLoopVectorizationInfo EPI(BestVF, IC, EpilogueVF.Width, 1);
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

assert(EPI.MainLoopVF == VF.Width && "VFs must match");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(EPI.MainLoopVF == VF.Width && "VFs must match");
assert(EPI.MainLoopVF == BestVF && "VFs must match");

or dropped - given that this holds by construction of EPI?

Maybe worth revisiting the use of EPI throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF; below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it is not needed any longer, removed in 5a9b9ef

Maybe worth revisiting the use of EPI throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF; below.

Yes, this should be untangled.

@fhahn fhahn merged commit 67a55e0 into llvm:main Jul 26, 2024
7 checks passed
@fhahn fhahn deleted the vplan-getbestplan-for-epilogue branch July 26, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants