-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[VPlan] Update getBestPlan to return VF, use also for epilogue vec. #98821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update getBestPlan to return the VF alongside the best plan instead of restricting the returned plan's VFs to the best VF. This is allows using getBestPlan to also get the best VPlan for epilogue vectorization. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues.
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesUpdate getBestPlan to return the VF alongside the best plan instead of restricting the returned plan's VFs to the best VF. This is allows using getBestPlan to also get the best VPlan for epilogue vectorization. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues. Full diff: https://github.com/llvm/llvm-project/pull/98821.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index c63cf0c37f2f9..62b5d270e2f04 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -365,8 +365,9 @@ class LoopVectorizationPlanner {
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
- /// Return the most profitable plan and fix its VF to the most profitable one.
- VPlan &getBestPlan() const;
+ /// Return the most profitable vectorization factor together with the most
+ /// profitable plan containing that vectorization factor.
+ std::pair<ElementCount, VPlan &> getBestPlan() const;
/// Generate the IR code for the vectorized loop captured in VPlan \p BestPlan
/// according to the best selected \p VF and \p UF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7d37d67cde29c..2ea136c7ebc48 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7423,11 +7423,11 @@ InstructionCost LoopVectorizationPlanner::cost(VPlan &Plan,
return Cost;
}
-VPlan &LoopVectorizationPlanner::getBestPlan() const {
+std::pair<ElementCount, VPlan &> LoopVectorizationPlanner::getBestPlan() const {
// If there is a single VPlan with a single VF, return it directly.
VPlan &FirstPlan = *VPlans[0];
if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1)
- return FirstPlan;
+ return {*FirstPlan.vectorFactors().begin(), FirstPlan};
VPlan *BestPlan = &FirstPlan;
ElementCount ScalarVF = ElementCount::getFixed(1);
@@ -7466,8 +7466,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
}
}
}
- BestPlan->setVF(BestFactor.Width);
- return *BestPlan;
+ return {BestFactor.Width, *BestPlan};
}
VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {
@@ -10287,6 +10286,11 @@ bool LoopVectorizePass::processLoop(Loop *L) {
} else {
// If we decided that it is *legal* to vectorize the loop, then do it.
+ const auto &[Width, BestPlan] = LVP.getBestPlan();
+ LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width << "\n");
+ assert(VF.Width == Width &&
+ "VPlan cost model and legacy cost model disagreed");
+
// Consider vectorizing the epilogue too if it's profitable.
VectorizationFactor EpilogueVF =
LVP.selectEpilogueVectorizationFactor(VF.Width, IC);
@@ -10395,14 +10399,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
PSI, Checks);
- VPlan &BestPlan = LVP.getBestPlan();
- assert(size(BestPlan.vectorFactors()) == 1 &&
- "Plan should have a single VF");
- ElementCount Width = *BestPlan.vectorFactors().begin();
- LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width
- << "\n");
- assert(VF.Width == Width &&
- "VPlan cost model and legacy cost model disagreed");
LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
++LoopsVectorized;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a simple cleanup, LGTM, but please wait a day in case others have comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further thoughts.
ElementCount Width = VF.Width; | ||
std::optional<VPlan *> VPlanFromVPCost; | ||
if (!UseLegacyCostModel) { | ||
const auto &[VPWidth, Plan] = LVP.getBestPlan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing to claim that getBestPlan() is also used for epilogue vectorization, given that Plan
is soon overwritten by BestPlan
, which is later overwritten by BestMainPlan
(and BestEpiPlan
) for epilogue vectorization. Could (some of) this redundancy be eliminated.
The current assertion checks best VF's rather than best Plans, verifying that LVP.getBestVF() equals the VF.Width of LVP.plan(), when vectorizing the main loop only; i.e., avoids checking the VF's of the main or epilog loops when doing epilog vectorization, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the temporary UseLegacyCostModel option now that we branched for 19.x, so this is now simpler again
Width = VPWidth; | ||
assert(VF.Width == Width && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Width = VPWidth; | |
assert(VF.Width == Width && | |
assert(VF.Width == VPWidth && |
A bit confusing to have both VPWidth and VF.Width, perhaps clearer to rename the former VPlanWidth. Width is expected to be equal to both.
@@ -7496,8 +7496,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const { | |||
} | |||
} | |||
} | |||
BestPlan->setVF(BestFactor.Width); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it is better to avoid restricting BestPlan's VF's to BestFactor.Width alone, thereby discarding all other associated VF's, one of which may be reuseful for subsequent epilog vectorization. This was the original intent of LVP's getBestPlan(). Cloning could be employed to retain the other VF's, but may be redundant - in case they are not needed.
VPlan &getBestPlan() const; | ||
/// Return the most profitable vectorization factor together with the most | ||
/// profitable plan containing that vectorization factor. | ||
std::pair<ElementCount, VPlan &> getBestPlan() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an underlying assumption that, at-least at this stage, there is at most a single VPlan per VF.
Perhaps it would be clearer to have getBestVF()
complement getPlanFor(VF)
- rather than getBestPlan()
and getBestPlanFor(VF)
, which is more aligned with LVP's plan()
? This would require two calls instead of a single getBestPlan() with two return values, but getPlanFor(VF) could be a simple map retrieval.
The VPlan per scalar VF deserves perhaps to be held separately, corresponding to the original scalar loop, w/ or w/o undergoing unrolling(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to introduce getBestVF
and use it in combination with getBestPlanFor
, without a map yet.
// If there is a single VPlan with a single VF, return it directly. | ||
VPlan &FirstPlan = *VPlans[0]; | ||
if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1) | ||
return FirstPlan; | ||
return {*FirstPlan.vectorFactors().begin(), FirstPlan}; | ||
|
||
VPlan *BestPlan = &FirstPlan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note (independent of this patch) a slight discrepancy in recording an initial {BestPlan, BestFactor} pair as {FirstPlan, ScalarCost or Max cost}, rather than having BestFactor be a factor of BestPlan. Somewhat more consistent to set BestPlan initially to ScalarPlan instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, adding last few minor comments.
@@ -7178,12 +7177,10 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const { | |||
VectorizationFactor CurrentFactor(VF, Cost, ScalarCost); | |||
if (isMoreProfitable(CurrentFactor, BestFactor)) { | |||
BestFactor = CurrentFactor; | |||
BestPlan = &*P; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can now also drop the {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); | ||
assert(BestVF.isScalar() && | ||
"VPlan cost model and legacy cost model disagreed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); | |
assert(BestVF.isScalar() && | |
"VPlan cost model and legacy cost model disagreed"); | |
assert(BestVF.isScalar() && | |
"VPlan cost model and legacy cost model disagreed"); | |
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reordered, thanks!
@@ -9985,15 +9983,20 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||
} else { | |||
// If we decided that it is *legal* to vectorize the loop, then do it. | |||
|
|||
ElementCount BestVF = LVP.getBestVF(); | |||
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); |
better set closer to where its used below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The epilogue code now also uses the fetched plan here to be duplicated for the main loop, so I retained it here, as it shared by both sides of the if
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI, | ||
PSI, Checks); | ||
LVP.executePlan(Width, IC, BestPlan, LB, DT, false); | ||
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false); | |
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); | |
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left at the original position as it is shared by both sides of the if
/// Return the best VPlan for \p VF. | ||
VPlan &getBestPlanFor(ElementCount VF) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Return the best VPlan for \p VF. | |
VPlan &getBestPlanFor(ElementCount VF) const; | |
/// Return the VPlan for \p VF. | |
VPlan &getPlanFor(ElementCount VF) const; |
would be more accurate, independent of this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do separately, thanks!
assert(BestPlan.hasScalarVFOnly() && | ||
ElementCount BestVF = LVP.getBestVF(); | ||
VPlan &BestPlan = LVP.getBestPlanFor(BestVF); | ||
assert(BestVF.isScalar() && | ||
"VPlan cost model and legacy cost model disagreed"); | ||
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false); | |
LVP.executePlan(BestVF, IC, BestPlan, Unroller, DT, false); |
following the typical pattern of having LVP.executePlan(VF, IC, Plan, ...)
follow VPlan &Plan = LVP.getPlanFor(VF)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
Cleanup to make things consistent in preparation for #98821.
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE, | ||
EPI, &LVL, &CM, BFI, PSI, Checks); | ||
|
||
assert(EPI.MainLoopVF == VF.Width && "VFs must match"); | ||
std::unique_ptr<VPlan> BestMainPlan( | ||
LVP.getBestPlanFor(VF.Width).duplicate()); | ||
std::unique_ptr<VPlan> BestMainPlan(BestPlan.duplicate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, BestPlan is now used here.
if (EpilogueVF.Width.isVector()) { | ||
|
||
// The first pass vectorizes the main loop and creates a scalar epilogue | ||
// to be vectorized by executing the plan (potentially with a different | ||
// factor) again shortly afterwards. | ||
EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1); | ||
EpilogueLoopVectorizationInfo EPI(BestVF, IC, EpilogueVF.Width, 1); | ||
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE, | ||
EPI, &LVL, &CM, BFI, PSI, Checks); | ||
|
||
assert(EPI.MainLoopVF == VF.Width && "VFs must match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(EPI.MainLoopVF == VF.Width && "VFs must match"); | |
assert(EPI.MainLoopVF == BestVF && "VFs must match"); |
or dropped - given that this holds by construction of EPI?
Maybe worth revisiting the use of EPI
throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF;
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it is not needed any longer, removed in 5a9b9ef
Maybe worth revisiting the use of EPI throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF; below.
Yes, this should be untangled.
Update getBestPlan to return the VF alongside the best plan instead of restricting the returned plan's VFs to the best VF.
This is allows using getBestPlan to also get the best VPlan for epilogue vectorization. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues.