Skip to content

[VPlan] Update getBestPlan to return VF, use also for epilogue vec. #98821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 26, 2024
4 changes: 2 additions & 2 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -365,8 +365,8 @@ class LoopVectorizationPlanner {
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
Comment on lines 365 to 366
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
/// Return the VPlan for \p VF.
VPlan &getPlanFor(ElementCount VF) const;

would be more accurate, independent of this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do separately, thanks!


/// Return the most profitable plan and fix its VF to the most profitable one.
VPlan &getBestPlan() const;
/// Return the most profitable vectorization factor.
ElementCount getBestVF() const;

/// Generate the IR code for the vectorized loop captured in VPlan \p BestPlan
/// according to the best selected \p VF and \p UF.
Expand Down
43 changes: 18 additions & 25 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7162,13 +7162,12 @@ InstructionCost LoopVectorizationPlanner::cost(VPlan &Plan,
return Cost;
}

VPlan &LoopVectorizationPlanner::getBestPlan() const {
ElementCount LoopVectorizationPlanner::getBestVF() const {
// If there is a single VPlan with a single VF, return it directly.
VPlan &FirstPlan = *VPlans[0];
if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1)
return FirstPlan;
return *FirstPlan.vectorFactors().begin();

VPlan *BestPlan = &FirstPlan;
ElementCount ScalarVF = ElementCount::getFixed(1);
assert(hasPlanWithVF(ScalarVF) &&
"More than a single plan/VF w/o any plan having scalar VF");
Expand Down Expand Up @@ -7199,14 +7198,11 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {

InstructionCost Cost = cost(*P, VF);
VectorizationFactor CurrentFactor(VF, Cost, ScalarCost);
if (isMoreProfitable(CurrentFactor, BestFactor)) {
if (isMoreProfitable(CurrentFactor, BestFactor))
BestFactor = CurrentFactor;
BestPlan = &*P;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can now also drop the {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

}
}
}
BestPlan->setVF(BestFactor.Width);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is better to avoid restricting BestPlan's VF's to BestFactor.Width alone, thereby discarding all other associated VF's, one of which may be reuseful for subsequent epilog vectorization. This was the original intent of LVP's getBestPlan(). Cloning could be employed to retain the other VF's, but may be redundant - in case they are not needed.

return *BestPlan;
return BestFactor.Width;
}

VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {
Expand Down Expand Up @@ -10001,10 +9997,11 @@ bool LoopVectorizePass::processLoop(Loop *L) {
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
&CM, BFI, PSI, Checks);

VPlan &BestPlan = LVP.getBestPlan();
assert(BestPlan.hasScalarVFOnly() &&
ElementCount BestVF = LVP.getBestVF();
assert(BestVF.isScalar() &&
"VPlan cost model and legacy cost model disagreed");
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
LVP.executePlan(BestVF, IC, BestPlan, Unroller, DT, false);

ORE->emit([&]() {
return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
Expand All @@ -10015,21 +10012,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
} else {
// If we decided that it is *legal* to vectorize the loop, then do it.

ElementCount BestVF = LVP.getBestVF();
LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << BestVF << "\n");
assert(VF.Width == BestVF &&
"VPlan cost model and legacy cost model disagreed");
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
// Consider vectorizing the epilogue too if it's profitable.
VectorizationFactor EpilogueVF =
LVP.selectEpilogueVectorizationFactor(VF.Width, IC);
LVP.selectEpilogueVectorizationFactor(BestVF, IC);
if (EpilogueVF.Width.isVector()) {

// The first pass vectorizes the main loop and creates a scalar epilogue
// to be vectorized by executing the plan (potentially with a different
// factor) again shortly afterwards.
EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1);
EpilogueLoopVectorizationInfo EPI(BestVF, IC, EpilogueVF.Width, 1);
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

assert(EPI.MainLoopVF == VF.Width && "VFs must match");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(EPI.MainLoopVF == VF.Width && "VFs must match");
assert(EPI.MainLoopVF == BestVF && "VFs must match");

or dropped - given that this holds by construction of EPI?

Maybe worth revisiting the use of EPI throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF; below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it is not needed any longer, removed in 5a9b9ef

Maybe worth revisiting the use of EPI throughout, separately, as it leads to an inflation of VF's, a redundant setting of EpilogueUF asserted to be 1, a confusing resetting of EPI.MainLoopVF = EPI.EpilogueVF; below.

Yes, this should be untangled.

std::unique_ptr<VPlan> BestMainPlan(
LVP.getBestPlanFor(VF.Width).duplicate());
std::unique_ptr<VPlan> BestMainPlan(BestPlan.duplicate());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, BestPlan is now used here.

const auto &[ExpandedSCEVs, ReductionResumeValues] = LVP.executePlan(
EPI.MainLoopVF, EPI.MainLoopUF, *BestMainPlan, MainILV, DT, true);
++LoopsVectorized;
Expand Down Expand Up @@ -10120,18 +10121,10 @@ bool LoopVectorizePass::processLoop(Loop *L) {
if (!MainILV.areSafetyChecksAdded())
DisableRuntimeUnroll = true;
} else {
VPlan &BestPlan = LVP.getBestPlan();
assert(size(BestPlan.vectorFactors()) == 1 &&
"Plan should have a single VF");
ElementCount Width = *BestPlan.vectorFactors().begin();
LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width
<< "\n");
assert(VF.Width == Width &&
"VPlan cost model and legacy cost model disagreed");
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, Width,
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, BestVF,
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
PSI, Checks);
LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);
VPlan &BestPlan = LVP.getBestPlanFor(BestVF);
LVP.executePlan(BestVF, IC, BestPlan, LB, DT, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left at the original position as it is shared by both sides of the if

++LoopsVectorized;

// Add metadata to disable runtime unrolling a scalar loop when there
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Not Interleaving.
; CHECK-NEXT: LV: Interleaving is not beneficial.
; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>
; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4
; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1
; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
; CHECK-NEXT: Live-in vp<%0> = VF * UF
Expand Down Expand Up @@ -340,8 +340,8 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Not Interleaving.
; CHECK-NEXT: LV: Interleaving is not beneficial.
; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>
; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4
; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1
; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
; CHECK-NEXT: Live-in vp<%0> = VF * UF
Expand Down
Loading