Skip to content

[LoopVectorize] Use CodeSize as the cost kind for minsize #124119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -989,9 +989,10 @@ class LoopVectorizationCostModel {
InterleavedAccessInfo &IAI)
: ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),
TTI(TTI), TLI(TLI), DB(DB), AC(AC), ORE(ORE), TheFunction(F),
Hints(Hints), InterleaveInfo(IAI), CostKind(TTI::TCK_RecipThroughput) {
Hints(Hints), InterleaveInfo(IAI) {
if (TTI.supportsScalableVectors() || ForceTargetSupportsScalableVectors)
initializeVScaleForTuning();
CostKind = F->hasMinSize() ? TTI::TCK_CodeSize : TTI::TCK_RecipThroughput;
}

/// \return An upper bound for the vectorization factors (both fixed and
Expand Down Expand Up @@ -3384,7 +3385,7 @@ LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,
// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally
// likely.
ScalarizationCost = ScalarizationCost / getReciprocalPredBlockProb();
ScalarizationCost = ScalarizationCost / getPredBlockCostDivisor(CostKind);
}
InstructionCost SafeDivisorCost = 0;

Expand Down Expand Up @@ -4300,6 +4301,13 @@ bool LoopVectorizationPlanner::isMoreProfitable(
EstimatedWidthB *= *VScale;
}

// When optimizing for size choose whichever is smallest, which will be the
// one with the smallest cost for the whole loop. On a tie pick the larger
// vector width, on the assumption that throughput will be greater.
if (CM.CostKind == TTI::TCK_CodeSize)
return CostA < CostB ||
(CostA == CostB && EstimatedWidthA > EstimatedWidthB);

// Assume vscale may be larger than 1 (or the value being tuned for),
// so that scalable vectorization is slightly favorable over fixed-width
// vectorization.
Expand Down Expand Up @@ -5530,7 +5538,7 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
}

// Scale the total scalar cost by block probability.
ScalarCost /= getReciprocalPredBlockProb();
ScalarCost /= getPredBlockCostDivisor(CostKind);

// Compute the discount. A non-negative discount means the vector version
// of the instruction costs more, and scalarizing would be beneficial.
Expand Down Expand Up @@ -5583,7 +5591,7 @@ InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
// cost by the probability of executing it. blockNeedsPredication from
// Legal is used so as to not include all blocks in tail folded loops.
if (VF.isScalar() && Legal->blockNeedsPredication(BB))
BlockCost /= getReciprocalPredBlockProb();
BlockCost /= getPredBlockCostDivisor(CostKind);

Cost += BlockCost;
}
Expand Down Expand Up @@ -5661,7 +5669,7 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
// conditional branches, but may not be executed for each vector lane. Scale
// the cost by the probability of executing the predicated block.
if (isPredicatedInst(I)) {
Cost /= getReciprocalPredBlockProb();
Cost /= getPredBlockCostDivisor(CostKind);

// Add the cost of an i1 extract and a branch
auto *VecI1Ty =
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,7 @@ InstructionCost VPRegionBlock::cost(ElementCount VF, VPCostContext &Ctx) {
// For the scalar case, we may not always execute the original predicated
// block, Thus, scale the block's cost by the probability of executing it.
if (VF.isScalar())
return ThenCost / getReciprocalPredBlockProb();
return ThenCost / getPredBlockCostDivisor(Ctx.CostKind);

return ThenCost;
}
Expand Down
15 changes: 11 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlanHelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,20 @@ Value *getRuntimeVF(IRBuilderBase &B, Type *Ty, ElementCount VF);
Value *createStepForVF(IRBuilderBase &B, Type *Ty, ElementCount VF,
int64_t Step);

/// A helper function that returns the reciprocal of the block probability of
/// predicated blocks. If we return X, we are assuming the predicated block
/// will execute once for every X iterations of the loop header.
/// A helper function that returns how much we should divide the cost of a
/// predicated block by. Typically this is the reciprocal of the block
/// probability, i.e. if we return X we are assuming the predicated block will
/// execute once for every X iterations of the loop header so the block should
/// only contribute 1/X of its cost to the total cost calculation, but when
/// optimizing for code size it will just be 1 as code size costs don't depend
/// on execution probabilities.
///
/// TODO: We should use actual block probability here, if available. Currently,
/// we always assume predicated blocks have a 50% chance of executing.
inline unsigned getReciprocalPredBlockProb() { return 2; }
inline unsigned
getPredBlockCostDivisor(TargetTransformInfo::TargetCostKind CostKind) {
return CostKind == TTI::TCK_CodeSize ? 1 : 2;
}

/// A range of powers-of-2 vectorization factors with fixed start and
/// adjustable end. The range includes start and excludes end, e.g.,:
Expand Down
Loading