Skip to content

Commit 1930524

Browse files
authored
[LoopVectorize] Fix cost model assert when vectorising calls (#125716)
The legacy and vplan cost models did not agree because VPWidenCallRecipe::computeCost only calculates the cost of the call instruction, whereas LoopVectorizationCostModel::setVectorizedCallDecision in some cases adds on the cost of a synthesised mask argument. However, this mask is always 'splat(i1 true)' which should be hoisted out of the loop during codegen. In order to synchronise the two cost models I have two options: 1) Also add the cost of the splat to the vplan model, or 2) Remove the cost of the splat from the legacy model. I chose 2) because I feel this more closely represents what the final code will look like. There is an argument that we should take account of such broadcast costs in the preheader when deciding if it's profitable to vectorise a loop, however there isn't currently a mechanism to do this. We currently only take account of the runtime checks when assessing profitability and what the minimum trip count should be. However, I don't believe this work needs doing as part of this PR.
1 parent 7aeae73 commit 1930524

File tree

2 files changed

+262
-23
lines changed

2 files changed

+262
-23
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6354,19 +6354,8 @@ void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
63546354
break;
63556355
}
63566356

6357-
// Add in the cost of synthesizing a mask if one wasn't required.
6358-
InstructionCost MaskCost = 0;
6359-
if (VecFunc && UsesMask && !MaskRequired)
6360-
MaskCost = TTI.getShuffleCost(
6361-
TargetTransformInfo::SK_Broadcast,
6362-
VectorType::get(IntegerType::getInt1Ty(
6363-
VecFunc->getFunctionType()->getContext()),
6364-
VF),
6365-
{}, CostKind);
6366-
63676357
if (TLI && VecFunc && !CI->isNoBuiltin())
6368-
VectorCost =
6369-
TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;
6358+
VectorCost = TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind);
63706359

63716360
// Find the cost of an intrinsic; some targets may have instructions that
63726361
// perform the operation without needing an actual call.

0 commit comments

Comments
 (0)