Skip to content

Commit d099d24

Browse files
lukel97yuxuanchen1997
authored andcommitted
[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594)
I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and rva22u64_v, and noticed that in a few cases that rva22u64_v was considerably slower. One of them was 519.lbm_r, which has a large loop that was being unprofitably vectorized. It has an if/else in the loop which requires large amounts of predication when vectorized, but despite the loop vectorizer taking this into account the vector cost came out as cheaper than the scalar. It looks like the reason for this is because we cost scalar floating point ops as 2, but their vector equivalents as 1 (for LMUL 1). This comes from how we use BasicTTIImpl for scalars which treats floats as twice as expensive as integers. This patch doubles the cost of vector floating point arithmetic ops so that they're at least as expensive as their scalar counterparts, which gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60. Fixes #62576 (the last point there about scalar fsub/fmul)
1 parent 348a15d commit d099d24

File tree

6 files changed

+484
-347
lines changed

6 files changed

+484
-347
lines changed

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1688,7 +1688,6 @@ InstructionCost RISCVTTIImpl::getArithmeticInstrCost(
16881688
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info,
16891689
Args, CxtI);
16901690

1691-
16921691
auto getConstantMatCost =
16931692
[&](unsigned Operand, TTI::OperandValueInfo OpInfo) -> InstructionCost {
16941693
if (OpInfo.isUniform() && TLI->canSplatOperand(Opcode, Operand))
@@ -1760,8 +1759,14 @@ InstructionCost RISCVTTIImpl::getArithmeticInstrCost(
17601759
Op1Info, Op2Info,
17611760
Args, CxtI);
17621761
}
1763-
return ConstantMatCost +
1764-
LT.first * getRISCVInstructionCost(Op, LT.second, CostKind);
1762+
1763+
InstructionCost InstrCost = getRISCVInstructionCost(Op, LT.second, CostKind);
1764+
// We use BasicTTIImpl to calculate scalar costs, which assumes floating point
1765+
// ops are twice as expensive as integer ops. Do the same for vectors so
1766+
// scalar floating point ops aren't cheaper than their vector equivalents.
1767+
if (Ty->isFPOrFPVectorTy())
1768+
InstrCost *= 2;
1769+
return ConstantMatCost + LT.first * InstrCost;
17651770
}
17661771

17671772
// TODO: Deduplicate from TargetTransformInfoImplCRTPBase.

0 commit comments

Comments
 (0)