Skip to content

Commit d508308

Browse files
[RISCV][CostModel] VPIntrinsics have same cost as their non-vp counterparts
On RISCV, only a few VPIntrinsics have their cost modeled by the VectorIntrinsicCostTable. Even so, none of those entries consider LMUL. All other VPIntrinsics do not have meaningful modeling. This patch models the cost of a VPIntrinsic as the cost of its non-VP counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP version depending on VL. On RISCV, this may be due two reasons (if the instruction is part of a loop): 1. A smaller VL can be used on the last iteration of the loop. 2. The VP instruction may avoid a scalar remainder loop. I have left this as a TODO since I think this change puts us on the right path of modeling the cost of a VPInstruction, and it isn't entierly clear to me how much of a discount we should give to a known VL<VLMAX or what to do when VL is unknown at compile time.
1 parent e13d041 commit d508308

File tree

3 files changed

+63
-36
lines changed

3 files changed

+63
-36
lines changed

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1687,6 +1687,33 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
16871687
}
16881688
}
16891689

1690+
// VP Intrinsics should have the same cost as their non-vp counterpart.
1691+
// TODO: Adjust the cost to make the vp intrinsic cheaper than its non-vp
1692+
// counterpart when the vector length argument is smaller than the maximum
1693+
// vector length.
1694+
if (VPIntrinsic::isVPIntrinsic(ICA.getID())) {
1695+
std::optional<Intrinsic::ID> FOp =
1696+
VPIntrinsic::getFunctionalOpcodeForVP(ICA.getID());
1697+
if (FOp)
1698+
return thisT()->getArithmeticInstrCost(*FOp, ICA.getReturnType(),
1699+
CostKind);
1700+
1701+
std::optional<Intrinsic::ID> FID =
1702+
VPIntrinsic::getFunctionalIntrinsicIDForVP(ICA.getID());
1703+
if (FID) {
1704+
// Non-vp version will have same Args/Tys except mask and vector length.
1705+
ArrayRef<const Value *> NewArgs(ICA.getArgs().begin(),
1706+
ICA.getArgs().end() - 2);
1707+
ArrayRef<Type *> NewTys(ICA.getArgTypes().begin(),
1708+
ICA.getArgTypes().end() - 2);
1709+
1710+
IntrinsicCostAttributes NewICA(*FID, ICA.getReturnType(), NewArgs,
1711+
NewTys, ICA.getFlags(), ICA.getInst(),
1712+
ICA.getScalarizationCost());
1713+
return thisT()->getIntrinsicInstrCost(NewICA, CostKind);
1714+
}
1715+
}
1716+
16901717
// Assume that we need to scalarize this intrinsic.
16911718
// Compute the scalarization overhead based on Args for a vector
16921719
// intrinsic.

llvm/test/Analysis/CostModel/RISCV/gep.ll

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
270270
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = getelementptr i8, ptr %base, i32 42
271271
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
272272
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = getelementptr i8, ptr %base, i32 42
273-
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
273+
; RVI-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
274274
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = getelementptr i8, ptr %base, i32 42
275275
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
276276
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = getelementptr i8, ptr %base, i32 42
@@ -282,7 +282,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
282282
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = getelementptr i8, ptr %base, i32 42
283283
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
284284
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = getelementptr i8, ptr %base, i32 42
285-
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
285+
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
286286
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %12 = getelementptr i8, ptr %base, i32 42
287287
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
288288
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
@@ -340,7 +340,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
340340
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %4 = getelementptr i8, ptr %base, i32 0
341341
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
342342
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %5 = getelementptr i8, ptr %base, i32 0
343-
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
343+
; RVI-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
344344
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %6 = getelementptr i8, ptr %base, i32 0
345345
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
346346
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %7 = getelementptr i8, ptr %base, i32 0
@@ -352,7 +352,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
352352
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %10 = getelementptr i8, ptr %base, i32 0
353353
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
354354
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %11 = getelementptr i8, ptr %base, i32 0
355-
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
355+
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
356356
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %12 = getelementptr i8, ptr %base, i32 0
357357
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
358358
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void

0 commit comments

Comments
 (0)