Skip to content

Commit 3d7df0a

Browse files
authored
[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334)
This patch fixes the compilation time issue of matrix-types-spec test from test-suite. Reproduction of the problem: ``` clang++ -DNDEBUG --target=riscv64-linux-gnu --sysroot=<sysroot path> --gcc-toolchain=<gcc path> -O2 -fenable-matrix <test-suite-path>/SingleSource/UnitTests/matrix-types-spec.cpp ``` On my machine, compilation takes 50.44s. In comparison, the same test with RVV (-march=rv64gcv) compiles in 3.06s, and for x86-64 target it takes 1.71s. It turns out that the main issue is unrolling of loop in multiplySpec function, that has extractelements with non-constant index: ``` for.body9.i: ; preds = %for.body9.i, %for.cond6.preheader.i %indvars.iv.i92 = phi i64 [ 0, %for.cond6.preheader.i ], [ %indvars.iv.next.i93, %for.body9.i ] %Elt.033.i = phi double [ 0.000000e+00, %for.cond6.preheader.i ], [ %80, %for.body9.i ] %77 = mul nuw nsw i64 %indvars.iv.i92, 25 %78 = add nuw nsw i64 %77, %indvars.iv39.i91 %matrixext.i = extractelement <475 x double> %62, i64 %78 %79 = add nuw nsw i64 %indvars.iv.i92, %74 %matrixext13.i = extractelement <209 x double> %73, i64 %79 %80 = tail call double @llvm.fmuladd.f64(double %matrixext.i, double %matrixext13.i, double %Elt.033.i) %indvars.iv.next.i93 = add nuw nsw i64 %indvars.iv.i92, 1 %exitcond.not.i94 = icmp eq i64 %indvars.iv.next.i93, 19 br i1 %exitcond.not.i94, label %for.cond.cleanup8.i, label %for.body9.i, !llvm.loop !21 ``` When RVV is supported, extractelement/insertelement with non-constant index can be lowered quite efficiently with vslidedown/vslideup; otherwise it's lowered via stack memory operations, i.e. for extractelement each vector element is stored on stack and then the needed element is loaded back; for insertelement is stores all vector elements, rewrites the required element value and then loads vector back. Currently the cost of such expensive operation is estimated as zero, so loop unroll processes the loop more aggresively. The proper estimation of cost (in a way like in X86 target) prohibits unrolling of this loop and fixes compilation time (2.77s on my machine).
1 parent be8b559 commit 3d7df0a

File tree

5 files changed

+350
-62
lines changed

5 files changed

+350
-62
lines changed

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1463,8 +1463,26 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
14631463
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Val);
14641464

14651465
// This type is legalized to a scalar type.
1466-
if (!LT.second.isVector())
1467-
return 0;
1466+
if (!LT.second.isVector()) {
1467+
auto *FixedVecTy = cast<FixedVectorType>(Val);
1468+
// If Index is a known constant, cost is zero.
1469+
if (Index != -1U)
1470+
return 0;
1471+
// Extract/InsertElement with non-constant index is very costly when
1472+
// scalarized; estimate cost of loads/stores sequence via the stack:
1473+
// ExtractElement cost: store vector to stack, load scalar;
1474+
// InsertElement cost: store vector to stack, store scalar, load vector.
1475+
Type *ElemTy = FixedVecTy->getElementType();
1476+
auto NumElems = FixedVecTy->getNumElements();
1477+
auto Align = DL.getPrefTypeAlign(ElemTy);
1478+
InstructionCost LoadCost =
1479+
getMemoryOpCost(Instruction::Load, ElemTy, Align, 0, CostKind);
1480+
InstructionCost StoreCost =
1481+
getMemoryOpCost(Instruction::Store, ElemTy, Align, 0, CostKind);
1482+
return Opcode == Instruction::ExtractElement
1483+
? StoreCost * NumElems + LoadCost
1484+
: (StoreCost + LoadCost) * NumElems + StoreCost;
1485+
}
14681486

14691487
// For unsupported scalable vector.
14701488
if (LT.second.isScalableVector() && !LT.first.isValid())
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 3
2+
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=riscv32 -mattr=+f,+d,+zfh < %s | FileCheck %s --check-prefixes=RV32
3+
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=riscv64 -mattr=+f,+d,+zfh < %s | FileCheck %s --check-prefixes=RV64
4+
5+
define void @extractelement_int(i32 %x) {
6+
; RV32-LABEL: 'extractelement_int'
7+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i8 = extractelement <2 x i8> undef, i32 %x
8+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i8 = extractelement <4 x i8> undef, i32 %x
9+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i8 = extractelement <8 x i8> undef, i32 %x
10+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i8 = extractelement <16 x i8> undef, i32 %x
11+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16i8 = extractelement <vscale x 16 x i8> undef, i32 %x
12+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i16 = extractelement <2 x i16> undef, i32 %x
13+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i16 = extractelement <4 x i16> undef, i32 %x
14+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i16 = extractelement <8 x i16> undef, i32 %x
15+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i16 = extractelement <16 x i16> undef, i32 %x
16+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16i16 = extractelement <vscale x 16 x i16> undef, i32 %x
17+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i32 = extractelement <2 x i32> undef, i32 %x
18+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i32 = extractelement <4 x i32> undef, i32 %x
19+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i32 = extractelement <8 x i32> undef, i32 %x
20+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i32 = extractelement <16 x i32> undef, i32 %x
21+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16i32 = extractelement <vscale x 16 x i32> undef, i32 %x
22+
; RV32-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2i64 = extractelement <2 x i64> undef, i32 %x
23+
; RV32-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v4i64 = extractelement <4 x i64> undef, i32 %x
24+
; RV32-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %v8i64 = extractelement <8 x i64> undef, i32 %x
25+
; RV32-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %v16i64 = extractelement <16 x i64> undef, i32 %x
26+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16i64 = extractelement <vscale x 16 x i64> undef, i32 %x
27+
; RV32-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
28+
;
29+
; RV64-LABEL: 'extractelement_int'
30+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i8 = extractelement <2 x i8> undef, i32 %x
31+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i8 = extractelement <4 x i8> undef, i32 %x
32+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i8 = extractelement <8 x i8> undef, i32 %x
33+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i8 = extractelement <16 x i8> undef, i32 %x
34+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16i8 = extractelement <vscale x 16 x i8> undef, i32 %x
35+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i16 = extractelement <2 x i16> undef, i32 %x
36+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i16 = extractelement <4 x i16> undef, i32 %x
37+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i16 = extractelement <8 x i16> undef, i32 %x
38+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i16 = extractelement <16 x i16> undef, i32 %x
39+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16i16 = extractelement <vscale x 16 x i16> undef, i32 %x
40+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i32 = extractelement <2 x i32> undef, i32 %x
41+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i32 = extractelement <4 x i32> undef, i32 %x
42+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i32 = extractelement <8 x i32> undef, i32 %x
43+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i32 = extractelement <16 x i32> undef, i32 %x
44+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16i32 = extractelement <vscale x 16 x i32> undef, i32 %x
45+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2i64 = extractelement <2 x i64> undef, i32 %x
46+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i64 = extractelement <4 x i64> undef, i32 %x
47+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i64 = extractelement <8 x i64> undef, i32 %x
48+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i64 = extractelement <16 x i64> undef, i32 %x
49+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16i64 = extractelement <vscale x 16 x i64> undef, i32 %x
50+
; RV64-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
51+
;
52+
%v2i8 = extractelement <2 x i8> undef, i32 %x
53+
%v4i8 = extractelement <4 x i8> undef, i32 %x
54+
%v8i8 = extractelement <8 x i8> undef, i32 %x
55+
%v16i8 = extractelement <16 x i8> undef, i32 %x
56+
%nxv16i8 = extractelement <vscale x 16 x i8> undef, i32 %x
57+
58+
%v2i16 = extractelement <2 x i16> undef, i32 %x
59+
%v4i16 = extractelement <4 x i16> undef, i32 %x
60+
%v8i16 = extractelement <8 x i16> undef, i32 %x
61+
%v16i16 = extractelement <16 x i16> undef, i32 %x
62+
%nxv16i16 = extractelement <vscale x 16 x i16> undef, i32 %x
63+
64+
%v2i32 = extractelement <2 x i32> undef, i32 %x
65+
%v4i32 = extractelement <4 x i32> undef, i32 %x
66+
%v8i32 = extractelement <8 x i32> undef, i32 %x
67+
%v16i32 = extractelement <16 x i32> undef, i32 %x
68+
%nxv16i32 = extractelement <vscale x 16 x i32> undef, i32 %x
69+
70+
%v2i64 = extractelement <2 x i64> undef, i32 %x
71+
%v4i64 = extractelement <4 x i64> undef, i32 %x
72+
%v8i64 = extractelement <8 x i64> undef, i32 %x
73+
%v16i64 = extractelement <16 x i64> undef, i32 %x
74+
%nxv16i64 = extractelement <vscale x 16 x i64> undef, i32 %x
75+
76+
ret void
77+
}
78+
79+
define void @extractelement_fp(i32 %x) {
80+
; RV32-LABEL: 'extractelement_fp'
81+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16 = extractelement <2 x half> undef, i32 %x
82+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f16 = extractelement <4 x half> undef, i32 %x
83+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f16 = extractelement <8 x half> undef, i32 %x
84+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f16 = extractelement <16 x half> undef, i32 %x
85+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16f16 = extractelement <vscale x 16 x half> undef, i32 %x
86+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32 = extractelement <2 x float> undef, i32 %x
87+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f32 = extractelement <4 x float> undef, i32 %x
88+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f32 = extractelement <8 x float> undef, i32 %x
89+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f32 = extractelement <16 x float> undef, i32 %x
90+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16f32 = extractelement <vscale x 16 x float> undef, i32 %x
91+
; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f64 = extractelement <2 x double> undef, i32 %x
92+
; RV32-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f64 = extractelement <4 x double> undef, i32 %x
93+
; RV32-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f64 = extractelement <8 x double> undef, i32 %x
94+
; RV32-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f64 = extractelement <16 x double> undef, i32 %x
95+
; RV32-NEXT: Cost Model: Invalid cost for instruction: %nxv16f64 = extractelement <vscale x 16 x double> undef, i32 %x
96+
; RV32-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
97+
;
98+
; RV64-LABEL: 'extractelement_fp'
99+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16 = extractelement <2 x half> undef, i32 %x
100+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f16 = extractelement <4 x half> undef, i32 %x
101+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f16 = extractelement <8 x half> undef, i32 %x
102+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f16 = extractelement <16 x half> undef, i32 %x
103+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16f16 = extractelement <vscale x 16 x half> undef, i32 %x
104+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32 = extractelement <2 x float> undef, i32 %x
105+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f32 = extractelement <4 x float> undef, i32 %x
106+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f32 = extractelement <8 x float> undef, i32 %x
107+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f32 = extractelement <16 x float> undef, i32 %x
108+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16f32 = extractelement <vscale x 16 x float> undef, i32 %x
109+
; RV64-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f64 = extractelement <2 x double> undef, i32 %x
110+
; RV64-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4f64 = extractelement <4 x double> undef, i32 %x
111+
; RV64-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8f64 = extractelement <8 x double> undef, i32 %x
112+
; RV64-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16f64 = extractelement <16 x double> undef, i32 %x
113+
; RV64-NEXT: Cost Model: Invalid cost for instruction: %nxv16f64 = extractelement <vscale x 16 x double> undef, i32 %x
114+
; RV64-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
115+
;
116+
%v2f16 = extractelement <2 x half> undef, i32 %x
117+
%v4f16 = extractelement <4 x half> undef, i32 %x
118+
%v8f16 = extractelement <8 x half> undef, i32 %x
119+
%v16f16 = extractelement <16 x half> undef, i32 %x
120+
%nxv16f16 = extractelement <vscale x 16 x half> undef, i32 %x
121+
122+
%v2f32 = extractelement <2 x float> undef, i32 %x
123+
%v4f32 = extractelement <4 x float> undef, i32 %x
124+
%v8f32 = extractelement <8 x float> undef, i32 %x
125+
%v16f32 = extractelement <16 x float> undef, i32 %x
126+
%nxv16f32 = extractelement <vscale x 16 x float> undef, i32 %x
127+
128+
%v2f64 = extractelement <2 x double> undef, i32 %x
129+
%v4f64 = extractelement <4 x double> undef, i32 %x
130+
%v8f64 = extractelement <8 x double> undef, i32 %x
131+
%v16f64 = extractelement <16 x double> undef, i32 %x
132+
%nxv16f64 = extractelement <vscale x 16 x double> undef, i32 %x
133+
134+
ret void
135+
}

0 commit comments

Comments
 (0)