Skip to content

Commit 7f26c27

Browse files
committed
[RISCV] Enable SLP by default (when vectors are available)
I propose that we go ahead and enabled SLP by default. Over the last few weeks, @luke and I have been working through codegen issues seen at small VLs from a couple of SPEC workloads. We still have a ways to go to get optimal codegen, but we're at the point where having a single configuration we're all tuning against is probably the right default. As a bit of history, I introduced this TTI hook back in a310637 back in August of last year to unblock enabling LoopVectorizer. At the time, we had a couple known issues: constant materialization, address generation, and a general lack of maturity of small fixed vector codegen. By now, each of these has had significant investment. I can't say any of them are completely fixed, but we're no longer seeing instances of them every place we look. What we're mostly seeing at this point is a long tail of code gen opportunities, many involving build vectors, shuffles, and extract patterns. I have a couple patches up to continue iterating on those issues, but I don't think they need to be blockers for enabling SLP. Differential Revision: https://reviews.llvm.org/D152750
1 parent 807adcf commit 7f26c27

File tree

5 files changed

+120
-373
lines changed

5 files changed

+120
-373
lines changed

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ static cl::opt<unsigned> RVVRegisterWidthLMUL(
3030
static cl::opt<unsigned> SLPMaxVF(
3131
"riscv-v-slp-max-vf",
3232
cl::desc(
33-
"Result used for getMaximumVF query which is used exclusively by "
34-
"SLP vectorizer. Defaults to 1 which disables SLP."),
35-
cl::init(1), cl::Hidden);
33+
"Overrides result used for getMaximumVF query which is used "
34+
"exclusively by SLP vectorizer."),
35+
cl::Hidden);
3636

3737
InstructionCost RISCVTTIImpl::getLMULCost(MVT VT) {
3838
// TODO: Here assume reciprocal throughput is 1 for LMUL_1, it is
@@ -1744,12 +1744,19 @@ unsigned RISCVTTIImpl::getRegUsageForType(Type *Ty) {
17441744
}
17451745

17461746
unsigned RISCVTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
1747-
// This interface is currently only used by SLP. Returning 1 (which is the
1748-
// default value for SLPMaxVF) disables SLP. We currently have a cost modeling
1749-
// problem w/ constant materialization which causes SLP to perform majorly
1750-
// unprofitable transformations.
1751-
// TODO: Figure out constant materialization cost modeling and remove.
1752-
return SLPMaxVF;
1747+
if (SLPMaxVF.getNumOccurrences())
1748+
return SLPMaxVF;
1749+
1750+
// Return how many elements can fit in getRegisterBitwidth. This is the
1751+
// same routine as used in LoopVectorizer. We should probably be
1752+
// accounting for whether we actually have instructions with the right
1753+
// lane type, but we don't have enough information to do that without
1754+
// some additional plumbing which hasn't been justified yet.
1755+
TypeSize RegWidth =
1756+
getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
1757+
// If no vector registers, or absurd element widths, disable
1758+
// vectorization by returning 1.
1759+
return std::max(1UL, RegWidth.getFixedValue() / ElemWidth);
17531760
}
17541761

17551762
bool RISCVTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,

llvm/test/Transforms/SLPVectorizer/RISCV/floating-point.ll

Lines changed: 24 additions & 147 deletions
Original file line numberDiff line numberDiff line change
@@ -18,31 +18,10 @@ define void @fp_add(ptr %dst, ptr %p, ptr %q) {
1818
; DEFAULT-LABEL: define void @fp_add
1919
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]], ptr [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
2020
; DEFAULT-NEXT: entry:
21-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
22-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
23-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
24-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
25-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
26-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
27-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
28-
; DEFAULT-NEXT: [[F0:%.*]] = load float, ptr [[Q]], align 4
29-
; DEFAULT-NEXT: [[PF1:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 1
30-
; DEFAULT-NEXT: [[F1:%.*]] = load float, ptr [[PF1]], align 4
31-
; DEFAULT-NEXT: [[PF2:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 2
32-
; DEFAULT-NEXT: [[F2:%.*]] = load float, ptr [[PF2]], align 4
33-
; DEFAULT-NEXT: [[PF3:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 3
34-
; DEFAULT-NEXT: [[F3:%.*]] = load float, ptr [[PF3]], align 4
35-
; DEFAULT-NEXT: [[A0:%.*]] = fadd float [[E0]], [[F0]]
36-
; DEFAULT-NEXT: [[A1:%.*]] = fadd float [[E1]], [[F1]]
37-
; DEFAULT-NEXT: [[A2:%.*]] = fadd float [[E2]], [[F2]]
38-
; DEFAULT-NEXT: [[A3:%.*]] = fadd float [[E3]], [[F3]]
39-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
40-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
41-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
42-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
43-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
44-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
45-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
21+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
22+
; DEFAULT-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[Q]], align 4
23+
; DEFAULT-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[TMP0]], [[TMP1]]
24+
; DEFAULT-NEXT: store <4 x float> [[TMP2]], ptr [[DST]], align 4
4625
; DEFAULT-NEXT: ret void
4726
;
4827
entry:
@@ -90,24 +69,9 @@ define void @fp_sub(ptr %dst, ptr %p) {
9069
; DEFAULT-LABEL: define void @fp_sub
9170
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]]) #[[ATTR0]] {
9271
; DEFAULT-NEXT: entry:
93-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
94-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
95-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
96-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
97-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
98-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
99-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
100-
; DEFAULT-NEXT: [[A0:%.*]] = fsub float [[E0]], 3.000000e+00
101-
; DEFAULT-NEXT: [[A1:%.*]] = fsub float [[E1]], 3.000000e+00
102-
; DEFAULT-NEXT: [[A2:%.*]] = fsub float [[E2]], 3.000000e+00
103-
; DEFAULT-NEXT: [[A3:%.*]] = fsub float [[E3]], 3.000000e+00
104-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
105-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
106-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
107-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
108-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
109-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
110-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
72+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
73+
; DEFAULT-NEXT: [[TMP1:%.*]] = fsub <4 x float> [[TMP0]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
74+
; DEFAULT-NEXT: store <4 x float> [[TMP1]], ptr [[DST]], align 4
11175
; DEFAULT-NEXT: ret void
11276
;
11377
entry:
@@ -148,31 +112,10 @@ define void @fp_mul(ptr %dst, ptr %p, ptr %q) {
148112
; DEFAULT-LABEL: define void @fp_mul
149113
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]], ptr [[Q:%.*]]) #[[ATTR0]] {
150114
; DEFAULT-NEXT: entry:
151-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
152-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
153-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
154-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
155-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
156-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
157-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
158-
; DEFAULT-NEXT: [[F0:%.*]] = load float, ptr [[Q]], align 4
159-
; DEFAULT-NEXT: [[PF1:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 1
160-
; DEFAULT-NEXT: [[F1:%.*]] = load float, ptr [[PF1]], align 4
161-
; DEFAULT-NEXT: [[PF2:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 2
162-
; DEFAULT-NEXT: [[F2:%.*]] = load float, ptr [[PF2]], align 4
163-
; DEFAULT-NEXT: [[PF3:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 3
164-
; DEFAULT-NEXT: [[F3:%.*]] = load float, ptr [[PF3]], align 4
165-
; DEFAULT-NEXT: [[A0:%.*]] = fmul float [[E0]], [[F0]]
166-
; DEFAULT-NEXT: [[A1:%.*]] = fmul float [[E1]], [[F1]]
167-
; DEFAULT-NEXT: [[A2:%.*]] = fmul float [[E2]], [[F2]]
168-
; DEFAULT-NEXT: [[A3:%.*]] = fmul float [[E3]], [[F3]]
169-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
170-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
171-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
172-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
173-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
174-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
175-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
115+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
116+
; DEFAULT-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[Q]], align 4
117+
; DEFAULT-NEXT: [[TMP2:%.*]] = fmul <4 x float> [[TMP0]], [[TMP1]]
118+
; DEFAULT-NEXT: store <4 x float> [[TMP2]], ptr [[DST]], align 4
176119
; DEFAULT-NEXT: ret void
177120
;
178121
entry:
@@ -220,24 +163,9 @@ define void @fp_div(ptr %dst, ptr %p) {
220163
; DEFAULT-LABEL: define void @fp_div
221164
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]]) #[[ATTR0]] {
222165
; DEFAULT-NEXT: entry:
223-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
224-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
225-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
226-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
227-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
228-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
229-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
230-
; DEFAULT-NEXT: [[A0:%.*]] = fdiv float [[E0]], 1.050000e+01
231-
; DEFAULT-NEXT: [[A1:%.*]] = fdiv float [[E1]], 1.050000e+01
232-
; DEFAULT-NEXT: [[A2:%.*]] = fdiv float [[E2]], 1.050000e+01
233-
; DEFAULT-NEXT: [[A3:%.*]] = fdiv float [[E3]], 1.050000e+01
234-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
235-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
236-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
237-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
238-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
239-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
240-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
166+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
167+
; DEFAULT-NEXT: [[TMP1:%.*]] = fdiv <4 x float> [[TMP0]], <float 1.050000e+01, float 1.050000e+01, float 1.050000e+01, float 1.050000e+01>
168+
; DEFAULT-NEXT: store <4 x float> [[TMP1]], ptr [[DST]], align 4
241169
; DEFAULT-NEXT: ret void
242170
;
243171
entry:
@@ -280,31 +208,10 @@ define void @fp_max(ptr %dst, ptr %p, ptr %q) {
280208
; DEFAULT-LABEL: define void @fp_max
281209
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]], ptr [[Q:%.*]]) #[[ATTR0]] {
282210
; DEFAULT-NEXT: entry:
283-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
284-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
285-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
286-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
287-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
288-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
289-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
290-
; DEFAULT-NEXT: [[F0:%.*]] = load float, ptr [[Q]], align 4
291-
; DEFAULT-NEXT: [[PF1:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 1
292-
; DEFAULT-NEXT: [[F1:%.*]] = load float, ptr [[PF1]], align 4
293-
; DEFAULT-NEXT: [[PF2:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 2
294-
; DEFAULT-NEXT: [[F2:%.*]] = load float, ptr [[PF2]], align 4
295-
; DEFAULT-NEXT: [[PF3:%.*]] = getelementptr inbounds float, ptr [[Q]], i64 3
296-
; DEFAULT-NEXT: [[F3:%.*]] = load float, ptr [[PF3]], align 4
297-
; DEFAULT-NEXT: [[A0:%.*]] = tail call float @llvm.maxnum.f32(float [[E0]], float [[F0]])
298-
; DEFAULT-NEXT: [[A1:%.*]] = tail call float @llvm.maxnum.f32(float [[E1]], float [[F1]])
299-
; DEFAULT-NEXT: [[A2:%.*]] = tail call float @llvm.maxnum.f32(float [[E2]], float [[F2]])
300-
; DEFAULT-NEXT: [[A3:%.*]] = tail call float @llvm.maxnum.f32(float [[E3]], float [[F3]])
301-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
302-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
303-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
304-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
305-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
306-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
307-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
211+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
212+
; DEFAULT-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[Q]], align 4
213+
; DEFAULT-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP0]], <4 x float> [[TMP1]])
214+
; DEFAULT-NEXT: store <4 x float> [[TMP2]], ptr [[DST]], align 4
308215
; DEFAULT-NEXT: ret void
309216
;
310217
entry:
@@ -354,24 +261,9 @@ define void @fp_min(ptr %dst, ptr %p) {
354261
; DEFAULT-LABEL: define void @fp_min
355262
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]]) #[[ATTR0]] {
356263
; DEFAULT-NEXT: entry:
357-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
358-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
359-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
360-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
361-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
362-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
363-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
364-
; DEFAULT-NEXT: [[A0:%.*]] = tail call float @llvm.minnum.f32(float [[E0]], float 1.250000e+00)
365-
; DEFAULT-NEXT: [[A1:%.*]] = tail call float @llvm.minnum.f32(float [[E1]], float 1.250000e+00)
366-
; DEFAULT-NEXT: [[A2:%.*]] = tail call float @llvm.minnum.f32(float [[E2]], float 1.250000e+00)
367-
; DEFAULT-NEXT: [[A3:%.*]] = tail call float @llvm.minnum.f32(float [[E3]], float 1.250000e+00)
368-
; DEFAULT-NEXT: store float [[A0]], ptr [[DST]], align 4
369-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 1
370-
; DEFAULT-NEXT: store float [[A1]], ptr [[PA1]], align 4
371-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 2
372-
; DEFAULT-NEXT: store float [[A2]], ptr [[PA2]], align 4
373-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
374-
; DEFAULT-NEXT: store float [[A3]], ptr [[PA3]], align 4
264+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
265+
; DEFAULT-NEXT: [[TMP1:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP0]], <4 x float> <float 1.250000e+00, float 1.250000e+00, float 1.250000e+00, float 1.250000e+00>)
266+
; DEFAULT-NEXT: store <4 x float> [[TMP1]], ptr [[DST]], align 4
375267
; DEFAULT-NEXT: ret void
376268
;
377269
entry:
@@ -413,24 +305,9 @@ define void @fp_convert(ptr %dst, ptr %p) {
413305
; DEFAULT-LABEL: define void @fp_convert
414306
; DEFAULT-SAME: (ptr [[DST:%.*]], ptr [[P:%.*]]) #[[ATTR0]] {
415307
; DEFAULT-NEXT: entry:
416-
; DEFAULT-NEXT: [[E0:%.*]] = load float, ptr [[P]], align 4
417-
; DEFAULT-NEXT: [[PE1:%.*]] = getelementptr inbounds float, ptr [[P]], i64 1
418-
; DEFAULT-NEXT: [[E1:%.*]] = load float, ptr [[PE1]], align 4
419-
; DEFAULT-NEXT: [[PE2:%.*]] = getelementptr inbounds float, ptr [[P]], i64 2
420-
; DEFAULT-NEXT: [[E2:%.*]] = load float, ptr [[PE2]], align 4
421-
; DEFAULT-NEXT: [[PE3:%.*]] = getelementptr inbounds float, ptr [[P]], i64 3
422-
; DEFAULT-NEXT: [[E3:%.*]] = load float, ptr [[PE3]], align 4
423-
; DEFAULT-NEXT: [[A0:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[E0]])
424-
; DEFAULT-NEXT: [[A1:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[E1]])
425-
; DEFAULT-NEXT: [[A2:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[E2]])
426-
; DEFAULT-NEXT: [[A3:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[E3]])
427-
; DEFAULT-NEXT: store i32 [[A0]], ptr [[DST]], align 4
428-
; DEFAULT-NEXT: [[PA1:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 1
429-
; DEFAULT-NEXT: store i32 [[A1]], ptr [[PA1]], align 4
430-
; DEFAULT-NEXT: [[PA2:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 2
431-
; DEFAULT-NEXT: store i32 [[A2]], ptr [[PA2]], align 4
432-
; DEFAULT-NEXT: [[PA3:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 3
433-
; DEFAULT-NEXT: store i32 [[A3]], ptr [[PA3]], align 4
308+
; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[P]], align 4
309+
; DEFAULT-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[TMP0]])
310+
; DEFAULT-NEXT: store <4 x i32> [[TMP1]], ptr [[DST]], align 4
434311
; DEFAULT-NEXT: ret void
435312
;
436313
entry:

0 commit comments

Comments
 (0)