Skip to content

Commit 2b00a73

Browse files
[SLP]Buildvector for alternate instructions with non-profitable gather operands.
If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: llvm#84978
1 parent 81cdd35 commit 2b00a73

8 files changed

+192
-77
lines changed

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2995,6 +2995,15 @@ class BoUpSLP {
29952995
return ScalarToTreeEntry.lookup(V);
29962996
}
29972997

2998+
/// Check that the operand node of alternate node does not generate
2999+
/// buildvector sequence. If it is, then probably not worth it to build
3000+
/// alternate shuffle, if number of buildvector operands + alternate
3001+
/// instruction > than the number of buildvector instructions.
3002+
/// \param S the instructions state of the analyzed values.
3003+
/// \param VL list of the instructions with alternate opcodes.
3004+
bool areAltOperandsProfitable(const InstructionsState &S,
3005+
ArrayRef<Value *> VL) const;
3006+
29983007
/// Checks if the specified list of the instructions/values can be vectorized
29993008
/// and fills required data before actual scheduling of the instructions.
30003009
TreeEntry::EntryState getScalarsVectorizationState(
@@ -5777,6 +5786,117 @@ static bool isAlternateInstruction(const Instruction *I,
57775786
const Instruction *AltOp,
57785787
const TargetLibraryInfo &TLI);
57795788

5789+
bool BoUpSLP::areAltOperandsProfitable(const InstructionsState &S,
5790+
ArrayRef<Value *> VL) const {
5791+
unsigned Opcode0 = S.getOpcode();
5792+
unsigned Opcode1 = S.getAltOpcode();
5793+
// The opcode mask selects between the two opcodes.
5794+
SmallBitVector OpcodeMask(VL.size(), false);
5795+
for (unsigned Lane : seq<unsigned>(0, VL.size()))
5796+
if (cast<Instruction>(VL[Lane])->getOpcode() == Opcode1)
5797+
OpcodeMask.set(Lane);
5798+
// If this pattern is supported by the target then consider it profitable.
5799+
if (TTI->isLegalAltInstr(FixedVectorType::get(S.MainOp->getType(), VL.size()),
5800+
Opcode0, Opcode1, OpcodeMask))
5801+
return true;
5802+
SmallVector<ValueList> Operands;
5803+
for (unsigned I : seq<unsigned>(0, S.MainOp->getNumOperands())) {
5804+
Operands.emplace_back();
5805+
// Prepare the operand vector.
5806+
for (Value *V : VL)
5807+
Operands.back().push_back(cast<Instruction>(V)->getOperand(I));
5808+
}
5809+
if (Operands.size() == 2) {
5810+
// Try find best operands candidates.
5811+
for (unsigned I : seq<unsigned>(0, VL.size() - 1)) {
5812+
SmallVector<std::pair<Value *, Value *>> Candidates(3);
5813+
Candidates[0] = std::make_pair(Operands[0][I], Operands[0][I + 1]);
5814+
Candidates[1] = std::make_pair(Operands[0][I], Operands[1][I + 1]);
5815+
Candidates[2] = std::make_pair(Operands[1][I], Operands[0][I + 1]);
5816+
std::optional<int> Res = findBestRootPair(Candidates);
5817+
switch (Res.value_or(0)) {
5818+
case 0:
5819+
break;
5820+
case 1:
5821+
std::swap(Operands[0][I + 1], Operands[1][I + 1]);
5822+
break;
5823+
case 2:
5824+
std::swap(Operands[0][I], Operands[1][I]);
5825+
break;
5826+
default:
5827+
llvm_unreachable("Unexpected index.");
5828+
}
5829+
}
5830+
}
5831+
DenseSet<unsigned> UniqueOpcodes;
5832+
constexpr unsigned NumAltInsts = 3; // main + alt + shuffle.
5833+
unsigned NonInstCnt = 0;
5834+
// Estimate number of instructions, required for the vectorized node and for
5835+
// the buildvector node.
5836+
unsigned UndefCnt = 0;
5837+
// Count the number of extra shuffles, required for vector nodes.
5838+
unsigned ExtraShuffleInsts = 0;
5839+
// Check that operands do not contain same values and create either perfect
5840+
// diamond match or shuffled match.
5841+
if (Operands.size() == 2) {
5842+
// Do not count same operands twice.
5843+
if (Operands.front() == Operands.back()) {
5844+
Operands.erase(Operands.begin());
5845+
} else if (!allConstant(Operands.front()) &&
5846+
all_of(Operands.front(), [&](Value *V) {
5847+
return is_contained(Operands.back(), V);
5848+
})) {
5849+
Operands.erase(Operands.begin());
5850+
++ExtraShuffleInsts;
5851+
}
5852+
}
5853+
const Loop *L = LI->getLoopFor(S.MainOp->getParent());
5854+
// Vectorize node, if:
5855+
// 1. at least single operand is constant or splat.
5856+
// 2. Operands have many loop invariants (the instructions are not loop
5857+
// invariants).
5858+
// 3. At least single unique operands is supposed to vectorized.
5859+
return none_of(Operands,
5860+
[&](ArrayRef<Value *> Op) {
5861+
if (allConstant(Op) ||
5862+
(!isSplat(Op) && allSameBlock(Op) && allSameType(Op) &&
5863+
getSameOpcode(Op, *TLI).MainOp))
5864+
return false;
5865+
DenseMap<Value *, unsigned> Uniques;
5866+
for (Value *V : Op) {
5867+
if (isa<Constant, ExtractElementInst>(V) ||
5868+
getTreeEntry(V) || (L && L->isLoopInvariant(V))) {
5869+
if (isa<UndefValue>(V))
5870+
++UndefCnt;
5871+
continue;
5872+
}
5873+
auto Res = Uniques.try_emplace(V, 0);
5874+
// Found first duplicate - need to add shuffle.
5875+
if (!Res.second && Res.first->second == 1)
5876+
++ExtraShuffleInsts;
5877+
++Res.first->getSecond();
5878+
if (auto *I = dyn_cast<Instruction>(V))
5879+
UniqueOpcodes.insert(I->getOpcode());
5880+
else if (Res.second)
5881+
++NonInstCnt;
5882+
}
5883+
return none_of(Uniques, [&](const auto &P) {
5884+
return P.first->hasNUsesOrMore(P.second + 1) &&
5885+
none_of(P.first->users(), [&](User *U) {
5886+
return getTreeEntry(U) || Uniques.contains(U);
5887+
});
5888+
});
5889+
}) ||
5890+
// Do not vectorize node, if estimated number of vector instructions is
5891+
// more than estimated number of buildvector instructions. Number of
5892+
// vector operands is number of vector instructions + number of vector
5893+
// instructions for operands (buildvectors). Number of buildvector
5894+
// instructions is just number_of_operands * number_of_scalars.
5895+
(UndefCnt < (VL.size() - 1) * S.MainOp->getNumOperands() &&
5896+
(UniqueOpcodes.size() + NonInstCnt + ExtraShuffleInsts +
5897+
NumAltInsts) < S.MainOp->getNumOperands() * VL.size());
5898+
}
5899+
57805900
BoUpSLP::TreeEntry::EntryState BoUpSLP::getScalarsVectorizationState(
57815901
InstructionsState &S, ArrayRef<Value *> VL, bool IsScatterVectorizeUserTE,
57825902
OrdersType &CurrentOrder, SmallVectorImpl<Value *> &PointerOps) const {
@@ -6074,6 +6194,14 @@ BoUpSLP::TreeEntry::EntryState BoUpSLP::getScalarsVectorizationState(
60746194
LLVM_DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
60756195
return TreeEntry::NeedToGather;
60766196
}
6197+
if (!areAltOperandsProfitable(S, VL)) {
6198+
LLVM_DEBUG(
6199+
dbgs()
6200+
<< "SLP: ShuffleVector not vectorized, operands are buildvector and "
6201+
"the whole alt sequence is not profitable.\n");
6202+
return TreeEntry::NeedToGather;
6203+
}
6204+
60776205
return TreeEntry::Vectorize;
60786206
}
60796207
default:

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -104,16 +104,16 @@ define void @dist_vec(ptr nocapture noundef readonly %pA, ptr nocapture noundef
104104
; CHECK-NEXT: [[AND95:%.*]] = and i32 [[B_0278]], 1
105105
; CHECK-NEXT: [[SHR96]] = lshr i32 [[A_0279]], 1
106106
; CHECK-NEXT: [[SHR97]] = lshr i32 [[B_0278]], 1
107-
; CHECK-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> poison, i32 [[AND94]], i32 0
108-
; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <2 x i32> [[TMP22]], <2 x i32> poison, <2 x i32> zeroinitializer
109-
; CHECK-NEXT: [[TMP24:%.*]] = icmp eq <2 x i32> [[TMP23]], zeroinitializer
110-
; CHECK-NEXT: [[TMP25:%.*]] = icmp ne <2 x i32> [[TMP23]], zeroinitializer
111-
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <2 x i1> [[TMP24]], <2 x i1> [[TMP25]], <4 x i32> <i32 0, i32 3, i32 0, i32 3>
112-
; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x i32> poison, i32 [[AND95]], i32 0
113-
; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <2 x i32> [[TMP27]], <2 x i32> poison, <2 x i32> zeroinitializer
114-
; CHECK-NEXT: [[TMP29:%.*]] = icmp ne <2 x i32> [[TMP28]], zeroinitializer
115-
; CHECK-NEXT: [[TMP30:%.*]] = icmp eq <2 x i32> [[TMP28]], zeroinitializer
116-
; CHECK-NEXT: [[TMP31:%.*]] = shufflevector <2 x i1> [[TMP29]], <2 x i1> [[TMP30]], <4 x i32> <i32 0, i32 3, i32 3, i32 0>
107+
; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i32 [[AND94]], 0
108+
; CHECK-NEXT: [[TOBOOL98:%.*]] = icmp ne i32 [[AND95]], 0
109+
; CHECK-NEXT: [[TOBOOL100:%.*]] = icmp eq i32 [[AND94]], 0
110+
; CHECK-NEXT: [[TOBOOL103:%.*]] = icmp eq i32 [[AND95]], 0
111+
; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i1> poison, i1 [[TOBOOL100]], i32 0
112+
; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x i1> [[TMP22]], i1 [[TOBOOL]], i32 1
113+
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <4 x i1> [[TMP23]], <4 x i1> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
114+
; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x i1> poison, i1 [[TOBOOL98]], i32 0
115+
; CHECK-NEXT: [[TMP27:%.*]] = insertelement <4 x i1> [[TMP25]], i1 [[TOBOOL103]], i32 1
116+
; CHECK-NEXT: [[TMP31:%.*]] = shufflevector <4 x i1> [[TMP27]], <4 x i1> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>
117117
; CHECK-NEXT: [[TMP32:%.*]] = select <4 x i1> [[TMP26]], <4 x i1> [[TMP31]], <4 x i1> zeroinitializer
118118
; CHECK-NEXT: [[TMP33:%.*]] = zext <4 x i1> [[TMP32]] to <4 x i32>
119119
; CHECK-NEXT: [[TMP34]] = add <4 x i32> [[TMP21]], [[TMP33]]
@@ -149,16 +149,16 @@ define void @dist_vec(ptr nocapture noundef readonly %pA, ptr nocapture noundef
149149
; CHECK-NEXT: [[AND134:%.*]] = and i32 [[B_1300]], 1
150150
; CHECK-NEXT: [[SHR135]] = lshr i32 [[A_1301]], 1
151151
; CHECK-NEXT: [[SHR136]] = lshr i32 [[B_1300]], 1
152-
; CHECK-NEXT: [[TMP39:%.*]] = insertelement <2 x i32> poison, i32 [[AND133]], i32 0
153-
; CHECK-NEXT: [[TMP40:%.*]] = shufflevector <2 x i32> [[TMP39]], <2 x i32> poison, <2 x i32> zeroinitializer
154-
; CHECK-NEXT: [[TMP41:%.*]] = icmp eq <2 x i32> [[TMP40]], zeroinitializer
155-
; CHECK-NEXT: [[TMP42:%.*]] = icmp ne <2 x i32> [[TMP40]], zeroinitializer
156-
; CHECK-NEXT: [[TMP43:%.*]] = shufflevector <2 x i1> [[TMP41]], <2 x i1> [[TMP42]], <4 x i32> <i32 0, i32 3, i32 0, i32 3>
157-
; CHECK-NEXT: [[TMP44:%.*]] = insertelement <2 x i32> poison, i32 [[AND134]], i32 0
158-
; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <2 x i32> [[TMP44]], <2 x i32> poison, <2 x i32> zeroinitializer
159-
; CHECK-NEXT: [[TMP46:%.*]] = icmp ne <2 x i32> [[TMP45]], zeroinitializer
160-
; CHECK-NEXT: [[TMP47:%.*]] = icmp eq <2 x i32> [[TMP45]], zeroinitializer
161-
; CHECK-NEXT: [[TMP48:%.*]] = shufflevector <2 x i1> [[TMP46]], <2 x i1> [[TMP47]], <4 x i32> <i32 0, i32 3, i32 3, i32 0>
152+
; CHECK-NEXT: [[TOBOOL137:%.*]] = icmp ne i32 [[AND133]], 0
153+
; CHECK-NEXT: [[TOBOOL139:%.*]] = icmp ne i32 [[AND134]], 0
154+
; CHECK-NEXT: [[TOBOOL144:%.*]] = icmp eq i32 [[AND133]], 0
155+
; CHECK-NEXT: [[TOBOOL147:%.*]] = icmp eq i32 [[AND134]], 0
156+
; CHECK-NEXT: [[TMP40:%.*]] = insertelement <4 x i1> poison, i1 [[TOBOOL144]], i32 0
157+
; CHECK-NEXT: [[TMP41:%.*]] = insertelement <4 x i1> [[TMP40]], i1 [[TOBOOL137]], i32 1
158+
; CHECK-NEXT: [[TMP43:%.*]] = shufflevector <4 x i1> [[TMP41]], <4 x i1> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
159+
; CHECK-NEXT: [[TMP42:%.*]] = insertelement <4 x i1> poison, i1 [[TOBOOL139]], i32 0
160+
; CHECK-NEXT: [[TMP39:%.*]] = insertelement <4 x i1> [[TMP42]], i1 [[TOBOOL147]], i32 1
161+
; CHECK-NEXT: [[TMP48:%.*]] = shufflevector <4 x i1> [[TMP39]], <4 x i1> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>
162162
; CHECK-NEXT: [[TMP49:%.*]] = select <4 x i1> [[TMP43]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
163163
; CHECK-NEXT: [[TMP50:%.*]] = zext <4 x i1> [[TMP49]] to <4 x i32>
164164
; CHECK-NEXT: [[TMP51]] = add <4 x i32> [[TMP38]], [[TMP50]]

llvm/test/Transforms/SLPVectorizer/X86/ext-int-reduced-not-operand.ll

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,8 @@ define i64 @wombat() {
99
; CHECK-NEXT: br label [[BB2]]
1010
; CHECK: bb2:
1111
; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, [[BB:%.*]] ], [ 0, [[BB1:%.*]] ]
12-
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[PHI]], i32 0
13-
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <2 x i32> zeroinitializer
14-
; CHECK-NEXT: [[TMP2:%.*]] = trunc <2 x i32> [[TMP1]] to <2 x i1>
15-
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0
16-
; CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i64
17-
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
18-
; CHECK-NEXT: [[TMP6:%.*]] = zext i1 [[TMP5]] to i64
12+
; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[PHI]] to i64
13+
; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[PHI]] to i64
1914
; CHECK-NEXT: [[OR:%.*]] = or i64 [[TMP4]], [[TMP6]]
2015
; CHECK-NEXT: ret i64 [[OR]]
2116
;

llvm/test/Transforms/SLPVectorizer/X86/gather-move-out-of-loop.ll

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,12 @@
44
define void @test(i16 %0) {
55
; CHECK-LABEL: @test(
66
; CHECK-NEXT: for.body92.preheader:
7-
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> <i16 0, i16 poison>, i16 [[TMP0:%.*]], i32 1
8-
; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>
9-
; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP1]] to <2 x i32>
10-
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>
11-
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <4 x i32> <i32 0, i32 poison, i32 1, i32 poison>
12-
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, <4 x i32> [[TMP5]], <4 x i32> <i32 4, i32 1, i32 6, i32 3>
137
; CHECK-NEXT: br label [[FOR_BODY92:%.*]]
148
; CHECK: for.body92:
9+
; CHECK-NEXT: [[CONV177_I:%.*]] = sext i16 0 to i32
10+
; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TMP0:%.*]] to i32
11+
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, i32 [[CONV177_I]], i32 0
12+
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[TMP1]], i32 2
1513
; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> zeroinitializer, [[TMP6]]
1614
; CHECK-NEXT: store <4 x i32> [[TMP7]], ptr undef, align 8
1715
; CHECK-NEXT: br label [[FOR_BODY92]]

llvm/test/Transforms/SLPVectorizer/X86/gathered-delayed-nodes-with-reused-user.ll

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,19 @@ define i64 @foo() {
66
; CHECK-NEXT: bb:
77
; CHECK-NEXT: br label [[BB3:%.*]]
88
; CHECK: bb1:
9-
; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i64> [ [[TMP5:%.*]], [[BB3]] ]
9+
; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ADD:%.*]], [[BB3]] ]
10+
; CHECK-NEXT: [[PHI2:%.*]] = phi i64 [ [[TMP9:%.*]], [[BB3]] ]
1011
; CHECK-NEXT: ret i64 0
1112
; CHECK: bb3:
1213
; CHECK-NEXT: [[PHI5:%.*]] = phi i64 [ 0, [[BB:%.*]] ], [ 0, [[BB3]] ]
1314
; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i64> [ zeroinitializer, [[BB]] ], [ [[TMP7:%.*]], [[BB3]] ]
14-
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> <i64 poison, i64 0>, i64 [[PHI5]], i32 0
15-
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
16-
; CHECK-NEXT: [[TMP4:%.*]] = or <2 x i64> [[TMP1]], [[TMP2]]
17-
; CHECK-NEXT: [[TMP5]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>
18-
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> <i64 poison, i64 0>, <2 x i32> <i32 0, i32 3>
19-
; CHECK-NEXT: [[TMP7]] = add <2 x i64> [[TMP6]], [[TMP2]]
20-
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP7]], i32 1
21-
; CHECK-NEXT: [[GETELEMENTPTR:%.*]] = getelementptr i64, ptr addrspace(1) null, i64 [[TMP8]]
22-
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
15+
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0
16+
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
17+
; CHECK-NEXT: [[ADD]] = add i64 [[TMP3]], [[TMP2]]
18+
; CHECK-NEXT: [[GETELEMENTPTR:%.*]] = getelementptr i64, ptr addrspace(1) null, i64 0
19+
; CHECK-NEXT: [[TMP9]] = or i64 [[PHI5]], 0
2320
; CHECK-NEXT: [[ICMP:%.*]] = icmp ult i64 [[TMP9]], 0
21+
; CHECK-NEXT: [[TMP7]] = insertelement <2 x i64> <i64 poison, i64 0>, i64 [[ADD]], i32 0
2422
; CHECK-NEXT: br i1 false, label [[BB3]], label [[BB1:%.*]]
2523
;
2624
bb:

llvm/test/Transforms/SLPVectorizer/X86/non-scheduled-inst-reused-as-last-inst.ll

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@
44
define void @foo() {
55
; CHECK-LABEL: define void @foo() {
66
; CHECK-NEXT: bb:
7-
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> <i32 poison, i32 0>, i32 0, i32 0
87
; CHECK-NEXT: br label [[BB1:%.*]]
98
; CHECK: bb1:
109
; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ zeroinitializer, [[BB:%.*]] ], [ [[TMP6:%.*]], [[BB4:%.*]] ]
11-
; CHECK-NEXT: [[TMP2:%.*]] = shl <2 x i32> [[TMP1]], [[TMP0]]
12-
; CHECK-NEXT: [[TMP3:%.*]] = or <2 x i32> [[TMP1]], [[TMP0]]
13-
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>
14-
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP1]], <2 x i32> <i32 0, i32 3>
10+
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
11+
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[TMP2]], 0
12+
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[SHL]], i32 0
1513
; CHECK-NEXT: [[TMP6]] = or <2 x i32> [[TMP5]], zeroinitializer
1614
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP6]], i32 0
1715
; CHECK-NEXT: [[CALL:%.*]] = call i64 null(i32 [[TMP7]])
1816
; CHECK-NEXT: br label [[BB4]]
1917
; CHECK: bb4:
18+
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP6]], i32 1
2019
; CHECK-NEXT: br i1 false, label [[BB5:%.*]], label [[BB1]]
2120
; CHECK: bb5:
22-
; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x i32> [ [[TMP4]], [[BB4]] ]
21+
; CHECK-NEXT: [[PHI6:%.*]] = phi i32 [ [[SHL]], [[BB4]] ]
22+
; CHECK-NEXT: [[PHI7:%.*]] = phi i32 [ [[TMP8]], [[BB4]] ]
2323
; CHECK-NEXT: ret void
2424
;
2525
bb:

llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -112,10 +112,10 @@ define void @addsub_and_external_users(ptr %A, ptr %ptr) {
112112
; CHECK-NEXT: bb1:
113113
; CHECK-NEXT: [[LD:%.*]] = load double, ptr undef, align 8
114114
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
115-
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
116-
; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x double> [[SHUFFLE]], <double 1.100000e+00, double 1.200000e+00>
117-
; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 1.100000e+00, double 1.200000e+00>
118-
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x i32> <i32 0, i32 3>
115+
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
116+
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
117+
; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
118+
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP6]], <2 x i32> <i32 0, i32 3>
119119
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], <double 2.100000e+00, double 2.200000e+00>
120120
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 3.100000e+00, double 3.200000e+00>
121121
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
@@ -159,10 +159,10 @@ define void @subadd_and_external_users(ptr %A, ptr %ptr) {
159159
; CHECK-NEXT: bb1:
160160
; CHECK-NEXT: [[LD:%.*]] = load double, ptr undef, align 8
161161
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
162-
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
163-
; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 1.200000e+00, double 1.100000e+00>
164-
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[SHUFFLE]], <double 1.200000e+00, double 1.100000e+00>
165-
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x i32> <i32 2, i32 1>
162+
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
163+
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], <double 1.200000e+00, double 1.100000e+00>
164+
; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP1]], <double 1.200000e+00, double 1.100000e+00>
165+
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP6]], <2 x i32> <i32 0, i32 3>
166166
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], <double 2.200000e+00, double 2.100000e+00>
167167
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 3.200000e+00, double 3.100000e+00>
168168
; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[A:%.*]], align 8

0 commit comments

Comments
 (0)