Skip to content

Commit 1833d41

Browse files
committed
[SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not vectorized (since they are part of the gather nodes) but may form full vector loads, being combined. This patch walks over all gather nodes, "gathering" and sorting gathered scalar loads and then tries to build vector loads, which later are reshuffled between the gather nodes. It allows later to add support for segmented loads (kind of AOS to SOA load kind for RISC-V RVV) and may help with the removal of the alternat e opcodes support. Currently, alternate nodes may depend on each other because of the consecutive loads between their operands. Because of that we cannot simply remove alternate vectorization. But this approach may help to remove most of the stuff for it, since we'll be able to vectorize loads in between lanes. Metric: size..text, AVX512 Program size..text test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 238381.00 250669.00 5.2% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 25753.00 26329.00 2.2% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test 3028.00 3092.00 2.1% test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test 4243.00 4275.00 0.8% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 649765.00 653877.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 649765.00 653877.00 0.6% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 4199.00 4222.00 0.5% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test 12933.00 12997.00 0.5% test-suite :: SingleSource/Benchmarks/Misc/flops.test 8282.00 8314.00 0.4% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test 10065.00 10097.00 0.3% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 5160.00 5176.00 0.3% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00 0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test 6908.00 6924.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 202830.00 203278.00 0.2% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 9133.00 9149.00 0.2% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 6792.00 6803.00 0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 97662.00 97758.00 0.1% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 595179.00 595739.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 70603.00 70667.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test 19877.00 19893.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 90231.00 90279.00 0.1% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33738.00 33754.00 0.0% test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 13262.00 13268.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1139964.00 1140460.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 849507.00 849875.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1158379.00 1158859.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 38724.00 38740.00 0.0% test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test 15180.00 15186.00 0.0% test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test 15484.00 15490.00 0.0% test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test 167391.00 167455.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 137448.00 137496.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2030254.00 2030766.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 302870.00 302934.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 303126.00 303190.00 0.0% test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test 241107.00 241155.00 0.0% test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test 162974.00 163006.00 0.0% test-suite :: MultiSource/Applications/siod/siod.test 167168.00 167200.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1048796.00 1048988.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 201623.00 201655.00 0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 501734.00 501798.00 0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 580888.00 580952.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 168319.00 168335.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 226022.00 226038.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 118011.00 118015.00 0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 550589.00 550605.00 0.0% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 3072477.00 3072541.00 0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2385563.00 2385579.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389171.00 389155.00 -0.0% test-suite :: MultiSource/Applications/lua/lua.test 234764.00 234748.00 -0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 227694.00 227678.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 119819.00 119807.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 117995.00 117983.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 123610.00 123594.00 -0.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 81414.00 81398.00 -0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782040.00 781880.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 911832.00 911608.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 192507.00 192459.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 122843.00 122811.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 122292.00 122260.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 777363.00 777155.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 123265.00 123205.00 -0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 315534.00 315358.00 -0.1% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 128163.00 128083.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 6562.00 6555.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 23428.00 23396.00 -0.1% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 22749.00 22717.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39549.00 39485.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39546.00 39482.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 57214.00 57118.00 -0.2% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413668.00 412804.00 -0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1044047.00 1041487.00 -0.2% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 12414.00 12382.00 -0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test 31161.00 30969.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 224726.00 223254.00 -0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93512.00 92824.00 -0.7% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281151.00 278463.00 -1.0% test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 2820.00 2788.00 -1.1% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 156819.00 154739.00 -1.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11160.00 -3.5% test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test 6734.00 6382.00 -5.2% results results0 diff ASCI_Purple/SMG2000 - extra vector code VPlanNativePath/outer-loop-vect - extra vectorization, better vector code AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code Rodinia/hotspot - small variations CINT2017speed/625.x264_s CINT2017rate/525.x264_r - extra vector code, better vectorization BenchmarkGame/n-body - better vector code. AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations Misc/flops - extra vector code AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations Misc-C++/Large - better vector code CFP2017rate/526.blender_r - extra vector code Prolangs-C++/city - extra vector code MiBench/consumer-lame - extra vector code CoyoteBench/fftbench - extra vector code Olden/power - better vector code CFP2017rate/538.imagick_r CFP2017speed/638.imagick_s - extra vector code CINT2017rate/531.deepsjeng_r - extra vector code CFP2006/447.dealII - small variations DOE-ProxyApps-C/miniAMR - small variations Prolangs-C/unix-smail - small variations DOE-ProxyApps-C++/PENNANT - small variations CINT2006/473.astar - small variations CFP2006/453.povray - small variations JM/lencod - extra vector code CFP2017rate/511.povray_r - small variations DOE-ProxyApps-C/CoMD - small variations CFP2006/470.lbm - extra vector code CFP2017speed/619.lbm_s CFP2017rate/519.lbm_r - extra vector code CINT2006/456.hmmer - extra code vectorized TSVC/ControlFlow-dbl - extra vector code CFP2017rate/510.parest_r - better vector code LCALS/SubsetALambdaLoops - extra code vectorized LCALS/SubsetARawLoops - extra code vectorized CFP2006/444.namd - extra code vectorized CFP2006/482.sphinx3 - better vector code Applications/siod - better vector code Benchmarks/7zip - better vector code DOE-ProxyApps-C++/CLAMR - extra code vectorized Applications/sqlite3 - extra code vectorized Applications/ClamAV - smaller vector code MallocBench/gs - small variations MicroBenchmarks/ImageProcessing - small variations TSVC/StatementReordering-flt - extra code vectorized CINT2006/471.omnetpp - small variations CINT2006/403.gcc - extra code vectorized CINT2006/483.xalancbmk - extra code vectorized JM/ldecod - small variations Applications/lua - extra code vectorized mafft/pairlocalalign - small variations TSVC/NodeSplitting-flt - extra code vectorized TSVC/Recurrences-flt - extra code vectorized TSVC/InductionVariable-flt - extra code vectorized FreeBench/pifft - small variations CINT2006/464.h264ref - extra code vectorized CINT2017speed/602.gcc_s CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined CINT2006/445.gobmk - small variations Applications/oggenc - small variations TSVC/LoopRestructuring-flt - extra code vectorized TSVC/CrossingThresholds-flt - extra code vectorized CFP2017rate/508.namd_r - small variations TSVC/ControlFlow-flt - extra code vectorized mediabench/g721 - small variations Prolangs-C/compiler - small variations FreeBench/fourinarow - better vector code MiBench/telecomm-gsm - small variation in vector code mediabench/gsm - same Prolangs-C/bison - small variations Adobe-C++/loop_unroll - extra code vectorized Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor code McCat/18-imp - variations in vector code Prolangs-C/gnugo - variations in vector code MallocBench/espresso - extra code vectorized DOE-ProxyApps-C++/miniFE - small variations in vector code Prolangs-C/TimberWolfMC - extra code vectorized, small changes in previously vectorized code. Olden/tsp - small changes in vector code CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores MiBench/security-blowfish - extra code vectorized McCat/08-main - better vector code. Metric: size..text, RISCV, sifive-p670 Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 63580.00 64020.00 0.7% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 21388.00 21406.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 296992.00 297088.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 968112.00 968208.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 45160.00 45164.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 49764.00 49762.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 449132.00 449108.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 695932.00 695892.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 166522.00 166490.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 722252.00 722092.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 27554.00 27546.00 -0.0% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 10900.00 10896.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 46754.00 46732.00 -0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 631570.00 631226.00 -0.1% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 850698.00 850218.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 24816.00 24800.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 24814.00 24798.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1% test-suite :: MultiSource/Applications/hbd/hbd.test 27236.00 27204.00 -0.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 293848.00 293480.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 20160.00 20048.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 182088.00 181040.00 -0.6% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 4788.00 4748.00 -0.8% DOE-ProxyApps-C++/miniFE - extra vector code MiBench/automotive-susan - small variations Benchmarks/Bullet - extra vector code CFP2017rate/511.povray_r - slightly better vector code TSVC/StatementReordering-dbl - small variations CINT2017rate/523.xalancbmk_r CINT2017speed/623.xalancbmk_s - extra vector code CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code TSVC/CrossingThresholds-flt - small variations Applications/sqlite3 - extra vector code JM/lencod - extra vector code, small variations CINT2017rate/525.x264_r CINT2017speed/625.x264_s - small variations CFP2017rate/526.blender_r - extra vector code, small variations DOE-ProxyApps-C/miniGMG - small variations Vectorizer/VPlanNativePath/outer-loop-vect - small variations TSVC/CrossingThresholds-dbl - small variations Benchmarks/tramp3d-v4 - small variations Benchmarks/7zip - extra vector code MiBench/telecomm-gsm - small variations mediabench/gsm/toast - small variations CFP2017rate/510.parest_r - extra vector code Applications/hbd - extra vector code JM/ldecod - better vector code Prolangs-C/compiler - extra vector code MallocBench/espresso - extra vector code mediabench/g721/g721encode - extra vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: #107461
1 parent e588fd9 commit 1833d41

21 files changed

+1286
-835
lines changed

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Lines changed: 753 additions & 118 deletions
Large diffs are not rendered by default.

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,12 @@ define void @s116_modified(ptr %a) {
2323
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP1]], align 4
2424
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[GEP3]], align 4
2525
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0
26-
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 poison>
26+
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
2727
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 poison, i32 poison>
2828
; CHECK-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.vector.insert.v4f32.v2f32(<4 x float> [[TMP5]], <2 x float> [[TMP2]], i64 2)
29-
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
30-
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP7]], <4 x i32> <i32 0, i32 poison, i32 2, i32 4>
31-
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 2, i32 3>
32-
; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[TMP6]], [[TMP9]]
33-
; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[A]], align 4
29+
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> [[TMP2]], <4 x i32> <i32 0, i32 0, i32 1, i32 2>
30+
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <4 x float> [[TMP6]], [[TMP7]]
31+
; CHECK-NEXT: store <4 x float> [[TMP8]], ptr [[A]], align 4
3432
; CHECK-NEXT: ret void
3533
;
3634
%gep1 = getelementptr inbounds float, ptr %a, i64 1

llvm/test/Transforms/SLPVectorizer/AArch64/vec3-calls.ll

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,27 @@
11
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
2-
; RUN: opt -passes=slp-vectorizer -slp-vectorize-non-power-of-2 -mtriple=arm64-apple-ios -S %s | FileCheck --check-prefixes=CHECK %s
3-
; RUN: opt -passes=slp-vectorizer -slp-vectorize-non-power-of-2=false -mtriple=arm64-apple-ios -S %s | FileCheck --check-prefixes=CHECK %s
2+
; RUN: opt -passes=slp-vectorizer -slp-vectorize-non-power-of-2 -mtriple=arm64-apple-ios -S %s | FileCheck --check-prefixes=CHECK,NON-POWER-OF-2 %s
3+
; RUN: opt -passes=slp-vectorizer -slp-vectorize-non-power-of-2=false -mtriple=arm64-apple-ios -S %s | FileCheck --check-prefixes=CHECK,POWER-OF-2 %s
44

55
define void @vec3_vectorize_call(ptr %Colour, float %0) {
6-
; CHECK-LABEL: @vec3_vectorize_call(
7-
; CHECK-NEXT: entry:
8-
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[COLOUR:%.*]], align 4
9-
; CHECK-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP1]], <2 x float> zeroinitializer, <2 x float> zeroinitializer)
10-
; CHECK-NEXT: store <2 x float> [[TMP2]], ptr [[COLOUR]], align 4
11-
; CHECK-NEXT: [[ARRAYIDX99_I1:%.*]] = getelementptr float, ptr [[COLOUR]], i64 2
12-
; CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0:%.*]], float 0.000000e+00, float 0.000000e+00)
13-
; CHECK-NEXT: store float [[TMP3]], ptr [[ARRAYIDX99_I1]], align 4
14-
; CHECK-NEXT: ret void
6+
; NON-POWER-OF-2-LABEL: @vec3_vectorize_call(
7+
; NON-POWER-OF-2-NEXT: entry:
8+
; NON-POWER-OF-2-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[COLOUR:%.*]], align 4
9+
; NON-POWER-OF-2-NEXT: [[TMP2:%.*]] = insertelement <3 x float> poison, float [[TMP0:%.*]], i32 2
10+
; NON-POWER-OF-2-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <3 x i32> <i32 0, i32 1, i32 poison>
11+
; NON-POWER-OF-2-NEXT: [[TMP4:%.*]] = shufflevector <3 x float> [[TMP2]], <3 x float> [[TMP3]], <3 x i32> <i32 3, i32 4, i32 2>
12+
; NON-POWER-OF-2-NEXT: [[TMP5:%.*]] = call <3 x float> @llvm.fmuladd.v3f32(<3 x float> [[TMP4]], <3 x float> zeroinitializer, <3 x float> zeroinitializer)
13+
; NON-POWER-OF-2-NEXT: store <3 x float> [[TMP5]], ptr [[COLOUR]], align 4
14+
; NON-POWER-OF-2-NEXT: ret void
15+
;
16+
; POWER-OF-2-LABEL: @vec3_vectorize_call(
17+
; POWER-OF-2-NEXT: entry:
18+
; POWER-OF-2-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[COLOUR:%.*]], align 4
19+
; POWER-OF-2-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP1]], <2 x float> zeroinitializer, <2 x float> zeroinitializer)
20+
; POWER-OF-2-NEXT: store <2 x float> [[TMP2]], ptr [[COLOUR]], align 4
21+
; POWER-OF-2-NEXT: [[ARRAYIDX99_I1:%.*]] = getelementptr float, ptr [[COLOUR]], i64 2
22+
; POWER-OF-2-NEXT: [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0:%.*]], float 0.000000e+00, float 0.000000e+00)
23+
; POWER-OF-2-NEXT: store float [[TMP3]], ptr [[ARRAYIDX99_I1]], align 4
24+
; POWER-OF-2-NEXT: ret void
1525
;
1626
entry:
1727
%1 = load float, ptr %Colour, align 4

llvm/test/Transforms/SLPVectorizer/AArch64/vectorizable-selects-uniform-cmps.ll

Lines changed: 9 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -245,34 +245,24 @@ define void @select_uniform_ugt_16xi8(ptr %ptr, i8 %x) {
245245
; CHECK-NEXT: [[L_8:%.*]] = load i8, ptr [[GEP_8]], align 1
246246
; CHECK-NEXT: [[CMP_8:%.*]] = icmp ugt i8 [[L_8]], -1
247247
; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i8 9
248-
; CHECK-NEXT: [[L_9:%.*]] = load i8, ptr [[GEP_9]], align 1
249-
; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i8 10
250-
; CHECK-NEXT: [[L_10:%.*]] = load i8, ptr [[GEP_10]], align 1
251248
; CHECK-NEXT: [[GEP_11:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i8 11
252249
; CHECK-NEXT: [[L_11:%.*]] = load i8, ptr [[GEP_11]], align 1
253250
; CHECK-NEXT: [[GEP_12:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i8 12
254251
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x i8>, ptr [[PTR]], align 1
255252
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i8> [[TMP0]], i32 0
256253
; CHECK-NEXT: [[S_8:%.*]] = select i1 [[CMP_8]], i8 [[TMP1]], i8 [[X:%.*]]
257-
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i8>, ptr [[GEP_12]], align 1
258-
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i8> [[TMP0]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
259-
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[L_9]], i32 9
260-
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[L_10]], i32 10
254+
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x i8>, ptr [[GEP_9]], align 1
255+
; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i8>, ptr [[GEP_12]], align 1
256+
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i8> [[TMP2]], <2 x i8> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
257+
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i8> [[TMP0]], <8 x i8> [[TMP4]], <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 8, i32 9, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
261258
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[L_11]], i32 11
262259
; CHECK-NEXT: [[TMP7:%.*]] = call <16 x i8> @llvm.vector.insert.v16i8.v8i8(<16 x i8> [[TMP6]], <8 x i8> [[TMP0]], i64 0)
263-
; CHECK-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.vector.insert.v16i8.v4i8(<16 x i8> [[TMP7]], <4 x i8> [[TMP2]], i64 12)
260+
; CHECK-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.vector.insert.v16i8.v4i8(<16 x i8> [[TMP7]], <4 x i8> [[TMP3]], i64 12)
264261
; CHECK-NEXT: [[TMP9:%.*]] = icmp ugt <16 x i8> [[TMP8]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
265-
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
266-
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
267-
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i8> [[TMP0]], <8 x i8> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 8, i32 9, i32 10, i32 11>
268-
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[L_9]], i32 9
269-
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[L_10]], i32 10
270-
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[L_11]], i32 11
271-
; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <16 x i8> [[TMP15]], <16 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
272-
; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i8> poison, i8 [[X]], i32 0
273-
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <16 x i8> [[TMP17]], <16 x i8> poison, <16 x i32> zeroinitializer
274-
; CHECK-NEXT: [[TMP19:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP16]], <16 x i8> [[TMP18]]
275-
; CHECK-NEXT: store <16 x i8> [[TMP19]], ptr [[PTR]], align 2
262+
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> poison, i8 [[X]], i32 0
263+
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <16 x i8> [[TMP10]], <16 x i8> poison, <16 x i32> zeroinitializer
264+
; CHECK-NEXT: [[TMP12:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP8]], <16 x i8> [[TMP11]]
265+
; CHECK-NEXT: store <16 x i8> [[TMP12]], ptr [[PTR]], align 2
276266
; CHECK-NEXT: ret void
277267
;
278268
entry:

llvm/test/Transforms/SLPVectorizer/X86/crash_dequeue.ll

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,8 @@ define void @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(ptr %__first, ptr n
1010
; CHECK-LABEL: @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(
1111
; CHECK-NEXT: entry:
1212
; CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__FIRST:%.*]], align 8
13-
; CHECK-NEXT: [[_M_FIRST3_I_I:%.*]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", ptr [[__FIRST]], i64 0, i32 1
14-
; CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__LAST:%.*]], align 8
15-
; CHECK-NEXT: [[_M_FIRST3_I_I83:%.*]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", ptr [[__LAST]], i64 0, i32 1
16-
; CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[_M_FIRST3_I_I83]], align 8
13+
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x ptr>, ptr [[__LAST:%.*]], align 8
14+
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x ptr> [[TMP1]], ptr [[TMP0]], i32 0
1715
; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT:%.*]], label [[WHILE_COND_I_PREHEADER:%.*]]
1816
; CHECK: while.cond.i.preheader:
1917
; CHECK-NEXT: br label [[WHILE_COND_I:%.*]]
@@ -22,10 +20,8 @@ define void @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(ptr %__first, ptr n
2220
; CHECK: while.body.i:
2321
; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT]], label [[WHILE_COND_I]]
2422
; CHECK: _ZSt13adjacent_findISt15_Deque_iteratorIdRdPdEET_S4_S4_.exit:
25-
; CHECK-NEXT: [[TMP3:%.*]] = phi ptr [ [[TMP2]], [[ENTRY:%.*]] ], [ [[TMP2]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]
26-
; CHECK-NEXT: [[TMP4:%.*]] = phi ptr [ [[TMP0]], [[ENTRY]] ], [ [[TMP1]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]
27-
; CHECK-NEXT: store ptr [[TMP4]], ptr [[__FIRST]], align 8
28-
; CHECK-NEXT: store ptr [[TMP3]], ptr [[_M_FIRST3_I_I]], align 8
23+
; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x ptr> [ [[TMP2]], [[ENTRY:%.*]] ], [ [[TMP1]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]
24+
; CHECK-NEXT: store <2 x ptr> [[TMP3]], ptr [[__FIRST]], align 8
2925
; CHECK-NEXT: br i1 undef, label [[IF_THEN_I55:%.*]], label [[WHILE_COND:%.*]]
3026
; CHECK: if.then.i55:
3127
; CHECK-NEXT: br label [[WHILE_COND]]

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -837,21 +837,18 @@ define i32 @maxi8_mutiple_uses(i32) {
837837
; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
838838
; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
839839
; THRESH-NEXT: [[TMP7:%.*]] = load <4 x i32>, ptr getelementptr inbounds ([32 x i32], ptr @arr, i64 0, i64 2), align 8
840-
; THRESH-NEXT: [[TMP8:%.*]] = load i32, ptr getelementptr inbounds ([32 x i32], ptr @arr, i64 0, i64 6), align 8
841-
; THRESH-NEXT: [[TMP9:%.*]] = load i32, ptr getelementptr inbounds ([32 x i32], ptr @arr, i64 0, i64 7), align 4
842-
; THRESH-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP7]])
843-
; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0
844-
; THRESH-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP9]], i32 1
845-
; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
846-
; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP6]], i32 1
847-
; THRESH-NEXT: [[TMP15:%.*]] = icmp sgt <2 x i32> [[TMP12]], [[TMP14]]
848-
; THRESH-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x i32> [[TMP12]], <2 x i32> [[TMP14]]
849-
; THRESH-NEXT: [[TMP17:%.*]] = extractelement <2 x i32> [[TMP16]], i32 0
850-
; THRESH-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP16]], i32 1
851-
; THRESH-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
852-
; THRESH-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP17]], i32 [[TMP18]]
853-
; THRESH-NEXT: [[TMP19:%.*]] = select i1 [[TMP5]], i32 3, i32 4
854-
; THRESH-NEXT: store i32 [[TMP19]], ptr @var, align 8
840+
; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP7]])
841+
; THRESH-NEXT: [[TMP9:%.*]] = load <2 x i32>, ptr getelementptr inbounds ([32 x i32], ptr @arr, i64 0, i64 6), align 8
842+
; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP8]], i32 0
843+
; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP6]], i32 1
844+
; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP10]], [[TMP11]]
845+
; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]]
846+
; THRESH-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
847+
; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
848+
; THRESH-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
849+
; THRESH-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP14]], i32 [[TMP15]]
850+
; THRESH-NEXT: [[TMP16:%.*]] = select i1 [[TMP5]], i32 3, i32 4
851+
; THRESH-NEXT: store i32 [[TMP16]], ptr @var, align 8
855852
; THRESH-NEXT: ret i32 [[OP_RDX5]]
856853
;
857854
%2 = load i32, ptr @arr, align 16

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -100,21 +100,14 @@ define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16)
100100

101101
define <4 x float> @PR16739_byval(ptr nocapture readonly dereferenceable(16) %x) {
102102
; CHECK-LABEL: @PR16739_byval(
103-
; CHECK-NEXT: [[T1:%.*]] = load i64, ptr [[X:%.*]], align 16
104-
; CHECK-NEXT: [[T2:%.*]] = getelementptr inbounds <4 x float>, ptr [[X]], i64 0, i64 2
105-
; CHECK-NEXT: [[T4:%.*]] = load i64, ptr [[T2]], align 8
106-
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
107-
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
108-
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0
103+
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i64>, ptr [[X:%.*]], align 16
104+
; CHECK-NEXT: [[T1:%.*]] = load i64, ptr [[X]], align 16
109105
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
110-
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32
111-
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float
112-
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1
113-
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32
114-
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float
115-
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2
116-
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
117-
; CHECK-NEXT: ret <4 x float> [[T15]]
106+
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 poison, i32 1, i32 1>
107+
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[T8]], i32 1
108+
; CHECK-NEXT: [[TMP4:%.*]] = trunc <4 x i64> [[TMP3]] to <4 x i32>
109+
; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i32> [[TMP4]] to <4 x float>
110+
; CHECK-NEXT: ret <4 x float> [[TMP5]]
118111
;
119112
%t1 = load i64, ptr %x, align 16
120113
%t2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -100,21 +100,14 @@ define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16)
100100

101101
define <4 x float> @PR16739_byval(ptr nocapture readonly dereferenceable(16) %x) {
102102
; CHECK-LABEL: @PR16739_byval(
103-
; CHECK-NEXT: [[T1:%.*]] = load i64, ptr [[X:%.*]], align 16
104-
; CHECK-NEXT: [[T2:%.*]] = getelementptr inbounds <4 x float>, ptr [[X]], i64 0, i64 2
105-
; CHECK-NEXT: [[T4:%.*]] = load i64, ptr [[T2]], align 8
106-
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
107-
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
108-
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0
103+
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i64>, ptr [[X:%.*]], align 16
104+
; CHECK-NEXT: [[T1:%.*]] = load i64, ptr [[X]], align 16
109105
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
110-
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32
111-
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float
112-
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1
113-
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32
114-
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float
115-
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2
116-
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
117-
; CHECK-NEXT: ret <4 x float> [[T15]]
106+
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 poison, i32 1, i32 1>
107+
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[T8]], i32 1
108+
; CHECK-NEXT: [[TMP4:%.*]] = trunc <4 x i64> [[TMP3]] to <4 x i32>
109+
; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i32> [[TMP4]] to <4 x float>
110+
; CHECK-NEXT: ret <4 x float> [[TMP5]]
118111
;
119112
%t1 = load i64, ptr %x, align 16
120113
%t2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2

0 commit comments

Comments
 (0)