[RISCV] Rematerialize vmv.v.x #107993

lukel97 · 2024-09-10T09:45:43Z

Even though vmv.v.x has a non constant scalar operand, we can still rematerialize it because we have split register allocation between vectors and scalars.

Program            regalloc.NumSpills                regalloc.NumReloads                regalloc.NumReMaterialization
                   lhs                rhs      diff  lhs                 rhs      diff  lhs                           rhs      diff
          657.xz_s   289.00             292.00  1.0%   505.00              484.00 -4.2%   613.00                        612.00 -0.2%
          557.xz_r   289.00             292.00  1.0%   505.00              484.00 -4.2%   613.00                        612.00 -0.2%
         505.mcf_r   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
       641.leela_s   356.00             356.00  0.0%   525.00              525.00  0.0%   801.00                        801.00  0.0%
        625.x264_s  1886.00            1886.00  0.0%  4561.00             4561.00  0.0%  2108.00                       2108.00  0.0%
   623.xalancbmk_s  1548.00            1548.00  0.0%  2466.00             2466.00  0.0% 13983.00                      13983.00  0.0%
     620.omnetpp_s   946.00             946.00  0.0%  1485.00             1485.00  0.0%  8413.00                       8413.00  0.0%
         605.mcf_s   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
       541.leela_r   356.00             356.00  0.0%   525.00              525.00  0.0%   801.00                        801.00  0.0%
        525.x264_r  1886.00            1886.00  0.0%  4561.00             4561.00  0.0%  2108.00                       2108.00  0.0%
      510.parest_r 42740.00           42740.00  0.0% 82400.00            82400.00  0.0% 65165.00                      65165.00  0.0%
     520.omnetpp_r   946.00             946.00  0.0%  1485.00             1485.00  0.0%  8413.00                       8413.00  0.0%
        508.namd_r  6598.00            6598.00  0.0% 15509.00            15509.00  0.0%  3164.00                       3164.00  0.0%
         644.nab_s   753.00             753.00  0.0%  1183.00             1183.00  0.0%  1559.00                       1559.00  0.0%
         619.lbm_s    68.00              68.00  0.0%    70.00               70.00  0.0%    20.00                         20.00  0.0%
         544.nab_r   753.00             753.00  0.0%  1183.00             1183.00  0.0%  1559.00                       1559.00  0.0%
         519.lbm_r    73.00              73.00  0.0%    75.00               75.00  0.0%    18.00                         18.00  0.0%
      511.povray_r  1937.00            1937.00  0.0%  3629.00             3629.00  0.0%  4914.00                       4914.00  0.0%
   523.xalancbmk_r  1548.00            1548.00  0.0%  2466.00             2466.00  0.0% 13983.00                      13983.00  0.0%
         502.gcc_r 12450.00           12446.00 -0.0% 27328.00            27312.00 -0.1% 50527.00                      50533.00  0.0%
         602.gcc_s 12450.00           12446.00 -0.0% 27328.00            27312.00 -0.1% 50527.00                      50533.00  0.0%
   500.perlbench_r  4178.00            4175.00 -0.1%  9162.00             9061.00 -1.1% 10223.00                      10392.00  1.7%
   600.perlbench_s  4178.00            4175.00 -0.1%  9162.00             9061.00 -1.1% 10223.00                      10392.00  1.7%
     526.blender_r 13105.00           13081.00 -0.2% 26478.00            26438.00 -0.2% 65188.00                      65230.00  0.1%
     638.imagick_s  4181.00            4157.00 -0.6% 11342.00            11316.00 -0.2% 10884.00                      10938.00  0.5%
     538.imagick_r  4181.00            4157.00 -0.6% 11342.00            11316.00 -0.2% 10884.00                      10938.00  0.5%
   531.deepsjeng_r   353.00             345.00 -2.3%   682.00              674.00 -1.2%   530.00                        538.00  1.5%
   631.deepsjeng_s   353.00             345.00 -2.3%   682.00              674.00 -1.2%   530.00                        538.00  1.5%
Geomean difference                             -0.1%                              -0.5%                                         0.3%

The slight increase in spills in the xz benchmarks are from scalar spills (presumably due to more uses of the scalar operand affecting spill weights), we still manage to remove some vector spills in it too.

InlineSpiller will check to make sure that the scalar operand is live at the point where the rematerialization occurs, so this won't extend any scalar live ranges. However this also means we may not be able to rematerialize in some cases, as shown in @vmv.v.x_needs_extended.

It might be worthwhile teaching InlineSpiller to extend scalar live ranges in a future patch. I experimented with this locally and it reduced spills on 531.deepsjeng_r by a further 3%.

Even though vmv.v.x has a non constant scalar operand, because we have split register allocation between vectors and scalars on RISC-V we can rematerialize it. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumReMaterialization lhs rhs diff lhs rhs diff lhs rhs diff 657.xz_s 289.00 292.00 1.0% 505.00 484.00 -4.2% 613.00 612.00 -0.2% 557.xz_r 289.00 292.00 1.0% 505.00 484.00 -4.2% 613.00 612.00 -0.2% 505.mcf_r 141.00 141.00 0.0% 372.00 372.00 0.0% 123.00 123.00 0.0% 641.leela_s 356.00 356.00 0.0% 525.00 525.00 0.0% 801.00 801.00 0.0% 625.x264_s 1886.00 1886.00 0.0% 4561.00 4561.00 0.0% 2108.00 2108.00 0.0% 623.xalancbmk_s 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 13983.00 13983.00 0.0% 620.omnetpp_s 946.00 946.00 0.0% 1485.00 1485.00 0.0% 8413.00 8413.00 0.0% 605.mcf_s 141.00 141.00 0.0% 372.00 372.00 0.0% 123.00 123.00 0.0% 541.leela_r 356.00 356.00 0.0% 525.00 525.00 0.0% 801.00 801.00 0.0% 525.x264_r 1886.00 1886.00 0.0% 4561.00 4561.00 0.0% 2108.00 2108.00 0.0% 510.parest_r 42740.00 42740.00 0.0% 82400.00 82400.00 0.0% 65165.00 65165.00 0.0% 520.omnetpp_r 946.00 946.00 0.0% 1485.00 1485.00 0.0% 8413.00 8413.00 0.0% 508.namd_r 6598.00 6598.00 0.0% 15509.00 15509.00 0.0% 3164.00 3164.00 0.0% 644.nab_s 753.00 753.00 0.0% 1183.00 1183.00 0.0% 1559.00 1559.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 20.00 20.00 0.0% 544.nab_r 753.00 753.00 0.0% 1183.00 1183.00 0.0% 1559.00 1559.00 0.0% 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 18.00 18.00 0.0% 511.povray_r 1937.00 1937.00 0.0% 3629.00 3629.00 0.0% 4914.00 4914.00 0.0% 523.xalancbmk_r 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 13983.00 13983.00 0.0% 502.gcc_r 12450.00 12446.00 -0.0% 27328.00 27312.00 -0.1% 50527.00 50533.00 0.0% 602.gcc_s 12450.00 12446.00 -0.0% 27328.00 27312.00 -0.1% 50527.00 50533.00 0.0% 500.perlbench_r 4178.00 4175.00 -0.1% 9162.00 9061.00 -1.1% 10223.00 10392.00 1.7% 600.perlbench_s 4178.00 4175.00 -0.1% 9162.00 9061.00 -1.1% 10223.00 10392.00 1.7% 526.blender_r 13105.00 13081.00 -0.2% 26478.00 26438.00 -0.2% 65188.00 65230.00 0.1% 638.imagick_s 4181.00 4157.00 -0.6% 11342.00 11316.00 -0.2% 10884.00 10938.00 0.5% 538.imagick_r 4181.00 4157.00 -0.6% 11342.00 11316.00 -0.2% 10884.00 10938.00 0.5% 531.deepsjeng_r 353.00 345.00 -2.3% 682.00 674.00 -1.2% 530.00 538.00 1.5% 631.deepsjeng_s 353.00 345.00 -2.3% 682.00 674.00 -1.2% 530.00 538.00 1.5% Geomean difference -0.1% -0.5% 0.3% The slight increase in spills in the xz benchmarks are from scalar spills (presumably due to more uses of the scalar operand affecting spill weights), we still manage to remove some vector spills in it too. InlineSpiller will check to make sure that the scalar operand is live at the point where the rematerialization occurs, so this won't extend any scalar live ranges. However this also means we may not be able to rematerialize in some cases, as shown in @vmv.v.x_needs_extended. It might be worthwhile teaching InlineSpiller to extend scalar live ranges in a future patch. I experimented with this locally and it reduced spills on 531.deepsjeng_r by a further 3%.

llvmbot · 2024-09-10T09:46:19Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

Even though vmv.v.x has a non constant scalar operand, we can still rematerialize it because we have split register allocation between vectors and scalars.

Program            regalloc.NumSpills                regalloc.NumReloads                regalloc.NumReMaterialization
                   lhs                rhs      diff  lhs                 rhs      diff  lhs                           rhs      diff
          657.xz_s   289.00             292.00  1.0%   505.00              484.00 -4.2%   613.00                        612.00 -0.2%
          557.xz_r   289.00             292.00  1.0%   505.00              484.00 -4.2%   613.00                        612.00 -0.2%
         505.mcf_r   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
       641.leela_s   356.00             356.00  0.0%   525.00              525.00  0.0%   801.00                        801.00  0.0%
        625.x264_s  1886.00            1886.00  0.0%  4561.00             4561.00  0.0%  2108.00                       2108.00  0.0%
   623.xalancbmk_s  1548.00            1548.00  0.0%  2466.00             2466.00  0.0% 13983.00                      13983.00  0.0%
     620.omnetpp_s   946.00             946.00  0.0%  1485.00             1485.00  0.0%  8413.00                       8413.00  0.0%
         605.mcf_s   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
       541.leela_r   356.00             356.00  0.0%   525.00              525.00  0.0%   801.00                        801.00  0.0%
        525.x264_r  1886.00            1886.00  0.0%  4561.00             4561.00  0.0%  2108.00                       2108.00  0.0%
      510.parest_r 42740.00           42740.00  0.0% 82400.00            82400.00  0.0% 65165.00                      65165.00  0.0%
     520.omnetpp_r   946.00             946.00  0.0%  1485.00             1485.00  0.0%  8413.00                       8413.00  0.0%
        508.namd_r  6598.00            6598.00  0.0% 15509.00            15509.00  0.0%  3164.00                       3164.00  0.0%
         644.nab_s   753.00             753.00  0.0%  1183.00             1183.00  0.0%  1559.00                       1559.00  0.0%
         619.lbm_s    68.00              68.00  0.0%    70.00               70.00  0.0%    20.00                         20.00  0.0%
         544.nab_r   753.00             753.00  0.0%  1183.00             1183.00  0.0%  1559.00                       1559.00  0.0%
         519.lbm_r    73.00              73.00  0.0%    75.00               75.00  0.0%    18.00                         18.00  0.0%
      511.povray_r  1937.00            1937.00  0.0%  3629.00             3629.00  0.0%  4914.00                       4914.00  0.0%
   523.xalancbmk_r  1548.00            1548.00  0.0%  2466.00             2466.00  0.0% 13983.00                      13983.00  0.0%
         502.gcc_r 12450.00           12446.00 -0.0% 27328.00            27312.00 -0.1% 50527.00                      50533.00  0.0%
         602.gcc_s 12450.00           12446.00 -0.0% 27328.00            27312.00 -0.1% 50527.00                      50533.00  0.0%
   500.perlbench_r  4178.00            4175.00 -0.1%  9162.00             9061.00 -1.1% 10223.00                      10392.00  1.7%
   600.perlbench_s  4178.00            4175.00 -0.1%  9162.00             9061.00 -1.1% 10223.00                      10392.00  1.7%
     526.blender_r 13105.00           13081.00 -0.2% 26478.00            26438.00 -0.2% 65188.00                      65230.00  0.1%
     638.imagick_s  4181.00            4157.00 -0.6% 11342.00            11316.00 -0.2% 10884.00                      10938.00  0.5%
     538.imagick_r  4181.00            4157.00 -0.6% 11342.00            11316.00 -0.2% 10884.00                      10938.00  0.5%
   531.deepsjeng_r   353.00             345.00 -2.3%   682.00              674.00 -1.2%   530.00                        538.00  1.5%
   631.deepsjeng_s   353.00             345.00 -2.3%   682.00              674.00 -1.2%   530.00                        538.00  1.5%
Geomean difference                             -0.1%                              -0.5%                                         0.3%

The slight increase in spills in the xz benchmarks are from scalar spills (presumably due to more uses of the scalar operand affecting spill weights), we still manage to remove some vector spills in it too.

InlineSpiller will check to make sure that the scalar operand is live at the point where the rematerialization occurs, so this won't extend any scalar live ranges. However this also means we may not be able to rematerialize in some cases, as shown in @vmv.v.x_needs_extended.

It might be worthwhile teaching InlineSpiller to extend scalar live ranges in a future patch. I experimented with this locally and it reduced spills on 531.deepsjeng_r by a further 3%.

Patch is 35.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107993.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+66-91)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+91-105)
(modified) llvm/test/CodeGen/RISCV/rvv/remat.ll (+141)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 325a50c9f48a1c..2bb9df4ead0e9c 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -169,6 +169,7 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
 bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
     const MachineInstr &MI) const {
   switch (RISCV::getRVVMCOpcode(MI.getOpcode())) {
+  case RISCV::VMV_V_X:
   case RISCV::VMV_V_I:
   case RISCV::VID_V:
     if (MI.getOperand(1).isUndef() &&
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index e11f176bfe6041..c6cecb7d07182f 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -2475,6 +2475,7 @@ multiclass VPseudoUnaryVMV_V_X_I {
         def "_V_" # mx : VPseudoUnaryNoMask<m.vrclass, m.vrclass>,
                          SchedUnary<"WriteVIMovV", "ReadVIMovV", mx,
                                     forcePassthruRead=true>;
+        let isReMaterializable = 1 in
         def "_X_" # mx : VPseudoUnaryNoMask<m.vrclass, GPR>,
                          SchedUnary<"WriteVIMovX", "ReadVIMovX", mx,
                                     forcePassthruRead=true>;
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
index 01aac122d5957d..7031f93edc2c3e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
@@ -2022,14 +2022,9 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    sub sp, sp, a1
 ; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x38, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 56 * vlenb
-; RV32-NEXT:    vmv1r.v v24, v0
+; RV32-NEXT:    vmv1r.v v7, v0
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 5
-; RV32-NEXT:    add a1, sp, a1
-; RV32-NEXT:    addi a1, a1, 16
-; RV32-NEXT:    vs8r.v v16, (a1) # Unknown-size Folded Spill
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    li a2, 48
+; RV32-NEXT:    li a2, 40
 ; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
@@ -2045,7 +2040,7 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
 ; RV32-NEXT:    vsrl.vi v8, v16, 1, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
+; RV32-NEXT:    li a4, 48
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
@@ -2053,67 +2048,53 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    lui a3, 349525
 ; RV32-NEXT:    addi a3, a3, 1365
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v16, a3
-; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 24
-; RV32-NEXT:    mul a3, a3, a4
-; RV32-NEXT:    add a3, sp, a3
-; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v16, (a3) # Unknown-size Folded Spill
+; RV32-NEXT:    vmv.v.x v8, a3
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
-; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vl8r.v v8, (a3) # Unknown-size Folded Reload
-; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vs8r.v v8, (a3) # Unknown-size Folded Spill
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vl8r.v v16, (a3) # Unknown-size Folded Reload
-; RV32-NEXT:    vsub.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vl8r.v v8, (a3) # Unknown-size Folded Reload
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 5
+; RV32-NEXT:    li a4, 48
+; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # Unknown-size Folded Spill
+; RV32-NEXT:    vl8r.v v24, (a3) # Unknown-size Folded Reload
+; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
+; RV32-NEXT:    vand.vv v8, v24, v8, v0.t
+; RV32-NEXT:    vsub.vv v16, v16, v8, v0.t
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v16, a3
-; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 5
-; RV32-NEXT:    add a3, sp, a3
-; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vl8r.v v8, (a3) # Unknown-size Folded Reload
-; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vmv.v.x v8, a3
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 4
+; RV32-NEXT:    li a4, 48
+; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vs8r.v v8, (a3) # Unknown-size Folded Spill
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 5
+; RV32-NEXT:    li a4, 48
+; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vl8r.v v8, (a3) # Unknown-size Folded Reload
-; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
+; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
+; RV32-NEXT:    li a4, 48
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v16, (a3) # Unknown-size Folded Spill
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
-; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 4
-; RV32-NEXT:    add a3, sp, a3
-; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vl8r.v v16, (a3) # Unknown-size Folded Reload
-; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vl8r.v v24, (a3) # Unknown-size Folded Reload
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
 ; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 61681
@@ -2121,25 +2102,26 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    slli a3, a3, 5
+; RV32-NEXT:    li a4, 24
+; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vs8r.v v16, (a3) # Unknown-size Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
+; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 4112
 ; RV32-NEXT:    addi a3, a3, 257
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # Unknown-size Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a3) # Unknown-size Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vmul.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vmul.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    li a2, 56
-; RV32-NEXT:    vsrl.vx v8, v16, a2, v0.t
+; RV32-NEXT:    vsrl.vx v8, v8, a2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 3
 ; RV32-NEXT:    add a3, sp, a3
@@ -2149,8 +2131,8 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:  # %bb.1:
 ; RV32-NEXT:    mv a0, a1
 ; RV32-NEXT:  .LBB46_2:
-; RV32-NEXT:    vmv1r.v v0, v24
-; RV32-NEXT:    li a3, 48
+; RV32-NEXT:    vmv1r.v v0, v7
+; RV32-NEXT:    li a3, 40
 ; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
@@ -2160,71 +2142,64 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    addi a0, sp, 16
 ; RV32-NEXT:    vs8r.v v8, (a0) # Unknown-size Folded Spill
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 24
-; RV32-NEXT:    mul a0, a0, a1
-; RV32-NEXT:    add a0, sp, a0
-; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    addi a0, sp, 16
-; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
-; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 48
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vsub.vv v8, v8, v16, v0.t
+; RV32-NEXT:    addi a0, sp, 16
+; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 48
+; RV32-NEXT:    li a1, 40
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vs8r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vsub.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    li a1, 40
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vs8r.v v8, (a0) # Unknown-size Folded Spill
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    li a1, 48
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 24
+; RV32-NEXT:    li a1, 40
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vs8r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 48
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vsrl.vi v16, v8, 2, v0.t
+; RV32-NEXT:    vs8r.v v16, (a0) # Unknown-size Folded Spill
+; RV32-NEXT:    vmv8r.v v16, v8
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    li a1, 40
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
+; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 24
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
 ; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    slli a0, a0, 5
+; RV32-NEXT:    li a1, 24
+; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
@@ -2386,23 +2361,23 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    vs8r.v v0, (a3) # Unknown-size Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v24, v16, v24
+; RV32-NEXT:    vsub.vv v16, v16, v24
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v0, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v24, v0
-; RV32-NEXT:    vsrl.vi v24, v24, 2
+; RV32-NEXT:    vand.vv v24, v16, v0
+; RV32-NEXT:    vsrl.vi v16, v16, 2
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vs8r.v v0, (a3) # Unknown-size Folded Spill
-; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vadd.vv v24, v16, v24
-; RV32-NEXT:    vsrl.vi v16, v24, 4
+; RV32-NEXT:    vand.vv v16, v16, v0
 ; RV32-NEXT:    vadd.vv v16, v24, v16
+; RV32-NEXT:    vsrl.vi v24, v16, 4
+; RV32-NEXT:    vadd.vv v16, v16, v24
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    addi a3, a3, -241
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
@@ -2437,16 +2412,16 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # Unknown-size Folded Reload
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v24, v8, v24
+; RV32-NEXT:    vsub.vv v8, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # Unknown-size Folded Reload
-; RV32-NEXT:    vand.vv v8, v24, v0
-; RV32-NEXT:    vsrl.vi v24, v24, 2
-; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vadd.vv v8, v8, v24
+; RV32-NEXT:    vand.vv v24, v8, v0
+; RV32-NEXT:    vsrl.vi v8, v8, 2
+; RV32-NEXT:    vand.vv v8, v8, v0
+; RV32-NEXT:    vadd.vv v8, v24, v8
 ; RV32-NEXT:    vsrl.vi v24, v8, 4
 ; RV32-NEXT:    vadd.vv v8, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
diff --git a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
index 0ef0a431dabc43..d36240e493e41d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
@@ -2266,7 +2266,7 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vnot.v v16, v16, v0.t
 ; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    li a5, 24
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
@@ -2283,12 +2283,18 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a5, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v8, a4
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 24
+; RV32-NEXT:    li a5, 40
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
 ; RV32-NEXT:    csrr a4, vlenb
+; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    mul a4, a4, a5
+; RV32-NEXT:    add a4, sp, a4
+; RV32-NEXT:    addi a4, a4, 16
+; RV32-NEXT:    vl8r.v v8, (a4) # Unknown-size Folded Reload
+; RV32-NEXT:    csrr a4, vlenb
 ; RV32-NEXT:    li a5, 48
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
@@ -2297,55 +2303,51 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli zero, a3, e64, m8, ta, ma
 ; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    li a5, 24
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vl8r.v v16, (a4) # Unknown-size Folded Reload
-; RV32-NEXT:    vsub.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsub.vv v16, v16, v8, v0.t
+; RV32-NEXT:    lui a4, 209715
+; RV32-NEXT:    addi a4, a4, 819
+; RV32-NEXT:    vsetvli a5, zero, e32, m8, ta, ma
+; RV32-NEXT:    vmv.v.x v8, a4
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    li a5, 48
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
-; RV32-NEXT:    lui a4, 209715
-; RV32-NEXT:    addi a4, a4, 819
-; RV32-NEXT:    vsetvli a5, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v16, a4
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    li a5, 48
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vl8r.v v8, (a4) # Unknown-size Folded Reload
 ; RV32-NEXT:    vsetvli zero, a3, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    slli a4, a4, 4
+; RV32-NEXT:    li a5, 24
+; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
+; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 40
+; RV32-NEXT:    li a5, 48
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vl8r.v v8, (a4) # Unknown-size Folded Reload
-; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a5, 48
+; RV32-NEXT:    li a5, 24
 ; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
-; RV32-NEXT:    vs8r.v v16, (a4) # Unknown-size Folded Spill
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
-; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    slli a4, a4, 4
-; RV32-NEXT:    add a4, sp, a4
-; RV32-NEXT:    addi a4, a4, 16
-; RV32-NEXT:    vl8r.v v16, (a4) # Unknown-size Folded Reload
-; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vl8r.v v8, (a4) # Unknown-size Folded Reload
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
 ; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a4, 61681
@@ -2353,26 +2355,30 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a5, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v16, a4
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    slli a4, a4, 4
+; RV32-NEXT:    li a5, 24
+; RV32-NEXT:    mul a4, a4, a5
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v16, (a4) # Unknown-size Folded Spill
 ; RV32-NEXT:    vsetvli zero, a3, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
+; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a4, 4112
 ; RV32-NEXT:    addi a4, a4, 257
 ; RV32-NEXT:    vsetvli a5, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a4
+; RV32-NEXT:    vmv.v.x v16, a4
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    slli a4, a4, 3
+; RV32-NEXT:    slli a4, a4, 4
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
-; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a4) # Unknown-size Folded Spill
 ; RV32-NEXT:    vsetvli zero, a3, e64, m8, ta, ma
-; RV32-NEXT:    vmul.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vmul.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    li a3, 56
-; RV32-NEXT:    vsrl.vx v8, v16, a3, v0.t
-; RV32-NEXT:    addi a4, sp, 16
+; RV32-NEXT:    vsrl.vx v8, v8, a3, v0.t
+; RV32-NEXT:    csrr a4, vlenb
+; RV32-NEXT:    slli a4, a4, 3
+; RV32-NEXT:    add a4, sp, a4
+; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
 ; RV32-NEXT:    bltu a0, a1, .LBB46_2
 ; RV32-NEXT:  # %bb.1:
@@ -2382,40 +2388,32 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    slli a1, a1, 5
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
-; RV32-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
+; RV32-NEXT:    vl8r.v v16, (a1) # Unknown-size Folded Reload
 ; RV32-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT:    vsub.vx v16, v8, a2, v0.t
-; RV32-NEXT:    vnot.v v8, v8, v0.t
-; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
-; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    slli a0, a0, 5
-; RV32-NEXT:    add a0, sp, a0
-; RV32-NEXT:    addi a0, a0, 16
+; RV32-NEXT:    vsub.vx v8, v16, a2, v0.t
+; RV32-NEXT:    vnot.v v16, v16, v0.t
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
+; RV32-NEXT:    addi a0, sp, 16
 ; RV32-NEXT:    vs8r.v v8, (a0) # Unknow...
[truncated]

This is the same principle as vmv.v.x in llvm#107993, but for floats. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumRemats lhs rhs diff lhs rhs diff lhs rhs diff 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 1.00 1.00 0.0% 544.nab_r 753.00 753.00 0.0% 1183.00 1183.00 0.0% 318.00 318.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 1.00 1.00 0.0% 644.nab_s 753.00 753.00 0.0% 1183.00 1183.00 0.0% 318.00 318.00 0.0% 508.namd_r 6598.00 6597.00 -0.0% 15509.00 15503.00 -0.0% 2387.00 2393.00 0.3% 526.blender_r 13105.00 13084.00 -0.2% 26478.00 26443.00 -0.1% 18991.00 18996.00 0.0% 510.parest_r 42740.00 42665.00 -0.2% 82400.00 82309.00 -0.1% 5612.00 5648.00 0.6% 511.povray_r 1937.00 1929.00 -0.4% 3629.00 3620.00 -0.2% 517.00 525.00 1.5% 538.imagick_r 4181.00 4150.00 -0.7% 11342.00 11125.00 -1.9% 3366.00 3366.00 0.0% 638.imagick_s 4181.00 4150.00 -0.7% 11342.00 11125.00 -1.9% 3366.00 3366.00 0.0% Geomean difference -0.2% -0.4% 0.2%

Continuing with llvm#107993 and llvm#108007, this handles the last of the main rematerializable vector instructions. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumRemats lhs rhs diff lhs rhs diff lhs rhs diff 508.namd_r 6598.00 6598.00 0.0% 15509.00 15509.00 0.0% 2387.00 2387.00 0.0% 505.mcf_r 141.00 141.00 0.0% 372.00 372.00 0.0% 36.00 36.00 0.0% 641.leela_s 356.00 356.00 0.0% 525.00 525.00 0.0% 117.00 117.00 0.0% 631.deepsjeng_s 353.00 353.00 0.0% 682.00 682.00 0.0% 124.00 124.00 0.0% 623.xalancbmk_s 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 620.00 620.00 0.0% 620.omnetpp_s 946.00 946.00 0.0% 1485.00 1485.00 0.0% 1178.00 1178.00 0.0% 605.mcf_s 141.00 141.00 0.0% 372.00 372.00 0.0% 36.00 36.00 0.0% 557.xz_r 289.00 289.00 0.0% 505.00 505.00 0.0% 172.00 172.00 0.0% 541.leela_r 356.00 356.00 0.0% 525.00 525.00 0.0% 117.00 117.00 0.0% 531.deepsjeng_r 353.00 353.00 0.0% 682.00 682.00 0.0% 124.00 124.00 0.0% 520.omnetpp_r 946.00 946.00 0.0% 1485.00 1485.00 0.0% 1178.00 1178.00 0.0% 523.xalancbmk_r 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 620.00 620.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 1.00 1.00 0.0% 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 1.00 1.00 0.0% 657.xz_s 289.00 289.00 0.0% 505.00 505.00 0.0% 172.00 172.00 0.0% 511.povray_r 1937.00 1936.00 -0.1% 3629.00 3628.00 -0.0% 517.00 518.00 0.2% 502.gcc_r 12450.00 12442.00 -0.1% 27328.00 27317.00 -0.0% 9409.00 9409.00 0.0% 602.gcc_s 12450.00 12442.00 -0.1% 27328.00 27317.00 -0.0% 9409.00 9409.00 0.0% 638.imagick_s 4181.00 4178.00 -0.1% 11342.00 11338.00 -0.0% 3366.00 3368.00 0.1% 538.imagick_r 4181.00 4178.00 -0.1% 11342.00 11338.00 -0.0% 3366.00 3368.00 0.1% 500.perlbench_r 4178.00 4175.00 -0.1% 9162.00 9159.00 -0.0% 2410.00 2410.00 0.0% 600.perlbench_s 4178.00 4175.00 -0.1% 9162.00 9159.00 -0.0% 2410.00 2410.00 0.0% 525.x264_r 1886.00 1884.00 -0.1% 4561.00 4559.00 -0.0% 471.00 471.00 0.0% 625.x264_s 1886.00 1884.00 -0.1% 4561.00 4559.00 -0.0% 471.00 471.00 0.0% 510.parest_r 42740.00 42689.00 -0.1% 82400.00 82252.00 -0.2% 5612.00 5620.00 0.1% 644.nab_s 753.00 752.00 -0.1% 1183.00 1182.00 -0.1% 318.00 318.00 0.0% 544.nab_r 753.00 752.00 -0.1% 1183.00 1182.00 -0.1% 318.00 318.00 0.0% 526.blender_r 13105.00 13084.00 -0.2% 26478.00 26442.00 -0.1% 18991.00 18989.00 -0.0% Geomean difference -0.0% -0.0% 0.0% There's an extra spill in one of the test cases, but it's likely noise from the spill weights and isn't an issue in practice.

4vtomat · 2024-09-10T12:12:15Z

Thanks for the patch, it looks pretty good!
One thing that I'm thinking about is that can you also add a precommit test for remat.ll so that I can see the difference between the original one and rematerialized one.

lukel97 · 2024-09-10T12:22:27Z

Thanks for the patch, it looks pretty good! One thing that I'm thinking about is that can you also add a precommit test for remat.ll so that I can see the difference between the original one and rematerialized one.

It should be precommitted in the PR itself, so you can see the diff if you go to files changed > changes from all commits.

I'm not sure what the convention is nowadays after the move to GitHub. Is precommitting directly to main preferred?

4vtomat · 2024-09-10T12:30:24Z

Thanks for the patch, it looks pretty good! One thing that I'm thinking about is that can you also add a precommit test for remat.ll so that I can see the difference between the original one and rematerialized one.

It should be precommitted in the PR itself, so you can see the diff if you go to files changed > changes from all commits.

Oh, I didn't notice it was already there, thanks!

I'm not sure what the convention is nowadays after the move to GitHub. Is precommitting directly to main preferred?

I'm not sure either, but in my opinion it's good to keep the precommit test for people that are not familiar with this part of code such as me lol~

4vtomat

LGTM, thanks!

topperc

LGTM

This is the same principle as vmv.v.x in llvm#107993, but for floats. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumRemats lhs rhs diff lhs rhs diff lhs rhs diff 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 1.00 1.00 0.0% 544.nab_r 753.00 753.00 0.0% 1183.00 1183.00 0.0% 318.00 318.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 1.00 1.00 0.0% 644.nab_s 753.00 753.00 0.0% 1183.00 1183.00 0.0% 318.00 318.00 0.0% 508.namd_r 6598.00 6597.00 -0.0% 15509.00 15503.00 -0.0% 2387.00 2393.00 0.3% 526.blender_r 13105.00 13084.00 -0.2% 26478.00 26443.00 -0.1% 18991.00 18996.00 0.0% 510.parest_r 42740.00 42665.00 -0.2% 82400.00 82309.00 -0.1% 5612.00 5648.00 0.6% 511.povray_r 1937.00 1929.00 -0.4% 3629.00 3620.00 -0.2% 517.00 525.00 1.5% 538.imagick_r 4181.00 4150.00 -0.7% 11342.00 11125.00 -1.9% 3366.00 3366.00 0.0% 638.imagick_s 4181.00 4150.00 -0.7% 11342.00 11125.00 -1.9% 3366.00 3366.00 0.0% Geomean difference -0.2% -0.4% 0.2%

This is the same principle as vmv.v.x in #107993, but for floats.

Continuing with llvm#107993 and llvm#108007, this handles the last of the main rematerializable vector instructions. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumRemats lhs rhs diff lhs rhs diff lhs rhs diff 508.namd_r 6598.00 6598.00 0.0% 15509.00 15509.00 0.0% 2387.00 2387.00 0.0% 505.mcf_r 141.00 141.00 0.0% 372.00 372.00 0.0% 36.00 36.00 0.0% 641.leela_s 356.00 356.00 0.0% 525.00 525.00 0.0% 117.00 117.00 0.0% 631.deepsjeng_s 353.00 353.00 0.0% 682.00 682.00 0.0% 124.00 124.00 0.0% 623.xalancbmk_s 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 620.00 620.00 0.0% 620.omnetpp_s 946.00 946.00 0.0% 1485.00 1485.00 0.0% 1178.00 1178.00 0.0% 605.mcf_s 141.00 141.00 0.0% 372.00 372.00 0.0% 36.00 36.00 0.0% 557.xz_r 289.00 289.00 0.0% 505.00 505.00 0.0% 172.00 172.00 0.0% 541.leela_r 356.00 356.00 0.0% 525.00 525.00 0.0% 117.00 117.00 0.0% 531.deepsjeng_r 353.00 353.00 0.0% 682.00 682.00 0.0% 124.00 124.00 0.0% 520.omnetpp_r 946.00 946.00 0.0% 1485.00 1485.00 0.0% 1178.00 1178.00 0.0% 523.xalancbmk_r 1548.00 1548.00 0.0% 2466.00 2466.00 0.0% 620.00 620.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 1.00 1.00 0.0% 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 1.00 1.00 0.0% 657.xz_s 289.00 289.00 0.0% 505.00 505.00 0.0% 172.00 172.00 0.0% 511.povray_r 1937.00 1936.00 -0.1% 3629.00 3628.00 -0.0% 517.00 518.00 0.2% 502.gcc_r 12450.00 12442.00 -0.1% 27328.00 27317.00 -0.0% 9409.00 9409.00 0.0% 602.gcc_s 12450.00 12442.00 -0.1% 27328.00 27317.00 -0.0% 9409.00 9409.00 0.0% 638.imagick_s 4181.00 4178.00 -0.1% 11342.00 11338.00 -0.0% 3366.00 3368.00 0.1% 538.imagick_r 4181.00 4178.00 -0.1% 11342.00 11338.00 -0.0% 3366.00 3368.00 0.1% 500.perlbench_r 4178.00 4175.00 -0.1% 9162.00 9159.00 -0.0% 2410.00 2410.00 0.0% 600.perlbench_s 4178.00 4175.00 -0.1% 9162.00 9159.00 -0.0% 2410.00 2410.00 0.0% 525.x264_r 1886.00 1884.00 -0.1% 4561.00 4559.00 -0.0% 471.00 471.00 0.0% 625.x264_s 1886.00 1884.00 -0.1% 4561.00 4559.00 -0.0% 471.00 471.00 0.0% 510.parest_r 42740.00 42689.00 -0.1% 82400.00 82252.00 -0.2% 5612.00 5620.00 0.1% 644.nab_s 753.00 752.00 -0.1% 1183.00 1182.00 -0.1% 318.00 318.00 0.0% 544.nab_r 753.00 752.00 -0.1% 1183.00 1182.00 -0.1% 318.00 318.00 0.0% 526.blender_r 13105.00 13084.00 -0.2% 26478.00 26442.00 -0.1% 18991.00 18989.00 -0.0% Geomean difference -0.0% -0.0% 0.0% There's an extra spill in one of the test cases, but it's likely noise from the spill weights and isn't an issue in practice.

Continuing with #107993 and #108007, this handles the last of the main rematerializable vector instructions. There's an extra spill in one of the test cases, but it's likely noise from the spill weights and isn't an issue in practice.

lukel97 added 2 commits September 10, 2024 17:24

Precommit test

436e0d1

lukel97 requested review from 4vtomat, BeMg, preames, topperc and wangpc-pp September 10, 2024 09:45

llvmbot added the backend:RISC-V label Sep 10, 2024

lukel97 mentioned this pull request Sep 10, 2024

[RISCV] Rematerialize vfmv.v.f #108007

Merged

lukel97 mentioned this pull request Sep 10, 2024

[RISCV] Rematerialize vmv.s.x and vfmv.s.f #108012

Merged

4vtomat approved these changes Sep 10, 2024

View reviewed changes

topperc approved these changes Sep 10, 2024

View reviewed changes

lukel97 merged commit 77fc8da into llvm:main Sep 11, 2024
10 checks passed

lukel97 added a commit that referenced this pull request Sep 11, 2024

[RISCV] Rematerialize vfmv.v.f (#108007)

21a0176

This is the same principle as vmv.v.x in #107993, but for floats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Rematerialize vmv.v.x #107993

[RISCV] Rematerialize vmv.v.x #107993

Uh oh!

lukel97 commented Sep 10, 2024

Uh oh!

llvmbot commented Sep 10, 2024

Uh oh!

4vtomat commented Sep 10, 2024 •

edited

Loading

Uh oh!

lukel97 commented Sep 10, 2024

Uh oh!

4vtomat commented Sep 10, 2024

Uh oh!

4vtomat left a comment •

edited

Loading

Uh oh!

topperc left a comment

Uh oh!

Uh oh!

Uh oh!

[RISCV] Rematerialize vmv.v.x #107993

[RISCV] Rematerialize vmv.v.x #107993

Uh oh!

Conversation

lukel97 commented Sep 10, 2024

Uh oh!

llvmbot commented Sep 10, 2024

Uh oh!

4vtomat commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Sep 10, 2024

Uh oh!

4vtomat commented Sep 10, 2024

Uh oh!

4vtomat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

4vtomat commented Sep 10, 2024 •

edited

Loading

4vtomat left a comment •

edited

Loading