[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

lukel97 · 2025-01-23T04:56:14Z

The motivation for this to allow reducing the vl when a user is a ternary pseudo, where the third operand is tied and also acts as a passthru.

When checking the users of an instruction, we currently bail if the user is used as a passthru because all of its elements past vl will be used for the tail.

We can allow passthru users if we know the tail of their result isn't used, which we will have computed beforehand after #124530

It's worth noting that this is all irrelevant of the tail policy, because tail agnostic still ends up using the passthru.

I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on qemu with rvv_ta_all_1s=true)

Fixes #123760

llvmbot · 2025-01-23T04:56:44Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

The motivation for this to allow reducing the vl when a user is a ternary pseudo, where the third operand is tied and also acts as a passthru.

When checking the users of an instruction, we currently bail if the user is used as a passthru because all of its elements past vl will be used for the tail.

We can allow passthru users if we know the tail of their result isn't used, and we can reuse checkUsers to check this.

It's worth noting that this is all irrelevant of the tail policy, because tail agnostic still ends up using the passthru.

I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on qemu with rvv_ta_all_1s=true)

Fixes #123760

Full diff: https://github.com/llvm/llvm-project/pull/124066.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+14-8)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll (+2-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+23)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.mir (+61)
(modified) llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll (+12-24)

diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 5f2d4e0585a0b3..4c6ba26ba6e193 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -53,7 +53,7 @@ class RISCVVLOptimizer : public MachineFunctionPass {
   std::optional<MachineOperand> getMinimumVLForUser(MachineOperand &UserOp);
   /// Returns the largest common VL MachineOperand that may be used to optimize
   /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+  std::optional<MachineOperand> checkUsers(const MachineInstr &MI);
   bool tryReduceVL(MachineInstr &MI);
   bool isCandidate(const MachineInstr &MI) const;
 };
@@ -1221,7 +1221,8 @@ RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
   return VLOp;
 }
 
-std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
+std::optional<MachineOperand>
+RISCVVLOptimizer::checkUsers(const MachineInstr &MI) {
   // FIXME: Avoid visiting each user for each time we visit something on the
   // worklist, combined with an extra visit from the outer loop. Restructure
   // along lines of an instcombine style worklist which integrates the outer
@@ -1235,16 +1236,21 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
       return std::nullopt;
     }
 
-    // Tied operands might pass through.
-    if (UserOp.isTied()) {
-      LLVM_DEBUG(dbgs() << "    Abort because user used as tied operand\n");
-      return std::nullopt;
-    }
-
     auto VLOp = getMinimumVLForUser(UserOp);
     if (!VLOp)
       return std::nullopt;
 
+    // If the user is a passthru, we will need to preserve it if its tail is
+    // demanded.
+    if (UserOp.isTied()) {
+      auto DemandedVL = checkUsers(UserMI);
+      if (!DemandedVL || !RISCV::isVLKnownLE(*DemandedVL, *VLOp)) {
+        LLVM_DEBUG(dbgs() << "    Abort because user is passthru in "
+                             "instruction with demanded tail\n");
+        return std::nullopt;
+      }
+    }
+
     // Use the largest VL among all the users. If we cannot determine this
     // statically, then we cannot optimize the VL.
     if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index d329979857a6b9..403cc0eb9dce1d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -894,9 +894,10 @@ define void @test_dag_loop() {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m4, ta, ma
 ; CHECK-NEXT:    vmclr.m v0
+; CHECK-NEXT:    vsetivli zero, 0, e8, m4, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vmv.v.i v12, 0
-; CHECK-NEXT:    vsetivli zero, 0, e8, m4, tu, mu
+; CHECK-NEXT:    vsetvli zero, zero, e8, m4, tu, mu
 ; CHECK-NEXT:    vssubu.vx v12, v8, zero, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e8, m4, ta, ma
 ; CHECK-NEXT:    vmseq.vv v0, v12, v8
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
index 1cc30f077feb4a..8d44eabafcecfc 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
@@ -150,3 +150,26 @@ define <vscale x 4 x i32> @dont_optimize_tied_def(<vscale x 4 x i32> %a, <vscale
   ret <vscale x 4 x i32> %2
 }
 
+define void @optimize_ternary_use(<vscale x 4 x i16> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c, ptr %p, iXLen %vl) {
+; NOVLOPT-LABEL: optimize_ternary_use:
+; NOVLOPT:       # %bb.0:
+; NOVLOPT-NEXT:    vsetvli a2, zero, e32, m2, ta, ma
+; NOVLOPT-NEXT:    vzext.vf2 v14, v8
+; NOVLOPT-NEXT:    vsetvli zero, a1, e32, m2, ta, ma
+; NOVLOPT-NEXT:    vmadd.vv v14, v10, v12
+; NOVLOPT-NEXT:    vse32.v v14, (a0)
+; NOVLOPT-NEXT:    ret
+;
+; VLOPT-LABEL: optimize_ternary_use:
+; VLOPT:       # %bb.0:
+; VLOPT-NEXT:    vsetvli zero, a1, e32, m2, ta, ma
+; VLOPT-NEXT:    vzext.vf2 v14, v8
+; VLOPT-NEXT:    vmadd.vv v14, v10, v12
+; VLOPT-NEXT:    vse32.v v14, (a0)
+; VLOPT-NEXT:    ret
+  %1 = zext <vscale x 4 x i16> %a to <vscale x 4 x i32>
+  %2 = mul <vscale x 4 x i32> %b, %1
+  %3 = add <vscale x 4 x i32> %2, %c
+  call void @llvm.riscv.vse(<vscale x 4 x i32> %3, ptr %p, iXLen %vl)
+  ret void
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
index 027eb8ca3c17f0..5e6af334f02f85 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
@@ -149,3 +149,64 @@ body: |
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_WV_M1_TIED $noreg, %x, 1, 3 /* e8 */, 0 /* tu, mu */
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     %y:vrm2 = PseudoVWADD_WV_M1_TIED $noreg, %x, 1, 3 /* e8 */, 0 /* tu, mu */
+...
+---
+# Can reduce %x even though %y uses it as a passthru, because %y's inactive elements aren't demanded
+name: passthru_not_demanded
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: passthru_not_demanded
+    ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %z:vr = PseudoVADD_VV_M1 $noreg, %y, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %z:vr = PseudoVADD_VV_M1 $noreg, %y, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+...
+---
+# Can't reduce %x because %y uses it as a passthru, and %y's inactive elements are demanded by %z
+name: passthru_demanded
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: passthru_demanded
+    ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %z:vr = PseudoVADD_VV_M1 $noreg, %y, $noreg, 2, 3 /* e8 */, 0 /* tu, mu */
+    %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %z:vr = PseudoVADD_VV_M1 $noreg, %y, $noreg, 2, 3 /* e8 */, 0 /* tu, mu */
+...
+---
+# Can reduce %x even though %y uses it as a passthru, because %y's inactive elements aren't demanded
+name: passthru_not_demanded_passthru_chain
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: passthru_not_demanded_passthru_chain
+    ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %z:vr = PseudoVADD_VV_M1 %y, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %a:vr = PseudoVADD_VV_M1 %z, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %b:vr = PseudoVADD_VV_M1 $noreg, %a, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %z:vr = PseudoVADD_VV_M1 %y, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %a:vr = PseudoVADD_VV_M1 %z, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %b:vr = PseudoVADD_VV_M1 $noreg, %a, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+...
+---
+# Can't reduce %x because %y uses it as a passthru, and %y's inactive elements are ultimately demanded in %b
+name: passthru_demanded_passthru_chain
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: passthru_demanded_passthru_chain
+    ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %z:vr = PseudoVADD_VV_M1 %y, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %a:vr = PseudoVADD_VV_M1 %z, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %b:vr = PseudoVADD_VV_M1 $noreg, %a, $noreg, 2, 3 /* e8 */, 0 /* tu, mu */
+    %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
+    %y:vr = PseudoVADD_VV_M1 %x, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %z:vr = PseudoVADD_VV_M1 %y, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %a:vr = PseudoVADD_VV_M1 %z, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    %b:vr = PseudoVADD_VV_M1 $noreg, %a, $noreg, 2, 3 /* e8 */, 0 /* tu, mu */
+...
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
index eb74b238c01b95..a29af3d5b54b0f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
@@ -1638,9 +1638,8 @@ define <vscale x 1 x i64> @vmadd_vx_nxv1i64(<vscale x 1 x i64> %a, i64 %b, <vsca
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m1, ta, ma
-; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
+; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vmadd.vv v10, v8, v9
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, tu, ma
 ; RV32-NEXT:    vmerge.vvm v8, v8, v10, v0
@@ -1669,9 +1668,8 @@ define <vscale x 1 x i64> @vmadd_vx_nxv1i64_unmasked(<vscale x 1 x i64> %a, i64
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m1, ta, ma
-; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
+; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vmadd.vv v10, v8, v9
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, tu, ma
 ; RV32-NEXT:    vmv.v.v v8, v10
@@ -1713,9 +1711,8 @@ define <vscale x 1 x i64> @vmadd_vx_nxv1i64_ta(<vscale x 1 x i64> %a, i64 %b, <v
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m1, ta, ma
-; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
+; RV32-NEXT:    vlse64.v v10, (a0), zero
 ; RV32-NEXT:    vmadd.vv v10, v8, v9
 ; RV32-NEXT:    vmerge.vvm v8, v8, v10, v0
 ; RV32-NEXT:    addi sp, sp, 16
@@ -1776,9 +1773,8 @@ define <vscale x 2 x i64> @vmadd_vx_nxv2i64(<vscale x 2 x i64> %a, i64 %b, <vsca
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m2, ta, ma
-; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m2, ta, ma
+; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vmadd.vv v12, v8, v10
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, tu, ma
 ; RV32-NEXT:    vmerge.vvm v8, v8, v12, v0
@@ -1807,9 +1803,8 @@ define <vscale x 2 x i64> @vmadd_vx_nxv2i64_unmasked(<vscale x 2 x i64> %a, i64
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m2, ta, ma
-; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m2, ta, ma
+; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vmadd.vv v12, v8, v10
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, tu, ma
 ; RV32-NEXT:    vmv.v.v v8, v12
@@ -1851,9 +1846,8 @@ define <vscale x 2 x i64> @vmadd_vx_nxv2i64_ta(<vscale x 2 x i64> %a, i64 %b, <v
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m2, ta, ma
-; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m2, ta, ma
+; RV32-NEXT:    vlse64.v v12, (a0), zero
 ; RV32-NEXT:    vmadd.vv v12, v8, v10
 ; RV32-NEXT:    vmerge.vvm v8, v8, v12, v0
 ; RV32-NEXT:    addi sp, sp, 16
@@ -1914,9 +1908,8 @@ define <vscale x 4 x i64> @vmadd_vx_nxv4i64(<vscale x 4 x i64> %a, i64 %b, <vsca
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m4, ta, ma
-; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m4, ta, ma
+; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vmadd.vv v16, v8, v12
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, tu, ma
 ; RV32-NEXT:    vmerge.vvm v8, v8, v16, v0
@@ -1945,9 +1938,8 @@ define <vscale x 4 x i64> @vmadd_vx_nxv4i64_unmasked(<vscale x 4 x i64> %a, i64
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m4, ta, ma
-; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m4, ta, ma
+; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vmadd.vv v16, v8, v12
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, tu, ma
 ; RV32-NEXT:    vmv.v.v v8, v16
@@ -1989,9 +1981,8 @@ define <vscale x 4 x i64> @vmadd_vx_nxv4i64_ta(<vscale x 4 x i64> %a, i64 %b, <v
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m4, ta, ma
-; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m4, ta, ma
+; RV32-NEXT:    vlse64.v v16, (a0), zero
 ; RV32-NEXT:    vmadd.vv v16, v8, v12
 ; RV32-NEXT:    vmerge.vvm v8, v8, v16, v0
 ; RV32-NEXT:    addi sp, sp, 16
@@ -2054,9 +2045,8 @@ define <vscale x 8 x i64> @vmadd_vx_nxv8i64(<vscale x 8 x i64> %a, i64 %b, <vsca
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m8, ta, ma
-; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
+; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vmadd.vv v24, v8, v16
 ; RV32-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV32-NEXT:    vmerge.vvm v8, v8, v24, v0
@@ -2085,9 +2075,8 @@ define <vscale x 8 x i64> @vmadd_vx_nxv8i64_unmasked(<vscale x 8 x i64> %a, i64
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m8, ta, ma
-; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
+; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vmadd.vv v24, v8, v16
 ; RV32-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV32-NEXT:    vmv.v.v v8, v24
@@ -2130,9 +2119,8 @@ define <vscale x 8 x i64> @vmadd_vx_nxv8i64_ta(<vscale x 8 x i64> %a, i64 %b, <v
 ; RV32-NEXT:    sw a0, 8(sp)
 ; RV32-NEXT:    sw a1, 12(sp)
 ; RV32-NEXT:    addi a0, sp, 8
-; RV32-NEXT:    vsetvli a1, zero, e64, m8, ta, ma
-; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
+; RV32-NEXT:    vlse64.v v24, (a0), zero
 ; RV32-NEXT:    vmadd.vv v24, v8, v16
 ; RV32-NEXT:    vmerge.vvm v8, v8, v24, v0
 ; RV32-NEXT:    addi sp, sp, 16

lukel97 · 2025-01-23T05:00:49Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

@@ -53,7 +53,7 @@ class RISCVVLOptimizer : public MachineFunctionPass {
  std::optional<MachineOperand> getMinimumVLForUser(MachineOperand &UserOp);
  /// Returns the largest common VL MachineOperand that may be used to optimize
  /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+  std::optional<MachineOperand> checkUsers(const MachineInstr &MI);


Needed to tweak the signature here since UserMI is const

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

michaelmaitland

LGTM. Nice find!

michaelmaitland · 2025-01-23T14:59:09Z

Do we get any impact if we remove MI.getNumDefs() != 1 in is candidate? Not sure if that would be related here.

preames

Hold on, this is now an unbounded recursion. This can cause both stack explosion, and more likely, significant compile time regressions.

I think you can achieve the same thing by adding a transform which uses the result of checkUsers to clear a pass thru to NoRegister. (That is, convert an instruction from TA/TU to tail undefined.) When we visit that instruction, it would propagate up.

lukel97 · 2025-01-23T18:50:09Z

Hold on, this is now an unbounded recursion. This can cause both stack explosion, and more likely, significant compile time regressions.

Yeah agreed the recursion is less than ideal, but it's only unbounded by the number of passthrus that are fed into passthrus, which doesn't seem to be common in practice.

I think you can achieve the same thing by adding a transform which uses the result of checkUsers to clear a pass thru to NoRegister. (That is, convert an instruction from TA/TU to tail undefined.) When we visit that instruction, it would propagate up.

Unfortunately there's no way to represent a tail undefined ternary pseudo, because the passthru is also the third operand. I wrote a bit more of a description of this in #123760.

I tried adding new _INACTIVE_UNDEF pseudos which propagate the information at isel that the tail is undef, but the duplication ended up causing a bunch of pain in other places, e.g. we need to handle new opcodes for commuting.

Given that we're checking the demanded VL of the users downwards, and we iterate upwards over each block, it might be possible to memoize the result of checkUsers? We would need to invalidate it whenever we reduce a VL though. I'll have a think about this tonight anyway to see if there's another approach.

lukel97 · 2025-01-23T18:53:30Z

Do we get any impact if we remove MI.getNumDefs() != 1 in is candidate? Not sure if that would be related here.

Off the top of my head, vsadd etc. have extra implicit defs, but I'm not sure if it's safe to change their VL. Fault-first loads have an extra explicit def, but I don't think they're safe to reduce either

michaelmaitland · 2025-01-23T19:58:29Z

Do we get any impact if we remove MI.getNumDefs() != 1 in is candidate? Not sure if that would be related here.

Off the top of my head, vsadd etc. have extra implicit defs, but I'm not sure if it's safe to change their VL. Fault-first loads have an extra explicit def, but I don't think they're safe to reduce either

Fault-first loads have an extra explicit def, but I don't think they're safe to reduce either

I had played around with a patch in the past that changed this line to MI.getNumExplicitDefs() != 1 to avoid the fault-first loads. vsadd and others who mess with vxsat are not part of isSupportedInstr. I think we could probably remove this line. It was placed here to guard against passthru. I think this would give you more optimization for passthrus lanes that are not read.

This replaces the worklist by instead computing what VL is demanded by each instruction's users first. checkUsers essentially already did this, so it's been renamed to computeDemandedVL. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in llvm#124066. Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.

lukel97 · 2025-01-27T14:33:59Z

Given that we're checking the demanded VL of the users downwards, and we iterate upwards over each block, it might be possible to memoize the result of checkUsers? We would need to invalidate it whenever we reduce a VL though. I'll have a think about this tonight anyway to see if there's another approach.

The solution I ended up with is in #124530. It turns out we don't need to invalidate anything because the demanded VL never changes: We can compute the minimum possible VL for each instruction up front without needing to mutate anything.

I'll rebase this PR on top of it once it's landed, as well as #124549 which was split off to help explain some diffs.

This replaces the worklist by instead computing what VL is demanded by each instruction's users first. checkUsers essentially already did this, so it's been renamed to computeDemandedVL. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in llvm#124066. Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.

This replaces the worklist by instead computing what VL is demanded by each instruction's users first, which is done via checkUsers. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be linear in complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.

… demanded

lukel97 · 2025-01-29T05:11:27Z

I've rebased this on top of #124530, so now there's no recursion

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

preames

LGTM

Only point I'm still a bit hesitant on is if we have some instruction with "weird" passthru usage (i.e. read after VL, etc..). I can't think of any, and spot checking existing code doesn't seem to reveal in any special casing, so I think we're okay here. Both passthru for TU/MU and FMA style extra sources should be fine with the code as written.

I guess one other point is that I'm surprised this has so little impact on the tests. Apparently we really don't use anything other than tail undef in practice do we?

lukel97 · 2025-01-30T02:29:20Z

Only point I'm still a bit hesitant on is if we have some instruction with "weird" passthru usage (i.e. read after VL, etc..).

vslidedowns and vrgathers can read past VL, but we guard against them already via mayReadPastVL.

Apparently we really don't use anything other than tail undef in practice do we?

It shows up in a lot more places if we stop emitting non-trapping VP intrinsics with EVL tail folding.

We also probably never see tail undef ternary pseudos, because that would mean one of the operands would have to be undef too.

After llvm#124066 we started allowing users that are passthrus. However for widening/narrowing instructions we were returning the wrong operand info for passthru operands since it originally assumed the operand would never be a passthru. This fixes it by handling it in IsMODef.

…#126504) After #124066 we started allowing users that are passthrus. However for widening/narrowing instructions we were returning the wrong operand info for passthru operands since it originally assumed the operand would never be a passthru. This fixes it by handling it in IsMODef.

…llvm#126504) After llvm#124066 we started allowing users that are passthrus. However for widening/narrowing instructions we were returning the wrong operand info for passthru operands since it originally assumed the operand would never be a passthru. This fixes it by handling it in IsMODef.

lukel97 requested review from michaelmaitland, preames, topperc and wangpc-pp January 23, 2025 04:56

llvmbot added the backend:RISC-V label Jan 23, 2025

lukel97 commented Jan 23, 2025

View reviewed changes

michaelmaitland reviewed Jan 23, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Outdated Show resolved Hide resolved

michaelmaitland approved these changes Jan 23, 2025

View reviewed changes

preames requested changes Jan 23, 2025

View reviewed changes

lukel97 mentioned this pull request Jan 27, 2025

[RISCV][VLOPT] Compute demanded VLs up front #124530

Merged

lukel97 added 2 commits January 29, 2025 13:06

Precommit tests

3f55135

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't…

f77d01d

… demanded

lukel97 force-pushed the vloptimizer/passthru-users branch from c6cfbe7 to f77d01d Compare January 29, 2025 05:09

lukel97 requested review from michaelmaitland and preames January 29, 2025 05:11

michaelmaitland reviewed Jan 29, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Show resolved Hide resolved

preames approved these changes Jan 29, 2025

View reviewed changes

lukel97 merged commit eb7e199 into llvm:main Jan 30, 2025
8 checks passed

lukel97 mentioned this pull request Feb 10, 2025

[RISCV][VLOPT] Fix passthru operand info for mixed-width instructions #126504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

Uh oh!

lukel97 commented Jan 23, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 23, 2025

Uh oh!

lukel97 Jan 23, 2025

Uh oh!

Uh oh!

michaelmaitland left a comment

Uh oh!

michaelmaitland commented Jan 23, 2025

Uh oh!

preames left a comment

Uh oh!

lukel97 commented Jan 23, 2025 •

edited

Loading

Uh oh!

lukel97 commented Jan 23, 2025

Uh oh!

michaelmaitland commented Jan 23, 2025 •

edited

Loading

Uh oh!

lukel97 commented Jan 27, 2025

Uh oh!

lukel97 commented Jan 29, 2025

Uh oh!

Uh oh!

preames left a comment

Uh oh!

lukel97 commented Jan 30, 2025

Uh oh!

Uh oh!

Uh oh!

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

Uh oh!

Conversation

lukel97 commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 23, 2025

Uh oh!

lukel97 Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelmaitland left a comment

Choose a reason for hiding this comment

Uh oh!

michaelmaitland commented Jan 23, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jan 23, 2025

Uh oh!

michaelmaitland commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jan 27, 2025

Uh oh!

lukel97 commented Jan 29, 2025

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 30, 2025

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Jan 23, 2025 •

edited

Loading

lukel97 commented Jan 23, 2025 •

edited

Loading

michaelmaitland commented Jan 23, 2025 •

edited

Loading