[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

preames · 2025-03-20T22:44:16Z

This is an adapted version of arsenm's #120524.

The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise.

This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.

@arsenm

This is an adapted version of arsenm's llvm#120524. The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise. This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.

llvmbot · 2025-03-20T22:44:49Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This is an adapted version of arsenm's #120524.

The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise.

This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.

Full diff: https://github.com/llvm/llvm-project/pull/132296.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+54-29)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir (-4)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd31c842312ea..3eb3666b01870 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
   return isLoadFromStackSlot(MI, FrameIndex, Dummy);
 }
 
+static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) {
+  switch (Opcode) {
+  default:
+    return std::nullopt;
+  case RISCV::VS1R_V:
+  case RISCV::VL1RE8_V:
+  case RISCV::VL1RE16_V:
+  case RISCV::VL1RE32_V:
+  case RISCV::VL1RE64_V:
+    return 1;
+  case RISCV::VS2R_V:
+  case RISCV::VL2RE8_V:
+  case RISCV::VL2RE16_V:
+  case RISCV::VL2RE32_V:
+  case RISCV::VL2RE64_V:
+    return 2;
+  case RISCV::VS4R_V:
+  case RISCV::VL4RE8_V:
+  case RISCV::VL4RE16_V:
+  case RISCV::VL4RE32_V:
+  case RISCV::VL4RE64_V:
+    return 4;
+  case RISCV::VS8R_V:
+  case RISCV::VL8RE8_V:
+  case RISCV::VL8RE16_V:
+  case RISCV::VL8RE32_V:
+  case RISCV::VL8RE64_V:
+    return 8;
+  }
+}
+
 Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
                                              int &FrameIndex,
                                              TypeSize &MemBytes) const {
@@ -125,6 +156,17 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
   case RISCV::FLD:
     MemBytes = TypeSize::getFixed(8);
     break;
+  case RISCV::VL1RE8_V:
+  case RISCV::VL2RE8_V:
+  case RISCV::VL4RE8_V:
+  case RISCV::VL8RE8_V:
+    if (!MI.getOperand(1).isFI())
+      return Register();
+    FrameIndex = MI.getOperand(1).getIndex();
+    unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+    unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+    MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+    return MI.getOperand(0).getReg();
   }
 
   if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -165,6 +207,17 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
   case RISCV::FSD:
     MemBytes = TypeSize::getFixed(8);
     break;
+  case RISCV::VS1R_V:
+  case RISCV::VS2R_V:
+  case RISCV::VS4R_V:
+  case RISCV::VS8R_V:
+    if (!MI.getOperand(1).isFI())
+      return Register();
+    FrameIndex = MI.getOperand(1).getIndex();
+    unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+    unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+    MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+    return MI.getOperand(0).getReg();
   }
 
   if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -4071,40 +4124,12 @@ bool RISCV::isZEXT_B(const MachineInstr &MI) {
          MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 255;
 }
 
-static bool isRVVWholeLoadStore(unsigned Opcode) {
-  switch (Opcode) {
-  default:
-    return false;
-  case RISCV::VS1R_V:
-  case RISCV::VS2R_V:
-  case RISCV::VS4R_V:
-  case RISCV::VS8R_V:
-  case RISCV::VL1RE8_V:
-  case RISCV::VL2RE8_V:
-  case RISCV::VL4RE8_V:
-  case RISCV::VL8RE8_V:
-  case RISCV::VL1RE16_V:
-  case RISCV::VL2RE16_V:
-  case RISCV::VL4RE16_V:
-  case RISCV::VL8RE16_V:
-  case RISCV::VL1RE32_V:
-  case RISCV::VL2RE32_V:
-  case RISCV::VL4RE32_V:
-  case RISCV::VL8RE32_V:
-  case RISCV::VL1RE64_V:
-  case RISCV::VL2RE64_V:
-  case RISCV::VL4RE64_V:
-  case RISCV::VL8RE64_V:
-    return true;
-  }
-}
-
 bool RISCV::isRVVSpill(const MachineInstr &MI) {
   // RVV lacks any support for immediate addressing for stack addresses, so be
   // conservative.
   unsigned Opcode = MI.getOpcode();
   if (!RISCVVPseudosTable::getPseudoInfo(Opcode) &&
-      !isRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
+      !getNFForRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
     return false;
   return true;
 }
diff --git a/llvm/test/CodeGen/RISCV/rvv/expandload.ll b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
index 25706bdec55c3..145b5794ce64f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/expandload.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
@@ -273,16 +273,16 @@ define <256 x i8> @test_expandload_v256i8(ptr %base, <256 x i1> %mask, <256 x i8
 ; CHECK-RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, mu
 ; CHECK-RV32-NEXT:    viota.m v24, v0
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
-; CHECK-RV32-NEXT:    li a1, 24
-; CHECK-RV32-NEXT:    mul a0, a0, a1
+; CHECK-RV32-NEXT:    slli a0, a0, 4
 ; CHECK-RV32-NEXT:    add a0, sp, a0
 ; CHECK-RV32-NEXT:    addi a0, a0, 16
-; CHECK-RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
-; CHECK-RV32-NEXT:    slli a0, a0, 4
+; CHECK-RV32-NEXT:    li a1, 24
+; CHECK-RV32-NEXT:    mul a0, a0, a1
 ; CHECK-RV32-NEXT:    add a0, sp, a0
 ; CHECK-RV32-NEXT:    addi a0, a0, 16
-; CHECK-RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
 ; CHECK-RV32-NEXT:    vrgather.vv v8, v16, v24, v0.t
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
 ; CHECK-RV32-NEXT:    li a1, 24
diff --git a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
index 6cf6307322643..bfb9b31de5be0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
@@ -51,8 +51,6 @@ body:             |
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
-    ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
-    ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
@@ -214,8 +212,6 @@ body:             |
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
-    ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
-    ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)

topperc · 2025-03-21T02:21:04Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
  return isLoadFromStackSlot(MI, FrameIndex, Dummy);
 }

+static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) {


NF -> LMUL? NF is normally used for segment load/store.

I think it is because, for whole register load/store, the nf field encodes how many vector registers to load and store. But I agree LMUL is a better software term here.

This was taken from the spec naming, but happy to switch it over. I'd been sort hoping someone would tell me we already had a utility function for this I'd missed. :)

topperc

LGTM

preames requested review from arsenm, lukel97, topperc and wangpc-pp March 20, 2025 22:44

llvmbot added the backend:RISC-V label Mar 20, 2025

topperc reviewed Mar 21, 2025

View reviewed changes

Address review comment

95dd62c

topperc approved these changes Mar 21, 2025

View reviewed changes

preames merged commit 5f94992 into llvm:main Mar 21, 2025
11 checks passed

preames deleted the pr-riscv-spill-fill-size-scalable branch March 21, 2025 17:04

preames mentioned this pull request Mar 21, 2025

RISCV: Implement isLoadFromStackSlot/isStoreToStackSlot for rvv #120524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

Uh oh!

preames commented Mar 20, 2025

Uh oh!

llvmbot commented Mar 20, 2025

Uh oh!

topperc Mar 21, 2025

Uh oh!

wangpc-pp Mar 21, 2025

Uh oh!

preames Mar 21, 2025

Uh oh!

topperc left a comment

Uh oh!

Uh oh!

Uh oh!

[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

Uh oh!

Conversation

preames commented Mar 20, 2025

Uh oh!

llvmbot commented Mar 20, 2025

Uh oh!

topperc Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

preames Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!