Skip to content

[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 21, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Mar 20, 2025

This is an adapted version of arsenm's #120524.

The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise.

This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.

This is an adapted version of arsenm's llvm#120524.

The intention of the change is to enable dead stack slot copy elimination
in StackSlotColoring for vector loads and stores.  In terms of testing,
see stack-slot-coloring.mir.  This has little impact on in tree tests
otherwise.

This change has a different and smaller set of test diffs then then
@arsenm's patch because I'm using scalable sizes for the LMULs, not
a single signal value.  His patch allowed vector load/store pairs
of different width to be deleted, mine does not.  There's also
simply been a lot of churn in regalloc behavior on these particular
tests recently, so that may explain some of the diff as well.
@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This is an adapted version of arsenm's #120524.

The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise.

This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.


Full diff: https://github.com/llvm/llvm-project/pull/132296.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+54-29)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir (-4)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd31c842312ea..3eb3666b01870 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
   return isLoadFromStackSlot(MI, FrameIndex, Dummy);
 }
 
+static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) {
+  switch (Opcode) {
+  default:
+    return std::nullopt;
+  case RISCV::VS1R_V:
+  case RISCV::VL1RE8_V:
+  case RISCV::VL1RE16_V:
+  case RISCV::VL1RE32_V:
+  case RISCV::VL1RE64_V:
+    return 1;
+  case RISCV::VS2R_V:
+  case RISCV::VL2RE8_V:
+  case RISCV::VL2RE16_V:
+  case RISCV::VL2RE32_V:
+  case RISCV::VL2RE64_V:
+    return 2;
+  case RISCV::VS4R_V:
+  case RISCV::VL4RE8_V:
+  case RISCV::VL4RE16_V:
+  case RISCV::VL4RE32_V:
+  case RISCV::VL4RE64_V:
+    return 4;
+  case RISCV::VS8R_V:
+  case RISCV::VL8RE8_V:
+  case RISCV::VL8RE16_V:
+  case RISCV::VL8RE32_V:
+  case RISCV::VL8RE64_V:
+    return 8;
+  }
+}
+
 Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
                                              int &FrameIndex,
                                              TypeSize &MemBytes) const {
@@ -125,6 +156,17 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
   case RISCV::FLD:
     MemBytes = TypeSize::getFixed(8);
     break;
+  case RISCV::VL1RE8_V:
+  case RISCV::VL2RE8_V:
+  case RISCV::VL4RE8_V:
+  case RISCV::VL8RE8_V:
+    if (!MI.getOperand(1).isFI())
+      return Register();
+    FrameIndex = MI.getOperand(1).getIndex();
+    unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+    unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+    MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+    return MI.getOperand(0).getReg();
   }
 
   if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -165,6 +207,17 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
   case RISCV::FSD:
     MemBytes = TypeSize::getFixed(8);
     break;
+  case RISCV::VS1R_V:
+  case RISCV::VS2R_V:
+  case RISCV::VS4R_V:
+  case RISCV::VS8R_V:
+    if (!MI.getOperand(1).isFI())
+      return Register();
+    FrameIndex = MI.getOperand(1).getIndex();
+    unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+    unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+    MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+    return MI.getOperand(0).getReg();
   }
 
   if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -4071,40 +4124,12 @@ bool RISCV::isZEXT_B(const MachineInstr &MI) {
          MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 255;
 }
 
-static bool isRVVWholeLoadStore(unsigned Opcode) {
-  switch (Opcode) {
-  default:
-    return false;
-  case RISCV::VS1R_V:
-  case RISCV::VS2R_V:
-  case RISCV::VS4R_V:
-  case RISCV::VS8R_V:
-  case RISCV::VL1RE8_V:
-  case RISCV::VL2RE8_V:
-  case RISCV::VL4RE8_V:
-  case RISCV::VL8RE8_V:
-  case RISCV::VL1RE16_V:
-  case RISCV::VL2RE16_V:
-  case RISCV::VL4RE16_V:
-  case RISCV::VL8RE16_V:
-  case RISCV::VL1RE32_V:
-  case RISCV::VL2RE32_V:
-  case RISCV::VL4RE32_V:
-  case RISCV::VL8RE32_V:
-  case RISCV::VL1RE64_V:
-  case RISCV::VL2RE64_V:
-  case RISCV::VL4RE64_V:
-  case RISCV::VL8RE64_V:
-    return true;
-  }
-}
-
 bool RISCV::isRVVSpill(const MachineInstr &MI) {
   // RVV lacks any support for immediate addressing for stack addresses, so be
   // conservative.
   unsigned Opcode = MI.getOpcode();
   if (!RISCVVPseudosTable::getPseudoInfo(Opcode) &&
-      !isRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
+      !getNFForRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
     return false;
   return true;
 }
diff --git a/llvm/test/CodeGen/RISCV/rvv/expandload.ll b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
index 25706bdec55c3..145b5794ce64f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/expandload.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
@@ -273,16 +273,16 @@ define <256 x i8> @test_expandload_v256i8(ptr %base, <256 x i1> %mask, <256 x i8
 ; CHECK-RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, mu
 ; CHECK-RV32-NEXT:    viota.m v24, v0
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
-; CHECK-RV32-NEXT:    li a1, 24
-; CHECK-RV32-NEXT:    mul a0, a0, a1
+; CHECK-RV32-NEXT:    slli a0, a0, 4
 ; CHECK-RV32-NEXT:    add a0, sp, a0
 ; CHECK-RV32-NEXT:    addi a0, a0, 16
-; CHECK-RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
-; CHECK-RV32-NEXT:    slli a0, a0, 4
+; CHECK-RV32-NEXT:    li a1, 24
+; CHECK-RV32-NEXT:    mul a0, a0, a1
 ; CHECK-RV32-NEXT:    add a0, sp, a0
 ; CHECK-RV32-NEXT:    addi a0, a0, 16
-; CHECK-RV32-NEXT:    vl8r.v v16, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT:    vl8r.v v8, (a0) # Unknown-size Folded Reload
 ; CHECK-RV32-NEXT:    vrgather.vv v8, v16, v24, v0.t
 ; CHECK-RV32-NEXT:    csrr a0, vlenb
 ; CHECK-RV32-NEXT:    li a1, 24
diff --git a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
index 6cf6307322643..bfb9b31de5be0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
@@ -51,8 +51,6 @@ body:             |
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
-    ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
-    ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
     ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
@@ -214,8 +212,6 @@ body:             |
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
-    ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
-    ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
     ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
     ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)

@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
return isLoadFromStackSlot(MI, FrameIndex, Dummy);
}

static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NF -> LMUL? NF is normally used for segment load/store.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because, for whole register load/store, the nf field encodes how many vector registers to load and store. But I agree LMUL is a better software term here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from the spec naming, but happy to switch it over. I'd been sort hoping someone would tell me we already had a utility function for this I'd missed. :)

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit 5f94992 into llvm:main Mar 21, 2025
11 checks passed
@preames preames deleted the pr-riscv-spill-fill-size-scalable branch March 21, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants