-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] isLoadFromStackSlot and isStoreToStackSlot for vector spill/fill #132296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is an adapted version of arsenm's llvm#120524. The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise. This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis is an adapted version of arsenm's #120524. The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise. This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well. Full diff: https://github.com/llvm/llvm-project/pull/132296.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd31c842312ea..3eb3666b01870 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
return isLoadFromStackSlot(MI, FrameIndex, Dummy);
}
+static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) {
+ switch (Opcode) {
+ default:
+ return std::nullopt;
+ case RISCV::VS1R_V:
+ case RISCV::VL1RE8_V:
+ case RISCV::VL1RE16_V:
+ case RISCV::VL1RE32_V:
+ case RISCV::VL1RE64_V:
+ return 1;
+ case RISCV::VS2R_V:
+ case RISCV::VL2RE8_V:
+ case RISCV::VL2RE16_V:
+ case RISCV::VL2RE32_V:
+ case RISCV::VL2RE64_V:
+ return 2;
+ case RISCV::VS4R_V:
+ case RISCV::VL4RE8_V:
+ case RISCV::VL4RE16_V:
+ case RISCV::VL4RE32_V:
+ case RISCV::VL4RE64_V:
+ return 4;
+ case RISCV::VS8R_V:
+ case RISCV::VL8RE8_V:
+ case RISCV::VL8RE16_V:
+ case RISCV::VL8RE32_V:
+ case RISCV::VL8RE64_V:
+ return 8;
+ }
+}
+
Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
int &FrameIndex,
TypeSize &MemBytes) const {
@@ -125,6 +156,17 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
case RISCV::FLD:
MemBytes = TypeSize::getFixed(8);
break;
+ case RISCV::VL1RE8_V:
+ case RISCV::VL2RE8_V:
+ case RISCV::VL4RE8_V:
+ case RISCV::VL8RE8_V:
+ if (!MI.getOperand(1).isFI())
+ return Register();
+ FrameIndex = MI.getOperand(1).getIndex();
+ unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+ unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+ MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+ return MI.getOperand(0).getReg();
}
if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -165,6 +207,17 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
case RISCV::FSD:
MemBytes = TypeSize::getFixed(8);
break;
+ case RISCV::VS1R_V:
+ case RISCV::VS2R_V:
+ case RISCV::VS4R_V:
+ case RISCV::VS8R_V:
+ if (!MI.getOperand(1).isFI())
+ return Register();
+ FrameIndex = MI.getOperand(1).getIndex();
+ unsigned BytesPerBlock = RISCV::RVVBitsPerBlock / 8;
+ unsigned NF = *getNFForRVVWholeLoadStore(MI.getOpcode());
+ MemBytes = TypeSize::getScalable(BytesPerBlock * NF);
+ return MI.getOperand(0).getReg();
}
if (MI.getOperand(1).isFI() && MI.getOperand(2).isImm() &&
@@ -4071,40 +4124,12 @@ bool RISCV::isZEXT_B(const MachineInstr &MI) {
MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 255;
}
-static bool isRVVWholeLoadStore(unsigned Opcode) {
- switch (Opcode) {
- default:
- return false;
- case RISCV::VS1R_V:
- case RISCV::VS2R_V:
- case RISCV::VS4R_V:
- case RISCV::VS8R_V:
- case RISCV::VL1RE8_V:
- case RISCV::VL2RE8_V:
- case RISCV::VL4RE8_V:
- case RISCV::VL8RE8_V:
- case RISCV::VL1RE16_V:
- case RISCV::VL2RE16_V:
- case RISCV::VL4RE16_V:
- case RISCV::VL8RE16_V:
- case RISCV::VL1RE32_V:
- case RISCV::VL2RE32_V:
- case RISCV::VL4RE32_V:
- case RISCV::VL8RE32_V:
- case RISCV::VL1RE64_V:
- case RISCV::VL2RE64_V:
- case RISCV::VL4RE64_V:
- case RISCV::VL8RE64_V:
- return true;
- }
-}
-
bool RISCV::isRVVSpill(const MachineInstr &MI) {
// RVV lacks any support for immediate addressing for stack addresses, so be
// conservative.
unsigned Opcode = MI.getOpcode();
if (!RISCVVPseudosTable::getPseudoInfo(Opcode) &&
- !isRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
+ !getNFForRVVWholeLoadStore(Opcode) && !isRVVSpillForZvlsseg(Opcode))
return false;
return true;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/expandload.ll b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
index 25706bdec55c3..145b5794ce64f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/expandload.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/expandload.ll
@@ -273,16 +273,16 @@ define <256 x i8> @test_expandload_v256i8(ptr %base, <256 x i1> %mask, <256 x i8
; CHECK-RV32-NEXT: vsetvli zero, a2, e8, m8, ta, mu
; CHECK-RV32-NEXT: viota.m v24, v0
; CHECK-RV32-NEXT: csrr a0, vlenb
-; CHECK-RV32-NEXT: li a1, 24
-; CHECK-RV32-NEXT: mul a0, a0, a1
+; CHECK-RV32-NEXT: slli a0, a0, 4
; CHECK-RV32-NEXT: add a0, sp, a0
; CHECK-RV32-NEXT: addi a0, a0, 16
-; CHECK-RV32-NEXT: vl8r.v v8, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
; CHECK-RV32-NEXT: csrr a0, vlenb
-; CHECK-RV32-NEXT: slli a0, a0, 4
+; CHECK-RV32-NEXT: li a1, 24
+; CHECK-RV32-NEXT: mul a0, a0, a1
; CHECK-RV32-NEXT: add a0, sp, a0
; CHECK-RV32-NEXT: addi a0, a0, 16
-; CHECK-RV32-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
+; CHECK-RV32-NEXT: vl8r.v v8, (a0) # Unknown-size Folded Reload
; CHECK-RV32-NEXT: vrgather.vv v8, v16, v24, v0.t
; CHECK-RV32-NEXT: csrr a0, vlenb
; CHECK-RV32-NEXT: li a1, 24
diff --git a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
index 6cf6307322643..bfb9b31de5be0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/stack-slot-coloring.mir
@@ -51,8 +51,6 @@ body: |
; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
- ; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
- ; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.1 :: (store unknown-size into %stack.1, align 8)
; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.0 :: (volatile load unknown-size, align 1)
; CHECK-NEXT: VS1R_V killed renamable $v31, %stack.0 :: (volatile store unknown-size, align 1)
; CHECK-NEXT: renamable $v31 = VL1RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
@@ -214,8 +212,6 @@ body: |
; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
- ; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
- ; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.0 :: (volatile load unknown-size, align 1)
; CHECK-NEXT: VS2R_V killed renamable $v30m2, %stack.0 :: (volatile store unknown-size, align 1)
; CHECK-NEXT: renamable $v30m2 = VL2RE8_V %stack.1 :: (load unknown-size from %stack.1, align 8)
|
@@ -99,6 +99,37 @@ Register RISCVInstrInfo::isLoadFromStackSlot(const MachineInstr &MI, | |||
return isLoadFromStackSlot(MI, FrameIndex, Dummy); | |||
} | |||
|
|||
static std::optional<unsigned> getNFForRVVWholeLoadStore(unsigned Opcode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NF -> LMUL? NF is normally used for segment load/store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is because, for whole register load/store, the nf
field encodes how many vector registers to load and store. But I agree LMUL is a better software term here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was taken from the spec naming, but happy to switch it over. I'd been sort hoping someone would tell me we already had a utility function for this I'd missed. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is an adapted version of arsenm's #120524.
The intention of the change is to enable dead stack slot copy elimination in StackSlotColoring for vector loads and stores. In terms of testing, see stack-slot-coloring.mir. This has little impact on in tree tests otherwise.
This change has a different and smaller set of test diffs then then @arsenm's patch because I'm using scalable sizes for the LMULs, not a single signal value. His patch allowed vector load/store pairs of different width to be deleted, mine does not. There's also simply been a lot of churn in regalloc behavior on these particular tests recently, so that may explain some of the diff as well.