-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Fold PseudoVMV_V_V with undef passthru, handling policy #106943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Fold PseudoVMV_V_V with undef passthru, handling policy #106943
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesIf a PseudoVMV_V_V's passthru is undef then we don't need the Src to have the same passthru, nor do we need to check its VL. The tail policy in these tests is still tu, #105788 should fix this separately. Full diff: https://github.com/llvm/llvm-project/pull/106943.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index 412fd790061a37..c885b6bc2e50d4 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -503,30 +503,32 @@ bool RISCVVectorPeephole::foldVMV_V_V(MachineInstr &MI) {
if (getSEWLMULRatio(MI) != getSEWLMULRatio(*Src))
return false;
- // Src needs to have the same passthru as VMV_V_V
- MachineOperand &SrcPassthru = Src->getOperand(1);
- if (SrcPassthru.getReg() != RISCV::NoRegister &&
- SrcPassthru.getReg() != Passthru.getReg())
- return false;
+ if (Passthru.getReg() != RISCV::NoRegister) {
+ // Src needs to have the same passthru as VMV_V_V
+ MachineOperand &SrcPassthru = Src->getOperand(1);
+ if (SrcPassthru.getReg() != RISCV::NoRegister &&
+ SrcPassthru.getReg() != Passthru.getReg())
+ return false;
- // Src VL will have already been reduced if legal (see tryToReduceVL),
- // so we don't need to handle a smaller source VL here. However, the
- // user's VL may be larger
- MachineOperand &SrcVL = Src->getOperand(RISCVII::getVLOpNum(Src->getDesc()));
- if (!isVLKnownLE(SrcVL, MI.getOperand(3)))
- return false;
+ // Src VL will have already been reduced if legal (see tryToReduceVL),
+ // so we don't need to handle a smaller source VL here. However, the
+ // user's VL may be larger
+ MachineOperand &SrcVL = Src->getOperand(RISCVII::getVLOpNum(Src->getDesc()));
+ if (!isVLKnownLE(SrcVL, MI.getOperand(3)))
+ return false;
- // If the new passthru doesn't dominate Src, try to move Src so it does.
- if (!ensureDominates(Passthru, *Src))
- return false;
+ // If the new passthru doesn't dominate Src, try to move Src so it does.
+ if (!ensureDominates(Passthru, *Src))
+ return false;
- if (SrcPassthru.getReg() != Passthru.getReg()) {
- SrcPassthru.setReg(Passthru.getReg());
- // If Src is masked then its passthru needs to be in VRNoV0.
- if (Passthru.getReg() != RISCV::NoRegister)
- MRI->constrainRegClass(Passthru.getReg(),
- TII->getRegClass(Src->getDesc(), 1, TRI,
- *Src->getParent()->getParent()));
+ if (SrcPassthru.getReg() != Passthru.getReg()) {
+ SrcPassthru.setReg(Passthru.getReg());
+ // If Src is masked then its passthru needs to be in VRNoV0.
+ if (Passthru.getReg() != RISCV::NoRegister)
+ MRI->constrainRegClass(Passthru.getReg(),
+ TII->getRegClass(Src->getDesc(), 1, TRI,
+ *Src->getParent()->getParent()));
+ }
}
// Use a conservative tu,mu policy, RISCVInsertVSETVLI will relax it if
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
index 3952e48c5c28fc..252ddb267a5ad2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
@@ -194,3 +194,14 @@ define <vscale x 2 x i32> @unfoldable_mismatched_sew(<vscale x 2 x i32> %passthr
%b = call <vscale x 2 x i32> @llvm.riscv.vmv.v.v.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %a.bitcast, iXLen %avl)
ret <vscale x 2 x i32> %b
}
+
+define <vscale x 1 x i64> @undef_passthru(<vscale x 1 x i64> %passthru, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, iXLen %avl) {
+; CHECK-LABEL: undef_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
+; CHECK-NEXT: vadd.vv v8, v9, v10
+; CHECK-NEXT: ret
+ %a = call <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64(<vscale x 1 x i64> %passthru, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, iXLen %avl)
+ %b = call <vscale x 1 x i64> @llvm.riscv.vmv.v.v.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i64> %a, iXLen %avl)
+ ret <vscale x 1 x i64> %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
index b2526c6df6939e..3672424438234e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
@@ -18,3 +18,17 @@ body: |
%y:gpr = ADDI $x0, 1
%z:vr = PseudoVMV_V_V_M1 %passthru, %x, 4, 5 /* e32 */, 0 /* tu, mu */
...
+---
+name: undef_passthru
+body: |
+ bb.0:
+ liveins: $v8
+ ; CHECK-LABEL: name: undef_passthru
+ ; CHECK: liveins: $v8
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %passthru:vr = COPY $v8
+ ; CHECK-NEXT: %x:vr = PseudoVADD_VV_M1 %passthru, $noreg, $noreg, 4, 5 /* e32 */, 0 /* tu, mu */
+ %passthru:vr = COPY $v8
+ %x:vr = PseudoVADD_VV_M1 %passthru, $noreg, $noreg, 4, 5 /* e32 */, 0 /* tu, mu */
+ %y:vr = PseudoVMV_V_V_M1 $noreg, %x, 4, 5 /* e32 */, 1 /* ta, mu */
+...
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks functionally correct, but would be much more cleanly expressed as an early return. This is really just a different transform entirely.
; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma | ||
; CHECK-NEXT: vadd.vv v8, v9, v10 | ||
; CHECK-NEXT: ret | ||
%a = call <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64(<vscale x 1 x i64> %passthru, <vscale x 1 x i64> %x, <vscale x 1 x i64> %y, iXLen %avl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually hold? Consider:
SrcVL = 2, so elements 0, 1 come from either x or y, but 2 onwards come from passthru
MIVL = 4, so elements 0, 1 come from either x or y, elements 2, 3 come from passthru, and elements 4+ are undefined
Oh, I see it. You're not changing passthru or VL on Src, you're basically just dropping MI.
I think this might be cleaner if expressed as an early continue here. This is arguably a different combine entirely. You're basically just using properties of the vmv.v.v itself to prove it can be elided entirely. Actually, now that I say that, yeah, this really doesn't depend on Src at all does it? I don't think you even need the single use condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's exactly what I did in #106840. I added a new peephole that doesn't check Src, but I closed it because I thought it was too similar to foldVMV_V_V.
However I've since realised that if the vmv.v.v passthru is undef, then we can still relax Src's policy to be tail agnostic if vmv.v.s tail subsumes it. E.g. if we have
vsetvli zero, a0, e64, m1, tu, ma
vadd.vv v8, v9, v10
vsetvli zero, zero, e64, m1, ta, ma
vmv.v.v v8, v8
We should actually be able to fold it into:
vsetvli zero, a0, e64, m1, ta, ma
vadd.vv v8, v9, v10
Hence why I moved it back into foldVMV_V_V, so it could take advantage of #105788.
But I completely agree with you here that this is could be much cleaner. I'm going to try again to do this as a separate peephole, but explicitly relaxing Src's policy if possible.
6de0b17
to
040dd4f
Compare
I've reworked this PR and updated the description to do this as a separate peephole, as I think it's easier to understand without having to think about foldVMV_V_V. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w/minor changes.
If a vmv.v.v has an undef passthru then we can just replace it with its input operand, since the tail is completely undefined. This is a reattempt of llvm#106840, but also checks to see if the input was a pseudo where we can relax its tail policy to undef. This also means we don't need to check for undef passthrus in foldVMV_V_V anymore because they will be handled by foldUndefPassthruVMV_V_V.
040dd4f
to
854d924
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/2963 Here is the relevant piece of the build log for the reference
|
If a vmv.v.v has an undef passthru then we can just replace it with its input operand, since the tail is completely undefined.
This is a reattempt of #106840, but also checks to see if the input was a pseudo where we can relax its tail policy to undef.
This also means we don't need to check for undef passthrus in foldVMV_V_V anymore because they will be handled by foldUndefPassthruVMV_V_V.