-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MachineCombiner] Don't ignore PHI depths #82025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 Author: Jonas Paulsson (JonPsson1) ChangesThe depths of the Root and the NewRoot are to be compared in MachineCombiner::improvesCriticalPathLen(), and while the call to BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some reason PHI nodes have been ignored in getOperandDef(). This seems wrong - according to the comment there there should be no depth info for PHIs, but the trace debug shows:
This means that Root currenty has that depth included in it, but NewRoot ignores it. My guess is that MachineTraceMetrics has been improved and this was missed in MachineCombiner. This patch simply removes that check for PHIs so that they are also counted. @LuoYuanke @asi-sc @preames @uweigand Full diff: https://github.com/llvm/llvm-project/pull/82025.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCombiner.cpp b/llvm/lib/CodeGen/MachineCombiner.cpp
index c65937935ed820..a4c87a7678bd8d 100644
--- a/llvm/lib/CodeGen/MachineCombiner.cpp
+++ b/llvm/lib/CodeGen/MachineCombiner.cpp
@@ -155,9 +155,6 @@ MachineCombiner::getOperandDef(const MachineOperand &MO) {
// We need a virtual register definition.
if (MO.isReg() && MO.getReg().isVirtual())
DefInstr = MRI->getUniqueVRegDef(MO.getReg());
- // PHI's have no depth etc.
- if (DefInstr && DefInstr->isPHI())
- DefInstr = nullptr;
return DefInstr;
}
diff --git a/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll b/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
index 8c0d97afe6c21d..f1ae3200175636 100644
--- a/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
+++ b/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
@@ -89,8 +89,8 @@ define { i128, i8 } @muloti_test(i128 %l, i128 %r) #0 {
; RISCV32-NEXT: snez a3, a3
; RISCV32-NEXT: and a3, a3, a7
; RISCV32-NEXT: or a2, a3, a2
-; RISCV32-NEXT: or a3, t2, t3
-; RISCV32-NEXT: or a2, a2, a3
+; RISCV32-NEXT: or a2, a2, t2
+; RISCV32-NEXT: or a2, a2, t3
; RISCV32-NEXT: mul a3, a5, a4
; RISCV32-NEXT: andi a2, a2, 1
; RISCV32-NEXT: sw a3, 0(a0)
diff --git a/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll b/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
index 34ef23db345755..234c7a0a500d30 100644
--- a/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
+++ b/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
@@ -553,8 +553,8 @@ define i8 @v8i32_or_select(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> %a2, <8 x i32
; AVX1-NEXT: vpcmpeqd %xmm5, %xmm4, %xmm4
; AVX1-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm0, %ymm0
-; AVX1-NEXT: vorps %ymm2, %ymm3, %ymm1
-; AVX1-NEXT: vorps %ymm0, %ymm1, %ymm0
+; AVX1-NEXT: vorps %ymm0, %ymm3, %ymm0
+; AVX1-NEXT: vorps %ymm2, %ymm0, %ymm0
; AVX1-NEXT: vmovmskps %ymm0, %eax
; AVX1-NEXT: # kill: def $al killed $al killed $eax
; AVX1-NEXT: vzeroupper
@@ -571,8 +571,8 @@ define i8 @v8i32_or_select(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> %a2, <8 x i32
; AVX2-NEXT: vpcmpeqd %ymm2, %ymm0, %ymm2
; AVX2-NEXT: .LBB7_3:
; AVX2-NEXT: vpcmpeqd %ymm1, %ymm0, %ymm0
-; AVX2-NEXT: vpor %ymm2, %ymm3, %ymm1
-; AVX2-NEXT: vpor %ymm0, %ymm1, %ymm0
+; AVX2-NEXT: vpor %ymm0, %ymm3, %ymm0
+; AVX2-NEXT: vpor %ymm2, %ymm0, %ymm0
; AVX2-NEXT: vmovmskps %ymm0, %eax
; AVX2-NEXT: # kill: def $al killed $al killed $eax
; AVX2-NEXT: vzeroupper
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (although I don't know this code that well) - any other comments?
I guess I will commit this then if there is no other comment. Thanks for review. |
That sounds completely wrong.
What about loops? Have you had a chance to check how PHIs will be calculated for them? |
That's a good question.. :) I did an experiment with this and found that the PHI with a long-latency incoming edge (fdiv) had a big depth value if outside of any loop, but 0 depth if I put it in a loop. So it seems that inside a loop this works as expected with basically 0 depth for any phi in the header, while outside of a loop the phi should have the incoming depth (for both Root and NewRoot with this patch). I think that is what this code does:
IIUC, this makes sure that any PHI in the loop header specifically does not carry a trace from the predecessor - preheader or latch included, which is what I observed per above. |
@JonPsson1 thanks for your research. LGTM |
The depths of the Root and the NewRoot are to be compared in MachineCombiner::improvesCriticalPathLen(), and while the call to BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some reason PHI nodes have been ignored in getOperandDef(). This seems wrong - according to the comment there there should be no depth info for PHIs, but the trace debug shows:
This means that Root currenty has that depth included in it, but NewRoot ignores it. My guess is that MachineTraceMetrics has been improved and this was missed in MachineCombiner. This patch simply removes that check for PHIs so that they are also counted.
@LuoYuanke @asi-sc @preames @uweigand