Skip to content

[MachineCombiner] Don't ignore PHI depths #82025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

JonPsson1
Copy link
Contributor

The depths of the Root and the NewRoot are to be compared in MachineCombiner::improvesCriticalPathLen(), and while the call to BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some reason PHI nodes have been ignored in getOperandDef(). This seems wrong - according to the comment there there should be no depth info for PHIs, but the trace debug shows:

Depths for %bb.2:
     48 Instructions
15      13      %52:gpr = PHI %30:gpr, %bb.0, %51:gpr, %bb.1
15      7       %53:gpr = SLTU %40:gpr, %39:gpr
...

This means that Root currenty has that depth included in it, but NewRoot ignores it. My guess is that MachineTraceMetrics has been improved and this was missed in MachineCombiner. This patch simply removes that check for PHIs so that they are also counted.

@LuoYuanke @asi-sc @preames @uweigand

@llvmbot
Copy link
Member

llvmbot commented Feb 16, 2024

@llvm/pr-subscribers-backend-x86

Author: Jonas Paulsson (JonPsson1)

Changes

The depths of the Root and the NewRoot are to be compared in MachineCombiner::improvesCriticalPathLen(), and while the call to BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some reason PHI nodes have been ignored in getOperandDef(). This seems wrong - according to the comment there there should be no depth info for PHIs, but the trace debug shows:

Depths for %bb.2:
     48 Instructions
15      13      %52:gpr = PHI %30:gpr, %bb.0, %51:gpr, %bb.1
15      7       %53:gpr = SLTU %40:gpr, %39:gpr
...

This means that Root currenty has that depth included in it, but NewRoot ignores it. My guess is that MachineTraceMetrics has been improved and this was missed in MachineCombiner. This patch simply removes that check for PHIs so that they are also counted.

@LuoYuanke @asi-sc @preames @uweigand


Full diff: https://github.com/llvm/llvm-project/pull/82025.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCombiner.cpp (-3)
  • (modified) llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll (+4-4)
diff --git a/llvm/lib/CodeGen/MachineCombiner.cpp b/llvm/lib/CodeGen/MachineCombiner.cpp
index c65937935ed820..a4c87a7678bd8d 100644
--- a/llvm/lib/CodeGen/MachineCombiner.cpp
+++ b/llvm/lib/CodeGen/MachineCombiner.cpp
@@ -155,9 +155,6 @@ MachineCombiner::getOperandDef(const MachineOperand &MO) {
   // We need a virtual register definition.
   if (MO.isReg() && MO.getReg().isVirtual())
     DefInstr = MRI->getUniqueVRegDef(MO.getReg());
-  // PHI's have no depth etc.
-  if (DefInstr && DefInstr->isPHI())
-    DefInstr = nullptr;
   return DefInstr;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll b/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
index 8c0d97afe6c21d..f1ae3200175636 100644
--- a/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
+++ b/llvm/test/CodeGen/RISCV/umulo-128-legalisation-lowering.ll
@@ -89,8 +89,8 @@ define { i128, i8 } @muloti_test(i128 %l, i128 %r) #0 {
 ; RISCV32-NEXT:    snez a3, a3
 ; RISCV32-NEXT:    and a3, a3, a7
 ; RISCV32-NEXT:    or a2, a3, a2
-; RISCV32-NEXT:    or a3, t2, t3
-; RISCV32-NEXT:    or a2, a2, a3
+; RISCV32-NEXT:    or a2, a2, t2
+; RISCV32-NEXT:    or a2, a2, t3
 ; RISCV32-NEXT:    mul a3, a5, a4
 ; RISCV32-NEXT:    andi a2, a2, 1
 ; RISCV32-NEXT:    sw a3, 0(a0)
diff --git a/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll b/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
index 34ef23db345755..234c7a0a500d30 100644
--- a/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
+++ b/llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll
@@ -553,8 +553,8 @@ define i8 @v8i32_or_select(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> %a2, <8 x i32
 ; AVX1-NEXT:    vpcmpeqd %xmm5, %xmm4, %xmm4
 ; AVX1-NEXT:    vpcmpeqd %xmm1, %xmm0, %xmm0
 ; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
-; AVX1-NEXT:    vorps %ymm2, %ymm3, %ymm1
-; AVX1-NEXT:    vorps %ymm0, %ymm1, %ymm0
+; AVX1-NEXT:    vorps %ymm0, %ymm3, %ymm0
+; AVX1-NEXT:    vorps %ymm2, %ymm0, %ymm0
 ; AVX1-NEXT:    vmovmskps %ymm0, %eax
 ; AVX1-NEXT:    # kill: def $al killed $al killed $eax
 ; AVX1-NEXT:    vzeroupper
@@ -571,8 +571,8 @@ define i8 @v8i32_or_select(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> %a2, <8 x i32
 ; AVX2-NEXT:    vpcmpeqd %ymm2, %ymm0, %ymm2
 ; AVX2-NEXT:  .LBB7_3:
 ; AVX2-NEXT:    vpcmpeqd %ymm1, %ymm0, %ymm0
-; AVX2-NEXT:    vpor %ymm2, %ymm3, %ymm1
-; AVX2-NEXT:    vpor %ymm0, %ymm1, %ymm0
+; AVX2-NEXT:    vpor %ymm0, %ymm3, %ymm0
+; AVX2-NEXT:    vpor %ymm2, %ymm0, %ymm0
 ; AVX2-NEXT:    vmovmskps %ymm0, %eax
 ; AVX2-NEXT:    # kill: def $al killed $al killed $eax
 ; AVX2-NEXT:    vzeroupper

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (although I don't know this code that well) - any other comments?

@JonPsson1
Copy link
Contributor Author

LGTM (although I don't know this code that well) - any other comments?

I guess I will commit this then if there is no other comment. Thanks for review.

@LuoYuanke @asi-sc @preames @uweigand

@asi-sc
Copy link
Contributor

asi-sc commented Mar 11, 2024

This means that Root currenty has that depth included in it, but NewRoot ignores it

That sounds completely wrong.

This patch simply removes that check for PHIs so that they are also counted.

What about loops? Have you had a chance to check how PHIs will be calculated for them?

@JonPsson1
Copy link
Contributor Author

JonPsson1 commented Mar 12, 2024

This means that Root currenty has that depth included in it, but NewRoot ignores it

That sounds completely wrong.

This patch simply removes that check for PHIs so that they are also counted.

What about loops? Have you had a chance to check how PHIs will be calculated for them?

That's a good question.. :) I did an experiment with this and found that the PHI with a long-latency incoming edge (fdiv) had a big depth value if outside of any loop, but 0 depth if I put it in a loop. So it seems that inside a loop this works as expected with basically 0 depth for any phi in the header, while outside of a loop the phi should have the incoming depth (for both Root and NewRoot with this patch).

I think that is what this code does:

MinInstrCountEnsemble::pickTracePred(const MachineBasicBlock *MBB) {
  if (MBB->pred_empty())
    return nullptr;
  const MachineLoop *CurLoop = getLoopFor(MBB);
  // Don't leave loops, and never follow back-edges.
  if (CurLoop && MBB == CurLoop->getHeader())
    return nullptr;
...

and

MinInstrCountEnsemble::pickTraceSucc(const MachineBasicBlock *MBB) {
  if (MBB->succ_empty())
    return nullptr;
  const MachineLoop *CurLoop = getLoopFor(MBB);
  const MachineBasicBlock *Best = nullptr;
  unsigned BestHeight = 0;
  for (const MachineBasicBlock *Succ : MBB->successors()) {
    // Don't consider back-edges.
    if (CurLoop && Succ == CurLoop->getHeader())
      continue;
...

IIUC, this makes sure that any PHI in the loop header specifically does not carry a trace from the predecessor - preheader or latch included, which is what I observed per above.

@asi-sc

@asi-sc
Copy link
Contributor

asi-sc commented Mar 14, 2024

@JonPsson1 thanks for your research.

LGTM

@JonPsson1 JonPsson1 merged commit 6588ac3 into llvm:main Mar 14, 2024
@JonPsson1 JonPsson1 deleted the MachineCombPHIs branch March 14, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants