Skip to content

[CodeGen] Use first EHLabel as a stop gate for live range shrinking #114195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2024

Conversation

MuellerMP
Copy link
Contributor

This fixes issue #114194

The issue happens during the LiveRangeShrink pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted in front of the label.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-x86

Author: Mirko (MuellerMP)

Changes

This fixes issue #114194

The issue happens during the LiveRangeShrink pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted in front of the label.


Full diff: https://github.com/llvm/llvm-project/pull/114195.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/LiveRangeShrink.cpp (+17-2)
  • (added) llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll (+112)
diff --git a/llvm/lib/CodeGen/LiveRangeShrink.cpp b/llvm/lib/CodeGen/LiveRangeShrink.cpp
index 3e3e3e51bfe9c6..884b2a9ed34e51 100644
--- a/llvm/lib/CodeGen/LiveRangeShrink.cpp
+++ b/llvm/lib/CodeGen/LiveRangeShrink.cpp
@@ -74,16 +74,26 @@ using InstOrderMap = DenseMap<MachineInstr *, unsigned>;
 /// M[A] > M[B] guarantees that A is dominated by B.
 /// If \p New is not in \p M, return \p Old. Otherwise if \p Old is null, return
 /// \p New.
+/// If \p New is part of an EHPad and is not dominated by \p EHBarrier, return
+/// \p Old, because the start of the landingpad is required to be the first
+/// non-PHI instruction.
 static MachineInstr *FindDominatedInstruction(MachineInstr &New,
                                               MachineInstr *Old,
+                                              MachineInstr *EHBarrier,
                                               const InstOrderMap &M) {
   auto NewIter = M.find(&New);
   if (NewIter == M.end())
     return Old;
+  unsigned OrderNew = NewIter->second;
+  if (EHBarrier) {
+    unsigned OrderBarrier = M.find(EHBarrier)->second;
+    if (OrderBarrier > OrderNew) {
+      return Old;
+    }
+  }
   if (Old == nullptr)
     return &New;
   unsigned OrderOld = M.find(Old)->second;
-  unsigned OrderNew = NewIter->second;
   if (OrderOld != OrderNew)
     return OrderOld < OrderNew ? &New : Old;
   // OrderOld == OrderNew, we need to iterate down from Old to see if it
@@ -125,14 +135,19 @@ bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {
     if (MBB.empty())
       continue;
     bool SawStore = false;
+    bool IsEHPad = MBB.isEHPad();
+    MachineInstr *EHBarrier = nullptr;
     BuildInstOrderMap(MBB.begin(), IOM);
     UseMap.clear();
 
     for (MachineBasicBlock::iterator Next = MBB.begin(); Next != MBB.end();) {
       MachineInstr &MI = *Next;
       ++Next;
+
       if (MI.isPHI() || MI.isDebugOrPseudoInstr())
         continue;
+      if (IsEHPad && !EHBarrier && MI.isEHLabel())
+        EHBarrier = &MI;
       if (MI.mayStore())
         SawStore = true;
 
@@ -201,7 +216,7 @@ bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {
           MachineInstr &DefInstr = *MRI.def_instr_begin(Reg);
           if (!TII.isCopyInstr(DefInstr))
             NumEligibleUse++;
-          Insert = FindDominatedInstruction(DefInstr, Insert, IOM);
+          Insert = FindDominatedInstruction(DefInstr, Insert, EHBarrier, IOM);
         } else {
           Insert = nullptr;
           break;
diff --git a/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll b/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll
new file mode 100755
index 00000000000000..b9afb996973548
--- /dev/null
+++ b/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll
@@ -0,0 +1,112 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple x86_64-unknown-linux-gnu %s -o - | FileCheck %s
+
+declare i32 @__gxx_personality_v0(...)
+declare void @maythrow()
+declare void @cleanup(i32)
+
+@external_bool = external global i1
+@externalA = external global i32
+@externalB = external global i32
+@externalC = external global i32
+@externalD = external global i32
+
+define void @test() personality ptr @__gxx_personality_v0  {
+; CHECK-LABEL: test:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    pushq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    pushq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    pushq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    .cfi_offset %rbx, -56
+; CHECK-NEXT:    .cfi_offset %r12, -48
+; CHECK-NEXT:    .cfi_offset %r13, -40
+; CHECK-NEXT:    .cfi_offset %r14, -32
+; CHECK-NEXT:    .cfi_offset %r15, -24
+; CHECK-NEXT:    .cfi_offset %rbp, -16
+; CHECK-NEXT:    movq external_bool@GOTPCREL(%rip), %rax
+; CHECK-NEXT:    cmpb $1, (%rax)
+; CHECK-NEXT:    jne .LBB0_3
+; CHECK-NEXT:  # %bb.1: # %branchA
+; CHECK-NEXT:    movq externalA@GOTPCREL(%rip), %rax
+; CHECK-NEXT:    movl (%rax), %eax
+; CHECK-NEXT:    movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:    movq externalC@GOTPCREL(%rip), %rax
+; CHECK-NEXT:    movl (%rax), %eax
+; CHECK-NEXT:    movl %eax, (%rsp) # 4-byte Spill
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .Ltmp2:
+; CHECK-NEXT:    callq maythrow@PLT
+; CHECK-NEXT:  .Ltmp3:
+; CHECK-NEXT:    jmp .LBB0_4
+; CHECK-NEXT:  .LBB0_3: # %branchB
+; CHECK-NEXT:    movq externalB@GOTPCREL(%rip), %rax
+; CHECK-NEXT:    movl (%rax), %eax
+; CHECK-NEXT:    movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:    movq externalD@GOTPCREL(%rip), %rax
+; CHECK-NEXT:    movl (%rax), %eax
+; CHECK-NEXT:    movl %eax, (%rsp) # 4-byte Spill
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .Ltmp0:
+; CHECK-NEXT:    callq maythrow@PLT
+; CHECK-NEXT:  .Ltmp1:
+; CHECK-NEXT:  .LBB0_4: # %end
+; CHECK-NEXT:    addq $8, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    popq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    popq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    popq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB0_2: # %lpad
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:  .Ltmp4:
+; CHECK-NEXT:    movq %rax, %rbx
+; CHECK-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %edi # 4-byte Reload
+; CHECK-NEXT:    addl (%rsp), %edi # 4-byte Folded Reload
+; CHECK-NEXT:    callq cleanup@PLT
+; CHECK-NEXT:    movq %rbx, %rdi
+; CHECK-NEXT:    callq _Unwind_Resume@PLT
+  %1 = load i1, ptr @external_bool
+  br i1 %1, label %branchA, label %branchB
+branchA:
+  %valueA = load i32, ptr @externalA
+  %valueC = load i32, ptr @externalC
+  call void asm sideeffect "", "~{rbp},~{rsi},~{rdi},~{rcx},~{rbx},~{r12},~{r13},~{r14},~{r15},~{flags}"()
+  invoke void @maythrow() to label %end unwind label %lpad
+branchB:
+  %valueB = load i32, ptr @externalB
+  %valueD = load i32, ptr @externalD
+  call void asm sideeffect "", "~{rbp},~{rsi},~{rdi},~{rcx},~{rbx},~{r12},~{r13},~{r14},~{r15},~{flags}"()
+  invoke void @maythrow() to label %end unwind label %lpad
+lpad:
+  %phiValue = phi i32 [%valueA, %branchA], [%valueB, %branchB]
+  %phiValue2 = phi i32 [%valueC, %branchA], [%valueD, %branchB]
+  %lp = landingpad { ptr, i32 }
+          cleanup
+  %3 = add i32 %phiValue2, %phiValue
+  call void @cleanup(i32 %3)
+  resume { ptr, i32 } %lp
+end:
+  ret void
+}

@MuellerMP
Copy link
Contributor Author

An alternative solution could be eliminating phis before lrshrink, which would lower the phis into copies after the label.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the context for this pass? I didn't know it existed, and I see it's only added by x86. Why does it exist? This is just an extra machine scheduler?

Comment on lines 147 to 149
if (MI.isPHI() || MI.isDebugOrPseudoInstr())
continue;
if (IsEHPad && !EHBarrier && MI.isEHLabel())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using SkipPHIsLabelsAndDebug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it could have been using it if we did not want to check for labels but in my fix i use the first ehlabel to determine the start of the landingpad.

@MuellerMP
Copy link
Contributor Author

MuellerMP commented Oct 31, 2024

What is the context for this pass? I didn't know it existed, and I see it's only added by x86. Why does it exist? This is just an extra machine scheduler?

That's a good question, because the machine scheduler actually doesn't run into this problem due to running after phi elimination.
So I read the original MR and this pass seems to relieve register pressure in some special cases:

  1. when reassociation reorders ir instructions in a bad way
  2. when user input introduces live-range issues

It also seemd to require insight of the whole basicblock, which is why it couldn't be solved with exisiting generic schdeduling.

I wonder if these assumptions still hold in the current version of misched and resassociate.
Since I'm not to familiar with either of them I propose to fix the current version of the pass for now and leave the analysis regarding the redundancy of the pass for followups.

Copy link

github-actions bot commented Oct 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@MuellerMP
Copy link
Contributor Author

Updated the patch to deal with rebuilds of the InstructionOrderMap (triggered an assertion during my tests).
I now use the same method the original author used for code motion barriers related to instruction with side effects:
Rebuilding the InstructionOrderMap so illegal code motion insertion points can't be referenced with FindDominatedInstruction.

@MuellerMP MuellerMP force-pushed the main branch 2 times, most recently from 97cff7a to 737d209 Compare October 31, 2024 14:56
// barrier for code motion. IOM is rebuild from the next instruction
// to prevent later instructions from being moved before this MI.
SawEHLabel = true;
BuildInstOrderMap(Next, IOM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This amounts to recomputing a machineinstr->index map twice. Why not just use SkipPHIsLabelsAndDebug up above ni the original map computation, and re-use the iterator at the start of the loop here

Copy link
Contributor Author

@MuellerMP MuellerMP Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - I think I get what you were hinting at now =)
So i cant SkipPHIsLabelsAndDebug in every case since we will miss the optimization in the original test which relies on moving uses of phis to their defs.
But we should be able to utilize the skip in the EHPad case since the assumption of the landingpad ehlabel being the first non phi should still hold.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure why you can't do this in the !MBBPad case

@MuellerMP
Copy link
Contributor Author

MuellerMP commented Oct 31, 2024

Not really sure why you can't do this in the !MBBPad case

well not for the map at least -> otherwise we are missing the phis in the map and thus we can't hoist their uses

hoisting uses is only illegal if there is an EHLabel inbetween def and use

This fixes issue llvm#114194

The issue happens during the LiveRangeShrink pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted in front of the label.
@MuellerMP
Copy link
Contributor Author

Updated the patch again:

  • Added a comment explaining the initial SkipPHIsLabelsAndDebug call for EHPads
  • Added a iterator-equals-end check before the initial BuildInstOrderMap for EHPads due to BuildInstOrderMap derefing the iterator
  • Now using SkipPHIsLabelsAndDebug infront and inside the loop thus eliminating the if continue

Did further testing with this patch and it seems to achieve the desired result.
I hope this clarifies my fix and also incorporates the adjustments regarding the Skip func in a meaningful way.

@arsenm would you be ok with merging this for me since i don't have write access?

@MuellerMP MuellerMP requested a review from arsenm November 1, 2024 23:11
@arsenm arsenm merged commit 4d3c427 into llvm:main Nov 2, 2024
6 of 8 checks passed
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…lvm#114195)

This fixes issue llvm#114194

The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted
in front of the label.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…lvm#114195)

This fixes issue llvm#114194

The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted
in front of the label.
@nathanchance
Copy link
Member

I am noticing a significant compile time regression (perhaps infinite loop?) with this change with certain compile options, which I discovered when building the Linux kernel for 32-bit x86. cvise spits out the trivial C reproducer:

enum { RATE_6M_PLCP } il4965_tx_status_reply_tx_agg_2;
int il4965_tx_status_reply_tx_i;
short il4965_tx_status_reply_tx_start_idx,
    il4965_tx_status_reply_tx_frame_status_0_0;
int il4965_tx_status_reply_tx() {
  int sh, start = il4965_tx_status_reply_tx_start_idx;
  long long bitmap;
  for (; il4965_tx_status_reply_tx_i; il4965_tx_status_reply_tx_i++) {
    if (il4965_tx_status_reply_tx_frame_status_0_0)
      continue;
    if (sh > 4) {
      sh = start;
      bitmap = sh = 0;
    }
    bitmap |= 1ULL << sh;
  }
  if (bitmap)
    il4965_tx_status_reply_tx_agg_2 = 1;
  return 0;
}

Immediately prior to this change:

Benchmark 1: clang --target=i686-linux-gnu -march=geode -O2 -g -c 4965-mac.i
  Time (mean ± σ):      14.9 ms ±   0.3 ms    [User: 8.9 ms, System: 5.9 ms]
  Range (min … max):    14.5 ms …  17.0 ms    142 runs

At this change:

$ clang --target=i686-linux-gnu -march=geode -O2 -g -c 4965-mac.i
<hangs>

MuellerMP added a commit to wibu-systems/llvm-project that referenced this pull request Nov 7, 2024
This fixes the infinite loop discovered in llvm#114195.
Since we skip debug instructions at the start of the loop we do not
need to skip them again at the end of the loop.
@MuellerMP
Copy link
Contributor Author

@nathanchance ty for finding and reducing this issue!
can repro and have a fix ready.

MuellerMP added a commit to wibu-systems/llvm-project that referenced this pull request Nov 8, 2024
This fixes the infinite loop discovered in llvm#114195.
Since we skip debug instructions at the start of the loop we do not
need to skip them again at the end of the loop.
MuellerMP added a commit to wibu-systems/llvm-project that referenced this pull request Nov 8, 2024
This fixes the infinite loop discovered in llvm#114195.
Since we skip debug instructions at the start of the loop we do not
need to skip them again at the end of the loop.
arsenm pushed a commit that referenced this pull request Nov 9, 2024
This fixes the infinite loop discovered in #114195. 
Since we skip debug instructions at the start of the loop we do not need
to skip them again at the end of the loop.
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
This fixes the infinite loop discovered in llvm#114195. 
Since we skip debug instructions at the start of the loop we do not need
to skip them again at the end of the loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants