-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen] Use first EHLabel as a stop gate for live range shrinking #114195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-regalloc @llvm/pr-subscribers-backend-x86 Author: Mirko (MuellerMP) ChangesThis fixes issue #114194 The issue happens during the This results in a portion of the landingpad missing due to being hoisted in front of the label. Full diff: https://github.com/llvm/llvm-project/pull/114195.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/LiveRangeShrink.cpp b/llvm/lib/CodeGen/LiveRangeShrink.cpp
index 3e3e3e51bfe9c6..884b2a9ed34e51 100644
--- a/llvm/lib/CodeGen/LiveRangeShrink.cpp
+++ b/llvm/lib/CodeGen/LiveRangeShrink.cpp
@@ -74,16 +74,26 @@ using InstOrderMap = DenseMap<MachineInstr *, unsigned>;
/// M[A] > M[B] guarantees that A is dominated by B.
/// If \p New is not in \p M, return \p Old. Otherwise if \p Old is null, return
/// \p New.
+/// If \p New is part of an EHPad and is not dominated by \p EHBarrier, return
+/// \p Old, because the start of the landingpad is required to be the first
+/// non-PHI instruction.
static MachineInstr *FindDominatedInstruction(MachineInstr &New,
MachineInstr *Old,
+ MachineInstr *EHBarrier,
const InstOrderMap &M) {
auto NewIter = M.find(&New);
if (NewIter == M.end())
return Old;
+ unsigned OrderNew = NewIter->second;
+ if (EHBarrier) {
+ unsigned OrderBarrier = M.find(EHBarrier)->second;
+ if (OrderBarrier > OrderNew) {
+ return Old;
+ }
+ }
if (Old == nullptr)
return &New;
unsigned OrderOld = M.find(Old)->second;
- unsigned OrderNew = NewIter->second;
if (OrderOld != OrderNew)
return OrderOld < OrderNew ? &New : Old;
// OrderOld == OrderNew, we need to iterate down from Old to see if it
@@ -125,14 +135,19 @@ bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {
if (MBB.empty())
continue;
bool SawStore = false;
+ bool IsEHPad = MBB.isEHPad();
+ MachineInstr *EHBarrier = nullptr;
BuildInstOrderMap(MBB.begin(), IOM);
UseMap.clear();
for (MachineBasicBlock::iterator Next = MBB.begin(); Next != MBB.end();) {
MachineInstr &MI = *Next;
++Next;
+
if (MI.isPHI() || MI.isDebugOrPseudoInstr())
continue;
+ if (IsEHPad && !EHBarrier && MI.isEHLabel())
+ EHBarrier = &MI;
if (MI.mayStore())
SawStore = true;
@@ -201,7 +216,7 @@ bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {
MachineInstr &DefInstr = *MRI.def_instr_begin(Reg);
if (!TII.isCopyInstr(DefInstr))
NumEligibleUse++;
- Insert = FindDominatedInstruction(DefInstr, Insert, IOM);
+ Insert = FindDominatedInstruction(DefInstr, Insert, EHBarrier, IOM);
} else {
Insert = nullptr;
break;
diff --git a/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll b/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll
new file mode 100755
index 00000000000000..b9afb996973548
--- /dev/null
+++ b/llvm/test/CodeGen/X86/lrshrink-ehpad-phis.ll
@@ -0,0 +1,112 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple x86_64-unknown-linux-gnu %s -o - | FileCheck %s
+
+declare i32 @__gxx_personality_v0(...)
+declare void @maythrow()
+declare void @cleanup(i32)
+
+@external_bool = external global i1
+@externalA = external global i32
+@externalB = external global i32
+@externalC = external global i32
+@externalD = external global i32
+
+define void @test() personality ptr @__gxx_personality_v0 {
+; CHECK-LABEL: test:
+; CHECK: # %bb.0:
+; CHECK-NEXT: pushq %rbp
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: pushq %r15
+; CHECK-NEXT: .cfi_def_cfa_offset 24
+; CHECK-NEXT: pushq %r14
+; CHECK-NEXT: .cfi_def_cfa_offset 32
+; CHECK-NEXT: pushq %r13
+; CHECK-NEXT: .cfi_def_cfa_offset 40
+; CHECK-NEXT: pushq %r12
+; CHECK-NEXT: .cfi_def_cfa_offset 48
+; CHECK-NEXT: pushq %rbx
+; CHECK-NEXT: .cfi_def_cfa_offset 56
+; CHECK-NEXT: pushq %rax
+; CHECK-NEXT: .cfi_def_cfa_offset 64
+; CHECK-NEXT: .cfi_offset %rbx, -56
+; CHECK-NEXT: .cfi_offset %r12, -48
+; CHECK-NEXT: .cfi_offset %r13, -40
+; CHECK-NEXT: .cfi_offset %r14, -32
+; CHECK-NEXT: .cfi_offset %r15, -24
+; CHECK-NEXT: .cfi_offset %rbp, -16
+; CHECK-NEXT: movq external_bool@GOTPCREL(%rip), %rax
+; CHECK-NEXT: cmpb $1, (%rax)
+; CHECK-NEXT: jne .LBB0_3
+; CHECK-NEXT: # %bb.1: # %branchA
+; CHECK-NEXT: movq externalA@GOTPCREL(%rip), %rax
+; CHECK-NEXT: movl (%rax), %eax
+; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movq externalC@GOTPCREL(%rip), %rax
+; CHECK-NEXT: movl (%rax), %eax
+; CHECK-NEXT: movl %eax, (%rsp) # 4-byte Spill
+; CHECK-NEXT: #APP
+; CHECK-NEXT: #NO_APP
+; CHECK-NEXT: .Ltmp2:
+; CHECK-NEXT: callq maythrow@PLT
+; CHECK-NEXT: .Ltmp3:
+; CHECK-NEXT: jmp .LBB0_4
+; CHECK-NEXT: .LBB0_3: # %branchB
+; CHECK-NEXT: movq externalB@GOTPCREL(%rip), %rax
+; CHECK-NEXT: movl (%rax), %eax
+; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movq externalD@GOTPCREL(%rip), %rax
+; CHECK-NEXT: movl (%rax), %eax
+; CHECK-NEXT: movl %eax, (%rsp) # 4-byte Spill
+; CHECK-NEXT: #APP
+; CHECK-NEXT: #NO_APP
+; CHECK-NEXT: .Ltmp0:
+; CHECK-NEXT: callq maythrow@PLT
+; CHECK-NEXT: .Ltmp1:
+; CHECK-NEXT: .LBB0_4: # %end
+; CHECK-NEXT: addq $8, %rsp
+; CHECK-NEXT: .cfi_def_cfa_offset 56
+; CHECK-NEXT: popq %rbx
+; CHECK-NEXT: .cfi_def_cfa_offset 48
+; CHECK-NEXT: popq %r12
+; CHECK-NEXT: .cfi_def_cfa_offset 40
+; CHECK-NEXT: popq %r13
+; CHECK-NEXT: .cfi_def_cfa_offset 32
+; CHECK-NEXT: popq %r14
+; CHECK-NEXT: .cfi_def_cfa_offset 24
+; CHECK-NEXT: popq %r15
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: popq %rbp
+; CHECK-NEXT: .cfi_def_cfa_offset 8
+; CHECK-NEXT: retq
+; CHECK-NEXT: .LBB0_2: # %lpad
+; CHECK-NEXT: .cfi_def_cfa_offset 64
+; CHECK-NEXT: .Ltmp4:
+; CHECK-NEXT: movq %rax, %rbx
+; CHECK-NEXT: movl {{[-0-9]+}}(%r{{[sb]}}p), %edi # 4-byte Reload
+; CHECK-NEXT: addl (%rsp), %edi # 4-byte Folded Reload
+; CHECK-NEXT: callq cleanup@PLT
+; CHECK-NEXT: movq %rbx, %rdi
+; CHECK-NEXT: callq _Unwind_Resume@PLT
+ %1 = load i1, ptr @external_bool
+ br i1 %1, label %branchA, label %branchB
+branchA:
+ %valueA = load i32, ptr @externalA
+ %valueC = load i32, ptr @externalC
+ call void asm sideeffect "", "~{rbp},~{rsi},~{rdi},~{rcx},~{rbx},~{r12},~{r13},~{r14},~{r15},~{flags}"()
+ invoke void @maythrow() to label %end unwind label %lpad
+branchB:
+ %valueB = load i32, ptr @externalB
+ %valueD = load i32, ptr @externalD
+ call void asm sideeffect "", "~{rbp},~{rsi},~{rdi},~{rcx},~{rbx},~{r12},~{r13},~{r14},~{r15},~{flags}"()
+ invoke void @maythrow() to label %end unwind label %lpad
+lpad:
+ %phiValue = phi i32 [%valueA, %branchA], [%valueB, %branchB]
+ %phiValue2 = phi i32 [%valueC, %branchA], [%valueD, %branchB]
+ %lp = landingpad { ptr, i32 }
+ cleanup
+ %3 = add i32 %phiValue2, %phiValue
+ call void @cleanup(i32 %3)
+ resume { ptr, i32 } %lp
+end:
+ ret void
+}
|
An alternative solution could be eliminating phis before lrshrink, which would lower the phis into copies after the label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the context for this pass? I didn't know it existed, and I see it's only added by x86. Why does it exist? This is just an extra machine scheduler?
llvm/lib/CodeGen/LiveRangeShrink.cpp
Outdated
if (MI.isPHI() || MI.isDebugOrPseudoInstr()) | ||
continue; | ||
if (IsEHPad && !EHBarrier && MI.isEHLabel()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using SkipPHIsLabelsAndDebug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it could have been using it if we did not want to check for labels but in my fix i use the first ehlabel to determine the start of the landingpad.
That's a good question, because the machine scheduler actually doesn't run into this problem due to running after phi elimination.
It also seemd to require insight of the whole basicblock, which is why it couldn't be solved with exisiting generic schdeduling. I wonder if these assumptions still hold in the current version of misched and resassociate. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Updated the patch to deal with rebuilds of the InstructionOrderMap (triggered an assertion during my tests). |
97cff7a
to
737d209
Compare
llvm/lib/CodeGen/LiveRangeShrink.cpp
Outdated
// barrier for code motion. IOM is rebuild from the next instruction | ||
// to prevent later instructions from being moved before this MI. | ||
SawEHLabel = true; | ||
BuildInstOrderMap(Next, IOM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This amounts to recomputing a machineinstr->index map twice. Why not just use SkipPHIsLabelsAndDebug up above ni the original map computation, and re-use the iterator at the start of the loop here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes - I think I get what you were hinting at now =)
So i cant SkipPHIsLabelsAndDebug in every case since we will miss the optimization in the original test which relies on moving uses of phis to their defs.
But we should be able to utilize the skip in the EHPad case since the assumption of the landingpad ehlabel being the first non phi should still hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure why you can't do this in the !MBBPad case
well not for the map at least -> otherwise we are missing the phis in the map and thus we can't hoist their uses hoisting uses is only illegal if there is an EHLabel inbetween def and use |
This fixes issue llvm#114194 The issue happens during the LiveRangeShrink pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label. This results in a portion of the landingpad missing due to being hoisted in front of the label.
Updated the patch again:
Did further testing with this patch and it seems to achieve the desired result. @arsenm would you be ok with merging this for me since i don't have write access? |
…lvm#114195) This fixes issue llvm#114194 The issue happens during the `LiveRangeShrink` pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label. This results in a portion of the landingpad missing due to being hoisted in front of the label.
…lvm#114195) This fixes issue llvm#114194 The issue happens during the `LiveRangeShrink` pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label. This results in a portion of the landingpad missing due to being hoisted in front of the label.
I am noticing a significant compile time regression (perhaps infinite loop?) with this change with certain compile options, which I discovered when building the Linux kernel for 32-bit x86.
Immediately prior to this change:
At this change:
|
This fixes the infinite loop discovered in llvm#114195. Since we skip debug instructions at the start of the loop we do not need to skip them again at the end of the loop.
@nathanchance ty for finding and reducing this issue! |
This fixes the infinite loop discovered in llvm#114195. Since we skip debug instructions at the start of the loop we do not need to skip them again at the end of the loop.
This fixes the infinite loop discovered in llvm#114195. Since we skip debug instructions at the start of the loop we do not need to skip them again at the end of the loop.
This fixes the infinite loop discovered in #114195. Since we skip debug instructions at the start of the loop we do not need to skip them again at the end of the loop.
This fixes the infinite loop discovered in llvm#114195. Since we skip debug instructions at the start of the loop we do not need to skip them again at the end of the loop.
This fixes issue #114194
The issue happens during the
LiveRangeShrink
pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label.This results in a portion of the landingpad missing due to being hoisted in front of the label.