Skip to content

[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. #114265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

daniel-zabawa
Copy link
Contributor

@daniel-zabawa daniel-zabawa commented Oct 30, 2024

This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect.

This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp.

This code assumed only PUSHes would appear in call sequences. However,
if calls require frame-pointer/base-pointer spills, only the PUSH
operations inserted by spillFPBP will be recognized, and the adjustments
to frame object offsets in prologepilog will be incorrect.

This change correctly reports the SP adjustment for POP and ADD/SUB to
rsp, and an assertion for unrecognized instructions that modify rsp.

There is no unit test provided as the issue can only be reliably
reproduced in MIR (to force a spill/restore outside and inside a call
sequence, respectively), but MIR cannot serialize the FPClobberedByCall
field that triggers spillFPBP to run.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-backend-x86

Author: Daniel Zabawa (daniel-zabawa)

Changes

This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect.

This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp.

There is no unit test provided as the issue can only be reliably reproduced in MIR (to force a spill/restore outside and inside a call sequence, respectively), but MIR cannot serialize the FPClobberedByCall field that triggers spillFPBP to run.


Full diff: https://github.com/llvm/llvm-project/pull/114265.diff

1 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+29-2)
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 38ea1f35be2b9a..e2285417a155e4 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -452,10 +452,13 @@ int X86InstrInfo::getSPAdjust(const MachineInstr &MI) const {
     return -(I->getOperand(1).getImm());
   }
 
-  // Currently handle only PUSHes we can reasonably expect to see
-  // in call sequences
+  // Handle other opcodes we reasonably expect to see in call
+  // sequences. Note this may include spill/restore of FP/BP.
   switch (MI.getOpcode()) {
   default:
+    assert(!(MI.modifiesRegister(X86::RSP, &RI) ||
+             MI.getDesc().hasImplicitDefOfPhysReg(X86::RSP)) &&
+           "Unhandled opcode in getSPAdjust");
     return 0;
   case X86::PUSH32r:
   case X86::PUSH32rmm:
@@ -467,6 +470,30 @@ int X86InstrInfo::getSPAdjust(const MachineInstr &MI) const {
   case X86::PUSH64rmr:
   case X86::PUSH64i32:
     return 8;
+  case X86::POP32r:
+  case X86::POP32rmm:
+  case X86::POP32rmr:
+    return -4;
+  case X86::POP64r:
+  case X86::POP64rmm:
+  case X86::POP64rmr:
+    return -8;
+  // FIXME: (implement and) use isAddImmediate in the
+  // default case instead of the following ADD/SUB cases.
+  case X86::ADD32ri:
+  case X86::ADD32ri8:
+  case X86::ADD64ri32:
+    if (MI.getOperand(0).getReg() == X86::RSP &&
+        MI.getOperand(1).getReg() == X86::RSP)
+      return -MI.getOperand(2).getImm();
+    return 0;
+  case X86::SUB32ri:
+  case X86::SUB32ri8:
+  case X86::SUB64ri32:
+    if (MI.getOperand(0).getReg() == X86::RSP &&
+        MI.getOperand(1).getReg() == X86::RSP)
+      return MI.getOperand(2).getImm();
+    return 0;
   }
 }
 

@e-kud e-kud requested review from RKSimon and phoebewang October 30, 2024 20:36
@e-kud
Copy link
Contributor

e-kud commented Oct 30, 2024

@daniel-zabawa any tests?

@daniel-zabawa
Copy link
Contributor Author

@daniel-zabawa any tests?

I am open to suggestions on how to cover this. The potential bug in prolog/epilog is contingent on having a frame object referenced in a call sequence, following a previous call sequence which requires the frame pointer to be spilled.

I don't know any reliable way to create the code exposing the issue without MIR, but MIR also does not support serializing the attributes (FPClobberedByCall) required for the FP spill/restore to be generated.

@e-kud
Copy link
Contributor

e-kud commented Oct 30, 2024

@daniel-zabawa any tests?

I am open to suggestions on how to cover this. The potential bug in prolog/epilog is contingent on having a frame object referenced in a call sequence, following a previous call sequence which requires the frame pointer to be spilled.

I don't know any reliable way to create the code exposing the issue without MIR, but MIR also does not support serializing the attributes (FPClobberedByCall) required for the FP spill/restore to be generated.

Can we add this attribute to llvm/lib/Target/X86/X86MachineFunctionInfo.h similarly we have it for amxProgModel and check with mir tests? It consumes X86MachineFunctionInfo that has getFPClobberedByCall.

@daniel-zabawa
Copy link
Contributor Author

@daniel-zabawa any tests?

I am open to suggestions on how to cover this. The potential bug in prolog/epilog is contingent on having a frame object referenced in a call sequence, following a previous call sequence which requires the frame pointer to be spilled.
I don't know any reliable way to create the code exposing the issue without MIR, but MIR also does not support serializing the attributes (FPClobberedByCall) required for the FP spill/restore to be generated.

Can we add this attribute to llvm/lib/Target/X86/X86MachineFunctionInfo.h similarly we have it for amxProgModel and check with mir tests? It consumes X86MachineFunctionInfo that has getFPClobberedByCall.

Thanks - I did not notice that it would be simple to add that into serialization. The test itself is still taking some time as it has to satisfy rather particular conditions for the FP to be spilled around a call and for that adjustment to be counted in prologepilog.

@daniel-zabawa
Copy link
Contributor Author

pinging @RKSimon @phoebewang for review

I have still not resolved the linux failure, but I believe it's unrelated. The only failed tests are in lldb-dap:

�_bk;t=1730412997573 FAIL: LLDB (/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-9dkps-1/llvm-project/github-pull-requests/build/bin/clang-x86_64) :: test_repl_evaluate_expressions (TestDAP_evaluate.TestDAP_evaluate)
�_bk;t=1730412997573 FAIL: LLDB (/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-9dkps-1/llvm-project/github-pull-requests/build/bin/clang-x86_64) :: test_repl_evaluate_expressions (TestDAP_evaluate.TestDAP_evaluate)

but these are also reported as unresolved in the summary.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@daniel-zabawa daniel-zabawa force-pushed the review/x86-spadjust-pops branch from fa874a0 to 32e26ed Compare November 13, 2024 16:30
@daniel-zabawa
Copy link
Contributor Author

force-push to respin the tests.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but please be vigilant as I wonder if other obscure instructions may be used as well and could fire that assert in an external test case.

@e-kud e-kud merged commit 6fb7cdf into llvm:main Nov 14, 2024
8 checks passed
Copy link

@daniel-zabawa Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@bjope
Copy link
Collaborator

bjope commented Nov 18, 2024

We have seen some suspected miscompiles after this patch, and I've tried to reduce it a bit. What happens is that we get different code depending on if there is debug info or not (Which seems a bit weird given the code changes here? But maybe it exposes some older problem or I just haven't analyzed this patch enough?).

Below is my reduced test case. It is a mir test foo.mir like this:

--- |
  ; ModuleID = '<stdin>'
  source_filename = "reduced.ll"
  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
  declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

  declare fastcc i16 @check_bi(i16, i256, i256) local_unnamed_addr

  define noundef i16 @test_one_15(i127 %loadedv3.i88, ptr nocapture writeonly %retval.i85, i256 %storedv322) local_unnamed_addr !dbg !4 {
  entry:
    %call216 = tail call fastcc i16 @check_bi(i16 0, i256 0, i256 -365375409332725729550921208179070754913983135743)
    store volatile i128 0, ptr null, align 4294967296
    %call276 = tail call fastcc i16 @check_bi(i16 0, i256 0, i256 0)
    tail call void @llvm.lifetime.start.p0(i64 0, ptr null), !dbg !7
    store i127 %loadedv3.i88, ptr %retval.i85, align 16
      #dbg_value(i256 0, !12, !DIExpression(), !14)
    %call337 = tail call fastcc i16 @check_bi(i16 0, i256 %storedv322, i256 0)
    %call392 = tail call fastcc i16 @check_bi(i16 0, i256 0, i256 -5575186299632655785383929568162090376495103)
    ret i16 0
  }

  attributes #0 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

  !llvm.dbg.cu = !{!0}
  !llvm.module.flags = !{!3}

  !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 20.0.0git.prerel (FlexC)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, retainedTypes: !2, splitDebugInlining: false, nameTableKind: None)
  !1 = !DIFile(filename: "foo.c", directory: "")
  !2 = !{}
  !3 = !{i32 2, !"Debug Info Version", i32 3}
  !4 = distinct !DISubprogram(name: "test_one_15", scope: !5, file: !5, line: 612, type: !6, scopeLine: 612, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !2)
  !5 = !DIFile(filename: "foo.h", directory: "")
  !6 = distinct !DISubroutineType(types: !2)
  !7 = !DILocation(line: 103, column: 116, scope: !8, inlinedAt: !10)
  !8 = distinct !DISubprogram(name: "idbi", scope: !5, file: !5, line: 103, type: !9, scopeLine: 103, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !2)
  !9 = distinct !DISubroutineType(types: !2)
  !10 = distinct !DILocation(line: 612, column: 202, scope: !11)
  !11 = distinct !DILexicalBlock(scope: !4, file: !5, line: 612, column: 112)
  !12 = !DILocalVariable(name: "res", scope: !11, file: !5, line: 612, type: !13)
  !13 = !DIBasicType(name: "_BitInt", size: 256, encoding: DW_ATE_signed)
  !14 = !DILocation(line: 0, scope: !11)

...
---
name:            test_one_15
alignment:       16
exposesReturnsTwice: false
legalized:       false
regBankSelected: false
selected:        false
failedISel:      false
tracksRegLiveness: true
hasWinCFI:       false
noPhis:          true
isSSA:           false
noVRegs:         true
hasFakeUses:     false
callsEHReturn:   false
callsUnwindInit: false
hasEHCatchret:   false
hasEHScopes:     false
hasEHFunclets:   false
isOutlined:      false
debugInstrRef:   true
failsVerification: false
tracksDebugUserValues: true
registers:       []
liveins:
  - { reg: '$rdi', virtual-reg: '' }
  - { reg: '$rsi', virtual-reg: '' }
  - { reg: '$rdx', virtual-reg: '' }
  - { reg: '$rcx', virtual-reg: '' }
  - { reg: '$r8', virtual-reg: '' }
  - { reg: '$r9', virtual-reg: '' }
frameInfo:
  isFrameAddressTaken: false
  isReturnAddressTaken: false
  hasStackMap:     false
  hasPatchPoint:   false
  stackSize:       0
  offsetAdjustment: 0
  maxAlignment:    16
  adjustsStack:    true
  hasCalls:        true
  stackProtector:  ''
  functionContext: ''
  maxCallFrameSize: 4294967295
  cvBytesOfCalleeSavedRegisters: 0
  hasOpaqueSPAdjustment: false
  hasVAStart:      false
  hasMustTailInVarArgFunc: false
  hasTailCall:     false
  isCalleeSavedInfoValid: false
  localFrameSize:  0
  savePoint:       ''
  restorePoint:    ''
fixedStack:
  - { id: 0, type: default, offset: 0, size: 8, alignment: 16, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
stack:           []
entry_values:    []
debugValueSubstitutions: []
constants:       []
machineFunctionInfo:
  amxProgModel:    None
  FPClobberedByCall: false
  hasPushSequences: true
body:             |
  bb.0.entry:
    liveins: $rcx, $rdi, $rdx, $rsi, $r8, $r9

    renamable $rbx = COPY $r9
    renamable $r14 = COPY $r8
    renamable $r15 = COPY $rcx
    renamable $r13 = COPY $rdx
    renamable $r12 = COPY $rsi
    renamable $rbp = COPY $rdi
    ADJCALLSTACKDOWN64 32, 0, 32, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    $edi = MOV32r0 implicit-def dead $eflags
    dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
    dead $edx = MOV32r0 implicit-def dead $eflags, implicit-def $rdx
    dead $ecx = MOV32r0 implicit-def dead $eflags, implicit-def $rcx
    dead $r8d = MOV32r0 implicit-def dead $eflags, implicit-def $r8
    PUSH64i32 -1, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 24)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 -1073741824, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 16)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 0, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 8)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 1, implicit-def $rsp, implicit $rsp :: (store (s64) into stack)
    CFI_INSTRUCTION adjust_cfa_offset 8
    CALL64pcrel32 target-flags(x86-plt) @check_bi, csr_64, implicit $rsp, implicit $ssp, implicit $edi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $ax
    ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    MOV64mi32 $noreg, 1, $noreg, 8, $noreg, 0 :: (volatile store (s64) into `ptr null` + 8, basealign 4294967296)
    MOV64mi32 $noreg, 1, $noreg, 0, $noreg, 0 :: (volatile store (s64) into `ptr null`, align 4294967296)
    ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    DBG_VALUE i256 0, $noreg, !12, !DIExpression(), debug-location !14
    renamable $xmm0 = V_SET0
    MOVUPSmr $rsp, 1, $noreg, 16, $noreg, renamable $xmm0 :: (store (s128), align 8)
    MOVUPSmr $rsp, 1, $noreg, 0, $noreg, killed renamable $xmm0 :: (store (s128), align 8)
    $edi = MOV32r0 implicit-def dead $eflags
    dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
    dead $edx = MOV32r0 implicit-def dead $eflags, implicit-def $rdx
    dead $ecx = MOV32r0 implicit-def dead $eflags, implicit-def $rcx
    dead $r8d = MOV32r0 implicit-def dead $eflags, implicit-def $r8
    CALL64pcrel32 target-flags(x86-plt) @check_bi, csr_64, implicit $rsp, implicit $ssp, implicit $edi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $ax
    ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    renamable $rax = MOV64ri 9223372036854775807
    renamable $rax = AND64rr killed renamable $rax, killed renamable $r12, implicit-def dead $eflags
    MOV64mr renamable $r13, 1, $noreg, 8, $noreg, killed renamable $rax :: (store (s64) into %ir.retval.i85 + 8, basealign 16)
    MOV64mr killed renamable $r13, 1, $noreg, 0, $noreg, killed renamable $rbp :: (store (s64) into %ir.retval.i85, align 16)
    ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    renamable $xmm0 = V_SET0
    MOVUPSmr $rsp, 1, $noreg, 16, $noreg, renamable $xmm0 :: (store (s128), align 8)
    MOVUPSmr $rsp, 1, $noreg, 0, $noreg, killed renamable $xmm0 :: (store (s128), align 8)
    $edi = MOV32r0 implicit-def dead $eflags
    $rsi = COPY killed renamable $r15
    $rdx = COPY killed renamable $r14
    $rcx = COPY killed renamable $rbx
    renamable $r8 = MOV64rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
    CALL64pcrel32 target-flags(x86-plt) @check_bi, csr_64, implicit $rsp, implicit $ssp, implicit $edi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $ax
    ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    ADJCALLSTACKDOWN64 32, 0, 32, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    $edi = MOV32r0 implicit-def dead $eflags
    dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
    dead $edx = MOV32r0 implicit-def dead $eflags, implicit-def $rdx
    dead $ecx = MOV32r0 implicit-def dead $eflags, implicit-def $rcx
    dead $r8d = MOV32r0 implicit-def dead $eflags, implicit-def $r8
    PUSH64i32 -1, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 24)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 -16384, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 16)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 0, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 8)
    CFI_INSTRUCTION adjust_cfa_offset 8
    PUSH64i32 1, implicit-def $rsp, implicit $rsp :: (store (s64) into stack)
    CFI_INSTRUCTION adjust_cfa_offset 8
    CALL64pcrel32 target-flags(x86-plt) @check_bi, csr_64, implicit $rsp, implicit $ssp, implicit $edi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $ax
    ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    dead $eax = MOV32r0 implicit-def dead $eflags, implicit-def $ax
    RET 0, $ax

...

If I compile it using llc foo.mir -run-pass prologepilog -o foo1.mir and then comment out the line with DBG_VALUE and run llc foo.mir -run-pass prologepilog -o foo2.mir, then those foo1.mir and foo2.mir files will diff like this:

178a179
>     DBG_VALUE i256 0, $noreg, !12, !DIExpression(), debug-location !14
205c206
<     renamable $r8 = MOV64rm $rsp, 1, $noreg, 96, $noreg :: (load (s64) from %fixed-stack.6, align 16)
---
>     renamable $r8 = MOV64rm $rsp, 1, $noreg, 128, $noreg :: (load (s64) from %fixed-stack.6, align 16)

@bjope
Copy link
Collaborator

bjope commented Nov 19, 2024

Hi @daniel-zabawa

I've reduced the test program even further:

--- |
  ; ModuleID = '<stdin>'
  source_filename = "reduced.ll"
  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  define noundef i16 @test_one_15() !dbg !0 {
  entry:
    ret i16 0
  }

  !llvm.dbg.cu = !{!0}
  !llvm.module.flags = !{!3}

  !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 20.0.0git.prerel (FlexC)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, retainedTypes: !{}, splitDebugInlining: false, nameTableKind: None)
  !1 = !DIFile(filename: "foo.c", directory: "")
  !3 = !{i32 2, !"Debug Info Version", i32 3}
  !10 = distinct !DILocation(line: 612, column: 202, scope: !11)
  !11 = distinct !DILexicalBlock(scope: !1, file: !1, line: 612, column: 112)
  !12 = !DILocalVariable(name: "res", scope: !11, file: !1, line: 612, type: !13)
  !13 = !DIBasicType(name: "_BitInt", size: 256, encoding: DW_ATE_signed)
  !14 = !DILocation(line: 0, scope: !11)

...
---
name:            test_one_15
alignment:       16
tracksRegLiveness: true
noPhis:          true
isSSA:           false
noVRegs:         true
debugInstrRef:   true
failsVerification: false
tracksDebugUserValues: true
registers:       []
frameInfo:
  isFrameAddressTaken: false
  isReturnAddressTaken: false
  hasStackMap:     false
  hasPatchPoint:   false
  stackSize:       0
  offsetAdjustment: 0
  maxAlignment:    16
  adjustsStack:    true
  hasCalls:        true
  stackProtector:  ''
  functionContext: ''
  maxCallFrameSize: 4294967295
  cvBytesOfCalleeSavedRegisters: 0
  hasOpaqueSPAdjustment: false
  hasVAStart:      false
  hasMustTailInVarArgFunc: false
  hasTailCall:     false
  isCalleeSavedInfoValid: false
  localFrameSize:  0
  savePoint:       ''
  restorePoint:    ''
fixedStack:
  - { id: 0, type: default, offset: 0, size: 8, alignment: 16, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
machineFunctionInfo:
  hasPushSequences: true
body:             |
  bb.0.entry:
    liveins: $rcx, $rdi, $rdx, $rsi, $r8, $r9

    ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    DBG_VALUE i256 0, $noreg, !12, !DIExpression(), !14
    renamable $r8 = MOV64rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
    ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
    RET 0
...

I suspect the problem somehow is related to the presence of "hasPushSequences: true".

@bjope
Copy link
Collaborator

bjope commented Nov 20, 2024

Ping!

Would be nice to know if you need more information (I could probably make a runnable C-level reproducer, but if the already provided reproducers are good enough I rather not spend time on that)?

Considering that we've seen lots of failing tests (runtime failures) starting to happen with this patch, maybe it should be reverted until someone has analyzed these problems?

@phoebewang, @RKSimon and @e-kud : I can't even see that @daniel-zabawa has a registered email address. So unsure if pinging here actually will lead to anything? I guess we shouldn't accept patches from people that aren't following up on problems found when landing patches?

phoebewang added a commit that referenced this pull request Nov 20, 2024
@daniel-zabawa
Copy link
Contributor Author

@bjope - my apologies as I had missed the recent email notifications. The current reproducer is fine.

The only change related to hasPushSequences was to add the serialization as there was no other reliable way to recreate the situation in MIR. That may well have exposed something below. Thanks @phoebewang for reverting.

@daniel-zabawa
Copy link
Contributor Author

I believe I've found the issue. When X86FrameLowering is handling the initial ADJCALLSTACKDOWN, the usual procedure is to insert the SP adjustment (SUB $rsp) at the ADJCALLSTACKDOWN site, with the returned iterator following the inserted SUB, as it's simply supplying the SP adjustment we already counted.

But, the insertion point actually skips debug instructions forward, and this is not accounted for in the result. So if debug instructions are present, the SUB inserted to replace the ADJCALLSTACKDOWN is visited again in the replaceFrameIndices loop, and is counted again towards the SP adjustment. This didn't happen before because we only recognized PUSH.

This is unfortunate as the prologepilog code apparently assumes the code replacing ADJCALLSTACKDOWN will not be visited in the loop. The original change to skip debug instructions when placing the replacement code traces back to https://bugs.llvm.org/show_bug.cgi?id=31319.

The commit only mentions that debug instructions after the erased (replaced?) call-stack adjustment could affect codegen. Both of the test cases from the bug report produce the same results with -g with or without this change. The lits added (CodeGen/X86/frame-lowering-debug-intrinsic.ll, CodeGen/X86/frame-lowering-debug-intrinsic-2.ll) pass with or without this change.

I am at a loss why the SUB should appear after debug instructions when the ADJCALLSTACKDOWN originally preceded them. I'll have to dig a bit to figure out a case that justifies this.

@phoebewang
Copy link
Contributor

@phoebewang, @RKSimon and @e-kud : I can't even see that @daniel-zabawa has a registered email address. So unsure if pinging here actually will lead to anything? I guess we shouldn't accept patches from people that aren't following up on problems found when landing patches?

@bjope Do you mean an email address shown in the GitHub? I think it's hidden by default. You can find the email address from git log.

Thanks @phoebewang for reverting.

I actually didn't revert it. I clicked the button, but decided to write an email to you and let you decide. So I just gave up creating a PR. I have reverted it now.

@dyung
Copy link
Collaborator

dyung commented Nov 21, 2024

Just a heads up that one of our internal tests started failing which I bisected back to this commit. I am working on a reproducer and like the case @bjope mentioned, the problem only occurs when the program is compiled with debug information. In our case, the program seems to go into an infinite loop after your change.

@bjope
Copy link
Collaborator

bjope commented Nov 21, 2024

@phoebewang, @RKSimon and @e-kud : I can't even see that @daniel-zabawa has a registered email address. So unsure if pinging here actually will lead to anything? I guess we shouldn't accept patches from people that aren't following up on problems found when landing patches?

@bjope Do you mean an email address shown in the GitHub? I think it's hidden by default. You can find the email address from git log.

Well, I kind of thought that I would see it in the email notification I got myself when I had used @ to cc someone (i.e. that I would see who else got that email). But I guess it is completely hidden then.

And I was probably a bit too harsh when talking about "people that aren't following up on problems found when landing patches". We can't expect everyone to be on high alert on any email notification X days after having submitted something (this wasn't even detected by the buildbots afaik). And Daniel did end up responding here. Just frustrating when you have a problem and don't even know if the submitter has gone fishing for the next couple of weeks (and you can't even see who the notifications are sent to).

@phoebewang
Copy link
Contributor

And I was probably a bit too harsh when talking about "people that aren't following up on problems found when landing patches". We can't expect everyone to be on high alert on any email notification X days after having submitted something (this wasn't even detected by the buildbots afaik). And Daniel did end up responding here. Just frustrating when you have a problem and don't even know if the submitter has gone fishing for the next couple of weeks (and you can't even see who the notifications are sent to).

We should be more generous to new contributors, they will learn it sooner or later. And you don't necessarily wait for author's response before reverting it.

@daniel-zabawa
Copy link
Contributor Author

@dyung please update here when you have a reproducer, even if it's not compact. That sounds like a different symptom than the existing issues.

@dyung
Copy link
Collaborator

dyung commented Nov 25, 2024

@dyung please update here when you have a reproducer, even if it's not compact. That sounds like a different symptom than the existing issues.

Sorry for the delay, I'm having trouble reducing it without completely breaking the test so I'll see if I can clean it up a bit and post it.

@dyung
Copy link
Collaborator

dyung commented Jan 16, 2025

@daniel-zabawa sorry for the delay, but here is a reduced C++ program which demonstrates the issue we were seeing on our internal tests.

extern "C" {
void printf(char *);
void snprintf(char *, long...);
void atexit(void());
}
void a() { printf("{PASS}\n{END}\n\x1a\n"); }
__attribute__((constructor)) void b() { atexit(a); }
void c(char *, int, char *fmt...) {
  __builtin_va_list args;
  __builtin_va_start(args, fmt);
}
char d[1];
long e;
int main() {
  struct f {
    f() {
      struct m {
        m() {
          for (long g = -10; g; ++g)
            for (long h = -10; h != 10; ++h)
              struct i {
                long j;
                i(long initial) {
                  char *k;
                  long l = initial, actual = __sync_lock_test_and_set_8(&l, j);
                  k = "";
                  snprintf(d, sizeof(d),
                           "%%s %%s0 expected %s/%s but got %s/%s", 0, 0, 0, 0,
                           0, "");
                  c("", 0, d, 0, "__sync_lock_test_and_set_8", initial, j,
                    actual, l);
                }
              } i(g);
          struct i {
            long j;
            i(long initial) {
              long actual = __sync_lock_test_and_set_8(&e, j);
              c("", 0, "", j, 0, initial, actual, initial);
            }
          } i(0);
        }
      } m;
    }
  } f;
}

Compile with clang -O2 -g -mavx repro.cpp and run the resulting binary. A compiler built from the commit preceeding yours (a8a1e90) generates a binary that finishes quickly when run. When using a compiler built from your change (6fb7cdf), the resulting binary gets stuck in an infinite loop. Like @bjope's example, debug info generation is required to hit the problem. Hope this helps.

@daniel-zabawa
Copy link
Contributor Author

@dyung - no worries, and I'm able to reproduce the issue. I've confirmed it's the same issue as the other failure: there is a debug instruction following the ADJCALLSTACKDOWN, and when frame lowering is eliminating said ADJCALLSTACKDOWN, the inserted code is placed after the debug instruction, but the iterator it returns is pointing to the debug instruction.

The prolog/epilog code is written such that it tracks the SP adjustment of ADJCALLSTACKDOWN based on the pseudo, and frame-lowering is expected to insert code to affect that adjustment at the same location. When the fixup code is inserted after one or more debug instructions, prolog/epilog will visit it again. This worked before by simply not recognizing that certain instructions are SP adjustments.

As I mentioned before, I have not been able to reproduce whatever issue was fixed by the frame-lowering change to insert frame pseudo replacement code after debug instructions. If anything, it may have been due to some interaction with other frame-lowering code that is trying to parse the frame setup to do some simplification, but I cannot tell what that code is.

Ideally I'd like to avoid the situation where pseudo replacement code is reordered with debug instructions, but without knowing the root issue at the time it was changed, it will require thorough testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants