Skip to content

[KeyInstr] Fully support mixed key/non-key inlining modes #144103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2025

Conversation

OCHyams
Copy link
Contributor

@OCHyams OCHyams commented Jun 13, 2025

Patch 3/4 adding bitcode support, though the final patch doesn't depend on this one. Ignore the 1st commit in this PR (I'll rebase once the parent commit is merged).

See summary in #144104 which this patch depends on. There's a cost to doing the right thing here, these numbers are for non-key-instructions builds:

https://llvm-compile-time-tracker.com/compare.php?from=bbdb9a446a87de0e3fc45a4016ceec41edb5c51b&to=25537e1b9cd4a50dd1b8575b5ad3ba250b7c0c12&stat=instructions%3Au


stage1-O3  stage1-ReleaseThinLTO  stage1-ReleaseLTO-g  stage1-O0-g  stage1-aarch64-O3  stage1-aarch64-O0-g  stage2-O3  stage2-O0-g  stage2-clang
61255M (+0.00%)  77455M (-0.01%)  89419M (+0.03%)  18868M (+0.16%)  68697M (-0.01%)  23070M (+0.10%)  53401M (-0.00%)  16533M (+0.16%)  34128875M (-0.00%)

@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-debuginfo

Author: Orlando Cazalet-Hyams (OCHyams)

Changes

Patch 3/4 adding bitcode support, though the final patch doesn't depend on this one. Ignore the 1st commit in this PR (I'll rebase once the parent commit is merged).

See <TODO: PR link>. There's a cost to doing the right thing here, these numbers are for non-key-instructions builds:

https://llvm-compile-time-tracker.com/compare.php?from=bbdb9a446a87de0e3fc45a4016ceec41edb5c51b&amp;to=25537e1b9cd4a50dd1b8575b5ad3ba250b7c0c12&amp;stat=instructions%3Au


stage1-O3  stage1-ReleaseThinLTO  stage1-ReleaseLTO-g  stage1-O0-g  stage1-aarch64-O3  stage1-aarch64-O0-g  stage2-O3  stage2-O0-g  stage2-clang
61255M (+0.00%)  77455M (-0.01%)  89419M (+0.03%)  18868M (+0.16%)  68697M (-0.01%)  23070M (+0.10%)  53401M (-0.00%)  16533M (+0.16%)  34128875M (-0.00%)

Full diff: https://github.com/llvm/llvm-project/pull/144103.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+18-6)
  • (added) llvm/test/DebugInfo/KeyInstructions/X86/dwarf-inline-modes.mir (+92)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 0edfca78b0886..44fa223a35635 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -169,8 +169,10 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
                           "Stuff")),
     cl::init(DwarfDebug::MinimizeAddrInV5::Default));
 
-static cl::opt<bool> KeyInstructionsAreStmts("dwarf-use-key-instructions",
-                                             cl::Hidden, cl::init(false));
+/// Set to false to ignore Key Instructions metadata.
+static cl::opt<bool> KeyInstructionsAreStmts(
+    "dwarf-use-key-instructions", cl::Hidden, cl::init(true),
+    cl::desc("Set to false to ignore Key Instructions metadata"));
 
 static constexpr unsigned ULEB128PadSize = 4;
 
@@ -2077,8 +2079,16 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
   unsigned LastAsmLine =
       Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();
 
+  // Not-Key-Instructions functions inlined into Key Instructions functions
+  // should use default is_stmt handling. Key instructions functions
+  // inlined into not-key-instructions functions should use Key Instructions
+  // is_stmt handling.
+  bool ScopeUsesKeyInstructions =
+      KeyInstructionsAreStmts && DL &&
+      DL->getScope()->getSubprogram()->getKeyInstructionsEnabled();
+
   bool IsKey = false;
-  if (KeyInstructionsAreStmts && DL && DL.getLine())
+  if (ScopeUsesKeyInstructions && DL && DL.getLine())
     IsKey = KeyInstructions.contains(MI);
 
   if (!DL && MI == PrologEndLoc) {
@@ -2158,7 +2168,7 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
     PrologEndLoc = nullptr;
   }
 
-  if (KeyInstructionsAreStmts) {
+  if (ScopeUsesKeyInstructions) {
     if (IsKey)
       Flags |= DWARF2_FLAG_IS_STMT;
   } else {
@@ -2651,10 +2661,12 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
   PrologEndLoc = emitInitialLocDirective(
       *MF, Asm->OutStreamer->getContext().getDwarfCompileUnitID());
 
+  // Run both `findForceIsStmtInstrs` and `computeKeyInstructions` because
+  // not-key-instructions functions may be inlined into key-instructions
+  // functions and vice versa.
   if (KeyInstructionsAreStmts)
     computeKeyInstructions(MF);
-  else
-    findForceIsStmtInstrs(MF);
+  findForceIsStmtInstrs(MF);
 }
 
 unsigned
diff --git a/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-inline-modes.mir b/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-inline-modes.mir
new file mode 100644
index 0000000000000..39fb4126332e5
--- /dev/null
+++ b/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-inline-modes.mir
@@ -0,0 +1,92 @@
+# RUN: llc %s --start-after=livedebugvalues --dwarf-use-key-instructions --filetype=obj -o - \
+# RUN: | llvm-objdump -d - --no-show-raw-insn \
+# RUN: | FileCheck %s --check-prefix=OBJ
+
+# RUN: llc %s --start-after=livedebugvalues --dwarf-use-key-instructions --filetype=obj -o - \
+# RUN: | llvm-dwarfdump - --debug-line \
+# RUN: | FileCheck %s --check-prefix=DBG
+
+
+
+--- |
+  target triple = "x86_64-unknown-linux-gnu"
+
+  define hidden noundef i32 @key() local_unnamed_addr !dbg !5 {
+  entry:
+    ret i32 0
+  }
+
+  define hidden noundef i32 @not_key() local_unnamed_addr !dbg !9 {
+  entry:
+    ret i32 0
+  }
+
+  !llvm.dbg.cu = !{!0}
+  !llvm.module.flags = !{!2, !3}
+  !llvm.ident = !{!4}
+
+  !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_17, file: !1, producer: "clang version 21.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, nameTableKind: None)
+  !1 = !DIFile(filename: "test.cpp", directory: "/")
+  !2 = !{i32 7, !"Dwarf Version", i32 5}
+  !3 = !{i32 2, !"Debug Info Version", i32 3}
+  !4 = !{!"clang version 21.0.0"}
+  !5 = distinct !DISubprogram(name: "key", scope: !1, file: !1, line: 1, type: !6, scopeLine: 1, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, keyInstructions: true)
+  !6 = !DISubroutineType(types: !7)
+  !7 = !{}
+  !9 = distinct !DISubprogram(name: "not_key", scope: !1, file: !1, line: 1, type: !6, scopeLine: 1, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, keyInstructions: false)
+  !10 = distinct !DILocation(line: 5, scope: !5)
+  !11 = distinct !DILocation(line: 9, scope: !9)
+...
+---
+name:            key
+alignment:       16
+body:             |
+  bb.0.entry:
+
+    ; OBJ: 0000000000000000 <key>:
+    ; OBJ-NEXT:  0: movl $0x1, %eax
+    ; OBJ-NEXT:  5: movl $0x2, %eax
+    ; OBJ-NEXT:  a: movl $0x3, %eax
+    ; OBJ-NEXT:  f: movl $0x4, %eax
+    ; OBJ-NEXT: 14: movl $0x5, %eax
+    ; OBJ-NEXT: 19: retq
+    ;
+    ; DBG:      Address            Line   Column File   ISA Discriminator OpIndex Flags
+    ; DBG-NEXT: ------------------ ------ ------ ------ --- ------------- ------- -------------
+    ; DBG-NEXT: 0x0000000000000000      1      0      0   0             0       0  is_stmt prologue_end
+    ; DBG-NEXT: 0x0000000000000005      2      0      0   0             0       0  is_stmt
+    ; DBG-NEXT: 0x000000000000000a      2      0      0   0             0       0
+    ; DBG-NEXT: 0x000000000000000f      3      0      0   0             0       0  is_stmt
+    ; DBG-NEXT: 0x0000000000000014      3      0      0   0             0       0
+    ;
+    $eax = MOV32ri 1,  debug-location !DILocation(line: 1, scope: !5)                            ; is_stmt (prologue_end)
+    $eax = MOV32ri 2,  debug-location !DILocation(line: 2, scope: !5, atomGroup: 1, atomRank: 1) ; is_stmt (key)
+    $eax = MOV32ri 3,  debug-location !DILocation(line: 2, scope: !9, inlinedAt: !10)
+    $eax = MOV32ri 4,  debug-location !DILocation(line: 3, scope: !9, inlinedAt: !10)            ; is_stmt (not_key)
+    $eax = MOV32ri 5,  debug-location !DILocation(line: 3, scope: !5, atomGroup: 1, atomRank: 2) ; is_stmt (key)
+    RET64 $eax,        debug-location !DILocation(line: 3, scope: !5, atomGroup: 1, atomRank: 1)
+...
+---
+name:            not_key
+alignment:       16
+body:             |
+  bb.0.entry:
+
+    ; OBJ: 0000000000000020 <not_key>:
+    ; OBJ-NEXT: 20: movl $0x1, %eax
+    ; OBJ-NEXT: 25: movl $0x2, %eax
+    ; OBJ-NEXT: 2a: movl $0x3, %eax
+    ; OBJ-NEXT: 2f: retq
+    ;
+    ;           Address            Line   Column File   ISA Discriminator OpIndex Flags
+    ;           ------------------ ------ ------ ------ --- ------------- ------- -------------
+    ; DBG-NEXT: 0x0000000000000020      1      0      0   0             0       0  is_stmt prologue_end
+    ; DBG-NEXT: 0x0000000000000025      2      0      0   0             0       0
+    ; DBG-NEXT: 0x000000000000002a      3      0      0   0             0       0  is_stmt
+    ; DBG-NEXT: 0x000000000000002f      3      0      0   0             0       0
+    ;
+    $eax = MOV32ri 1,  debug-location !DILocation(line: 1, scope: !9)                                            ; is_stmt (prologue_end)
+    $eax = MOV32ri 2,  debug-location !DILocation(line: 2, scope: !5, inlinedAt: !11, atomGroup: 1, atomRank: 2)
+    $eax = MOV32ri 3,  debug-location !DILocation(line: 3, scope: !5, inlinedAt: !11, atomGroup: 1, atomRank: 1) ; is_stmt (key)
+    RET64 $eax,        debug-location !DILocation(line: 3, scope: !9)
+...

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not feasible to check the optimisation mode here, avoiding the performance hit to O0? While there may be frontend/LTO reasons why key-and-non-key instructions could be mixed, one of the major reasons why KIs is needed is due to instruction-schedulers shuffling instructions around, which doesn't happen at O0. We could reasonably say "Key instruction is off at O0 anyway, we're not honouring it if you managed to convince something to be inlined into an optnone function".

If that's not alright, IMO it's OK to once again let O0 pay the price -- it's increasing from a low base anyway. Plus, the MC-layer optimisations we're getting up shortly gives it a 1.2% speedup.

@OCHyams
Copy link
Contributor Author

OCHyams commented Jun 24, 2025

ty

Is it not feasible to check the optimisation mode here, avoiding the performance hit to O0? While there may be frontend/LTO reasons why key-and-non-key instructions could be mixed, one of the major reasons why KIs is needed is due to instruction-schedulers shuffling instructions around, which doesn't happen at O0. We could reasonably say "Key instruction is off at O0 anyway, we're not honouring it if you managed to convince something to be inlined into an optnone function".

If that's not alright, IMO it's OK to once again let O0 pay the price -- it's increasing from a low base anyway. Plus, the MC-layer optimisations we're getting up shortly gives it a 1.2% speedup.

For posterity - chatting offline - this is referring to an MC layer patch that's already landed - compile-time-tracker-link.

Checking the optimisation level could work, but if the costs are within the realm of acceptable I prefer to avoid the additional complexity and edge case (the edge case that exists before this patch only regresses to something close to existing behaviour, rather than something worse). So, I'd lean towards having O0 paying the price unless anyone feels very strongly that the cost isn't acceptable (and hopefully our improvements to -g performance are considered there).

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then.

@OCHyams OCHyams merged commit af82e14 into llvm:main Jun 30, 2025
6 of 7 checks passed
@OCHyams OCHyams deleted the ki-sp-flag-3 branch July 1, 2025 12:24
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Patch 3/4 adding bitcode support, though the final patch doesn't depend on this
one.

Prior to this patch, a Key Instructions function inlined into a
Not-Key-Instructions function fell back to Not-Key-Instructions handling.

In order to fully support inlining mixed modes we need to run
`computeKeyInstructions` (in case there's a Key Instructions scope) and
`findForceIsStmtInstrs` (in case there's a Not-Key-Instructions scope) on all
functions. This has a slight performance cost for all configurations - see PR
for details.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Patch 3/4 adding bitcode support, though the final patch doesn't depend on this
one.

Prior to this patch, a Key Instructions function inlined into a
Not-Key-Instructions function fell back to Not-Key-Instructions handling.

In order to fully support inlining mixed modes we need to run
`computeKeyInstructions` (in case there's a Key Instructions scope) and
`findForceIsStmtInstrs` (in case there's a Not-Key-Instructions scope) on all
functions. This has a slight performance cost for all configurations - see PR
for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants