Skip to content

[KeyInstr] Add bitcode support #144102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

OCHyams
Copy link
Contributor

@OCHyams OCHyams commented Jun 13, 2025

Patch 4/4 adding bitcode support

@OCHyams OCHyams requested review from jmorse and SLTozer June 13, 2025 15:54
@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-debuginfo

Author: Orlando Cazalet-Hyams (OCHyams)

Changes

Patch 4/4 adding bitcode support


Full diff: https://github.com/llvm/llvm-project/pull/144102.diff

5 Files Affected:

  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+5-2)
  • (modified) llvm/lib/Bitcode/Reader/MetadataLoader.cpp (+14-6)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+13-7)
  • (added) llvm/test/DebugInfo/KeyInstructions/Generic/link-two-modes.ll (+67)
  • (added) llvm/test/DebugInfo/KeyInstructions/Generic/roundtrip.ll (+48)
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index fde934fbb3cf1..5f24aac2eb975 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -5082,7 +5082,9 @@ Error BitcodeReader::parseFunctionBody(Function *F) {
 
       unsigned Line = Record[0], Col = Record[1];
       unsigned ScopeID = Record[2], IAID = Record[3];
-      bool isImplicitCode = Record.size() == 5 && Record[4];
+      bool isImplicitCode = Record.size() >= 5 && Record[4];
+      uint64_t AtomGroup = Record.size() == 7 ? Record[5] : 0;
+      uint8_t AtomRank = Record.size() == 7 ? Record[6] : 0;
 
       MDNode *Scope = nullptr, *IA = nullptr;
       if (ScopeID) {
@@ -5097,8 +5099,9 @@ Error BitcodeReader::parseFunctionBody(Function *F) {
         if (!IA)
           return error("Invalid record");
       }
+
       LastLoc = DILocation::get(Scope->getContext(), Line, Col, Scope, IA,
-                                isImplicitCode);
+                                isImplicitCode, AtomGroup, AtomRank);
       I->setDebugLoc(LastLoc);
       I = nullptr;
       continue;
diff --git a/llvm/lib/Bitcode/Reader/MetadataLoader.cpp b/llvm/lib/Bitcode/Reader/MetadataLoader.cpp
index 1cd1797c1092d..538a93b168410 100644
--- a/llvm/lib/Bitcode/Reader/MetadataLoader.cpp
+++ b/llvm/lib/Bitcode/Reader/MetadataLoader.cpp
@@ -1416,7 +1416,8 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
     break;
   }
   case bitc::METADATA_LOCATION: {
-    if (Record.size() != 5 && Record.size() != 6)
+    // 5: inlinedAt, 6: isImplicit, 8: Key Instructions fields.
+    if (Record.size() != 5 && Record.size() != 6 && Record.size() != 8)
       return error("Invalid record");
 
     IsDistinct = Record[0];
@@ -1424,10 +1425,12 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
     unsigned Column = Record[2];
     Metadata *Scope = getMD(Record[3]);
     Metadata *InlinedAt = getMDOrNull(Record[4]);
-    bool ImplicitCode = Record.size() == 6 && Record[5];
+    bool ImplicitCode = Record.size() >= 6 && Record[5];
+    uint64_t AtomGroup = Record.size() == 8 ? Record[6] : 0;
+    uint8_t AtomRank = Record.size() == 8 ? Record[7] : 0;
     MetadataList.assignValue(
         GET_OR_DISTINCT(DILocation, (Context, Line, Column, Scope, InlinedAt,
-                                     ImplicitCode)),
+                                     ImplicitCode, AtomGroup, AtomRank)),
         NextMetadataNo);
     NextMetadataNo++;
     break;
@@ -1857,7 +1860,7 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
     break;
   }
   case bitc::METADATA_SUBPROGRAM: {
-    if (Record.size() < 18 || Record.size() > 21)
+    if (Record.size() < 18 || Record.size() > 22)
       return error("Invalid record");
 
     bool HasSPFlags = Record[0] & 4;
@@ -1909,6 +1912,9 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
     bool HasTargetFuncName = false;
     unsigned OffsetA = 0;
     unsigned OffsetB = 0;
+    // Key instructions won't be enabled in old-format bitcode, so only
+    // check it if HasSPFlags is true.
+    bool UsesKeyInstructions = false;
     if (!HasSPFlags) {
       OffsetA = 2;
       OffsetB = 2;
@@ -1921,7 +1927,9 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
     } else {
       HasAnnotations = Record.size() >= 19;
       HasTargetFuncName = Record.size() >= 20;
+      UsesKeyInstructions = Record.size() >= 21 ? Record[20] : 0;
     }
+
     Metadata *CUorFn = getMDOrNull(Record[12 + OffsetB]);
     DISubprogram *SP = GET_OR_DISTINCT(
         DISubprogram,
@@ -1947,8 +1955,8 @@ Error MetadataLoader::MetadataLoaderImpl::parseOneMetadata(
          HasAnnotations ? getMDOrNull(Record[18 + OffsetB])
                         : nullptr, // annotations
          HasTargetFuncName ? getMDString(Record[19 + OffsetB])
-                           : nullptr // targetFuncName
-         ));
+                           : nullptr, // targetFuncName
+         UsesKeyInstructions));
     MetadataList.assignValue(SP, NextMetadataNo);
     NextMetadataNo++;
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 628b939af19ce..5afcc283c315a 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -1799,12 +1799,14 @@ unsigned ModuleBitcodeWriter::createDILocationAbbrev() {
   // location (it's never more expensive than building an array size 1).
   auto Abbv = std::make_shared<BitCodeAbbrev>();
   Abbv->Add(BitCodeAbbrevOp(bitc::METADATA_LOCATION));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));
-  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1));
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // isDistinct
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));   // line
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));   // column
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));   // scope
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));   // inlinedAt
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // isImplicitCode
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6));   // atomGroup
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 3)); // atomRank
   return Stream.EmitAbbrev(std::move(Abbv));
 }
 
@@ -1820,7 +1822,8 @@ void ModuleBitcodeWriter::writeDILocation(const DILocation *N,
   Record.push_back(VE.getMetadataID(N->getScope()));
   Record.push_back(VE.getMetadataOrNullID(N->getInlinedAt()));
   Record.push_back(N->isImplicitCode());
-
+  Record.push_back(N->getAtomGroup());
+  Record.push_back(N->getAtomRank());
   Stream.EmitRecord(bitc::METADATA_LOCATION, Record, Abbrev);
   Record.clear();
 }
@@ -2141,6 +2144,7 @@ void ModuleBitcodeWriter::writeDISubprogram(const DISubprogram *N,
   Record.push_back(VE.getMetadataOrNullID(N->getThrownTypes().get()));
   Record.push_back(VE.getMetadataOrNullID(N->getAnnotations().get()));
   Record.push_back(VE.getMetadataOrNullID(N->getRawTargetFuncName()));
+  Record.push_back(N->getKeyInstructionsEnabled());
 
   Stream.EmitRecord(bitc::METADATA_SUBPROGRAM, Record, Abbrev);
   Record.clear();
@@ -3717,6 +3721,8 @@ void ModuleBitcodeWriter::writeFunction(
           Vals.push_back(VE.getMetadataOrNullID(DL->getScope()));
           Vals.push_back(VE.getMetadataOrNullID(DL->getInlinedAt()));
           Vals.push_back(DL->isImplicitCode());
+          Vals.push_back(DL->getAtomGroup());
+          Vals.push_back(DL->getAtomRank());
           Stream.EmitRecord(bitc::FUNC_CODE_DEBUG_LOC, Vals);
           Vals.clear();
           LastDL = DL;
diff --git a/llvm/test/DebugInfo/KeyInstructions/Generic/link-two-modes.ll b/llvm/test/DebugInfo/KeyInstructions/Generic/link-two-modes.ll
new file mode 100644
index 0000000000000..104d75d8648b6
--- /dev/null
+++ b/llvm/test/DebugInfo/KeyInstructions/Generic/link-two-modes.ll
@@ -0,0 +1,67 @@
+; RUN: split-file %s %t
+; RUN: llvm-as %t/key-instr-enabled.ll -o %t/key-instr-enabled.bc
+; RUN: llvm-as %t/key-instr-disabled.ll -o %t/key-instr-disabled.bc
+; RUN: llvm-link %t/key-instr-enabled.bc %t/key-instr-disabled.bc -o - | llvm-dis | FileCheck %s
+
+;; Check the Key Instructions metadata is preserved correctly when linking a
+;; modules with Key Instructions enabled/disabled.
+
+;; Key Instructions enabled.
+; CHECK: void @f() !dbg [[f:!.*]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT:   ret void, !dbg [[enabled:!.*]]
+; CHECK-NEXT: }
+
+;; Key Instructions disabled.
+; CHECK: void @g() !dbg [[g:!.*]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT:   ret void, !dbg [[disabled:!.*]]
+; CHECK-NEXT: }
+
+; CHECK: !llvm.dbg.cu = !{!0, !2}
+; CHECK-NEXT: !llvm.module.flags = !{!4}
+
+; CHECK: [[file1:!.*]] = !DIFile(filename: "key-instr-enabled.cpp", directory: "/")
+; CHECK: [[file2:!.*]] = !DIFile(filename: "key-instr-disabled.cpp", directory: "/")
+; CHECK: [[f]] = distinct !DISubprogram(name: "f", scope: [[file1]]{{.*}},  keyInstructions: true)
+; CHECK: [[enabled]] = !DILocation(line: 1, column: 11, scope: [[f]], atomGroup: 1, atomRank: 1)
+; CHECK: [[g]] = distinct !DISubprogram(name: "g", scope: [[file2]]
+; CHECK-NOT:                            keyInstructions
+; CHECK-SAME:                           )
+; CHECK: [[disabled]] = !DILocation(line: 1, column: 11, scope: [[g]])
+
+;--- key-instr-enabled.ll
+define dso_local void @f() !dbg !10 {
+entry:
+  ret void, !dbg !13
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 21.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "key-instr-enabled.cpp", directory: "/")
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!9 = !{!"clang version 21.0.0git"}
+!10 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 1, type: !11, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, keyInstructions: true)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 1, column: 11, scope: !10, atomGroup: 1, atomRank: 1)
+
+;--- key-instr-disabled.ll
+define dso_local void @g() !dbg !10 {
+entry:
+  ret void, !dbg !13
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 21.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "key-instr-disabled.cpp", directory: "/")
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!9 = !{!"clang version 21.0.0git"}
+!10 = distinct !DISubprogram(name: "g", scope: !1, file: !1, line: 1, type: !11, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, keyInstructions: false)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 1, column: 11, scope: !10)
diff --git a/llvm/test/DebugInfo/KeyInstructions/Generic/roundtrip.ll b/llvm/test/DebugInfo/KeyInstructions/Generic/roundtrip.ll
new file mode 100644
index 0000000000000..79a06e51b264f
--- /dev/null
+++ b/llvm/test/DebugInfo/KeyInstructions/Generic/roundtrip.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt %s -o - -S  | llvm-as - | llvm-dis - | FileCheck %s
+
+; Key Instructions enabled.
+define dso_local void @f() !dbg !10 {
+; CHECK-LABEL: define dso_local void @f(
+; CHECK-SAME: ) !dbg [[DBG3:![0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret void, !dbg [[DBG6:![0-9]+]]
+;
+entry:
+  ret void, !dbg !13
+}
+
+; Key Instructions disabled.
+define dso_local void @g() !dbg !14 {
+; CHECK-LABEL: define dso_local void @g(
+; CHECK-SAME: ) !dbg [[DBG7:![0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret void, !dbg [[DBG8:![0-9]+]]
+;
+entry:
+  ret void, !dbg !15
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 21.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "key-instr.cpp", directory: "/")
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!9 = !{!"clang version 21.0.0git"}
+!10 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 1, type: !11, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, keyInstructions: true)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 1, scope: !10, atomGroup: 1, atomRank: 1)
+!14 = distinct !DISubprogram(name: "g", scope: !1, file: !1, line: 2, type: !11, scopeLine: 2, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, keyInstructions: false)
+!15 = !DILocation(line: 2, scope: !14)
+;.
+; CHECK: [[META0:![0-9]+]] = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: [[META1:![0-9]+]], producer: "{{.*}}clang version {{.*}}", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+; CHECK: [[META1]] = !DIFile(filename: "{{.*}}key-instr.cpp", directory: {{.*}})
+; CHECK: [[DBG3]] = distinct !DISubprogram(name: "f", scope: [[META1]], file: [[META1]], line: 1, type: [[META4:![0-9]+]], scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: [[META0]], keyInstructions: true)
+; CHECK: [[META4]] = !DISubroutineType(types: [[META5:![0-9]+]])
+; CHECK: [[META5]] = !{null}
+; CHECK: [[DBG6]] = !DILocation(line: 1, scope: [[DBG3]], atomGroup: 1, atomRank: 1)
+; CHECK: [[DBG7]] = distinct !DISubprogram(name: "g", scope: [[META1]], file: [[META1]], line: 2, type: [[META4]], scopeLine: 2, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: [[META0]])
+; CHECK: [[DBG8]] = !DILocation(line: 2, scope: [[DBG7]])
+;.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; do you have any size comparisons for bitcode files as a result of this? All the DILocations are bigger (regardless of KI mode), but on the other hand the debug-loc-again optimisation will avoid paying for that for every instruction.

@@ -0,0 +1,48 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt %s -o - -S | llvm-as - | llvm-dis - | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial opt run un-necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember if there was a reason I did that... I will update this when I come to rebase it (depending on the outcome of the abbrev changes in PR 146497)

@OCHyams
Copy link
Contributor Author

OCHyams commented Jun 24, 2025

Looks good; do you have any size comparisons for bitcode files as a result of this? All the DILocations are bigger (regardless of KI mode), but on the other hand the debug-loc-again optimisation will avoid paying for that for every instruction.

Build with support enabled and the feature on by default: compile time tracker link.

but on the other hand the debug-loc-again optimisation will avoid paying for that for every instruction.

Yes and no. It's less likely that you find identical debug-locs, considering both group and rank. I considered whether it might be reasonable to change the debug-loc-again to ignore KeyInstructions info in the comparison and encode it. I didn't end up pursuing the idea, but I can look into it if you feel it's needed.

@jmorse
Copy link
Member

jmorse commented Jun 24, 2025

Hmmm -- that's a noteworthy cost (in some scenarios up to 5%) on bitcode size (with -g). While we can rely on the position of "this is delivering a superior debugging experience" because there's strictly more information in the object-file / bitcode, it's worth going the extra mile to see if we can cut the file size increase down a bit.

IIRC we talked in the past about the numeric distribution of atom-groups and how they're keyed on {Scope,InlinedAt}, thus each independent function can have its own number-sequence, but IIRC we also thought about making them globally unique-ish. Or something. The point being that with VBR-encoded atom-groups we pay a cost for high-numbers that perhaps we could avoid by being smarter?

@OCHyams
Copy link
Contributor Author

OCHyams commented Jun 24, 2025

Ok, I'll do some investigating

OCHyams added a commit to OCHyams/llvm-project that referenced this pull request Jul 1, 2025
DILocations that are not attached to instructions are encoded using
METADATA_LOCATION records which have an abbrev. DILocations attached to
instructions are interleaved with instruction records as FUNC_CODE_DEBUG_LOC
records, which do not have an abbrev (and FUNC_CODE_DEBUG_LOC_AGAIN which have
no operands).

Add a new FUNCTION_BLOCK abbrev FUNCTION_DEBUG_LOC_ABBREV for
FUNC_CODE_DEBUG_LOC records.

This reduces the bc file size by up to 7% in CTMark, with many between 2-4%
smaller.

[per-file file size
compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=75cf826849713c00829cdf657e330e24c1a2fd03&to=1e268ebd0a581016660d9d7e942495c1be041f7d&stat=size-file&details=on)
(go to stage1-ReleaseLTO-g).

This optimisation is motivated by llvm#144102, which adds the new Key Instructions
fields to bitcode records. The combined patches still overall look to be a
slight improvement over the base.
@OCHyams
Copy link
Contributor Author

OCHyams commented Jul 1, 2025

Hmmm -- that's a noteworthy cost (in some scenarios up to 5%) on bitcode size (with -g). While we can rely on the position of "this is delivering a superior debugging experience" because there's strictly more information in the object-file / bitcode, it's worth going the extra mile to see if we can cut the file size increase down a bit.

IIRC we talked in the past about the numeric distribution of atom-groups and how they're keyed on {Scope,InlinedAt}, thus each independent function can have its own number-sequence, but IIRC we also thought about making them globally unique-ish. Or something. The point being that with VBR-encoded atom-groups we pay a cost for high-numbers that perhaps we could avoid by being smarter?

Doing anything "smart" looks tricky, but I found an easy win in #146497 which seems to cover the cost of the extra fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants