Skip to content

[DebugInfo][DWARF] Emit Per-Function Line Table Offsets and End Sequences #110192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 14, 2024

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Sep 27, 2024

Summary

This patch introduces a new compiler option -mllvm -emit-func-debug-line-table-offsets that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Background
RFC: New DWARF Attribute for Symbolication of Merged Functions

Previous similar PR: #93137 – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR #99710 - and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the debug_line section without directly associating line entries with their originating DW_TAG_subprogram DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution.

Implementation Details
To address the above issue, the patch makes the following key changes:

DW_AT_LLVM_stmt_sequence Attribute: Introduces a new LLVM-specific attribute DW_AT_LLVM_stmt_sequence to each DW_TAG_subprogram DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin.

End-of-Sequence Markers: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited.

Assembler and Streamer Modifications: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior.

Compiler Option: Introduces the -mllvm -emit-func-debug-line-table-offsets option to enable this functionality, allowing users to opt-in as needed.

@alx32 alx32 requested review from dwblaikie and pogo59 September 27, 2024 00:43
@alx32 alx32 marked this pull request as ready for review September 27, 2024 00:44
@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-mc

Author: None (alx32)

Changes

Summary

This patch introduces a new compiler option -mllvm -emit-func-debug-line-table-offsets that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Background
RFC: New DWARF Attribute for Symbolication of Merged Functions

Previous similar PR: #93137 – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR #99710 - and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the debug_line section without directly associating line entries with their originating DW_TAG_subprogram DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution.

Implementation Details
To address the above issue, the patch makes the following key changes:

DW_AT_LLVM_stmt_sequence Attribute: Introduces a new LLVM-specific attribute DW_AT_LLVM_stmt_sequence to each DW_TAG_subprogram DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin.

End-of-Sequence Markers: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited.

Assembler and Streamer Modifications: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior.

Compiler Option: Introduces the -mllvm -emit-func-debug-line-table-offsets option to enable this functionality, allowing users to opt-in as needed.


Full diff: https://github.com/llvm/llvm-project/pull/110192.diff

7 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.def (+1)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+4-1)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+27)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (+8)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+29-1)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+18-4)
  • (added) llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll (+82)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.def b/llvm/include/llvm/BinaryFormat/Dwarf.def
index d55947fc5103ac..b1fa81a2fc6abd 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.def
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.def
@@ -617,6 +617,7 @@ HANDLE_DW_AT(0x3e07, LLVM_apinotes, 0, APPLE)
 HANDLE_DW_AT(0x3e08, LLVM_ptrauth_isa_pointer, 0, LLVM)
 HANDLE_DW_AT(0x3e09, LLVM_ptrauth_authenticates_null_values, 0, LLVM)
 HANDLE_DW_AT(0x3e0a, LLVM_ptrauth_authentication_mode, 0, LLVM)
+HANDLE_DW_AT(0x3e0b, LLVM_stmt_sequence, 0, LLVM)
 
 // Apple extensions.
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index bea79545d1ab96..e7e1bef1ad2d72 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -123,6 +123,9 @@ class MCDwarfLoc {
   friend class MCContext;
   friend class MCDwarfLineEntry;
 
+  // DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
+  friend class DwarfDebug;
+
   MCDwarfLoc(unsigned fileNum, unsigned line, unsigned column, unsigned flags,
              unsigned isa, unsigned discriminator)
       : FileNum(fileNum), Line(line), Column(column), Flags(flags), Isa(isa),
@@ -239,7 +242,7 @@ class MCLineSection {
 
   // Add an end entry by cloning the last entry, if exists, for the section
   // the given EndLabel belongs to. The label is replaced by the given EndLabel.
-  void addEndEntry(MCSymbol *EndLabel);
+  void addEndEntry(MCSymbol *EndLabel, bool generatingFuncLineTableOffsets);
 
   using MCDwarfLineEntryCollection = std::vector<MCDwarfLineEntry>;
   using iterator = MCDwarfLineEntryCollection::iterator;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 707aecc5dc578e..d6d5970917401d 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -251,6 +251,15 @@ class MCStreamer {
   /// discussion for future inclusion.
   bool AllowAutoPadding = false;
 
+  // Flag specifying weather functions will have an offset into the line table
+  // where the line data for that function starts
+  bool GenerateFuncLineTableOffsets = false;
+
+  // Symbol that tracks the stream symbol for first line of the current function
+  // being generated. This symbol can be used to reference where the line
+  // entries for the function start in the generated line table.
+  MCSymbol *CurrentFuncFirstLineStreamSym;
+
 protected:
   MCFragment *CurFrag = nullptr;
 
@@ -313,6 +322,24 @@ class MCStreamer {
   void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
 
+  void setGenerateFuncLineTableOffsets(bool v) {
+    GenerateFuncLineTableOffsets = v;
+  }
+  bool getGenerateFuncLineTableOffsets() const {
+    return GenerateFuncLineTableOffsets;
+  }
+
+  // Use the below functions to track the symbol that points to the current
+  // function's line info in the output stream.
+  void beginFunction() { CurrentFuncFirstLineStreamSym = nullptr; }
+  void emittedLineStreamSym(MCSymbol *StreamSym) {
+    if (!CurrentFuncFirstLineStreamSym)
+      CurrentFuncFirstLineStreamSym = StreamSym;
+  }
+  MCSymbol *getCurrentFuncFirstLineStreamSym() {
+    return CurrentFuncFirstLineStreamSym;
+  }
+
   /// When emitting an object file, create and emit a real label. When emitting
   /// textual assembly, this should do nothing to avoid polluting our output.
   virtual MCSymbol *emitCFILabel();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index 0a1ff189bedbc4..c62075cf77c45a 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -527,6 +527,14 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
           *DD->getCurrentFunction()))
     addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
 
+  if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
+      Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
+    addSectionLabel(
+        *SPDie, dwarf::DW_AT_LLVM_stmt_sequence,
+        Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
+        Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
+  }
+
   // Only include DW_AT_frame_base in full debug info
   if (!includeMinimalInlineScopes()) {
     const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index e9649f9ff81658..bd6d5e0ea7a363 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -170,6 +170,12 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
                           "Stuff")),
     cl::init(DwarfDebug::MinimizeAddrInV5::Default));
 
+static cl::opt<bool> EmitFuncLineTableOffsetsOption(
+    "emit-func-debug-line-table-offsets", cl::Hidden,
+    cl::desc("Include line table offset in function's debug info and emit end "
+             "sequence after each function's line data."),
+    cl::init(false));
+
 static constexpr unsigned ULEB128PadSize = 4;
 
 void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
@@ -443,6 +449,8 @@ DwarfDebug::DwarfDebug(AsmPrinter *A)
   Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion);
   Asm->OutStreamer->getContext().setDwarfFormat(Dwarf64 ? dwarf::DWARF64
                                                         : dwarf::DWARF32);
+  Asm->OutStreamer->setGenerateFuncLineTableOffsets(
+      EmitFuncLineTableOffsetsOption);
 }
 
 // Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h.
@@ -2221,6 +2229,10 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
   if (SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
     return;
 
+  // Notify the streamer that we are beginning a function - this will reset the
+  // label pointing to the currently generated function's first line entry
+  Asm->OutStreamer->beginFunction();
+
   DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
 
   Asm->OutStreamer->getContext().setDwarfCompileUnitID(
@@ -2249,7 +2261,8 @@ void DwarfDebug::terminateLineTable(const DwarfCompileUnit *CU) {
       getDwarfCompileUnitIDForLineTable(*CU));
   // Add the last range label for the given CU.
   LineTable.getMCLineSections().addEndEntry(
-      const_cast<MCSymbol *>(CURanges.back().End));
+      const_cast<MCSymbol *>(CURanges.back().End),
+      EmitFuncLineTableOffsetsOption);
 }
 
 void DwarfDebug::skippedNonDebugFunction() {
@@ -2342,6 +2355,21 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
   // Construct call site entries.
   constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);
 
+  // If we're emitting line table offsets, we also need to emit an end label
+  // after all function's line entries
+  if (EmitFuncLineTableOffsetsOption) {
+    MCSymbol *LineSym = Asm->OutStreamer->getContext().createTempSymbol();
+    Asm->OutStreamer->emitLabel(LineSym);
+    MCDwarfLoc DwarfLoc(
+        1, 1, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0);
+    MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+    Asm->OutStreamer->getContext()
+        .getMCDwarfLineTable(
+            Asm->OutStreamer->getContext().getDwarfCompileUnitID())
+        .getMCLineSections()
+        .addLineEntry(LineEntry, Asm->OutStreamer->getCurrentSectionOnly());
+  }
+
   // Clear debug info
   // Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the
   // DbgVariables except those that are also in AbstractVariables (since they
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index 8ff097f29aebd1..34a9541bbbcc3a 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -104,8 +104,17 @@ void MCDwarfLineEntry::make(MCStreamer *MCOS, MCSection *Section) {
   // Get the current .loc info saved in the context.
   const MCDwarfLoc &DwarfLoc = MCOS->getContext().getCurrentDwarfLoc();
 
+  MCSymbol *LineStreamLabel = nullptr;
+  // If functions need offsets into the generated line table, then we need to
+  // create a label referencing where the line was generated in the output
+  // stream
+  if (MCOS->getGenerateFuncLineTableOffsets()) {
+    LineStreamLabel = MCOS->getContext().createTempSymbol();
+    MCOS->emittedLineStreamSym(LineStreamLabel);
+  }
+
   // Create a (local) line entry with the symbol and the current .loc info.
-  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, LineStreamLabel);
 
   // clear DwarfLocSeen saying the current .loc info is now used.
   MCOS->getContext().clearDwarfLocSeen();
@@ -145,7 +154,8 @@ makeStartPlusIntExpr(MCContext &Ctx, const MCSymbol &Start, int IntVal) {
   return Res;
 }
 
-void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
+void MCLineSection::addEndEntry(MCSymbol *EndLabel,
+                                bool generatingFuncLineTableOffsets) {
   auto *Sec = &EndLabel->getSection();
   // The line table may be empty, which we should skip adding an end entry.
   // There are two cases:
@@ -158,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
   if (I != MCLineDivisions.end()) {
     auto &Entries = I->second;
     auto EndEntry = Entries.back();
-    EndEntry.setEndLabel(EndLabel);
-    Entries.push_back(EndEntry);
+    // If generatingFuncLineTableOffsets is set, then we already generated an
+    // end label at the end of the last function, so skip generating another one
+    if (!generatingFuncLineTableOffsets) {
+      EndEntry.setEndLabel(EndLabel);
+      Entries.push_back(EndEntry);
+    }
   }
 }
 
diff --git a/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
new file mode 100644
index 00000000000000..ef8b0c817cfb67
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
@@ -0,0 +1,82 @@
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=NO_STMT_SEQ
+
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s -emit-func-debug-line-table-offsets
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=STMT_SEQ
+
+; NO_STMT_SEQ-NOT:      DW_AT_LLVM_stmt_sequence
+
+; STMT_SEQ:   [[[ABBREV_CODE:[0-9]+]]] DW_TAG_subprogram
+; STMT_SEQ:  	       DW_AT_LLVM_stmt_sequence    DW_FORM_sec_offset
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000028)
+; STMT_SEQ:   DW_AT_name {{.*}}func01
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000033)
+; STMT_SEQ:   DW_AT_name {{.*}}main
+
+;; Check that the line table starts at 0x00000028 (first function)
+; STMT_SEQ:            Address            Line   Column File   ISA Discriminator OpIndex Flags
+; STMT_SEQ-NEXT:       ------------------ ------ ------ ------ --- ------------- ------- -------------
+; STMT_SEQ-NEXT:  0x00000028: 00 DW_LNE_set_address (0x00000006)
+
+;; Check that we have an 'end_sequence' just before the next function (0x00000033)
+; STMT_SEQ:            0x0000000000000006      1      0      1   0             0       0  is_stmt end_sequence
+; STMT_SEQ-NEXT: 0x00000033: 00 DW_LNE_set_address (0x00000027)
+
+;; Check that the end of the line table still has an 'end_sequence'
+; STMT_SEQ       0x00000049: 00 DW_LNE_end_sequence
+; STMT_SEQ-NEXT        0x0000000000000027      6      3      1   0             0       0  end_sequence
+
+
+; generated from:
+; clang -g -S -emit-llvm test.c -o test.ll
+; ======= test.c ======
+; int func01() {
+;   return 1;
+; }
+; int main() {
+;   return 0;
+; }
+; =====================
+
+
+; ModuleID = 'test.c'
+source_filename = "test.c"
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @func01() #0 !dbg !9 {
+  ret i32 1, !dbg !13
+}
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @main() #0 !dbg !14 {
+  %1 = alloca i32, align 4
+  store i32 0, ptr %1, align 4
+  ret i32 0, !dbg !15
+}
+
+attributes #0 = { noinline nounwind optnone ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "Homebrew clang version 17.0.6", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk", sdk: "MacOSX14.sdk")
+!1 = !DIFile(filename: "test.c", directory: "/tmp/clang_test")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"uwtable", i32 1}
+!7 = !{i32 7, !"frame-pointer", i32 1}
+!8 = !{!"Homebrew clang version 17.0.6"}
+!9 = distinct !DISubprogram(name: "func01", scope: !1, file: !1, line: 1, type: !10, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{!12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !DILocation(line: 2, column: 3, scope: !9)
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DILocation(line: 6, column: 3, scope: !14)

@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-debuginfo

Author: None (alx32)

Changes

Summary

This patch introduces a new compiler option -mllvm -emit-func-debug-line-table-offsets that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Background
RFC: New DWARF Attribute for Symbolication of Merged Functions

Previous similar PR: #93137 – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR #99710 - and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the debug_line section without directly associating line entries with their originating DW_TAG_subprogram DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution.

Implementation Details
To address the above issue, the patch makes the following key changes:

DW_AT_LLVM_stmt_sequence Attribute: Introduces a new LLVM-specific attribute DW_AT_LLVM_stmt_sequence to each DW_TAG_subprogram DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin.

End-of-Sequence Markers: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited.

Assembler and Streamer Modifications: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior.

Compiler Option: Introduces the -mllvm -emit-func-debug-line-table-offsets option to enable this functionality, allowing users to opt-in as needed.


Full diff: https://github.com/llvm/llvm-project/pull/110192.diff

7 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.def (+1)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+4-1)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+27)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (+8)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+29-1)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+18-4)
  • (added) llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll (+82)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.def b/llvm/include/llvm/BinaryFormat/Dwarf.def
index d55947fc5103ac..b1fa81a2fc6abd 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.def
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.def
@@ -617,6 +617,7 @@ HANDLE_DW_AT(0x3e07, LLVM_apinotes, 0, APPLE)
 HANDLE_DW_AT(0x3e08, LLVM_ptrauth_isa_pointer, 0, LLVM)
 HANDLE_DW_AT(0x3e09, LLVM_ptrauth_authenticates_null_values, 0, LLVM)
 HANDLE_DW_AT(0x3e0a, LLVM_ptrauth_authentication_mode, 0, LLVM)
+HANDLE_DW_AT(0x3e0b, LLVM_stmt_sequence, 0, LLVM)
 
 // Apple extensions.
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index bea79545d1ab96..e7e1bef1ad2d72 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -123,6 +123,9 @@ class MCDwarfLoc {
   friend class MCContext;
   friend class MCDwarfLineEntry;
 
+  // DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
+  friend class DwarfDebug;
+
   MCDwarfLoc(unsigned fileNum, unsigned line, unsigned column, unsigned flags,
              unsigned isa, unsigned discriminator)
       : FileNum(fileNum), Line(line), Column(column), Flags(flags), Isa(isa),
@@ -239,7 +242,7 @@ class MCLineSection {
 
   // Add an end entry by cloning the last entry, if exists, for the section
   // the given EndLabel belongs to. The label is replaced by the given EndLabel.
-  void addEndEntry(MCSymbol *EndLabel);
+  void addEndEntry(MCSymbol *EndLabel, bool generatingFuncLineTableOffsets);
 
   using MCDwarfLineEntryCollection = std::vector<MCDwarfLineEntry>;
   using iterator = MCDwarfLineEntryCollection::iterator;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 707aecc5dc578e..d6d5970917401d 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -251,6 +251,15 @@ class MCStreamer {
   /// discussion for future inclusion.
   bool AllowAutoPadding = false;
 
+  // Flag specifying weather functions will have an offset into the line table
+  // where the line data for that function starts
+  bool GenerateFuncLineTableOffsets = false;
+
+  // Symbol that tracks the stream symbol for first line of the current function
+  // being generated. This symbol can be used to reference where the line
+  // entries for the function start in the generated line table.
+  MCSymbol *CurrentFuncFirstLineStreamSym;
+
 protected:
   MCFragment *CurFrag = nullptr;
 
@@ -313,6 +322,24 @@ class MCStreamer {
   void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
 
+  void setGenerateFuncLineTableOffsets(bool v) {
+    GenerateFuncLineTableOffsets = v;
+  }
+  bool getGenerateFuncLineTableOffsets() const {
+    return GenerateFuncLineTableOffsets;
+  }
+
+  // Use the below functions to track the symbol that points to the current
+  // function's line info in the output stream.
+  void beginFunction() { CurrentFuncFirstLineStreamSym = nullptr; }
+  void emittedLineStreamSym(MCSymbol *StreamSym) {
+    if (!CurrentFuncFirstLineStreamSym)
+      CurrentFuncFirstLineStreamSym = StreamSym;
+  }
+  MCSymbol *getCurrentFuncFirstLineStreamSym() {
+    return CurrentFuncFirstLineStreamSym;
+  }
+
   /// When emitting an object file, create and emit a real label. When emitting
   /// textual assembly, this should do nothing to avoid polluting our output.
   virtual MCSymbol *emitCFILabel();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index 0a1ff189bedbc4..c62075cf77c45a 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -527,6 +527,14 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
           *DD->getCurrentFunction()))
     addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
 
+  if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
+      Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
+    addSectionLabel(
+        *SPDie, dwarf::DW_AT_LLVM_stmt_sequence,
+        Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
+        Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
+  }
+
   // Only include DW_AT_frame_base in full debug info
   if (!includeMinimalInlineScopes()) {
     const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index e9649f9ff81658..bd6d5e0ea7a363 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -170,6 +170,12 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
                           "Stuff")),
     cl::init(DwarfDebug::MinimizeAddrInV5::Default));
 
+static cl::opt<bool> EmitFuncLineTableOffsetsOption(
+    "emit-func-debug-line-table-offsets", cl::Hidden,
+    cl::desc("Include line table offset in function's debug info and emit end "
+             "sequence after each function's line data."),
+    cl::init(false));
+
 static constexpr unsigned ULEB128PadSize = 4;
 
 void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
@@ -443,6 +449,8 @@ DwarfDebug::DwarfDebug(AsmPrinter *A)
   Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion);
   Asm->OutStreamer->getContext().setDwarfFormat(Dwarf64 ? dwarf::DWARF64
                                                         : dwarf::DWARF32);
+  Asm->OutStreamer->setGenerateFuncLineTableOffsets(
+      EmitFuncLineTableOffsetsOption);
 }
 
 // Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h.
@@ -2221,6 +2229,10 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
   if (SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
     return;
 
+  // Notify the streamer that we are beginning a function - this will reset the
+  // label pointing to the currently generated function's first line entry
+  Asm->OutStreamer->beginFunction();
+
   DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
 
   Asm->OutStreamer->getContext().setDwarfCompileUnitID(
@@ -2249,7 +2261,8 @@ void DwarfDebug::terminateLineTable(const DwarfCompileUnit *CU) {
       getDwarfCompileUnitIDForLineTable(*CU));
   // Add the last range label for the given CU.
   LineTable.getMCLineSections().addEndEntry(
-      const_cast<MCSymbol *>(CURanges.back().End));
+      const_cast<MCSymbol *>(CURanges.back().End),
+      EmitFuncLineTableOffsetsOption);
 }
 
 void DwarfDebug::skippedNonDebugFunction() {
@@ -2342,6 +2355,21 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
   // Construct call site entries.
   constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);
 
+  // If we're emitting line table offsets, we also need to emit an end label
+  // after all function's line entries
+  if (EmitFuncLineTableOffsetsOption) {
+    MCSymbol *LineSym = Asm->OutStreamer->getContext().createTempSymbol();
+    Asm->OutStreamer->emitLabel(LineSym);
+    MCDwarfLoc DwarfLoc(
+        1, 1, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0);
+    MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+    Asm->OutStreamer->getContext()
+        .getMCDwarfLineTable(
+            Asm->OutStreamer->getContext().getDwarfCompileUnitID())
+        .getMCLineSections()
+        .addLineEntry(LineEntry, Asm->OutStreamer->getCurrentSectionOnly());
+  }
+
   // Clear debug info
   // Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the
   // DbgVariables except those that are also in AbstractVariables (since they
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index 8ff097f29aebd1..34a9541bbbcc3a 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -104,8 +104,17 @@ void MCDwarfLineEntry::make(MCStreamer *MCOS, MCSection *Section) {
   // Get the current .loc info saved in the context.
   const MCDwarfLoc &DwarfLoc = MCOS->getContext().getCurrentDwarfLoc();
 
+  MCSymbol *LineStreamLabel = nullptr;
+  // If functions need offsets into the generated line table, then we need to
+  // create a label referencing where the line was generated in the output
+  // stream
+  if (MCOS->getGenerateFuncLineTableOffsets()) {
+    LineStreamLabel = MCOS->getContext().createTempSymbol();
+    MCOS->emittedLineStreamSym(LineStreamLabel);
+  }
+
   // Create a (local) line entry with the symbol and the current .loc info.
-  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, LineStreamLabel);
 
   // clear DwarfLocSeen saying the current .loc info is now used.
   MCOS->getContext().clearDwarfLocSeen();
@@ -145,7 +154,8 @@ makeStartPlusIntExpr(MCContext &Ctx, const MCSymbol &Start, int IntVal) {
   return Res;
 }
 
-void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
+void MCLineSection::addEndEntry(MCSymbol *EndLabel,
+                                bool generatingFuncLineTableOffsets) {
   auto *Sec = &EndLabel->getSection();
   // The line table may be empty, which we should skip adding an end entry.
   // There are two cases:
@@ -158,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
   if (I != MCLineDivisions.end()) {
     auto &Entries = I->second;
     auto EndEntry = Entries.back();
-    EndEntry.setEndLabel(EndLabel);
-    Entries.push_back(EndEntry);
+    // If generatingFuncLineTableOffsets is set, then we already generated an
+    // end label at the end of the last function, so skip generating another one
+    if (!generatingFuncLineTableOffsets) {
+      EndEntry.setEndLabel(EndLabel);
+      Entries.push_back(EndEntry);
+    }
   }
 }
 
diff --git a/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
new file mode 100644
index 00000000000000..ef8b0c817cfb67
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
@@ -0,0 +1,82 @@
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=NO_STMT_SEQ
+
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s -emit-func-debug-line-table-offsets
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=STMT_SEQ
+
+; NO_STMT_SEQ-NOT:      DW_AT_LLVM_stmt_sequence
+
+; STMT_SEQ:   [[[ABBREV_CODE:[0-9]+]]] DW_TAG_subprogram
+; STMT_SEQ:  	       DW_AT_LLVM_stmt_sequence    DW_FORM_sec_offset
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000028)
+; STMT_SEQ:   DW_AT_name {{.*}}func01
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000033)
+; STMT_SEQ:   DW_AT_name {{.*}}main
+
+;; Check that the line table starts at 0x00000028 (first function)
+; STMT_SEQ:            Address            Line   Column File   ISA Discriminator OpIndex Flags
+; STMT_SEQ-NEXT:       ------------------ ------ ------ ------ --- ------------- ------- -------------
+; STMT_SEQ-NEXT:  0x00000028: 00 DW_LNE_set_address (0x00000006)
+
+;; Check that we have an 'end_sequence' just before the next function (0x00000033)
+; STMT_SEQ:            0x0000000000000006      1      0      1   0             0       0  is_stmt end_sequence
+; STMT_SEQ-NEXT: 0x00000033: 00 DW_LNE_set_address (0x00000027)
+
+;; Check that the end of the line table still has an 'end_sequence'
+; STMT_SEQ       0x00000049: 00 DW_LNE_end_sequence
+; STMT_SEQ-NEXT        0x0000000000000027      6      3      1   0             0       0  end_sequence
+
+
+; generated from:
+; clang -g -S -emit-llvm test.c -o test.ll
+; ======= test.c ======
+; int func01() {
+;   return 1;
+; }
+; int main() {
+;   return 0;
+; }
+; =====================
+
+
+; ModuleID = 'test.c'
+source_filename = "test.c"
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @func01() #0 !dbg !9 {
+  ret i32 1, !dbg !13
+}
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @main() #0 !dbg !14 {
+  %1 = alloca i32, align 4
+  store i32 0, ptr %1, align 4
+  ret i32 0, !dbg !15
+}
+
+attributes #0 = { noinline nounwind optnone ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "Homebrew clang version 17.0.6", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk", sdk: "MacOSX14.sdk")
+!1 = !DIFile(filename: "test.c", directory: "/tmp/clang_test")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"uwtable", i32 1}
+!7 = !{i32 7, !"frame-pointer", i32 1}
+!8 = !{!"Homebrew clang version 17.0.6"}
+!9 = distinct !DISubprogram(name: "func01", scope: !1, file: !1, line: 1, type: !10, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{!12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !DILocation(line: 2, column: 3, scope: !9)
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DILocation(line: 6, column: 3, scope: !14)

@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (alx32)

Changes

Summary

This patch introduces a new compiler option -mllvm -emit-func-debug-line-table-offsets that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Background
RFC: New DWARF Attribute for Symbolication of Merged Functions

Previous similar PR: #93137 – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR #99710 - and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the debug_line section without directly associating line entries with their originating DW_TAG_subprogram DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution.

Implementation Details
To address the above issue, the patch makes the following key changes:

DW_AT_LLVM_stmt_sequence Attribute: Introduces a new LLVM-specific attribute DW_AT_LLVM_stmt_sequence to each DW_TAG_subprogram DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin.

End-of-Sequence Markers: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited.

Assembler and Streamer Modifications: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior.

Compiler Option: Introduces the -mllvm -emit-func-debug-line-table-offsets option to enable this functionality, allowing users to opt-in as needed.


Full diff: https://github.com/llvm/llvm-project/pull/110192.diff

7 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.def (+1)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+4-1)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+27)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (+8)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+29-1)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+18-4)
  • (added) llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll (+82)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.def b/llvm/include/llvm/BinaryFormat/Dwarf.def
index d55947fc5103ac..b1fa81a2fc6abd 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.def
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.def
@@ -617,6 +617,7 @@ HANDLE_DW_AT(0x3e07, LLVM_apinotes, 0, APPLE)
 HANDLE_DW_AT(0x3e08, LLVM_ptrauth_isa_pointer, 0, LLVM)
 HANDLE_DW_AT(0x3e09, LLVM_ptrauth_authenticates_null_values, 0, LLVM)
 HANDLE_DW_AT(0x3e0a, LLVM_ptrauth_authentication_mode, 0, LLVM)
+HANDLE_DW_AT(0x3e0b, LLVM_stmt_sequence, 0, LLVM)
 
 // Apple extensions.
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index bea79545d1ab96..e7e1bef1ad2d72 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -123,6 +123,9 @@ class MCDwarfLoc {
   friend class MCContext;
   friend class MCDwarfLineEntry;
 
+  // DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
+  friend class DwarfDebug;
+
   MCDwarfLoc(unsigned fileNum, unsigned line, unsigned column, unsigned flags,
              unsigned isa, unsigned discriminator)
       : FileNum(fileNum), Line(line), Column(column), Flags(flags), Isa(isa),
@@ -239,7 +242,7 @@ class MCLineSection {
 
   // Add an end entry by cloning the last entry, if exists, for the section
   // the given EndLabel belongs to. The label is replaced by the given EndLabel.
-  void addEndEntry(MCSymbol *EndLabel);
+  void addEndEntry(MCSymbol *EndLabel, bool generatingFuncLineTableOffsets);
 
   using MCDwarfLineEntryCollection = std::vector<MCDwarfLineEntry>;
   using iterator = MCDwarfLineEntryCollection::iterator;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 707aecc5dc578e..d6d5970917401d 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -251,6 +251,15 @@ class MCStreamer {
   /// discussion for future inclusion.
   bool AllowAutoPadding = false;
 
+  // Flag specifying weather functions will have an offset into the line table
+  // where the line data for that function starts
+  bool GenerateFuncLineTableOffsets = false;
+
+  // Symbol that tracks the stream symbol for first line of the current function
+  // being generated. This symbol can be used to reference where the line
+  // entries for the function start in the generated line table.
+  MCSymbol *CurrentFuncFirstLineStreamSym;
+
 protected:
   MCFragment *CurFrag = nullptr;
 
@@ -313,6 +322,24 @@ class MCStreamer {
   void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
 
+  void setGenerateFuncLineTableOffsets(bool v) {
+    GenerateFuncLineTableOffsets = v;
+  }
+  bool getGenerateFuncLineTableOffsets() const {
+    return GenerateFuncLineTableOffsets;
+  }
+
+  // Use the below functions to track the symbol that points to the current
+  // function's line info in the output stream.
+  void beginFunction() { CurrentFuncFirstLineStreamSym = nullptr; }
+  void emittedLineStreamSym(MCSymbol *StreamSym) {
+    if (!CurrentFuncFirstLineStreamSym)
+      CurrentFuncFirstLineStreamSym = StreamSym;
+  }
+  MCSymbol *getCurrentFuncFirstLineStreamSym() {
+    return CurrentFuncFirstLineStreamSym;
+  }
+
   /// When emitting an object file, create and emit a real label. When emitting
   /// textual assembly, this should do nothing to avoid polluting our output.
   virtual MCSymbol *emitCFILabel();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index 0a1ff189bedbc4..c62075cf77c45a 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -527,6 +527,14 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
           *DD->getCurrentFunction()))
     addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
 
+  if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
+      Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
+    addSectionLabel(
+        *SPDie, dwarf::DW_AT_LLVM_stmt_sequence,
+        Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
+        Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
+  }
+
   // Only include DW_AT_frame_base in full debug info
   if (!includeMinimalInlineScopes()) {
     const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index e9649f9ff81658..bd6d5e0ea7a363 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -170,6 +170,12 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
                           "Stuff")),
     cl::init(DwarfDebug::MinimizeAddrInV5::Default));
 
+static cl::opt<bool> EmitFuncLineTableOffsetsOption(
+    "emit-func-debug-line-table-offsets", cl::Hidden,
+    cl::desc("Include line table offset in function's debug info and emit end "
+             "sequence after each function's line data."),
+    cl::init(false));
+
 static constexpr unsigned ULEB128PadSize = 4;
 
 void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
@@ -443,6 +449,8 @@ DwarfDebug::DwarfDebug(AsmPrinter *A)
   Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion);
   Asm->OutStreamer->getContext().setDwarfFormat(Dwarf64 ? dwarf::DWARF64
                                                         : dwarf::DWARF32);
+  Asm->OutStreamer->setGenerateFuncLineTableOffsets(
+      EmitFuncLineTableOffsetsOption);
 }
 
 // Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h.
@@ -2221,6 +2229,10 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
   if (SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
     return;
 
+  // Notify the streamer that we are beginning a function - this will reset the
+  // label pointing to the currently generated function's first line entry
+  Asm->OutStreamer->beginFunction();
+
   DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
 
   Asm->OutStreamer->getContext().setDwarfCompileUnitID(
@@ -2249,7 +2261,8 @@ void DwarfDebug::terminateLineTable(const DwarfCompileUnit *CU) {
       getDwarfCompileUnitIDForLineTable(*CU));
   // Add the last range label for the given CU.
   LineTable.getMCLineSections().addEndEntry(
-      const_cast<MCSymbol *>(CURanges.back().End));
+      const_cast<MCSymbol *>(CURanges.back().End),
+      EmitFuncLineTableOffsetsOption);
 }
 
 void DwarfDebug::skippedNonDebugFunction() {
@@ -2342,6 +2355,21 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
   // Construct call site entries.
   constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);
 
+  // If we're emitting line table offsets, we also need to emit an end label
+  // after all function's line entries
+  if (EmitFuncLineTableOffsetsOption) {
+    MCSymbol *LineSym = Asm->OutStreamer->getContext().createTempSymbol();
+    Asm->OutStreamer->emitLabel(LineSym);
+    MCDwarfLoc DwarfLoc(
+        1, 1, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0);
+    MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+    Asm->OutStreamer->getContext()
+        .getMCDwarfLineTable(
+            Asm->OutStreamer->getContext().getDwarfCompileUnitID())
+        .getMCLineSections()
+        .addLineEntry(LineEntry, Asm->OutStreamer->getCurrentSectionOnly());
+  }
+
   // Clear debug info
   // Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the
   // DbgVariables except those that are also in AbstractVariables (since they
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index 8ff097f29aebd1..34a9541bbbcc3a 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -104,8 +104,17 @@ void MCDwarfLineEntry::make(MCStreamer *MCOS, MCSection *Section) {
   // Get the current .loc info saved in the context.
   const MCDwarfLoc &DwarfLoc = MCOS->getContext().getCurrentDwarfLoc();
 
+  MCSymbol *LineStreamLabel = nullptr;
+  // If functions need offsets into the generated line table, then we need to
+  // create a label referencing where the line was generated in the output
+  // stream
+  if (MCOS->getGenerateFuncLineTableOffsets()) {
+    LineStreamLabel = MCOS->getContext().createTempSymbol();
+    MCOS->emittedLineStreamSym(LineStreamLabel);
+  }
+
   // Create a (local) line entry with the symbol and the current .loc info.
-  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, LineStreamLabel);
 
   // clear DwarfLocSeen saying the current .loc info is now used.
   MCOS->getContext().clearDwarfLocSeen();
@@ -145,7 +154,8 @@ makeStartPlusIntExpr(MCContext &Ctx, const MCSymbol &Start, int IntVal) {
   return Res;
 }
 
-void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
+void MCLineSection::addEndEntry(MCSymbol *EndLabel,
+                                bool generatingFuncLineTableOffsets) {
   auto *Sec = &EndLabel->getSection();
   // The line table may be empty, which we should skip adding an end entry.
   // There are two cases:
@@ -158,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
   if (I != MCLineDivisions.end()) {
     auto &Entries = I->second;
     auto EndEntry = Entries.back();
-    EndEntry.setEndLabel(EndLabel);
-    Entries.push_back(EndEntry);
+    // If generatingFuncLineTableOffsets is set, then we already generated an
+    // end label at the end of the last function, so skip generating another one
+    if (!generatingFuncLineTableOffsets) {
+      EndEntry.setEndLabel(EndLabel);
+      Entries.push_back(EndEntry);
+    }
   }
 }
 
diff --git a/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
new file mode 100644
index 00000000000000..ef8b0c817cfb67
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
@@ -0,0 +1,82 @@
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=NO_STMT_SEQ
+
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s -emit-func-debug-line-table-offsets
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=STMT_SEQ
+
+; NO_STMT_SEQ-NOT:      DW_AT_LLVM_stmt_sequence
+
+; STMT_SEQ:   [[[ABBREV_CODE:[0-9]+]]] DW_TAG_subprogram
+; STMT_SEQ:  	       DW_AT_LLVM_stmt_sequence    DW_FORM_sec_offset
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000028)
+; STMT_SEQ:   DW_AT_name {{.*}}func01
+; STMT_SEQ:   DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ:       DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset]	(0x00000033)
+; STMT_SEQ:   DW_AT_name {{.*}}main
+
+;; Check that the line table starts at 0x00000028 (first function)
+; STMT_SEQ:            Address            Line   Column File   ISA Discriminator OpIndex Flags
+; STMT_SEQ-NEXT:       ------------------ ------ ------ ------ --- ------------- ------- -------------
+; STMT_SEQ-NEXT:  0x00000028: 00 DW_LNE_set_address (0x00000006)
+
+;; Check that we have an 'end_sequence' just before the next function (0x00000033)
+; STMT_SEQ:            0x0000000000000006      1      0      1   0             0       0  is_stmt end_sequence
+; STMT_SEQ-NEXT: 0x00000033: 00 DW_LNE_set_address (0x00000027)
+
+;; Check that the end of the line table still has an 'end_sequence'
+; STMT_SEQ       0x00000049: 00 DW_LNE_end_sequence
+; STMT_SEQ-NEXT        0x0000000000000027      6      3      1   0             0       0  end_sequence
+
+
+; generated from:
+; clang -g -S -emit-llvm test.c -o test.ll
+; ======= test.c ======
+; int func01() {
+;   return 1;
+; }
+; int main() {
+;   return 0;
+; }
+; =====================
+
+
+; ModuleID = 'test.c'
+source_filename = "test.c"
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @func01() #0 !dbg !9 {
+  ret i32 1, !dbg !13
+}
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @main() #0 !dbg !14 {
+  %1 = alloca i32, align 4
+  store i32 0, ptr %1, align 4
+  ret i32 0, !dbg !15
+}
+
+attributes #0 = { noinline nounwind optnone ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "Homebrew clang version 17.0.6", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk", sdk: "MacOSX14.sdk")
+!1 = !DIFile(filename: "test.c", directory: "/tmp/clang_test")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"uwtable", i32 1}
+!7 = !{i32 7, !"frame-pointer", i32 1}
+!8 = !{!"Homebrew clang version 17.0.6"}
+!9 = distinct !DISubprogram(name: "func01", scope: !1, file: !1, line: 1, type: !10, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{!12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !DILocation(line: 2, column: 3, scope: !9)
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DILocation(line: 6, column: 3, scope: !14)

Comment on lines 126 to 127
// DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
friend class DwarfDebug;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels like a bit of a layering and encapsulation break & hopefully can be avoided?

What happens if DwarfDebug doesn't do this? Doesn't MCDwarf correctly implicitly terminate the sequence when a new label is requested? Isn't that enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep this implementation as close to Apple's very similar implementation of -fcas-friendly-debug-info in their fork of llvm-project - see MCDwarf.h in Apple's llvm-project fork . This to both to try to stick to their proposed pattern and to make it easier to merge the changes later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dwblaikie, this was done so that DwarfDebug can create and add a new MCDwarfLineEntry at the end of DwarfDebug::endFunctionImpl. Do you have a better way of being able to add an DW_LNE_end_sequence line entry from DwarfDebug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DwarfDebug needs some functionality in MCDwarf, it should probably be exposed to any MCDwarf client rather than uniquely to DwarfDebug.

But I don't know that DW_LNE_end_sequence placement should be chosen by DwarfDebug - it should go at the end of any chunk of .text that can be sliced-and-diced by the linker. (which means it should probably be at the end of every function on MachO when using subsections-via-symbols (but I guess it isn't currently, because the DWARF only gets rewritten by dsymutil which is DWARF-aware), or on ELF with -ffunction-sections)

In general I'd expect DwarfDebug to not actually request where line table sequences start and end - they're a function of the object format about where content can be treated as contiguous or not.

Except for this patch, which needs a label that starts a sequence even if it isn't in what would be an isolated chunk (though I have my doubts about that - if the function isn't at the start of an isolated chunk of .text, does it need one of these labels? Could it use the label of the start of the isolated chunk it's part of? Could it share that location somehow with all functions in that chunk?)

Comment on lines 530 to 535
if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
addSectionLabel(
*SPDie, dwarf::DW_AT_LLVM_stmt_sequence,
Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than wiring up the attribute into the streamer, then querying it out here in DwarfCompileUnit, then going back into the streamer with labels - could the streamer "do the right thing" when a label is requested, and otherwise do the old/usual thing?

It doesn't seem like MCStreamer should "know" about function labels, it should know about line table labels, and you could in theory request them anywhere you want to be able to jump into parsing the line table without having to parse additional context. (ie: in beginFunctionImpl ? (though maybe that's after the prologue, in which case maybe there's some callback before the prologue, or we should add one))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a better design and I think it may work. For the current implementation - same as above - was trying to follow Apple's existing design. See here for example.

Before taking the more straightforward approach just want to make sure we aren't necessarily weighing in Apple's existing design - and we should try to get the best design for the current feature only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough -prior art and all.

Could you rope someone in from that work to join this discussion, then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rastogishubham / @adrian-prantl - In this PR I'm doing something similar to -fcas-friendly-debug-info in the Apple branch (and also a bit extra on top). Do you think this is the way to go for future merges / consistency / etc ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bulbazord / @kastiglione - We're trying to do something similar to -fcas-friendly-debug-info in the Apple branch - could you have a look if you're OK with this approach for consistency with Apple branch ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this when you pinged me originally.
There is no good reason for -fcas-friendly-debug-info to live downstream so we should try to unify both implementations.
@rastogishubham Can you take a look at both patches and see if we could rebase -fcas-friendly-debug-info on top of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, let me take a look, I need to also familiarize myself with -fcas-friendly-debug-info code again

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feedback still seems relevant, to me, to the current version of the code - it's still strange to me that MCStreamer needs to know about this with an attribute up-front with 'setGenerateFuncLineTableOffsets` - in assembly that isn't necessary, the asm directive is used wherever needed, and MCStreamer behaves appropriately/as needed (it's not like we needed a directive at the start of the file to announce that we might use line table offset directives later in the file - so why would we need that at the API level? We should just query for line table offsets when we need them and MCStreamer should provide them at that point, without needing some prior warning/up front thing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. The new implementation seems much more straightforward. LMK what you think.

I also tested it on a more complex example and the result seems correct.

@@ -158,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
if (I != MCLineDivisions.end()) {
auto &Entries = I->second;
auto EndEntry = Entries.back();
EndEntry.setEndLabel(EndLabel);
Entries.push_back(EndEntry);
// If generatingFuncLineTableOffsets is set, then we already generated an
Copy link
Contributor

@rastogishubham rastogishubham Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this comment up to be the third case? So change line 161 to say

// There are three cases:

and then add this comment as the third case?

Like: https://github.com/swiftlang/llvm-project/pull/6718/files#diff-702eecfe61bae2fe5a8a0464141c844242f6429e8f7d546bf6f69b1d633c2a32R156

@rastogishubham
Copy link
Contributor

@alx32 there is a major bug with this patch and I am not sure why it is happening.

If we take this test file:

// /tmp/a.cpp
int foo() {
    return 1;
}

int bar() {
    return 2;
}

int baz() {
    return 3;
}

compile it regularly:

clang -c -g /tmp/a.cpp -o /tmp/a.o

then use dwarfdump on it with the verbose mode (-v):
dwarfdump --debug-line /tmp/a.o -v

We will see:

            Address            Line   Column File   ISA Discriminator OpIndex Flags
            ------------------ ------ ------ ------ --- ------------- ------- -------------
0x00000043: 04 DW_LNS_set_file (0)
0x00000045: 05 DW_LNS_set_column (5)
0x00000047: 0a DW_LNS_set_prologue_end
0x00000048: 00 DW_LNE_set_address (0x0000000000000000)
0x00000053: 13 address += 0,  line += 1,  op-index += 0
            0x0000000000000000      2      5      0   0             0       0  is_stmt prologue_end
0x00000054: 0a DW_LNS_set_prologue_end
0x00000055: 86 address += 8,  line += 4,  op-index += 0
            0x0000000000000008      6      5      0   0             0       0  is_stmt prologue_end
0x00000056: 0a DW_LNS_set_prologue_end
0x00000057: 86 address += 8,  line += 4,  op-index += 0
            0x0000000000000010     10      5      0   0             0       0  is_stmt prologue_end
0x00000058: 02 DW_LNS_advance_pc (addr += 8, op-index += 0)
0x0000005a: 00 DW_LNE_end_sequence
            0x0000000000000018     10      5      0   0             0       0  is_stmt end_sequence

However, when I use apply your patch and use:
clang -mllvm -emit-func-debug-line-table-offsets -c -g /tmp/a.cpp -o /tmp/a.o

then use dwardump:
dwarfdump --debug-line /tmp/a.o -v

I see:

            Address            Line   Column File   ISA Discriminator OpIndex Flags
           ------------------ ------ ------ ------ --- ------------- ------- -------------
0x00000043: 00 DW_LNE_set_address (0x0000000000000008)
0x0000004e: 01 DW_LNS_copy
           0x0000000000000008      1      0      1   0             0       0  is_stmt
0x0000004f: 00 DW_LNE_end_sequence
           0x0000000000000008      1      0      1   0             0       0  is_stmt end_sequence
0x00000052: 00 DW_LNE_set_address (0x0000000000000010)
0x0000005d: 01 DW_LNS_copy
           0x0000000000000010      1      0      1   0             0       0  is_stmt
0x0000005e: 00 DW_LNE_end_sequence
           0x0000000000000010      1      0      1   0             0       0  is_stmt end_sequence
0x00000061: 00 DW_LNE_set_address (0x0000000000000018)
0x0000006c: 01 DW_LNS_copy
           0x0000000000000018      1      0      1   0             0       0  is_stmt
0x0000006d: 00 DW_LNE_end_sequence
           0x0000000000000018      1      0      1   0             0       0  is_stmt end_sequence

Notice that there are no line table opcodes that advance the file and line, the file and line stay 1 and 0 for the duration of the line table. That is very wrong.

@alx32
Copy link
Contributor Author

alx32 commented Oct 22, 2024

@alx32 there is a major bug with this patch and I am not sure why it is happening.

Thanks - Will have a look !

@alx32
Copy link
Contributor Author

alx32 commented Oct 25, 2024

About the issues found by @rastogishubham above - these need some context to explain:

Timeline:

  • Originally this feature was published as PR#93985 (now closed, very similar to this PR).
  • PR#93985 was designed to be as rebase-friendly as possible with the implementation of -fcas-friendly-debug-info within the Apple branch.
  • In PR#93985 it was requested that this feature also be supported in the MC layer.
  • MC layer support was added in PR#99710
  • Now, I published this PR with not much changes from the original now-closed PR#93985 , wanting to keep compatibility with -fcas-friendly-debug-info feature.
  • The issue is that I didn't fully account for how the MC layer changes interacted with the original PR#93985, leading to the issues that @rastogishubham pointed out.

The specific issue is that in the MC layer change, if a MCDwarfLineEntry has a valid LineStreamLabel then it will only be used for generating the line label, and the rest of the information in the MCDwarfLineEntry will be ignored - see code.
This behavior is basically required in the MC layer as .loc_label is a separate instruction.

This behavior lead to the issue @rastogishubham pointed out where valid MCDwarfLineEntry entries were being ignored because they had a LineStreamLabel attached.

The reason for the above context is to bring up another issue - after the MC change, the behavior is that if a LineStreamLabel is specified then the current line sequence will also be terminated. If we want to use LineStreamLabel in the current change, then this behavior will be inherited, leading to conflicts to how the "end current sequence" behavior is implemented for the -fcas-friendly-debug-info feature.

So it looks like the options here are:

  1. Diverge this feature from the implementation of -fcas-friendly-debug-info - similar to this comment. This means that -fcas-friendly-debug-info can't really share the functionality of this feature directly, as currently implemented in the Apple branch, but there should not be problems merging them together.
  2. Add a MCDwarfLineEntry::LineStreamLabel_NoEndSequence or simliar field and basically bypass the MC changes and go back to an implementation similar to the original PR#93985 where there will be some overlap with -fcas-friendly-debug-info. This will basically lead to duplicate logic / fields in MCDwarfLineEntry.

I think Nr.2 above would not be ideal - but still wanted to present it as an option.
I'll update this PR to an implementation of Nr.1 above so we can see how that would look like.

@alx32
Copy link
Contributor Author

alx32 commented Oct 25, 2024

The latest change is possibility Nr.1 presented in the above comment - similar to David's proposal here where we simply insert a line label and have that terminate the current sequence.

@rastogishubham - LMK how you think we should proceed here either of the above 2 options or some other way.

PS - There is still lots of back/forth through MCStreamer (also mentioned by David here) mainly because:

  • There is no beginFunctionImpl callback that we would need to remove the back/forth.
  • The back/forth through the MCStreamer is how Apple's branch went about implementing the -fcas-friendly-debug-info feature.

@rastogishubham
Copy link
Contributor

rastogishubham commented Oct 28, 2024

@alx32 thanks for the detailed update, as long as the patch doesn't break the -fcas-friendly-debug-info functionality I am okay with how it is implemented, might I suggest adding a new test to make sure that the linetable is correct when the option is used.

This should merge fine with what we have in the Apple branch.

@rastogishubham
Copy link
Contributor

@alx32 can you please add a test that checks to make sure the line table is correct like in #110192 (comment)

If that is added, I can approve the patch, thanks!

@alx32
Copy link
Contributor Author

alx32 commented Nov 5, 2024

@alx32 can you please add a test that checks to make sure the line table is correct

@rastogishubham - sorry was out last week. What about the latest test change ? Do you think we need more coverage ?

EDIT: Failures are infra related - not an actual issue: checking out commit "01add905319491fc2ddbb51076d22a64078c8701": exit status 1

Copy link
Contributor

@rastogishubham rastogishubham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alx32 alx32 force-pushed the 02_stmt_seq_clang branch from 01add90 to ab07e40 Compare November 11, 2024 17:24
@alx32
Copy link
Contributor Author

alx32 commented Nov 11, 2024

Updated (and rebased) to address merge conflict.
@dwblaikie - any suggestions before merging this ?

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, reckon this is worth a go.

@alx32 alx32 merged commit f407dff into llvm:main Nov 14, 2024
8 checks passed
alx32 added a commit that referenced this pull request Feb 6, 2025
…e lookups (#123391)

**Summary**
Add support for filtering line table entries based on
`DW_AT_LLVM_stmt_sequence` attribute when looking up address ranges.
This ensures that line entries are correctly attributed to their
corresponding functions, even when multiple functions share the same
address range due to optimizations.

**Background**
In #110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

The `DW_AT_LLVM_stmt_sequence` attribute allows accurate attribution of
line number information to their corresponding functions, even in
scenarios where functions are merged or share the same address space due
to optimizations like Identical Code Folding (ICF) in the linker.

**Implementation Details**
The patch modifies `DWARFDebugLine::lookupAddressRange` to accept an
optional DWARFDie parameter. When provided, the function checks if the
`DIE` has a `DW_AT_LLVM_stmt_sequence` attribute. This attribute
contains an offset into the line table that marks where the line entries
for this DIE's function begin.

If the attribute is present, the function filters the results to only
include line entries from the sequence that starts at the specified
offset. This ensures that even when multiple functions share the same
address range, we return only the line entries that actually belong to
the function represented by the DIE.

The implementation:
- Adds an optional DWARFDie parameter to lookupAddressRange
- Extracts the `DW_AT_LLVM_stmt_sequence` offset if present
- Modifies the address range lookup logic to filter sequences based on
their offset
- Returns only line entries from the matching sequence
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 7, 2025
…n line table lookups (#123391)

**Summary**
Add support for filtering line table entries based on
`DW_AT_LLVM_stmt_sequence` attribute when looking up address ranges.
This ensures that line entries are correctly attributed to their
corresponding functions, even when multiple functions share the same
address range due to optimizations.

**Background**
In llvm/llvm-project#110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

The `DW_AT_LLVM_stmt_sequence` attribute allows accurate attribution of
line number information to their corresponding functions, even in
scenarios where functions are merged or share the same address space due
to optimizations like Identical Code Folding (ICF) in the linker.

**Implementation Details**
The patch modifies `DWARFDebugLine::lookupAddressRange` to accept an
optional DWARFDie parameter. When provided, the function checks if the
`DIE` has a `DW_AT_LLVM_stmt_sequence` attribute. This attribute
contains an offset into the line table that marks where the line entries
for this DIE's function begin.

If the attribute is present, the function filters the results to only
include line entries from the sequence that starts at the specified
offset. This ensures that even when multiple functions share the same
address range, we return only the line entries that actually belong to
the function represented by the DIE.

The implementation:
- Adds an optional DWARFDie parameter to lookupAddressRange
- Extracts the `DW_AT_LLVM_stmt_sequence` offset if present
- Modifies the address range lookup logic to filter sequences based on
their offset
- Returns only line entries from the matching sequence
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…e lookups (llvm#123391)

**Summary**
Add support for filtering line table entries based on
`DW_AT_LLVM_stmt_sequence` attribute when looking up address ranges.
This ensures that line entries are correctly attributed to their
corresponding functions, even when multiple functions share the same
address range due to optimizations.

**Background**
In llvm#110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

The `DW_AT_LLVM_stmt_sequence` attribute allows accurate attribution of
line number information to their corresponding functions, even in
scenarios where functions are merged or share the same address space due
to optimizations like Identical Code Folding (ICF) in the linker.

**Implementation Details**
The patch modifies `DWARFDebugLine::lookupAddressRange` to accept an
optional DWARFDie parameter. When provided, the function checks if the
`DIE` has a `DW_AT_LLVM_stmt_sequence` attribute. This attribute
contains an offset into the line table that marks where the line entries
for this DIE's function begin.

If the attribute is present, the function filters the results to only
include line entries from the sequence that starts at the specified
offset. This ensures that even when multiple functions share the same
address range, we return only the line entries that actually belong to
the function represented by the DIE.

The implementation:
- Adds an optional DWARFDie parameter to lookupAddressRange
- Extracts the `DW_AT_LLVM_stmt_sequence` offset if present
- Modifies the address range lookup logic to filter sequences based on
their offset
- Returns only line entries from the matching sequence
@alx32 alx32 requested a review from clayborg February 26, 2025 22:54
alx32 added a commit that referenced this pull request Mar 13, 2025
…es (#128953)

**Summary:**  
This update adds handling for `DW_AT_LLVM_stmt_sequence` attributes in
the DWARF linker. These attributes point to rows in the line table,
which gets rewritten during linking. Since the row positions change, the
offsets in these attributes need to be updated to match the new layout
in the output `.debug_line` section. The changes add new data structures
and tweak existing functions to track and fix these attributes.

**Background**
In #110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434).
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.

**Implementation details:**  
Here’s what’s changed in the code:  
- **New Tracking in `CompileUnit`:** A `StmtSeqListAttributes` vector is
added to the `CompileUnit` class. It stores the locations where
`DW_AT_LLVM_stmt_sequence` attributes need to be patched, recorded when
cloning DIEs (debug info entries).
- **Updated `emitLineTableForUnit` Function:** This function now has an
optional `RowOffsets` parameter. It collects the byte offsets of each
row in the output line table. We only need to use this functionality if
`DW_AT_LLVM_stmt_sequence` attributes are present in the unit.
- **Row Tracking with `TrackedRow`:** A `TrackedRow` struct keeps track
of each input row’s original index and whether it starts a sequence in
the output table. This links old rows to their new positions in the
rewritten line table. Several implementations were considered and
prototyped here, but so far this has proven the simplest and cleanest
approach.
- **Patching Step:** After the line table is written, the linker uses
the data in `TrackedRow`'s objects and `RowOffsets` array to update the
`DW_AT_LLVM_stmt_sequence` attributes with the correct offsets.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 13, 2025
…n line tables (#128953)

**Summary:**
This update adds handling for `DW_AT_LLVM_stmt_sequence` attributes in
the DWARF linker. These attributes point to rows in the line table,
which gets rewritten during linking. Since the row positions change, the
offsets in these attributes need to be updated to match the new layout
in the output `.debug_line` section. The changes add new data structures
and tweak existing functions to track and fix these attributes.

**Background**
In llvm/llvm-project#110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434).
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.

**Implementation details:**
Here’s what’s changed in the code:
- **New Tracking in `CompileUnit`:** A `StmtSeqListAttributes` vector is
added to the `CompileUnit` class. It stores the locations where
`DW_AT_LLVM_stmt_sequence` attributes need to be patched, recorded when
cloning DIEs (debug info entries).
- **Updated `emitLineTableForUnit` Function:** This function now has an
optional `RowOffsets` parameter. It collects the byte offsets of each
row in the output line table. We only need to use this functionality if
`DW_AT_LLVM_stmt_sequence` attributes are present in the unit.
- **Row Tracking with `TrackedRow`:** A `TrackedRow` struct keeps track
of each input row’s original index and whether it starts a sequence in
the output table. This links old rows to their new positions in the
rewritten line table. Several implementations were considered and
prototyped here, but so far this has proven the simplest and cleanest
approach.
- **Patching Step:** After the line table is written, the linker uses
the data in `TrackedRow`'s objects and `RowOffsets` array to update the
`DW_AT_LLVM_stmt_sequence` attributes with the correct offsets.
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
…es (llvm#128953)

**Summary:**  
This update adds handling for `DW_AT_LLVM_stmt_sequence` attributes in
the DWARF linker. These attributes point to rows in the line table,
which gets rewritten during linking. Since the row positions change, the
offsets in these attributes need to be updated to match the new layout
in the output `.debug_line` section. The changes add new data structures
and tweak existing functions to track and fix these attributes.

**Background**
In llvm#110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434).
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.

**Implementation details:**  
Here’s what’s changed in the code:  
- **New Tracking in `CompileUnit`:** A `StmtSeqListAttributes` vector is
added to the `CompileUnit` class. It stores the locations where
`DW_AT_LLVM_stmt_sequence` attributes need to be patched, recorded when
cloning DIEs (debug info entries).
- **Updated `emitLineTableForUnit` Function:** This function now has an
optional `RowOffsets` parameter. It collects the byte offsets of each
row in the output line table. We only need to use this functionality if
`DW_AT_LLVM_stmt_sequence` attributes are present in the unit.
- **Row Tracking with `TrackedRow`:** A `TrackedRow` struct keeps track
of each input row’s original index and whether it starts a sequence in
the output table. This links old rows to their new positions in the
rewritten line table. Several implementations were considered and
prototyped here, but so far this has proven the simplest and cleanest
approach.
- **Patching Step:** After the line table is written, the linker uses
the data in `TrackedRow`'s objects and `RowOffsets` array to update the
`DW_AT_LLVM_stmt_sequence` attributes with the correct offsets.
Copy link
Contributor

@nocchijiang nocchijiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alx32, I've ported the patch to the Swift compiler and tested it on a Swift codebase and found it fails to compile a Swift module. It is a bit challenging to create a minimal reproduction, but please let me know if you need one.

@@ -2223,6 +2223,9 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
return;

DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
FunctionLineTableLabel = CU.emitFuncLineTableOffsets()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call be placed after setDwarfCompileUnitID()? If the compile unit ID is changed, it seems that FunctionLineTableLabel would no longer be in the same line table as the debug lines added later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest adding an assertion in MCDwarfLineEntry::setEndLabel to ensure that this->LineStreamLabel == nullptr. I found a case where LineStreamLabel was emitted twice, which triggered a complaint from MCStreamer::emitLabel, which is the reason why I caught the potential bug here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a case where LineStreamLabel was emitted twice

Is this happening in the current upstreamed version of the patch or the Swift implementation ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually ported your patches, as well as some other necessary ones like b468ed4, back to a downstream Swift compiler repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation, I realized that this issue isn't strictly specific to Swift. Instead, it's related to scenarios where multiple CUs are emitted in one compiler invocation. In my case, the affected Swift module is configured with -enable-single-module-llvm-emission.

The issue can be reproduced using tip-of-tree LLVM compiling llvm-test-suite with a slightly modified ReleaseLTO-g.cmake configuration (in order to enable -emit-func-debug-line-table-offsets for LTO objects).

set(CMAKE_EXE_LINKER_FLAGS_RELEASE "-Wl,-mllvm,-emit-func-debug-line-table-offsets"  CACHE STRING "")

Copy link
Contributor Author

@alx32 alx32 May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the repro. I have managed to repro it locally. I will look into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal repro via:

llc -O3 -mcpu=x86-64 -emit-func-debug-line-table-offsets -filetype=obj  debug-line-lto-bug.ll

Where debug-line-lto-bug.ll is https://gist.github.com/alx32/5d5db8dc9818e17e59e47c4f70d91ad1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: #142253

@alx32
Copy link
Contributor Author

alx32 commented May 20, 2025

@nocchijiang Thanks for the report.

and found it fails to compile a Swift module

Do you mean within a larger codebase it fails to compile one particular Swift module? Or that any Swift module will not compile after the port? I will probably be adding Swift support in the future, but it's not planned for the immediate timeline. Do you have plans currently to upstream the Swift support?

@nocchijiang
Copy link
Contributor

@alx32

Do you mean within a larger codebase it fails to compile one particular Swift module? Or that any Swift module will not compile after the port?

It fails on one particular Swift module.

Do you have plans currently to upstream the Swift support?

I am working on creating a minimal reproduction of the issue. Ideally I would make it an LLVM regression test, which would allow me to submit the proposed fix as mentioned in the review comment.

@alx32
Copy link
Contributor Author

alx32 commented May 22, 2025

@nocchijiang - Without a repro I won't quite be able to come up with an adequate fix here.

Also note that if your downstream branch has -fcas-friendly-debug-info (same as Apple branch) then there will be some functional conflicts - I know when this patch was ported to Apple branch there were some work-arounds needed to have this compatible with -fcas-friendly-debug-info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants