Skip to content

[DebugInfo] Add flag to enable function-level debug line attribution #93985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented May 31, 2024

This patch introduces a new flag, -mllvm -emit-func-debug-line-table-offsets, which enables the generation of additional debug information to facilitate the attribution of debug line data to their corresponding functions (DW_TAG_subprogram).

Context:
Currently, Clang generates function line information in the debug_line section without directly linking the line entries to their originating functions. This becomes problematic when post-compile tools, such as the Identical Code Folding (ICF) in the LLD linker, merge multiple functions together, resulting in overlapping address ranges and conflicting line information.

To address this issue, this patch adds a new attribute, DW_AT_META_stmt_sequence, to each DW_TAG_subprogram, which represents an offset into the line table where the line data for the subprogram begins. Additionally, a DW_LNE_end_sequence is added to the line table to mark the end of the line information for a particular subprogram. These changes enable the correct attribution of line entries to their originating functions, even when functions share the same address space after optimization.

NOTE: While this change is functional, it requires additional tooling support to be fully utilized, particularly for symbol resolution in cases where multiple functions share the same address space. This will be addressed in future changes.

RFC: https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434

Previous changes: #93137 - Added an LLD option to generate debug information for ICF'ed functions.

@alx32 alx32 force-pushed the 02_stmt_seq_clang branch from cf53e62 to 9276fc5 Compare May 31, 2024 18:31
@alx32 alx32 force-pushed the 02_stmt_seq_clang branch from 9276fc5 to 3402f2a Compare May 31, 2024 18:35
@alx32 alx32 marked this pull request as ready for review June 3, 2024 14:17
@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (alx32)

Changes

This patch introduces a new Clang flag, -mllvm -emit-func-debug-line-table-offsets, which enables the generation of additional debug information to facilitate the attribution of debug line data to their corresponding functions (DW_TAG_subprogram).

Context:
Currently, Clang generates function line information in the debug_line section without directly linking the line entries to their originating functions. This becomes problematic when post-compile tools, such as the Identical Code Folding (ICF) in the LLD linker, merge multiple functions together, resulting in overlapping address ranges and conflicting line information.

To address this issue, this patch adds a new attribute, DW_AT_META_stmt_sequence, to each DW_TAG_subprogram, which represents an offset into the line table where the line data for the subprogram begins. Additionally, a DW_LNE_end_sequence is added to the line table to mark the end of the line information for a particular subprogram. These changes enable the correct attribution of line entries to their originating functions, even when functions share the same address space after optimization.

NOTE: While this change is functional, it requires additional tooling support to be fully utilized, particularly for symbol resolution in cases where multiple functions share the same address space. This will be addressed in future changes.

Previous changes: #93137 - Added an LLD option to generate debug information for ICF'ed functions.


Full diff: https://github.com/llvm/llvm-project/pull/93985.diff

8 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.def (+2)
  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.h (+1)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+18-3)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+27)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (+8)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+29-1)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+23-5)
  • (added) llvm/test/DebugInfo/X86/DW_AT_META_stmt_seq_sec_offset.ll (+68)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.def b/llvm/include/llvm/BinaryFormat/Dwarf.def
index adcf24eb83b03..6124babc40927 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.def
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.def
@@ -578,6 +578,8 @@ HANDLE_DW_AT(0x2904, GO_runtime_type, 0, GO)
 
 HANDLE_DW_AT(0x3210, UPC_threads_scaled, 0, UPC)
 
+HANDLE_DW_AT(0x3600, META_stmt_sequence, 0, META)
+
 HANDLE_DW_AT(0x393e, IBM_wsa_addr, 0, IBM)
 HANDLE_DW_AT(0x393f, IBM_home_location, 0, IBM)
 HANDLE_DW_AT(0x3940, IBM_alt_srcview, 0, IBM)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 74c4d6ff3a716..3e42dc240fcc0 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -84,6 +84,7 @@ enum LLVMConstants : uint32_t {
   DWARF_VENDOR_PGI,
   DWARF_VENDOR_SUN,
   DWARF_VENDOR_UPC,
+  DWARF_VENDOR_META,
   ///\}
 };
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index 18056c5fdf816..b6f87dc8b6c8a 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -122,6 +122,8 @@ class MCDwarfLoc {
 private: // MCContext manages these
   friend class MCContext;
   friend class MCDwarfLineEntry;
+  // DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
+  friend class DwarfDebug;
 
   MCDwarfLoc(unsigned fileNum, unsigned line, unsigned column, unsigned flags,
              unsigned isa, unsigned discriminator)
@@ -194,14 +196,27 @@ class MCDwarfLineEntry : public MCDwarfLoc {
 
 public:
   // Constructor to create an MCDwarfLineEntry given a symbol and the dwarf loc.
-  MCDwarfLineEntry(MCSymbol *label, const MCDwarfLoc loc)
-      : MCDwarfLoc(loc), Label(label) {}
+  MCDwarfLineEntry(MCSymbol *label, const MCDwarfLoc loc,
+                   bool isEndOfFunction = false,
+                   MCSymbol *streamLabel = nullptr)
+      : MCDwarfLoc(loc), Label(label), IsEndOfFunction(isEndOfFunction),
+        StreamLabel(streamLabel) {}
 
   MCSymbol *getLabel() const { return Label; }
 
   // This indicates the line entry is synthesized for an end entry.
   bool IsEndEntry = false;
 
+  // This indicates that the current line entry denotes the end of a function,
+  // it is used to emit a DW_LNE_end_sequnece to reset the state machine
+  // registers.
+  bool IsEndOfFunction;
+
+  // Optional symbol to be emitted just before the line is written into the
+  // output stream. It can be used to reference the position of the start of
+  // this line's data in the output stream.
+  MCSymbol *StreamLabel;
+
   // Override the label with the given EndLabel.
   void setEndLabel(MCSymbol *EndLabel) {
     Label = EndLabel;
@@ -227,7 +242,7 @@ class MCLineSection {
 
   // Add an end entry by cloning the last entry, if exists, for the section
   // the given EndLabel belongs to. The label is replaced by the given EndLabel.
-  void addEndEntry(MCSymbol *EndLabel);
+  void addEndEntry(MCSymbol *EndLabel, bool generatingFuncLineTableOffsets);
 
   using MCDwarfLineEntryCollection = std::vector<MCDwarfLineEntry>;
   using iterator = MCDwarfLineEntryCollection::iterator;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index b7468cf70a664..559acabee4082 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -254,6 +254,15 @@ class MCStreamer {
   /// discussion for future inclusion.
   bool AllowAutoPadding = false;
 
+  // Flag specyfing weather functions will have an offset into the line table
+  // where the line data for that function starts
+  bool GenerateFuncLineTableOffsets = false;
+
+  // Symbol that tracks the stream symbol for first line of the current function
+  // being generated. This symbol can be used to reference where the line
+  // entries for the function start in the generated line table.
+  MCSymbol *CurrentFuncFirstLineStreamSym;
+
 protected:
   MCStreamer(MCContext &Ctx);
 
@@ -310,6 +319,24 @@ class MCStreamer {
   void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
 
+  void setGenerateFuncLineTableOffsets(bool v) {
+    GenerateFuncLineTableOffsets = v;
+  }
+  bool getGenerateFuncLineTableOffsets() const {
+    return GenerateFuncLineTableOffsets;
+  }
+
+  // Use the below functions to track the symbol that points to the current
+  // function's line info in the output stream.
+  void beginFunction() { CurrentFuncFirstLineStreamSym = nullptr; }
+  void emittedLineStreamSym(MCSymbol *StreamSym) {
+    if (!CurrentFuncFirstLineStreamSym)
+      CurrentFuncFirstLineStreamSym = StreamSym;
+  }
+  MCSymbol *getCurrentFuncFirstLineStreamSym() {
+    return CurrentFuncFirstLineStreamSym;
+  }
+
   /// When emitting an object file, create and emit a real label. When emitting
   /// textual assembly, this should do nothing to avoid polluting our output.
   virtual MCSymbol *emitCFILabel();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index c1e7f01f0eba5..53a9805cfd503 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -526,6 +526,14 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
           *DD->getCurrentFunction()))
     addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
 
+  if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
+      Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
+    addSectionLabel(
+        *SPDie, dwarf::DW_AT_META_stmt_sequence,
+        Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
+        Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
+  }
+
   // Only include DW_AT_frame_base in full debug info
   if (!includeMinimalInlineScopes()) {
     const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index b9c02aed848cc..6b987792c3db3 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -170,6 +170,12 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
                           "Stuff")),
     cl::init(DwarfDebug::MinimizeAddrInV5::Default));
 
+static cl::opt<bool> EmitFuncLineTableOffsetsOption(
+    "emit-func-debug-line-table-offsets", cl::Hidden,
+    cl::desc("Include line table offset in function's debug info and emit end "
+             "sequence after each function's line data."),
+    cl::init(false));
+
 static constexpr unsigned ULEB128PadSize = 4;
 
 void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
@@ -440,6 +446,8 @@ DwarfDebug::DwarfDebug(AsmPrinter *A)
   Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion);
   Asm->OutStreamer->getContext().setDwarfFormat(Dwarf64 ? dwarf::DWARF64
                                                         : dwarf::DWARF32);
+  Asm->OutStreamer->setGenerateFuncLineTableOffsets(
+      EmitFuncLineTableOffsetsOption);
 }
 
 // Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h.
@@ -2222,6 +2230,10 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
   if (SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
     return;
 
+  // Notify the streamer that we are beginning a function - this will reset the
+  // label pointing to the currently generated function's first line entry
+  Asm->OutStreamer->beginFunction();
+
   DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
 
   Asm->OutStreamer->getContext().setDwarfCompileUnitID(
@@ -2250,7 +2262,8 @@ void DwarfDebug::terminateLineTable(const DwarfCompileUnit *CU) {
       getDwarfCompileUnitIDForLineTable(*CU));
   // Add the last range label for the given CU.
   LineTable.getMCLineSections().addEndEntry(
-      const_cast<MCSymbol *>(CURanges.back().End));
+      const_cast<MCSymbol *>(CURanges.back().End),
+      EmitFuncLineTableOffsetsOption);
 }
 
 void DwarfDebug::skippedNonDebugFunction() {
@@ -2343,6 +2356,21 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
   // Construct call site entries.
   constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);
 
+  // If we're emitting line table offsets, we also need to emit an end label
+  // after all function's line entries
+  if (EmitFuncLineTableOffsetsOption) {
+    MCSymbol *LineSym = Asm->OutStreamer->getContext().createTempSymbol();
+    Asm->OutStreamer->emitLabel(LineSym);
+    MCDwarfLoc DwarfLoc(
+        1, 1, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0);
+    MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, /*IsEndOfFunction*/ true);
+    Asm->OutStreamer->getContext()
+        .getMCDwarfLineTable(
+            Asm->OutStreamer->getContext().getDwarfCompileUnitID())
+        .getMCLineSections()
+        .addLineEntry(LineEntry, Asm->OutStreamer->getCurrentSectionOnly());
+  }
+
   // Clear debug info
   // Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the
   // DbgVariables except those that are also in AbstractVariables (since they
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index aba4071e6b910..aeec79766e3d6 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -103,8 +103,18 @@ void MCDwarfLineEntry::make(MCStreamer *MCOS, MCSection *Section) {
   // Get the current .loc info saved in the context.
   const MCDwarfLoc &DwarfLoc = MCOS->getContext().getCurrentDwarfLoc();
 
+  MCSymbol *StreamLabel = nullptr;
+  // If functions need offsets into the generated line table, then we need to
+  // create a label referencing where the line was generated in the output
+  // stream
+  if (MCOS->getGenerateFuncLineTableOffsets()) {
+    StreamLabel = MCOS->getContext().createTempSymbol();
+    MCOS->emittedLineStreamSym(StreamLabel);
+  }
+
   // Create a (local) line entry with the symbol and the current .loc info.
-  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+  MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, /*isEndOfFunction=*/false,
+                             StreamLabel);
 
   // clear DwarfLocSeen saying the current .loc info is now used.
   MCOS->getContext().clearDwarfLocSeen();
@@ -144,7 +154,8 @@ makeStartPlusIntExpr(MCContext &Ctx, const MCSymbol &Start, int IntVal) {
   return Res;
 }
 
-void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
+void MCLineSection::addEndEntry(MCSymbol *EndLabel,
+                                bool generatingFuncLineTableOffsets) {
   auto *Sec = &EndLabel->getSection();
   // The line table may be empty, which we should skip adding an end entry.
   // There are two cases:
@@ -157,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
   if (I != MCLineDivisions.end()) {
     auto &Entries = I->second;
     auto EndEntry = Entries.back();
-    EndEntry.setEndLabel(EndLabel);
-    Entries.push_back(EndEntry);
+    // If generatingFuncLineTableOffsets is set, then we already generated an
+    // end label at the end of the last function, so skip generating another one
+    if (!generatingFuncLineTableOffsets) {
+      EndEntry.setEndLabel(EndLabel);
+      Entries.push_back(EndEntry);
+    }
   }
 }
 
@@ -187,8 +202,11 @@ void MCDwarfLineTable::emitOne(
   bool EndEntryEmitted = false;
   for (const MCDwarfLineEntry &LineEntry : LineEntries) {
     MCSymbol *Label = LineEntry.getLabel();
+    if (LineEntry.StreamLabel && MCOS->getGenerateFuncLineTableOffsets()) {
+      MCOS->emitLabel(LineEntry.StreamLabel);
+    }
     const MCAsmInfo *asmInfo = MCOS->getContext().getAsmInfo();
-    if (LineEntry.IsEndEntry) {
+    if (LineEntry.IsEndEntry || LineEntry.IsEndOfFunction) {
       MCOS->emitDwarfAdvanceLineAddr(INT64_MAX, LastLabel, Label,
                                      asmInfo->getCodePointerSize());
       init();
diff --git a/llvm/test/DebugInfo/X86/DW_AT_META_stmt_seq_sec_offset.ll b/llvm/test/DebugInfo/X86/DW_AT_META_stmt_seq_sec_offset.ll
new file mode 100644
index 0000000000000..50e0856b30327
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_AT_META_stmt_seq_sec_offset.ll
@@ -0,0 +1,68 @@
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=NO_STMT_SEQ
+
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s -emit-func-debug-line-table-offsets
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=STMT_SEQ
+
+; NO_STMT_SEQ-NOT:      DW_AT_META_stmt_sequence
+
+; STMT_SEQ:   [2] DW_TAG_subprogram
+; STMT_SEQ:  	       DW_AT_META_stmt_sequence    DW_FORM_sec_offset
+; STMT_SEQ:   DW_TAG_subprogram [2]
+; STMT_SEQ:       DW_AT_META_stmt_sequence [DW_FORM_sec_offset]	(0x00000028)
+; STMT_SEQ:   DW_AT_name {{.*}}func01
+; STMT_SEQ:   DW_TAG_subprogram [2]
+; STMT_SEQ:       DW_AT_META_stmt_sequence [DW_FORM_sec_offset]	(0x00000038)
+; STMT_SEQ:   DW_AT_name {{.*}}main
+
+; generated from:
+; clang -g -S -emit-llvm test.c -o test.ll
+; ======= test.c ======
+; int func01() {
+;   return 1;
+; }
+; int main() {
+;   return 0;
+; }
+; =====================
+
+
+; ModuleID = 'test.c'
+source_filename = "test.c"
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @func01() #0 !dbg !9 {
+  ret i32 1, !dbg !13
+}
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @main() #0 !dbg !14 {
+  %1 = alloca i32, align 4
+  store i32 0, ptr %1, align 4
+  ret i32 0, !dbg !15
+}
+
+attributes #0 = { noinline nounwind optnone ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "Homebrew clang version 17.0.6", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk", sdk: "MacOSX14.sdk")
+!1 = !DIFile(filename: "test.c", directory: "/tmp/clang_test")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"uwtable", i32 1}
+!7 = !{i32 7, !"frame-pointer", i32 1}
+!8 = !{!"Homebrew clang version 17.0.6"}
+!9 = distinct !DISubprogram(name: "func01", scope: !1, file: !1, line: 1, type: !10, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{!12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !DILocation(line: 2, column: 3, scope: !9)
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DILocation(line: 6, column: 3, scope: !14)

@alx32 alx32 requested review from adrian-prantl and dwblaikie June 3, 2024 14:18
@dwblaikie
Copy link
Collaborator

Anyone want to debate whether this should be "DW_AT_META" or "DW_AT_LLVM"? (mostly I think we've used LLVM for things that are usable within the LLVM ecosystem, which it seems the intent is that this might be - llvm-gsym, etc?) I guess the Apple accelerator tables were APPLE but perhaps in part because they were only usable on with the MachO debug info distribution mechanism?

@dwblaikie
Copy link
Collaborator

Might want/need to start with this patch split up - adding the assembly first, then using it for LLVM code generation.

Which, I think, will mean needing to define some assembly directives to use this functionality in the assembler/assembly code (which, at a very cursory glance of this patch, aren't provided yet)?

While it's true we do emit some DWARF when using the integrated assembler that we cannot emit via assembly (the ability to use multiple separate line tables, I think, in LTO - only done when going direct to machine code via the integrated assembler, and when going via assembly we get one monolithic line table) - it's not an ideal state and this feature's a bit more intrusive than whether or not it's one line table or smaller separate line tables (those are at least roughly semantically equivalent - this new proposed feature isn't just a difference in encoding, but in functionality/content)

@alx32
Copy link
Contributor Author

alx32 commented Jun 5, 2024

adding the assembly first, then using it for LLVM code generation.
Which, I think, will mean needing to define some assembly directives

Thank you for the input! I am not very familiar with this part of the codebase. Could you provide some pointers to similar existing functionality or tests that would be relevant to what the first part of this change should be?

The most similar feature to the new DW_AT_META_stmt_sequence is DW_AT_stmt_list. In the assembly tests, DW_AT_stmt_list appears represented like this: abbr_offset.s:38

	.byte 16                      # DW_AT_stmt_list
	...
	.long .Lline_table_start0     # DW_AT_stmt_list

It seems this representation should also work for DW_AT_META_stmt_sequence - without additional changes.
Am I missing something, or is there another aspect I should be looking into?

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@dwblaikie
Copy link
Collaborator

adding the assembly first, then using it for LLVM code generation.
Which, I think, will mean needing to define some assembly directives

Thank you for the input! I am not very familiar with this part of the codebase. Could you provide some pointers to similar existing functionality or tests that would be relevant to what the first part of this change should be?

The most similar feature to the new DW_AT_META_stmt_sequence is DW_AT_stmt_list. In the assembly tests, DW_AT_stmt_list appears represented like this: abbr_offset.s:38

	.byte 16                      # DW_AT_stmt_list
	...
	.long .Lline_table_start0     # DW_AT_stmt_list

It seems this representation should also work for DW_AT_META_stmt_sequence - without additional changes. Am I missing something, or is there another aspect I should be looking into?

So this works for the line table because the stmt_list refers to the very start of the line table - so the assembly code refers to that label, then later on it emits a label:

        .section        .debug_line,"",@progbits
.Lline_table_start0:

Then the assembler emits the line table during assembly.

So it's only possible to refer to the very start of the line table in this way - to refer into the line table at particular locations, it would require assembler features.

@pogo59
Copy link
Collaborator

pogo59 commented Jun 5, 2024

Currently, Clang generates function line information in the debug_line section without directly linking the line entries to their originating functions.

If you compile with -ffunction-sections, then there won't be any sequences in the line table that cross function boundaries. There will always be a DW_LNE_set_address opcode that points back to the start of the function. This pointer is in the opposite direction to what you are looking for, so it requires a bit more analysis to associate sequences with subprograms, but the information is there. Is that sufficient?

@alx32
Copy link
Contributor Author

alx32 commented Jun 6, 2024

This feature is primarily designed for MachO, so the following explanation is specific to that context:

@pogo59

There will always be a DW_LNE_set_address opcode that points back to the start of the function.

You're right that this method works well when we're dealing with a single object file. For instance, in foo.o, the line entries for the foo function will correctly point to the foo instructions based on the address, and the same goes for bar.o with the bar function.

However, the challenge arises when identical functions like foo and bar are merged during the linking process with Identical Code Folding (ICF). After linking foo.o and bar.o into a single binary (out.bin) and creating a debug symbol file (out.dSYM), there will be only one merged function in out.bin representing both foo and bar. In the out.dSYM, the line entries for both foo and bar will point to this single merged function. At this point, it becomes impossible to determine which line entries originally belonged to foo and which to bar.

Is there currently a way to attribute the line information in out.dSYM to either foo or bar ?

I think one possibility would be to leave the object files as-is and have dsymutil add additional attributes to out.dSYM - but I don't think this is an approach we take for anything currently. I did look into this a bit but it gets complicated really fast - I don't think the DWARF linker was designed with this in mind.

@alx32
Copy link
Contributor Author

alx32 commented Jun 6, 2024

@dwblaikie

So it's only possible to refer to the very start of the line table in this way

Thanks for clarifying. Alternatively, would it be possible to specify it as a numeric offset (at least for the purpose of this change) ? i.e:

 .long  0x41   # DW_AT_META_stmt_sequence

@dwblaikie
Copy link
Collaborator

I don't think it's possible to know that constant without essentially modeling everything the assembler would do (to know how large various encodings are, etc) - which is probably not portable (exactly how an assembler encodes the line table may vary between implementations, maybe between uses of a single implementation (might be able to enable different kinds of relaxation etc))

@pogo59
Copy link
Collaborator

pogo59 commented Jun 6, 2024

Without knowing the larger context in which you want to consume the per-function line table pointer, it's harder to know what to suggest. The analysis I mentioned earlier would have to be done in the linker prior to the ICF pass. Now it sounds like you are wanting to leave breadcrumbs for a debugger or other post-link consumer. I'm unfamiliar with .dsym files so I'm not sure what part they play. Is there an RFC or other higher-level description for how all the pieces fit together?

Assuming the new attribute is a reasonable way to go, here's a thought about how to make it work. Have the compiler create a symbol when it wants to emit DW_AT_META_stmt_sequence, and record that symbol somewhere (attached to some per-function data structure that the assembler will be looking at). Then when the assembler gets around to emitting the line table for the function, it can notice the symbol and define it at that point.

Note that if you want the assembler to emit end_sequence at the end of each function's part of the line table, you must begin each sequence with set_address. That is, the line-table generation must behave as if -ffunction-sections was in effect, even if it isn't. That probably wouldn't be too hard to arrange, but you'd need a new option or flag to pass down to the assembler.

@pogo59
Copy link
Collaborator

pogo59 commented Jun 6, 2024

Anyone want to debate whether this should be "DW_AT_META" or "DW_AT_LLVM"?

That kind of depends on the intended consumer, what it's actually used for, etc. If this is prototyping a way to solve the ICF confusion within debuggers, to be consumed by LLDB, with an eye toward making an actual DWARF proposal, then the LLVM vendor seems appropriate. If this is for some secret tooling within Meta, then META would be most appropriate.

@pogo59
Copy link
Collaborator

pogo59 commented Jun 6, 2024

This patch introduces a new Clang flag, -mllvm -emit-func-debug-line-table-offsets

Nit: Technically that is not a Clang flag. A "Clang flag" would be a driver or cc1 option, not -mllvm.

@alx32
Copy link
Contributor Author

alx32 commented Jun 7, 2024

Here is the RFC with more information: https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434

I will look into the the assembly attribute generation support.

@alx32 alx32 changed the title [DebugInfo] Add clang flag to enable function-level debug line attribution [DebugInfo] Add flag to enable function-level debug line attribution Jun 7, 2024
@alx32 alx32 closed this Jul 11, 2024
@alx32 alx32 deleted the 02_stmt_seq_clang branch July 11, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants