[DwarfDump] Add new set of line-table-related statistics to llvm-dwarfdump #93289

SLTozer · 2024-05-24T10:17:44Z

This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only).

Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.

I've not worked hard to make this performant, under the assumption that llvm-dwarfdump is generally running offline and does not run into serious performance constraints; if this is not the case, then there may be some suitable improvements in the uniquing and file index behaviour. The names of the statistics are straightforward and descriptive, but if there is a particular style/pattern they ought to match then I'm open to changing them as well.

This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only). Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.

llvmbot · 2024-05-24T10:18:19Z

@llvm/pr-subscribers-debuginfo

Author: Stephen Tozer (SLTozer)

Changes

This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only).

Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.

I've not worked hard to make this performant, under the assumption that llvm-dwarfdump is generally running offline and does not run into serious performance constraints; if this is not the case, then there may be some suitable improvements in the uniquing and file index behaviour. The names of the statistics are straightforward and descriptive, but if there is a particular style/pattern they ought to match then I'm open to changing them as well.

Full diff: https://github.com/llvm/llvm-project/pull/93289.diff

2 Files Affected:

(modified) llvm/test/tools/llvm-dwarfdump/X86/locstats.ll (+6)
(modified) llvm/tools/llvm-dwarfdump/Statistics.cpp (+78)

diff --git a/llvm/test/tools/llvm-dwarfdump/X86/locstats.ll b/llvm/test/tools/llvm-dwarfdump/X86/locstats.ll
index f850119acb000..415f092dc7da7 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/locstats.ll
+++ b/llvm/test/tools/llvm-dwarfdump/X86/locstats.ll
@@ -89,6 +89,12 @@
 ; CHECK-NEXT: "#local vars - entry values with [80%,90%) of parent scope covered by DW_AT_location": 1,
 ; CHECK-NEXT: "#local vars - entry values with [90%,100%) of parent scope covered by DW_AT_location": 0,
 ; CHECK-NEXT: "#local vars - entry values with 100% of parent scope covered by DW_AT_location": 1
+; CHECK-NEXT: "#bytes with line information": 51,
+; CHECK-NEXT: "#bytes with line-0 locations": 3,
+; CHECK-NEXT: "#line entries": 7,
+; CHECK-NEXT: "#line entries marked is_stmt": 5,
+; CHECK-NEXT: "#line entries (unique)": 6,
+; CHECK-NEXT: "#line entries (unique non-0)": 5
 
 ; The source code of the test case:
 ; extern void fn3(int *);
diff --git a/llvm/tools/llvm-dwarfdump/Statistics.cpp b/llvm/tools/llvm-dwarfdump/Statistics.cpp
index 96841c3c387bd..28e597ba8894f 100644
--- a/llvm/tools/llvm-dwarfdump/Statistics.cpp
+++ b/llvm/tools/llvm-dwarfdump/Statistics.cpp
@@ -8,6 +8,7 @@
 
 #include "llvm-dwarfdump.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
 #include "llvm/ADT/StringSet.h"
 #include "llvm/DebugInfo/DWARF/DWARFContext.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
@@ -188,6 +189,16 @@ struct LocationStats {
   /// Total number of local variables processed.
   SaturatingUINT64 NumVar = 0;
 };
+
+/// Holds accumulated debug line statistics across all CUs.
+struct LineStats {
+  SaturatingUINT64 NumBytes = 0;
+  SaturatingUINT64 NumLineZeroBytes = 0;
+  SaturatingUINT64 NumEntries = 0;
+  SaturatingUINT64 NumIsStmtEntries = 0;
+  SaturatingUINT64 NumUniqueEntries = 0;
+  SaturatingUINT64 NumUniqueNonZeroEntries = 0;
+};
 } // namespace
 
 /// Collect debug location statistics for one DIE.
@@ -848,6 +859,7 @@ bool dwarfdump::collectStatsForObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
   StringRef FormatName = Obj.getFileFormatName();
   GlobalStats GlobalStats;
   LocationStats LocStats;
+  LineStats LnStats;
   StringMap<PerFunctionStats> Statistics;
   // This variable holds variable information for functions with
   // abstract_origin globally, across all CUs.
@@ -856,6 +868,12 @@ bool dwarfdump::collectStatsForObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
   // abstract_origin.
   FunctionDIECUTyMap AbstractOriginFnCUs;
   CrossCUReferencingDIELocationTy CrossCUReferencesToBeResolved;
+  // Line, Col, File
+  using LineTuple = std::tuple<uint32_t, uint16_t, uint16_t>;
+  SmallVector<std::string> Files;
+  DenseSet<LineTuple> UniqueLines;
+  DenseSet<LineTuple> UniqueNonZeroLines;
+
   for (const auto &CU : static_cast<DWARFContext *>(&DICtx)->compile_units()) {
     if (DWARFDie CUDie = CU->getNonSkeletonUnitDIE(false)) {
       // This variable holds variable information for functions with
@@ -882,8 +900,58 @@ bool dwarfdump::collectStatsForObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
         CrossCUReferencesToBeResolved.push_back(
             DIELocation(CUDie.getDwarfUnit(), CrossCUReferencingDIEOffset));
     }
+    if (const auto *LineTable = DICtx.getLineTableForUnit(CU.get())) {
+      auto LastFileIdxOpt = LineTable->getLastValidFileIndex();
+      // Each CU has its own file index; in order to track unique line entries
+      // across CUs, we therefore need to map each CU file index to a global
+      // file index, which we store here.
+      DenseMap<uint64_t, uint16_t> CUFileMapping;
+      if (LastFileIdxOpt) {
+        std::string File;
+        for (uint64_t FileIdx = 0; FileIdx <= *LastFileIdxOpt; ++FileIdx) {
+          if (LineTable->getFileNameByIndex(
+                  FileIdx, CU->getCompilationDir(),
+                  DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath,
+                  File)) {
+            auto ExistingFile = llvm::find(Files, File);
+            if (ExistingFile != Files.end()) {
+              CUFileMapping[FileIdx] =
+                  std::distance(Files.begin(), ExistingFile);
+            } else {
+              CUFileMapping[FileIdx] = Files.size();
+              Files.push_back(File);
+            }
+          }
+        }
+      }
+      for (auto Seq : LineTable->Sequences) {
+        LnStats.NumBytes += Seq.HighPC - Seq.LowPC;
+        // Ignore the `end_sequence` entry, since it's not interesting for us.
+        LnStats.NumEntries += Seq.LastRowIndex - Seq.FirstRowIndex - 1;
+        for (size_t RowIdx = Seq.FirstRowIndex; RowIdx < Seq.LastRowIndex - 1;
+             ++RowIdx) {
+          auto Entry = LineTable->Rows[RowIdx];
+          if (Entry.IsStmt)
+            LnStats.NumIsStmtEntries += 1;
+          assert(CUFileMapping.contains(Entry.File) &&
+                 "Should have been collected earlier!");
+          uint16_t MappedFile = CUFileMapping[Entry.File];
+          UniqueLines.insert({Entry.Line, Entry.Column, MappedFile});
+          if (Entry.Line != 0) {
+            UniqueNonZeroLines.insert({Entry.Line, Entry.Column, MappedFile});
+          } else {
+            auto EntryStartAddress = Entry.Address.Address;
+            auto EntryEndAddress = LineTable->Rows[RowIdx + 1].Address.Address;
+            LnStats.NumLineZeroBytes += EntryEndAddress - EntryStartAddress;
+          }
+        }
+      }
+    }
   }
 
+  LnStats.NumUniqueEntries = UniqueLines.size();
+  LnStats.NumUniqueNonZeroEntries = UniqueNonZeroLines.size();
+
   /// Resolve CrossCU references.
   collectZeroLocCovForVarsWithCrossCUReferencingAbstractOrigin(
       LocStats, AbstractOriginFnCUs, GlobalAbstractOriginFnInfo,
@@ -1043,6 +1111,16 @@ bool dwarfdump::collectStatsForObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
   printLocationStats(J, "#local vars", LocStats.LocalVarLocStats);
   printLocationStats(J, "#local vars - entry values",
                      LocStats.LocalVarNonEntryValLocStats);
+
+  // Print line statistics for the object file.
+  printDatum(J, "#bytes with line information", LnStats.NumBytes.Value);
+  printDatum(J, "#bytes with line-0 locations", LnStats.NumLineZeroBytes.Value);
+  printDatum(J, "#line entries", LnStats.NumEntries.Value);
+  printDatum(J, "#line entries marked is_stmt", LnStats.NumIsStmtEntries.Value);
+  printDatum(J, "#line entries (unique)", LnStats.NumUniqueEntries.Value);
+  printDatum(J, "#line entries (unique non-0)",
+             LnStats.NumUniqueNonZeroEntries.Value);
+
   J.objectEnd();
   OS << '\n';
   LLVM_DEBUG(

jryans

Looks good to me, seems quite useful to have! 😄

jryans · 2024-05-24T11:53:59Z

Since this adds new stats, I suppose the schema on LNT may need to be updated... Perhaps @adrian-prantl can help with that once this merges.

djolertrk

LGTM (with some minor comments).

Thanks!

djolertrk · 2024-05-24T12:44:09Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

+          }
+        }
+      }
+      for (auto Seq : LineTable->Sequences) {


const auto& instead?

djolertrk · 2024-05-24T12:44:59Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

+      // across CUs, we therefore need to map each CU file index to a global
+      // file index, which we store here.
+      DenseMap<uint64_t, uint16_t> CUFileMapping;
+      if (LastFileIdxOpt) {


an early exit here is better

Good point - LastFileIdxOpt doesn't technically need to be valid for the rest of the logic, but a line table without files wouldn't have any sequences either. That being said this part isn't a loop so can't be early-exited; I can move this and the LineTable variable out of the if block and use them together as a precondition though.

djolertrk · 2024-05-24T12:45:39Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

@@ -856,6 +868,12 @@ bool dwarfdump::collectStatsForObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
  // abstract_origin.
  FunctionDIECUTyMap AbstractOriginFnCUs;
  CrossCUReferencingDIELocationTy CrossCUReferencesToBeResolved;
+  // Line, Col, File


Improve the comment please

SLTozer · 2024-05-24T14:10:33Z

Since this adds new stats, I suppose the schema on LNT may need to be updated... Perhaps @adrian-prantl can help with that once this merges.

My assumption at the moment would be that this is passed through as JSON, s.t. new fields will be ignored by default - so it shouldn't cause any problems at least for the existing stats, though it would be good to get these added!

Without getting ahead of things too much (I'll have a discourse post soon going more into the topic), it looks as though the unique lines metric has been degrading in clang over time, so it'd be good to get some continuous measurement.

jryans · 2024-05-24T14:18:41Z

My assumption at the moment would be that this is passed through as JSON, s.t. new fields will be ignored by default - so it shouldn't cause any problems at least for the existing stats, though it would be good to get these added!

Ah yeah, I don't think it will cause problems either, I was more just thinking about ensuring the new fields are tracked over time going forward. But yeah, IIUC, there's no rush to do anything immediately after merging this because of the JSON structure as you mentioned.

Without getting ahead of things too much (I'll have a discourse post soon going more into the topic), it looks as though the unique lines metric has been degrading in clang over time, so it'd be good to get some continuous measurement.

Ah, good to know. Would be great to have more people watching those stats as they evolve... I might be one such person soon, we'll see. 😄

OCHyams · 2024-05-24T14:23:52Z

Do we need any more targeted tests for the stats (for example, exercising the cu-unique-file-id code you added)? llvm/test/tools/llvm-dwarfdump/X86/locstats.ll provides some coverage, maybe that's enough to flag up future changes causing issues. I don't have a strong opinion on this but thought it should be brought up. wdyt?

SLTozer · 2024-05-24T14:36:16Z

Do we need any more targeted tests for the stats (for example, exercising the cu-unique-file-id code you added)?

Could add a second test that uses multiple compile units - not sure what the best way to handle that is, but I'm sure there'll be some existing lit examples I can base it on.

adrian-prantl

Since this is looking at .debug_line — have you tested this with both DWARF 4 and DWARF 5 inputs?

felipepiovezan · 2024-05-24T16:43:43Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

+      // across CUs, we therefore need to map each CU file index to a global
+      // file index, which we store here.
+      DenseMap<uint64_t, uint16_t> CUFileMapping;
+      std::string File;


The scope of this variable can be restricted to being inside the loop, right now this is giving the impression it is needed across loop iterations or outside the loop

felipepiovezan · 2024-05-24T16:51:04Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

+      // file index, which we store here.
+      DenseMap<uint64_t, uint16_t> CUFileMapping;
+      std::string File;
+      for (uint64_t FileIdx = 0; FileIdx <= *LastFileIdxOpt; ++FileIdx) {


Aren't dwarf 4 indices 1-based and 0-based in dwarf 5?

They are - but the method getFileNameByIndex will return std::nullopt if you fetch index 0 for DWARFv4 and earlier, and we have the correct final index for the for-loop from getLastValidFileIndex, so in this case this loop still does the right thing.

SLTozer · 2024-05-24T17:35:22Z

Since this is looking at .debug_line — have you tested this with both DWARF 4 and DWARF 5 inputs?

I've updated the locstats.ll to test for DWARF versions 2-5; only DWARF version 1 results in a failure, which happens while emitting the object file (presumably because some of the DIExpressions are unsupported).

pogo59 · 2024-05-24T17:56:51Z

only DWARF version 1 results in a failure

In practice, there is no DWARF version 1.

OCHyams · 2024-05-28T15:59:44Z

llvm/tools/llvm-dwarfdump/Statistics.cpp

+  printDatum(J, "#bytes with line information", LnStats.NumBytes.Value);
+  printDatum(J, "#bytes with line-0 locations", LnStats.NumLineZeroBytes.Value);
+  printDatum(J, "#line entries", LnStats.NumEntries.Value);
+  printDatum(J, "#line entries marked is_stmt", LnStats.NumIsStmtEntries.Value);


Suggested change

printDatum(J, "#line entries marked is_stmt", LnStats.NumIsStmtEntries.Value);

printDatum(J, "#line entries (is_stmt)", LnStats.NumIsStmtEntries.Value);

More in keeping with lines below?

SLTozer · 2024-06-07T08:31:46Z

I think this is about ready to merge; only thing I couldn't do was get a proper test for tracking uniqueness across multiple CUs. I can create an IR file with multiple DICompileUnits, but that doesn't mean multiple separate compile units in the output object. I think I can't make such a test without either just committing the test object directly, or linking. Is test coverage a blocker for the patch? I don't want to commit without coverage for some functionality, but I'm not sure what the preferred approach for this is.

OCHyams · 2024-06-11T13:43:46Z

I think this is about ready to merge; only thing I couldn't do was get a proper test for tracking uniqueness across multiple CUs.

It might be possible using yaml2obj? Setting up yaml2obj tests for DWARF can be pretty fiddly though. I don't want to block this patch if others are broadly happy though

SLTozer · 2024-06-11T13:47:34Z

It might be possible using yaml2obj? Setting up yaml2obj tests for DWARF can be pretty fiddly though. I don't want to block this patch if others are broadly happy though

I looked at yaml2obj, couldn't figure out a way to get it to do what I need - it seems like it doesn't support debug lines, and if it does I haven't found out how to make it do so.

…fdump (llvm#93289) This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only). Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.

SLTozer added the tools:llvm-dwarfdump label May 24, 2024

SLTozer requested review from OCHyams, dwblaikie and djtodoro May 24, 2024 10:17

SLTozer self-assigned this May 24, 2024

llvmbot added the debuginfo label May 24, 2024

jryans approved these changes May 24, 2024

View reviewed changes

djolertrk approved these changes May 24, 2024

View reviewed changes

Address review comments

e884612

adrian-prantl approved these changes May 24, 2024

View reviewed changes

felipepiovezan reviewed May 24, 2024

View reviewed changes

Address further review comments

f36d004

OCHyams reviewed May 28, 2024

View reviewed changes

Change name of unique is_stmt lines stat

bfb20ed

SLTozer merged commit 451f3fc into llvm:main Jun 13, 2024
5 of 7 checks passed

	printDatum(J, "#line entries marked is_stmt", LnStats.NumIsStmtEntries.Value);
	printDatum(J, "#line entries (is_stmt)", LnStats.NumIsStmtEntries.Value);

[DwarfDump] Add new set of line-table-related statistics to llvm-dwarfdump #93289

[DwarfDump] Add new set of line-table-related statistics to llvm-dwarfdump #93289

Uh oh!

Conversation

SLTozer commented May 24, 2024

Uh oh!

llvmbot commented May 24, 2024

Uh oh!

jryans left a comment

Choose a reason for hiding this comment

Uh oh!

jryans commented May 24, 2024

Uh oh!

djolertrk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer commented May 24, 2024

Uh oh!

jryans commented May 24, 2024

Uh oh!

OCHyams commented May 24, 2024

Uh oh!

SLTozer commented May 24, 2024

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer commented May 24, 2024

Uh oh!

pogo59 commented May 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer commented Jun 7, 2024

Uh oh!

OCHyams commented Jun 11, 2024

Uh oh!

SLTozer commented Jun 11, 2024

Uh oh!

Uh oh!

Uh oh!