Skip to content

Symbolize line zero as if no source info is available #124846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

dwblaikie
Copy link
Collaborator

Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.

Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.
@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: David Blaikie (dwblaikie)

Changes

Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.


Full diff: https://github.com/llvm/llvm-project/pull/124846.diff

4 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+1-1)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+2-1)
  • (modified) llvm/test/tools/llvm-symbolizer/skip-line-zero.s (+2-2)
  • (modified) llvm/test/tools/llvm-symbolizer/sym-verbose.test (+2-3)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index 99e1642ff23ad5..15eac10688f18f 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1869,7 +1869,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
           LineTable->getFileLineInfoForAddress(
               {Address.Address, Address.SectionIndex}, Spec.ApproximateLine,
               CU->getCompilationDir(), Spec.FLIKind, Frame);
-      } else {
+      } else if (CallLine != 0) {
         // Otherwise, use call file, call line and call column from
         // previous DIE in inlined chain.
         if (LineTable)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 939a5163d55abc..a30ae163fa5a98 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -1508,7 +1508,8 @@ bool DWARFDebugLine::LineTable::getFileLineInfoForAddress(
     return false;
   // Take file number and line/column from the row.
   const auto &Row = Rows[RowIndex];
-  if (!getFileNameByIndex(Row.File, CompDir, Kind, Result.FileName))
+  if (Row.Line == 0 ||
+      !getFileNameByIndex(Row.File, CompDir, Kind, Result.FileName))
     return false;
   Result.Line = Row.Line;
   Result.Column = Row.Column;
diff --git a/llvm/test/tools/llvm-symbolizer/skip-line-zero.s b/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
index e9fbea558e0eb8..74dfb5cdc1aaea 100644
--- a/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
+++ b/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
@@ -20,13 +20,13 @@
 ## Check that without '--skip-line-zero', line zero is displayed for a line-table entry which has no source correspondence.
 # RUN: llvm-symbolizer --obj=%t.o -f=none 0x16d4 | FileCheck --strict-whitespace --match-full-lines --check-prefix=DISABLE %s
 
-# DISABLE:main.c:0:0
+# DISABLE:??:0:0
 
 ## Check that the '--skip-line-zero' does not cross sequence boundaries.
 ## If it fails to find in the current sequence then line zero is returned for the queried address.
 # RUN: llvm-symbolizer --obj=%t.o -f=none --skip-line-zero 0x16c0 | FileCheck --strict-whitespace --match-full-lines --check-prefix=FAIL-ACROSS-SEQ %s
 
-# FAIL-ACROSS-SEQ:main.c:0:0
+# FAIL-ACROSS-SEQ:??:0:0
 
 ## Check that with '--skip-line-zero', the last non-zero line in the current sequence is displayed.
 # RUN: llvm-symbolizer --obj=%t.o -f=none --skip-line-zero 0x1717 | FileCheck --strict-whitespace --match-full-lines --check-prefix=WITHIN-SEQ %s
diff --git a/llvm/test/tools/llvm-symbolizer/sym-verbose.test b/llvm/test/tools/llvm-symbolizer/sym-verbose.test
index 831fd6c7f05071..224c317f558a1a 100644
--- a/llvm/test/tools/llvm-symbolizer/sym-verbose.test
+++ b/llvm/test/tools/llvm-symbolizer/sym-verbose.test
@@ -50,13 +50,12 @@ CHECK-NEXT: Column: 0
 
 CHECK: 0x4005ad
 CHECK-NEXT: foo
-CHECK-NEXT: Filename: /tmp{{[\\/]}}discrim.c
+CHECK-NEXT: Filename: ??
 CHECK-NEXT: Function start filename: /tmp{{[\\/]}}discrim.c
 CHECK-NEXT: Function start line: 4
 CHECK-NEXT: Function start address: 0x400590
 CHECK-NEXT: Line: 0
-CHECK-NEXT: Column: 30
-CHECK-NEXT: Discriminator: 4
+CHECK-NEXT: Column: 0
 CHECK-NEXT: main
 CHECK-NEXT: Filename: /tmp{{[\\/]}}discrim.c
 CHECK-NEXT: Function start filename: /tmp{{[\\/]}}discrim.c

@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/pr-subscribers-debuginfo

Author: David Blaikie (dwblaikie)

Changes

Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.


Full diff: https://github.com/llvm/llvm-project/pull/124846.diff

4 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+1-1)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+2-1)
  • (modified) llvm/test/tools/llvm-symbolizer/skip-line-zero.s (+2-2)
  • (modified) llvm/test/tools/llvm-symbolizer/sym-verbose.test (+2-3)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index 99e1642ff23ad5..15eac10688f18f 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1869,7 +1869,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address,
           LineTable->getFileLineInfoForAddress(
               {Address.Address, Address.SectionIndex}, Spec.ApproximateLine,
               CU->getCompilationDir(), Spec.FLIKind, Frame);
-      } else {
+      } else if (CallLine != 0) {
         // Otherwise, use call file, call line and call column from
         // previous DIE in inlined chain.
         if (LineTable)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 939a5163d55abc..a30ae163fa5a98 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -1508,7 +1508,8 @@ bool DWARFDebugLine::LineTable::getFileLineInfoForAddress(
     return false;
   // Take file number and line/column from the row.
   const auto &Row = Rows[RowIndex];
-  if (!getFileNameByIndex(Row.File, CompDir, Kind, Result.FileName))
+  if (Row.Line == 0 ||
+      !getFileNameByIndex(Row.File, CompDir, Kind, Result.FileName))
     return false;
   Result.Line = Row.Line;
   Result.Column = Row.Column;
diff --git a/llvm/test/tools/llvm-symbolizer/skip-line-zero.s b/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
index e9fbea558e0eb8..74dfb5cdc1aaea 100644
--- a/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
+++ b/llvm/test/tools/llvm-symbolizer/skip-line-zero.s
@@ -20,13 +20,13 @@
 ## Check that without '--skip-line-zero', line zero is displayed for a line-table entry which has no source correspondence.
 # RUN: llvm-symbolizer --obj=%t.o -f=none 0x16d4 | FileCheck --strict-whitespace --match-full-lines --check-prefix=DISABLE %s
 
-# DISABLE:main.c:0:0
+# DISABLE:??:0:0
 
 ## Check that the '--skip-line-zero' does not cross sequence boundaries.
 ## If it fails to find in the current sequence then line zero is returned for the queried address.
 # RUN: llvm-symbolizer --obj=%t.o -f=none --skip-line-zero 0x16c0 | FileCheck --strict-whitespace --match-full-lines --check-prefix=FAIL-ACROSS-SEQ %s
 
-# FAIL-ACROSS-SEQ:main.c:0:0
+# FAIL-ACROSS-SEQ:??:0:0
 
 ## Check that with '--skip-line-zero', the last non-zero line in the current sequence is displayed.
 # RUN: llvm-symbolizer --obj=%t.o -f=none --skip-line-zero 0x1717 | FileCheck --strict-whitespace --match-full-lines --check-prefix=WITHIN-SEQ %s
diff --git a/llvm/test/tools/llvm-symbolizer/sym-verbose.test b/llvm/test/tools/llvm-symbolizer/sym-verbose.test
index 831fd6c7f05071..224c317f558a1a 100644
--- a/llvm/test/tools/llvm-symbolizer/sym-verbose.test
+++ b/llvm/test/tools/llvm-symbolizer/sym-verbose.test
@@ -50,13 +50,12 @@ CHECK-NEXT: Column: 0
 
 CHECK: 0x4005ad
 CHECK-NEXT: foo
-CHECK-NEXT: Filename: /tmp{{[\\/]}}discrim.c
+CHECK-NEXT: Filename: ??
 CHECK-NEXT: Function start filename: /tmp{{[\\/]}}discrim.c
 CHECK-NEXT: Function start line: 4
 CHECK-NEXT: Function start address: 0x400590
 CHECK-NEXT: Line: 0
-CHECK-NEXT: Column: 30
-CHECK-NEXT: Discriminator: 4
+CHECK-NEXT: Column: 0
 CHECK-NEXT: main
 CHECK-NEXT: Filename: /tmp{{[\\/]}}discrim.c
 CHECK-NEXT: Function start filename: /tmp{{[\\/]}}discrim.c

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, but probably worth getting one or two others to confirm they're happy with this idea too.

@rnk
Copy link
Collaborator

rnk commented Feb 5, 2025

Can you link to the code or documentation that explains this DWARF line table size encoding optimization? My reading of DILocation::getMergedLocation is that we intend to create DILocations with column zero when we choose to use line zero, but maybe there is a logic bug in there that I can't see yet.

@pogo59
Copy link
Collaborator

pogo59 commented Feb 5, 2025

Not sure if this is what you're asking for, but DWARF has compact encoding for line-table entries that advance the line number. That is, things like "line=line+1" are very compact. The idea is that line numbers tend to go up by relatively small amounts, so it's worth making that not take up a lot of space.
There is no such compaction for column numbers, you just set the column number as a literal. They bounce around too much for any worthwhile compaction tactic.

I remember doing some work to make sure that going to line 0 and back did NOT affect/reset the column number. Setting the column number to zero, and then putting it back, seemed like a waste of space.

Line 0 means "no source" regardless of file or column or anything else. I think what David is doing here is reasonable for symbolizing. The other source-coordinate parameters are meaningless when line = 0.

I haven't looked at what getMergedLocation is doing, but again you should look at it from the perspective of "line=0 means the rest is meaningless."

@compnerd
Copy link
Member

compnerd commented Feb 5, 2025

Agreed, what David is doing here seems correct. Lines are 1-based in DWARF which is why line 0 is treated as a special marker to indicate no line information.

@rnk
Copy link
Collaborator

rnk commented Feb 5, 2025

I agree with everything you said, but where is this actually implemented in the LLVM source code? I looked hard at the MC DWARF line table encoding, but I couldn't find where this is implemented.

@dwblaikie
Copy link
Collaborator Author

Can you link to the code or documentation that explains this DWARF line table size encoding optimization? My reading of DILocation::getMergedLocation is that we intend to create DILocations with column zero when we choose to use line zero, but maybe there is a logic bug in there that I can't see yet.

The DWARF spec wording for this in DWARFv5 (https://dwarfstd.org/doc/DWARF5.pdf) is page 151, table 6.3 which says "The compiler may emit the value 0 in cases where an instruction cannot be attributed to any source line." (by "line" read that as "entry" - it doesn't mean "oh, there's no line, but maybe there's a column", the column is a modifier on the line - so if there's no line, the column is meaningless (& if there's zero line, the file doesn't matter, etc))

Admittedly, the DWARF spec isn't /entirely/ clear that the file doesn't matter if line is zero - you could argue that the file is meaningful even if the line is zero and means the instruction came from "somewhere in this file" - but that generally hasn't been the interpretation the LLVM project has taken. (we could file a DWARF issue to get this clarified)

So I wrote a small test case with DILocations, and didn't observe the behavior I was expecting - LLVM lowering did respect the varying column on the instructions.

That's because we rarely use line zero on an instruction, I think - we mostly remove the location from the instruction entirely (and that usually causes the location of a previous instruction to "flow" on to the unlocated instruction, when th eline table is created).

But a place where we then do produce a zero location - is when a flow-on location would be problematic. Especially at the start of a basic block, where a previous block's location would flow across the boundary which would be bad (sample accuracy, human user - it'd show we are in a block we aren't in, might not be reached/reachable/hot, etc)

Ah, here we go:

// See if we have a reason to emit a line-0 record now.
// Reasons to emit a line-0 record include:
// - User asked for it (UnknownLocations).
// - Instruction has a label, so it's referenced from somewhere else,
// possibly debug information; we want it to have a source location.
// - Instruction is at the top of a block; we don't want to inherit the
// location from the physically previous (maybe unrelated) block.
if (UnknownLocations == Enable || PrevLabel ||
(PrevInstBB && PrevInstBB != MI->getParent())) {
// Preserve the file and column numbers, if we can, to save space in
// the encoded line table.
// Do not update PrevInstLoc, it remembers the last non-0 line.
const MDNode *Scope = nullptr;
unsigned Column = 0;
if (PrevInstLoc) {
Scope = PrevInstLoc.getScope();
Column = PrevInstLoc.getCol();
}
recordSourceLine(/*Line=*/0, Column, Scope, /*Flags=*/0);
}

@dwblaikie
Copy link
Collaborator Author

We could probably sink this handling down into recordSourceLine so it does kick in for explicitly zero DILocation'd instructions too (these happen for things like call instructions that get merged)

@rnk
Copy link
Collaborator

rnk commented Feb 6, 2025

Thanks for the pointer!

@pogo59
Copy link
Collaborator

pogo59 commented Feb 6, 2025

Admittedly, the DWARF spec isn't /entirely/ clear that the file doesn't matter if line is zero - you could argue that the file is meaningful even if the line is zero and means the instruction came from "somewhere in this file" - but that generally hasn't been the interpretation the LLVM project has taken. (we could file a DWARF issue to get this clarified)

Not sure I've seen anyone suggest the "somewhere in this file" interpretation before. I find it hard to imagine "from this file but no particular line" as being usefully distinct from "no source line whatsoever." (And DWARF (at least as of v5) has no way to specify "no file" in any case.)

@dwblaikie dwblaikie merged commit c32cd57 into llvm:main Feb 6, 2025
11 checks passed
@dwblaikie dwblaikie deleted the symbolizer_line_zero_filename branch February 6, 2025 18:45
jplehr added a commit to jplehr/llvm-project that referenced this pull request Feb 7, 2025
In 124846 the symbolizer was changed to ignore 0-column entries, which
lead to a slightly different representation in the stack traces. This
patch addresses these differences.

Not sure if the difference in kernel_trap.c is also a result of this
change or not.
Can be tracked separate from this, after the bots are back to green.
jplehr added a commit that referenced this pull request Feb 7, 2025
In #124846 the symbolizer was changed to ignore 0-column entries, which
lead to a slightly different representation in the stack traces. This
patch addresses these differences.

Not sure if the difference in kernel_trap.c is also a result of this
change or not.
Can be tracked separate from this, after the bots are back to green.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
In llvm#124846 the symbolizer was changed to ignore 0-column entries, which
lead to a slightly different representation in the stack traces. This
patch addresses these differences.

Not sure if the difference in kernel_trap.c is also a result of this
change or not.
Can be tracked separate from this, after the bots are back to green.
ZequanWu added a commit that referenced this pull request Feb 24, 2025
This commit creates an inconsistency on `__sanitizer_symbolize_pc` API. Before this change, this API always uses the filename from debug info when the line number is 0. After this change, the filename becomes invalid when line number is 0. The sanitizer might fall back to use base filename from symbol table. So, this API may return either `??:0` or `{filename from symbol table}:0` depending on if the symbol table has the filename for it. Make sure this inconsistency is resolved before relanding the commit.
ZequanWu added a commit that referenced this pull request Feb 24, 2025
ZequanWu added a commit that referenced this pull request Feb 24, 2025
The dependency commit was reverted at 23aca2f. Reverting this as well.
ZequanWu added a commit that referenced this pull request Mar 25, 2025
…for the given address is available. (#128619)

To reland #124846, we need to
make symbolizer consistent with the case when line number is 0. Always
using filename and line from debug info even if the line number is 0
sounds like the reasonable path to go.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 25, 2025
…debug info for the given address is available. (#128619)

To reland llvm/llvm-project#124846, we need to
make symbolizer consistent with the case when line number is 0. Always
using filename and line from debug info even if the line number is 0
sounds like the reasonable path to go.
ZequanWu added a commit to ZequanWu/llvm-project that referenced this pull request Mar 31, 2025
…24846)"

This land commits 23aca2f and 1b15a89. llvm#128619 makes symbolizer to always use debug info when available so we can reland this chagnge.
ZequanWu added a commit that referenced this pull request Mar 31, 2025
…" (#133798)

This land commits 23aca2f and
1b15a89.
#128619 makes symbolizer to
always use debug info when available so we can reland this chagnge.
ZequanWu added a commit that referenced this pull request Apr 10, 2025
…#124846)" (#133798)"

This reverts commit 3483740 because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.
AllinLeeYL pushed a commit to AllinLeeYL/llvm-project that referenced this pull request Apr 10, 2025
…llvm#124846)" (llvm#133798)"

This reverts commit 3483740 because llvm#128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…llvm#124846)" (llvm#133798)"

This reverts commit 3483740 because llvm#128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants