-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DWARF] Emit line 0 source locations for BB padding nops #99496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch makes LLVM emit line 0 source locations for nops emitted as basic block padding. Co-authored-by: Orlando Cazalet-Hyams <[email protected]>
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-backend-x86 Author: Nabeel Omer (omern1) ChangesThis patch makes LLVM emit line 0 source locations for nops emitted as basic block padding. CC @OCHyams Full diff: https://github.com/llvm/llvm-project/pull/99496.diff 7 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/DebugHandlerBase.h b/llvm/include/llvm/CodeGen/DebugHandlerBase.h
index 36a844e7087fa..30e56269b529d 100644
--- a/llvm/include/llvm/CodeGen/DebugHandlerBase.h
+++ b/llvm/include/llvm/CodeGen/DebugHandlerBase.h
@@ -137,6 +137,8 @@ class DebugHandlerBase {
void beginBasicBlockSection(const MachineBasicBlock &MBB);
void endBasicBlockSection(const MachineBasicBlock &MBB);
+ virtual void beginCodeAlignment(const MachineBasicBlock& MBB) {}
+
/// Return Label preceding the instruction.
MCSymbol *getLabelBeforeInsn(const MachineInstr *MI);
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 1f59ec545b4f7..78cb5df4bd02c 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -3967,7 +3967,11 @@ void AsmPrinter::emitBasicBlockStart(const MachineBasicBlock &MBB) {
// Emit an alignment directive for this block, if needed.
const Align Alignment = MBB.getAlignment();
if (Alignment != Align(1))
+ {
+ for (auto &Handler : DebugHandlers)
+ Handler->beginCodeAlignment(MBB);
emitAlignment(Alignment, nullptr, MBB.getMaxBytesForAlignment());
+ }
// If the block has its address taken, emit any labels that were used to
// reference the block. It is possible that there is more than one label
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 80cd5ec501f25..36e53702930f8 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -3668,3 +3668,19 @@ bool DwarfDebug::alwaysUseRanges(const DwarfCompileUnit &CU) const {
return true;
return false;
}
+
+void DwarfDebug::beginCodeAlignment(const MachineBasicBlock& MBB) {
+ auto *SP = MBB.getParent()->getFunction().getSubprogram();
+
+ bool NoDebug =
+ !SP || SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug;
+
+ if (NoDebug)
+ return;
+
+ auto PrevLoc = Asm->OutStreamer->getContext().getCurrentDwarfLoc();
+ Asm->OutStreamer->emitDwarfLocDirective(PrevLoc.getFileNum(), 0,
+ 0, 0, 0, 0, StringRef());
+ MCDwarfLineEntry::make(Asm->OutStreamer.get(),
+ Asm->OutStreamer->getCurrentSectionOnly());
+}
\ No newline at end of file
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
index 452485b632c45..eb474c0366d82 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
@@ -727,6 +727,9 @@ class DwarfDebug : public DebugHandlerBase {
/// Process beginning of an instruction.
void beginInstruction(const MachineInstr *MI) override;
+ /// Process beginning of code alignment.
+ void beginCodeAlignment(const MachineBasicBlock& MBB) override;
+
/// Perform an MD5 checksum of \p Identifier and return the lower 64 bits.
static uint64_t makeTypeSignature(StringRef Identifier);
diff --git a/llvm/test/CodeGen/X86/fsafdo_test1.ll b/llvm/test/CodeGen/X86/fsafdo_test1.ll
index 61c0f59aba6f8..e80a7f2f354f2 100644
--- a/llvm/test/CodeGen/X86/fsafdo_test1.ll
+++ b/llvm/test/CodeGen/X86/fsafdo_test1.ll
@@ -4,9 +4,9 @@
; Check that fs-afdo discriminators are generated.
; V01: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3
; V01: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5
-; V0: .loc 1 9 5 is_stmt 0 discriminator 11266 # foo.c:9:5
+; V0: .loc 1 9 5 discriminator 11266 # foo.c:9:5
; V0: .loc 1 7 3 is_stmt 1 discriminator 11266 # foo.c:7:3
-; V1: .loc 1 9 5 is_stmt 0 discriminator 514 # foo.c:9:5
+; V1: .loc 1 9 5 discriminator 514 # foo.c:9:5
; V1: .loc 1 7 3 is_stmt 1 discriminator 258 # foo.c:7:3
; Check that variable __llvm_fs_discriminator__ is generated.
; V01: .type __llvm_fs_discriminator__,@object # @__llvm_fs_discriminator__
diff --git a/llvm/test/CodeGen/X86/fsafdo_test4.ll b/llvm/test/CodeGen/X86/fsafdo_test4.ll
index 6a22ea9822412..aa05a0c4cd3ef 100644
--- a/llvm/test/CodeGen/X86/fsafdo_test4.ll
+++ b/llvm/test/CodeGen/X86/fsafdo_test4.ll
@@ -1,11 +1,16 @@
-; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false < %s | FileCheck %s
-; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s | FileCheck %s
+; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false < %s | FileCheck --implicit-check-not=.loc %s
+; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s | FileCheck --implicit-check-not=.loc %s
;
; Check that fs-afdo discriminators are NOT generated, as debugInfoForProfiling is false (not set).
+; CHECK: .loc 1 7 15 prologue_end discriminator 2 # foo.c:7:15
; CHECK: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3
+; CHECK: .loc 1 0 3 # foo.c:0:3
; CHECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5
-; CHECK-NOT: .loc 1 9 5 is_stmt 0 discriminator
-; CHECK-NOT: .loc 1 7 3 is_stmt 1 discriminator
+; CHECK: .loc 1 0 0 is_stmt 0 # :0:0
+; CHECK: .loc 1 9 5 discriminator 2 # foo.c:9:5
+; CHECK: .loc 1 0 0 # :0:0
+; CHECK: .loc 1 7 3 is_stmt 1 discriminator 2 # foo.c:7:3
+; CHECK: .loc 1 14 3 # foo.c:14:3
; Check that variable __llvm_fs_discriminator__ is NOT generated.
; CHECK-NOT: __llvm_fs_discriminator__:
diff --git a/llvm/test/DebugInfo/X86/loop-align-debug.ll b/llvm/test/DebugInfo/X86/loop-align-debug.ll
new file mode 100644
index 0000000000000..a0302d08faa0c
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/loop-align-debug.ll
@@ -0,0 +1,55 @@
+; RUN: llc %s --filetype=obj -o %t
+; RUN: llvm-objdump -d %t | FileCheck %s --check-prefixes=OBJ
+; RUN: llvm-dwarfdump --debug-line %t | FileCheck %s --check-prefixes=DBG
+; RUN: llc %s -o - | FileCheck %s --check-prefixes=ASM
+
+; OBJ: 1:{{.*}}nop
+
+;; Address Line Column File ISA Discriminator OpIndex Flags
+; DBG: 0x0000000000000000 3 0 0 0 0 0 is_stmt
+; DBG: 0x0000000000000001 0 0 0 0 0 0
+; DBG: 0x0000000000000010 5 0 0 0 0 0 is_stmt prologue_end
+; DBG: 0x0000000000000017 5 0 0 0 0 0 is_stmt end_sequence
+
+; ASM: .loc 0 0 0 is_stmt 0
+; ASM-NEXT: .L{{.*}}:
+; ASM-NEXT: .p2align 4, 0x90
+
+;; $ cat test.cpp
+;; void g();
+;; void f() {
+;; [[clang::code_align(16)]]
+;; while (1) { g(); }
+;; }
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define dso_local void @f() local_unnamed_addr !dbg !9 {
+entry:
+ br label %while.body, !dbg !12
+
+while.body: ; preds = %entry, %while.body
+ tail call void @g(), !dbg !12
+ br label %while.body, !dbg !12, !llvm.loop !13
+}
+
+declare !dbg !16 void @g() local_unnamed_addr
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 19.0.0git", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "test.cpp", directory: "/")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!8 = !{!"clang version 19.0.0git"}
+!9 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 3, type: !10, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{}
+!12 = !DILocation(line: 5, scope: !9)
+!13 = distinct !{!13, !12, !12, !14, !15}
+!14 = !{!"llvm.loop.mustprogress"}
+!15 = !{!"llvm.loop.align", i32 16}
+!16 = !DISubprogram(name: "g", scope: !1, file: !1, line: 2, type: !10, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
cc @dwblaikie, for some context, we noticed this as it manifested in our debugger "step-over"-ing to an out of scope line number (a loop alignment nop inherited the source location from an inlined instruction before it, but isn't included in the inlined function's scope range). GDB and LLDB seem to avoid running into that issue while stepping over but it can still be observed while stepping at an instruction level. Either way, a code alignment nop feels unambiguously like an instruction that should be line-zeroed to me. wdyt? |
; CHECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5 | ||
; CHECK-NOT: .loc 1 9 5 is_stmt 0 discriminator | ||
; CHECK-NOT: .loc 1 7 3 is_stmt 1 discriminator | ||
; CHECK: .loc 1 0 0 is_stmt 0 # :0:0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the column to zero is sub-optimal, it requires extra line-table opcodes to set it to zero and then put it back. Please preserve the column number.
; V0: .loc 1 7 3 is_stmt 1 discriminator 11266 # foo.c:7:3 | ||
; V1: .loc 1 9 5 is_stmt 0 discriminator 514 # foo.c:9:5 | ||
; V1: .loc 1 9 5 discriminator 514 # foo.c:9:5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does your patch cause this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC .loc
directives only show state changes, and we emit a .loc 1 0 0 is_stmt 0
before that so the assembly printer omits the is_stmt
there. I've also verified that the line table is the same with and without this patch (apart from the new line 0 entries).
auto PrevLoc = Asm->OutStreamer->getContext().getCurrentDwarfLoc(); | ||
Asm->OutStreamer->emitDwarfLocDirective( | ||
PrevLoc.getFileNum(), 0, PrevLoc.getColumn(), 0, 0, 0, StringRef()); | ||
MCDwarfLineEntry::make(Asm->OutStreamer.get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this call should be unnecessary? It's not used anywhere else in CodeGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instructions, usually the line information is recorded by AsmPrinter::emitFunctionBody() -> DwarfDebug::beginInstruction() -> recordSourceLine() -> MCStreamer::emitDwarfLocDirective()
and then MCDwarfLineEntry::make
is called by MCObjectStreamer::emitInstructionImpl()
.
In the case of nops we aren't going through the normal flow of emitting instructions so we want the line entry for our line information to be emitted immediately otherwise it'll be emitted at the beginning of the next instruction (the end of the nops).
This is also what MCObjectStreamer::emitInstructionImpl()
does. The line entry is emitted first so that the label points to the beginning of the instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so because in this case we're looking at a .align directive rather than a real instruction, it won't be passing through emitInstructionImpl()
so we need to call MCDwarfLineEntry::make()
directly. Sounds reasonable.
return; | ||
|
||
auto PrevLoc = Asm->OutStreamer->getContext().getCurrentDwarfLoc(); | ||
Asm->OutStreamer->emitDwarfLocDirective( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we call recordSourceLine(), but in this case we're assuming we're at the top of a basic block and (maybe) emitting nops; so, zeroing out flags and discriminator as well as line seems like a reasonable simplification.
What happens if the block is already aligned? No nops will be emitted, and presumably you'd end up with a .loc for a line 0 immediately followed by a .loc for the correct line. The line table will encode that correctly, but I'm suspicious that some consumers will decide the instruction actually has line 0 (because that's what it will find first for that address). Please test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. We control whether or not we call beginCodeAlignment
in this if
block but if other users call beginCodeAlignment
unconditionally they would end up with a line 0 .loc
immediately followed by the correct .loc
.
The function is named beginCodeAlignment
so one may argue you should only call it when emitting alignment nops but on the other hand we can easily move the if
block into beginCodeAlignment
so I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it now skips the line-0 for a block that has 1-byte alignment, but not if the block happens to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean.
0000000000000000 <f>:
; ...
3: 7e 2b jle 0x30 <f+0x30>
5: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
f: 90 nop
10: e8 00 00 00 00 callq 0x15 <f+0x15>
15: e8 00 00 00 00 callq 0x1a <f+0x1a>
1a: e8 00 00 00 00 callq 0x1f <f+0x1f>
1f: e8 00 00 00 00 callq 0x24 <f+0x24>
24: e8 00 00 00 00 callq 0x29 <f+0x29>
29: e8 00 00 00 00 callq 0x2e <f+0x2e>
2e: eb e0 jmp 0x10 <f+0x10>
30: e8 00 00 00 00 callq 0x35 <f+0x35> ; BB starting here is a loop but has no nops
35: eb f9 jmp 0x30 <f+0x30>
We will end up with a line 0 record for address 0x30 immediately followed by a record with the correct line number.
The MCAlignFragment
type only allows for the possibility of nops being emitted, it does not guarantee that they'll be emitted. The addresses aren't known till after the assembler computes layout which happens much later than here. I can't think of a way to handle this case as things currently stand though there may be other debug-info related mechanisms to deal with this that I'm not aware of, what do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm yes at the AsmPrinter level we don't know whether we'll need any nops. That suggests you want to figure this out down in MC when you do know whether you will be emitting any nops. Or, possibly, when encoding the line table, look to see whether there's a line-0 immediately followed by a non-0 at the same offset, and suppress the line-0. Offhand I don't know where those kinds of things happen, other than "somewhere in MC," sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests you want to figure this out down in MC when you do know whether you will be emitting any nops.
That's going to be difficult because the line table has already been encoded by that time. In theory one could evaluate the line table to find the MCDwarfLineAddrFragment
associated with the address that the nops are being emitted at but that doesn't seem practical.
Or, possibly, when encoding the line table, look to see whether there's a line-0 immediately followed by a non-0 at the same offset, and suppress the line-0.
Dwarf line entries are encoded in a MCDwarfLineAddrFragment
which contains an expression and fixups to compute the delta between the current symbol and the last symbol encoded with fixups and then eventually end up as a link-time relocation so the absolute address isn't known until that time.
This seems like a design problem in the MC layer, there isn't a way to correlate a fragment with it's DWARF-related counterpart. This is probably partly due to the fact that fragments don't normally need to change once they've been created. I don't think this is solvable within the scope of this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dwarf line entries are encoded in a MCDwarfLineAddrFragment which contains an expression and fixups to compute the delta between the current symbol and the last symbol encoded with fixups and then eventually end up as a link-time relocation so the absolute address isn't known until that time.
That doesn't sound right. There should be only one link-time relocation per sequence (usually, one per function). Line-table entries are encoded with a line-adjustment and an instruction-offset delta from the previous entry. It should be feasible to detect a zero instruction delta.
That said, it is a broader problem as I've seen these zero deltas in other situations as well. This patch makes it more prevalent but isn't introducing a new problem. At the AsmPrinter level, you have to assume that there could be nops introduced, and attaching line-0 to those nops is one reasonable solution.
Have you considered instead using the .loc of the first instruction of the new block? That is, emit the new block's .loc before emitting the .align directive. This is something that occurred to me just now, I haven't looked at the code to see whether that's feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, never mind. The line-table heuristic will put is_stmt on the nop instead of at the top of the block, and that will probably cause other problems. Go ahead with the line-0 idea.
emitAlignment(Alignment, nullptr, MBB.getMaxBytesForAlignment()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't going to change, don't keep the braces.
return; | ||
|
||
auto PrevLoc = Asm->OutStreamer->getContext().getCurrentDwarfLoc(); | ||
Asm->OutStreamer->emitDwarfLocDirective( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it now skips the line-0 for a block that has 1-byte alignment, but not if the block happens to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still two points to answer: The call to MCDwarfLineEntry::make
, and the braces on the if where you undid a code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch makes LLVM emit line 0 source locations for nops emitted as basic block padding.
CC @OCHyams