Skip to content

Fix performance bug in buildLocationList #109343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 31, 2024
Merged

Conversation

tmsri
Copy link
Member

@tmsri tmsri commented Sep 19, 2024

In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.

In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
@llvmbot
Copy link
Member

llvmbot commented Sep 19, 2024

@llvm/pr-subscribers-debuginfo

Author: Sriraman Tallam (tmsri)

Changes

In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.


Full diff: https://github.com/llvm/llvm-project/pull/109343.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+8-5)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+20-18)
  • (added) llvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-6.ll (+92)
  • (modified) llvm/test/DebugInfo/X86/basic-block-sections_1.ll (+3-3)
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index db7adfd3b21e5f..4c5afaa5132755 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1737,6 +1737,12 @@ void AsmPrinter::emitFunctionBody() {
   bool IsEHa = MMI->getModule()->getModuleFlag("eh-asynch");
 
   bool CanDoExtraAnalysis = ORE->allowExtraAnalysis(DEBUG_TYPE);
+  // Create a slot for the entry basic block section so that the section
+  // order is preserved when iterating over MBBSectionRanges.
+  if (!MF->empty())
+    MBBSectionRanges[MF->front().getSectionID()] =
+        MBBSectionRange{CurrentFnBegin, nullptr};
+
   for (auto &MBB : *MF) {
     // Print a label for the basic block.
     emitBasicBlockStart(MBB);
@@ -2000,11 +2006,8 @@ void AsmPrinter::emitFunctionBody() {
   }
   for (auto &Handler : Handlers)
     Handler->markFunctionEnd();
-
-  assert(!MBBSectionRanges.contains(MF->front().getSectionID()) &&
-         "Overwrite section range");
-  MBBSectionRanges[MF->front().getSectionID()] =
-      MBBSectionRange{CurrentFnBegin, CurrentFnEnd};
+  // Update the end label of the entry block's section.
+  MBBSectionRanges[MF->front().getSectionID()].EndLabel = CurrentFnEnd;
 
   // Print out jump tables referenced by the function.
   emitJumpTableInfo();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index e9649f9ff81658..890deb319ea73e 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -34,6 +34,7 @@
 #include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"
 #include "llvm/DebugInfo/DWARF/DWARFExpression.h"
 #include "llvm/IR/Constants.h"
+#include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Module.h"
@@ -1772,18 +1773,14 @@ bool DwarfDebug::buildLocationList(SmallVectorImpl<DebugLocEntry> &DebugLoc,
     // span each individual section in the range from StartLabel to EndLabel.
     if (Asm->MF->hasBBSections() && StartLabel == Asm->getFunctionBegin() &&
         !Instr->getParent()->sameSection(&Asm->MF->front())) {
-      const MCSymbol *BeginSectionLabel = StartLabel;
-
-      for (const MachineBasicBlock &MBB : *Asm->MF) {
-        if (MBB.isBeginSection() && &MBB != &Asm->MF->front())
-          BeginSectionLabel = MBB.getSymbol();
-
-        if (MBB.sameSection(Instr->getParent())) {
-          DebugLoc.emplace_back(BeginSectionLabel, EndLabel, Values);
+      for (const auto &[MBBSectionId, MBBSectionRange] :
+           Asm->MBBSectionRanges) {
+        if (Instr->getParent()->getSectionID() == MBBSectionId) {
+          DebugLoc.emplace_back(MBBSectionRange.BeginLabel, EndLabel, Values);
           break;
         }
-        if (MBB.isEndSection())
-          DebugLoc.emplace_back(BeginSectionLabel, MBB.getEndSymbol(), Values);
+        DebugLoc.emplace_back(MBBSectionRange.BeginLabel,
+                              MBBSectionRange.EndLabel, Values);
       }
     } else {
       DebugLoc.emplace_back(StartLabel, EndLabel, Values);
@@ -1824,22 +1821,27 @@ bool DwarfDebug::buildLocationList(SmallVectorImpl<DebugLocEntry> &DebugLoc,
     RangeMBB = &Asm->MF->front();
   else
     RangeMBB = Entries.begin()->getInstr()->getParent();
+  auto RangeIt = Asm->MBBSectionRanges.find(RangeMBB->getSectionID());
+  assert(RangeIt != Asm->MBBSectionRanges.end() &&
+         "Range MBB not found in MBBSectionRanges!");
   auto *CurEntry = DebugLoc.begin();
   auto *NextEntry = std::next(CurEntry);
+  auto NextRangeIt = std::next(RangeIt);
   while (NextEntry != DebugLoc.end()) {
-    // Get the last machine basic block of this section.
-    while (!RangeMBB->isEndSection())
-      RangeMBB = RangeMBB->getNextNode();
-    if (!RangeMBB->getNextNode())
+    if (NextRangeIt == Asm->MBBSectionRanges.end())
       return false;
     // CurEntry should end the current section and NextEntry should start
     // the next section and the Values must match for these two ranges to be
-    // merged.
-    if (CurEntry->getEndSym() != RangeMBB->getEndSymbol() ||
-        NextEntry->getBeginSym() != RangeMBB->getNextNode()->getSymbol() ||
+    // merged.  Do not match the section label end if it is the entry block
+    // section.  This is because the end label for the Debug Loc and the
+    // Function end label could be different.
+    if ((RangeIt->second.EndLabel != Asm->getFunctionEnd() &&
+         CurEntry->getEndSym() != RangeIt->second.EndLabel) ||
+        NextEntry->getBeginSym() != NextRangeIt->second.BeginLabel ||
         CurEntry->getValues() != NextEntry->getValues())
       return false;
-    RangeMBB = RangeMBB->getNextNode();
+    RangeIt = NextRangeIt;
+    NextRangeIt = std::next(RangeIt);
     CurEntry = NextEntry;
     NextEntry = std::next(CurEntry);
   }
diff --git a/llvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-6.ll b/llvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-6.ll
new file mode 100644
index 00000000000000..8c8eef68b2ec0c
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-6.ll
@@ -0,0 +1,92 @@
+; RUN: llc %s -mtriple=x86_64-unknown-linux-gnu --dwarf-version=4 --basic-block-sections=none -filetype=obj -o - | llvm-dwarfdump - | FileCheck %s
+; RUN: llc %s -mtriple=x86_64-unknown-linux-gnu --dwarf-version=4 --basic-block-sections=all -filetype=obj -o - | llvm-dwarfdump - | FileCheck --check-prefix=SECTIONS %s
+; RUN: llc %s -mtriple=x86_64-unknown-linux-gnu --dwarf-version=5 --basic-block-sections=none -filetype=obj -o - | llvm-dwarfdump - | FileCheck %s
+; RUN: llc %s -mtriple=x86_64-unknown-linux-gnu --dwarf-version=5 --basic-block-sections=all -filetype=obj -o - | llvm-dwarfdump - | FileCheck --check-prefix=SECTIONS %s
+
+; CHECK:      DW_TAG_variable
+; CHECK-NEXT: DW_AT_location
+; CHECK-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +7, DW_OP_stack_value
+; CHECK-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +8, DW_OP_stack_value
+; CHECK-NEXT: DW_AT_name	("i")
+
+; SECTIONS:      DW_TAG_variable
+; SECTIONS-NEXT: DW_AT_location
+; SECTIONS-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +7, DW_OP_stack_value
+; SECTIONS-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +8, DW_OP_stack_value
+; SECTIONS-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +8, DW_OP_stack_value
+; SECTIONS-NEXT: [0x{{[0-9a-f]+}}, 0x{{[0-9a-f]+}}): DW_OP_consts +8, DW_OP_stack_value
+; SECTIONS-NEXT: DW_AT_name	("i")
+
+; Source to generate the IR below:
+; void f1();
+; extern bool b;
+; void test() {
+;     // i is not a const throughout the whole scope and should
+;     // not use DW_AT_const_value
+;     int i = 7;
+;     f1();
+;     i = 8;
+;     if (b)
+;       f1();
+; }
+; $ clang++ -S loclist_section.cc -O2 -g  -emit-llvm
+
+@b = external local_unnamed_addr global i8, align 1
+
+; Function Attrs: mustprogress uwtable
+define dso_local void @_Z4testv() local_unnamed_addr #0 !dbg !10 {
+entry:
+    #dbg_value(i32 7, !14, !DIExpression(), !16)
+  tail call void @_Z2f1v(), !dbg !17
+    #dbg_value(i32 8, !14, !DIExpression(), !16)
+  %0 = load i8, ptr @b, align 1, !dbg !18, !tbaa !20, !range !24, !noundef !25
+  %loadedv = trunc nuw i8 %0 to i1, !dbg !18
+  br i1 %loadedv, label %if.then, label %if.end, !dbg !26
+
+if.then:                                          ; preds = %entry
+  tail call void @_Z2f1v(), !dbg !27
+  br label %if.end, !dbg !27
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void, !dbg !28
+}
+
+declare !dbg !29 void @_Z2f1v() local_unnamed_addr #1
+
+attributes #0 = { mustprogress uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8}
+!llvm.ident = !{!9}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 20.0.0git ([email protected]:)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "loclist_section.cc", directory: "Examples/debug_loc", checksumkind: CSK_MD5, checksum: "67769a94389681c8a6da481e2f358abb")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"PIE Level", i32 2}
+!7 = !{i32 7, !"uwtable", i32 2}
+!8 = !{i32 7, !"debug-info-assignment-tracking", i1 true}
+!9 = !{!"clang version 20.0.0git ([email protected]:.../llvm-project.git 7c3256280a78b0505ae4d43985c4d3239451a151)"}
+!10 = distinct !DISubprogram(name: "test", linkageName: "_Z4testv", scope: !1, file: !1, line: 3, type: !11, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !13)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !{!14}
+!14 = !DILocalVariable(name: "i", scope: !10, file: !1, line: 6, type: !15)
+!15 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!16 = !DILocation(line: 0, scope: !10)
+!17 = !DILocation(line: 7, column: 5, scope: !10)
+!18 = !DILocation(line: 9, column: 9, scope: !19)
+!19 = distinct !DILexicalBlock(scope: !10, file: !1, line: 9, column: 9)
+!20 = !{!21, !21, i64 0}
+!21 = !{!"bool", !22, i64 0}
+!22 = !{!"omnipotent char", !23, i64 0}
+!23 = !{!"Simple C++ TBAA"}
+!24 = !{i8 0, i8 2}
+!25 = !{}
+!26 = !DILocation(line: 9, column: 9, scope: !10)
+!27 = !DILocation(line: 10, column: 7, scope: !19)
+!28 = !DILocation(line: 11, column: 1, scope: !10)
+!29 = !DISubprogram(name: "f1", linkageName: "_Z2f1v", scope: !1, file: !1, line: 1, type: !11, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
diff --git a/llvm/test/DebugInfo/X86/basic-block-sections_1.ll b/llvm/test/DebugInfo/X86/basic-block-sections_1.ll
index 12b60c4dc321bd..c90d715142ec8b 100644
--- a/llvm/test/DebugInfo/X86/basic-block-sections_1.ll
+++ b/llvm/test/DebugInfo/X86/basic-block-sections_1.ll
@@ -16,10 +16,10 @@
 ; NO-SECTIONS: DW_AT_high_pc [DW_FORM_data4] ({{.*}})
 ; BB-SECTIONS: DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
 ; BB-SECTIONS-NEXT: DW_AT_ranges [DW_FORM_sec_offset]
+; BB-SECTIONS-NEXT: [{{.*}}) ".text.hot._Z3fooi"
 ; BB-SECTIONS-NEXT: [{{.*}}) ".text.hot._Z3fooi._Z3fooi.__part.1"
 ; BB-SECTIONS-NEXT: [{{.*}}) ".text.hot._Z3fooi._Z3fooi.__part.2"
 ; BB-SECTIONS-NEXT: [{{.*}}) ".text.hot._Z3fooi._Z3fooi.__part.3"
-; BB-SECTIONS-NEXT: [{{.*}}) ".text.hot._Z3fooi"
 ; BB-SECTIONS-ASM: _Z3fooi:
 ; BB-SECTIONS-ASM: .Ltmp{{[0-9]+}}:
 ; BB-SECTIONS-ASM-NEXT: .loc 1 2 9 prologue_end
@@ -36,14 +36,14 @@
 ; BB-SECTIONS-ASM: .size	_Z3fooi.__part.3, .LBB_END0_{{[0-9]+}}-_Z3fooi.__part.3
 ; BB-SECTIONS-ASM: .Lfunc_end0:
 ; BB-SECTIONS-ASM: .Ldebug_ranges0:
+; BB-SECTIONS-ASM-NEXT:	.quad	.Lfunc_begin0
+; BB-SECTIONS-ASM-NEXT:	.quad	.Lfunc_end0
 ; BB-SECTIONS-ASM-NEXT:	.quad	_Z3fooi.__part.1
 ; BB-SECTIONS-ASM-NEXT:	.quad	.LBB_END0_{{[0-9]+}}
 ; BB-SECTIONS-ASM-NEXT:	.quad	_Z3fooi.__part.2
 ; BB-SECTIONS-ASM-NEXT:	.quad	.LBB_END0_{{[0-9]+}}
 ; BB-SECTIONS-ASM-NEXT:	.quad	_Z3fooi.__part.3
 ; BB-SECTIONS-ASM-NEXT:	.quad	.LBB_END0_{{[0-9]+}}
-; BB-SECTIONS-ASM-NEXT:	.quad	.Lfunc_begin0
-; BB-SECTIONS-ASM-NEXT:	.quad	.Lfunc_end0
 ; BB-SECTIONS-ASM-NEXT:	.quad	0
 ; BB-SECTIONS-ASM-NEXT:	.quad	0
 ; BB-SECTIONS-LINE-TABLE:      0x0000000000000000 1 0 1 0 0 0 is_stmt

@tmsri
Copy link
Member Author

tmsri commented Sep 19, 2024

Continuation from PR 108886 I added a test case to address jmorse's recent comment.

Test to check that a variable declared within a scope that has basic block
sections still produces DW_AT_const_value.
@tmsri
Copy link
Member Author

tmsri commented Sep 23, 2024

I added one more test where the variable is only valid within a lexical scope that is part of the main section and another basic block section.

@tmsri
Copy link
Member Author

tmsri commented Sep 25, 2024

Ping. I believe one concern with this patch was about a variable that is only valid within a lexical scope which is defined in the entry block's section. I added a test to cover that case. Is there any other concern?

@amharc
Copy link
Contributor

amharc commented Sep 26, 2024

I think the logic in this PR is correct: the comment for buildLocationList says "If the resulting list has only one entry that is valid throughout variable's scope return true" and this seems to be the case with this PR too. I'm not a debug info expert though.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eyeballing this further, I was under the impression that this code was evaluating the validity of issuing a single-location in general, but I see now that's achieved by the earlier call to validThroughout. I see now that this is more about verifying the basic-block-section ranges are suitable to merge once the variable is identified as having a single-location, such as a constant value, is that correct?

I think my difficulty is in seeing any situation where the code from line 8130 onwards should return false -- if validThroughout has returned true, then we can use a single-location that implicitly covers every source-location that's in scope without the need for a location list surely? (I imagine it's tricky to identify the instruction-ranges that are in scope with basic block sections, but that's not a matter for buildLocationList).

Stepping back even further, I'm sort of questioning why this code is here without having a real handle on it; for this PR itself the change looks correct, with one inline question.

Comment on lines +1835 to +1840
// merged. Do not match the section label end if it is the entry block
// section. This is because the end label for the Debug Loc and the
// Function end label could be different.
if ((RangeIt->second.EndLabel != Asm->getFunctionEnd() &&
CurEntry->getEndSym() != RangeIt->second.EndLabel) ||
NextEntry->getBeginSym() != NextRangeIt->second.BeginLabel ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question perhaps but: the comment talks about not matching stuff in the entry block, but the new comparison is against getFunctionEnd() which I anticipate is the exit block. Shouldn't that be fetching the MCLabel for the end-of-the-entry block instead?

(There's some overlap between the current "Entry" and the "entry" block, so I might have confused myself)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, MBBSectionRanges stores the ranges for each section. For the entry section, the end label is the Function End Label, Asm->getFunctionEnd(). The entry section is actually the original function section.

Now, take a look at this assembly:

	.section	.text._Z4testv,"ax",@progbits
	.globl	_Z4testv                        # -- Begin function _Z4testv
	.type	_Z4testv,@function
_Z4testv:                               # @_Z4testv
.Lfunc_begin0:
	.loc	0 3 0                           # ex4.cc:3:0
# %bb.0:                                # %entry
	pushq	%rax
.Ltmp0:
	#DEBUG_VALUE: test:i <- 7
	.loc	0 7 5 prologue_end              # ex4.cc:7:5
	callq	_Z2f1v@PLT
        ...
.LBB_END0_0:
	.cfi_endproc
	.section	.text._Z4testv,"ax",@progbits,unique,1
_Z4testv.__part.1:                      # %if.then
	.section	.text._Z4testv,"ax",@progbits
.Lfunc_end0:

The entry block section uses label ".Lfunc_end0" which is Asm->getFunctionEnd(). There is an additional label that is created immediately after the end of the entry section ".LBB_END0_0" which is what the debug loc uses, like an alias. In this case and this case alone, these two labels won't match even though semantically, they are the same. So, I exclude that match.

CurEntry and entry block are different things. Sorry for the use of the overloaded entry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- awkward terminology, but I can see what's going on.

@jmorse
Copy link
Member

jmorse commented Sep 30, 2024

This continued to bug me a bit, so I put some traps on the return false paths at the end of buildLocationList with this PR and didn't get any test failures when running ninja check-llvm. Sorry to ask for more, but a test that covers that path would really help my understanding.

jmorse@7b78ea8 <--- based on
11a5026 with the addition of some x86 traps, which should cause tests covering those paths to fail.

@tmsri
Copy link
Member Author

tmsri commented Oct 8, 2024

Eyeballing this further, I was under the impression that this code was evaluating the validity of issuing a single-location in general, but I see now that's achieved by the earlier call to validThroughout. I see now that this is more about verifying the basic-block-section ranges are suitable to merge once the variable is identified as having a single-location, such as a constant value, is that correct?

Yes, this is correct. Even if it is not merged, it will produce something that is valid.

@tmsri
Copy link
Member Author

tmsri commented Oct 8, 2024

This continued to bug me a bit, so I put some traps on the return false paths at the end of buildLocationList with this PR and didn't get any test failures when running ninja check-llvm. Sorry to ask for more, but a test that covers that path would really help my understanding.

jmorse@7b78ea8 <--- based on 11a5026 with the addition of some x86 traps, which should cause tests covering those paths to fail.

Ah nice!, let me see if I can make a test for this.

These test cases check loclist merging in buildLocationList with
basic block sections. 8.ll does not create any basic block sections
and the merging must quit early because there is no constant value.
9.11 creates sections but cannot be merged as i is not a constant value.
@tmsri
Copy link
Member Author

tmsri commented Oct 30, 2024

I added two more test cases which check the traps you wanted to test for the code snippet below. The first one will check the first return as there aren't any basic block sections. The second test will check the second "return false".

while (NextEntry != DebugLoc.end()) {
    if (NextRangeIt == Asm->MBBSectionRanges.end())
      return false;
    // CurEntry should end the current section and NextEntry should start
    // the next section and the Values must match for these two ranges to be
    // merged.  Do not match the section label end if it is the entry block
    // section.  This is because the end label for the Debug Loc and the
    // Function end label could be different.
    if ((RangeIt->second.EndLabel != Asm->getFunctionEnd() &&
         CurEntry->getEndSym() != RangeIt->second.EndLabel) ||
        NextEntry->getBeginSym() != NextRangeIt->second.BeginLabel ||
        CurEntry->getValues() != NextEntry->getValues())
      return false;

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- many thanks for the extra test coverage, I think that clears up my remaining questions about what's going on.

Comment on lines +1835 to +1840
// merged. Do not match the section label end if it is the entry block
// section. This is because the end label for the Debug Loc and the
// Function end label could be different.
if ((RangeIt->second.EndLabel != Asm->getFunctionEnd() &&
CurEntry->getEndSym() != RangeIt->second.EndLabel) ||
NextEntry->getBeginSym() != NextRangeIt->second.BeginLabel ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- awkward terminology, but I can see what's going on.

@tmsri tmsri merged commit c7ef002 into llvm:main Oct 31, 2024
8 checks passed
@tmsri tmsri deleted the buildloclist_perf_bug branch October 31, 2024 16:00
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants