Skip to content

[lldb] Fixing edge cases in "source list" #126526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 21, 2025
Merged

[lldb] Fixing edge cases in "source list" #126526

merged 5 commits into from
Feb 21, 2025

Conversation

labath
Copy link
Collaborator

@labath labath commented Feb 10, 2025

While looking at how to make Function::GetEndLineSourceInfo (which is only used in "command source") work with discontinuous functions, I realized there are other corner cases that this function doesn't handle.

The code assumed that the last line entry in the function will also correspond to the last source line. This is probably true for unoptimized code, but I don't think we can rely on the optimizer to preserve this property. What's worse, the code didn't check that the last line entry belonged to the same file as the first one, so if this line entry was the result of inlining, we could end up using a line from a completely different file.

To fix this, I change the algorithm to iterate over all line entries in the function (which belong to the same file) and find the max line number out of those. This way we can naturally handle the discontinuous case as well.

This implementation is going to be slower than the previous one, but I don't think that matters, because:

  • this command is only used rarely, and interactively
  • we have plenty of other code which iterates through the line table

I added some basic tests for the function operation. I don't claim the tests to be comprehensive, or that the function handles all edge cases, but test framework created here could be used for testing other fixes/edge cases as well.

@labath labath requested a review from jimingham February 10, 2025 14:44
@labath labath requested a review from JDevlieghere as a code owner February 10, 2025 14:44
@llvmbot llvmbot added the lldb label Feb 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

While looking at how to make Function::GetEndLineSourceInfo (which is only used in "command source") work with discontinuous functions, I realized there are other corner cases that this function doesn't handle.

The code assumed that the last line entry in the function will also correspond to the last source line. This is probably true for unoptimized code, but I don't think we can rely on the optimizer to preserve this property. What's worse, the code didn't check that the last line entry belonged to the same file as the first one, so if this line entry was the result of inlining, we could end up using a line from a completely different file.

To fix this, I change the algorithm to iterate over all line entries in the function (which belong to the same file) and find the max line number out of those. This way we can naturally handle the discontinuous case as well.

This implementations is going to be slower than the previous one, but I don't think that matters, because:

  • this command is only used rarely, and interactively
  • we have plenty of other code which iterates through the line table

I added some basic tests for the function operation. I don't claim the tests to be comprehensive, or that the function handles all edge cases, but test framework created here could be used for testing other fixes/edge cases as well.


Full diff: https://github.com/llvm/llvm-project/pull/126526.diff

4 Files Affected:

  • (modified) lldb/include/lldb/Symbol/Function.h (+5-10)
  • (modified) lldb/source/Commands/CommandObjectSource.cpp (+6-8)
  • (modified) lldb/source/Symbol/Function.cpp (+27-16)
  • (added) lldb/test/Shell/Commands/command-source-list.s (+265)
diff --git a/lldb/include/lldb/Symbol/Function.h b/lldb/include/lldb/Symbol/Function.h
index f3b956139f3c5b..ee3a8304fc5b37 100644
--- a/lldb/include/lldb/Symbol/Function.h
+++ b/lldb/include/lldb/Symbol/Function.h
@@ -15,6 +15,7 @@
 #include "lldb/Expression/DWARFExpressionList.h"
 #include "lldb/Symbol/Block.h"
 #include "lldb/Utility/UserID.h"
+#include "lldb/lldb-forward.h"
 #include "llvm/ADT/ArrayRef.h"
 
 #include <mutex>
@@ -460,6 +461,7 @@ class Function : public UserID, public SymbolContextScope {
   }
 
   lldb::LanguageType GetLanguage() const;
+
   /// Find the file and line number of the source location of the start of the
   /// function.  This will use the declaration if present and fall back on the
   /// line table if that fails.  So there may NOT be a line table entry for
@@ -473,16 +475,9 @@ class Function : public UserID, public SymbolContextScope {
   void GetStartLineSourceInfo(lldb::SupportFileSP &source_file_sp,
                               uint32_t &line_no);
 
-  /// Find the file and line number of the source location of the end of the
-  /// function.
-  ///
-  ///
-  /// \param[out] source_file
-  ///     The source file.
-  ///
-  /// \param[out] line_no
-  ///     The line number.
-  void GetEndLineSourceInfo(FileSpec &source_file, uint32_t &line_no);
+  using SourceRange = Range<uint32_t, uint32_t>;
+  /// Find the file and line number range of the function.
+  llvm::Expected<std::pair<lldb::SupportFileSP, SourceRange>> GetSourceInfo();
 
   /// Get the outgoing call edges from this function, sorted by their return
   /// PC addresses (in increasing order).
diff --git a/lldb/source/Commands/CommandObjectSource.cpp b/lldb/source/Commands/CommandObjectSource.cpp
index 936783216f6ff5..81bf1769808bad 100644
--- a/lldb/source/Commands/CommandObjectSource.cpp
+++ b/lldb/source/Commands/CommandObjectSource.cpp
@@ -784,14 +784,12 @@ class CommandObjectSourceList : public CommandObjectParsed {
 
       if (sc.block == nullptr) {
         // Not an inlined function
-        sc.function->GetStartLineSourceInfo(start_file, start_line);
-        if (start_line == 0) {
-          result.AppendErrorWithFormat("Could not find line information for "
-                                       "start of function: \"%s\".\n",
-                                       source_info.function.GetCString());
-          return 0;
-        }
-        sc.function->GetEndLineSourceInfo(end_file, end_line);
+        auto expected_info = sc.function->GetSourceInfo();
+        if (!expected_info)
+          result.AppendError(llvm::toString(expected_info.takeError()));
+        start_file = expected_info->first;
+        start_line = expected_info->second.GetRangeBase();
+        end_line = expected_info->second.GetRangeEnd();
       } else {
         // We have an inlined function
         start_file = source_info.line_entry.file_sp;
diff --git a/lldb/source/Symbol/Function.cpp b/lldb/source/Symbol/Function.cpp
index 11a43a9172ea67..2594fecb27ceb2 100644
--- a/lldb/source/Symbol/Function.cpp
+++ b/lldb/source/Symbol/Function.cpp
@@ -320,25 +320,36 @@ void Function::GetStartLineSourceInfo(SupportFileSP &source_file_sp,
   }
 }
 
-void Function::GetEndLineSourceInfo(FileSpec &source_file, uint32_t &line_no) {
-  line_no = 0;
-  source_file.Clear();
-
-  // The -1 is kind of cheesy, but I want to get the last line entry for the
-  // given function, not the first entry of the next.
-  Address scratch_addr(GetAddressRange().GetBaseAddress());
-  scratch_addr.SetOffset(scratch_addr.GetOffset() +
-                         GetAddressRange().GetByteSize() - 1);
-
+llvm::Expected<std::pair<SupportFileSP, Function::SourceRange>>
+Function::GetSourceInfo() {
+  SupportFileSP source_file_sp;
+  uint32_t start_line;
+  GetStartLineSourceInfo(source_file_sp, start_line);
   LineTable *line_table = m_comp_unit->GetLineTable();
-  if (line_table == nullptr)
-    return;
+  if (start_line == 0 || !line_table) {
+    return llvm::createStringError(llvm::formatv(
+        "Could not find line information for function \"{0}\".", GetName()));
+  }
 
-  LineEntry line_entry;
-  if (line_table->FindLineEntryByAddress(scratch_addr, line_entry, nullptr)) {
-    line_no = line_entry.line;
-    source_file = line_entry.GetFile();
+  uint32_t end_line = start_line;
+  for (const AddressRange &range : GetAddressRanges()) {
+    LineEntry entry;
+    uint32_t idx;
+    if (!line_table->FindLineEntryByAddress(range.GetBaseAddress(), entry,
+                                            &idx))
+      continue;
+
+    addr_t end_addr = range.GetBaseAddress().GetFileAddress() + range.GetByteSize();
+    while (line_table->GetLineEntryAtIndex(idx++, entry) &&
+           entry.range.GetBaseAddress().GetFileAddress() < end_addr) {
+      // Ignore entries belonging to inlined functions or #included files.
+      if (source_file_sp->Equal(*entry.file_sp,
+                                SupportFile::eEqualFileSpecAndChecksumIfSet))
+        end_line = std::max(end_line, entry.line);
+    }
   }
+  return std::make_pair(std::move(source_file_sp),
+                        SourceRange(start_line, end_line - start_line));
 }
 
 llvm::ArrayRef<std::unique_ptr<CallEdge>> Function::GetCallEdges() {
diff --git a/lldb/test/Shell/Commands/command-source-list.s b/lldb/test/Shell/Commands/command-source-list.s
new file mode 100644
index 00000000000000..3dbc0a196f16a6
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-source-list.s
@@ -0,0 +1,265 @@
+# REQUIRES: x86
+
+# RUN: split-file %s %t
+# RUN: llvm-mc --triple=x86_64-pc-linux -filetype=obj %t/a.s -o %t/a.o
+# RUN: %lldb %t/a.o -o "settings set target.source-map . %t" -s %t/commands -o exit | FileCheck %s
+
+#--- commands
+# CASE 0: function at the start of the file
+source list -n func0
+# CHECK-LABEL: source list -n func0
+# CHECK-NEXT:  File: file0.c
+# CHECK-NEXT:     1    content of file0.c:1
+# CHECK-NEXT:     2    content of file0.c:2
+# CHECK-NEXT:     3    content of file0.c:3
+# CHECK-NEXT:     4    content of file0.c:4
+# CHECK-NEXT:     5    content of file0.c:5
+# CHECK-NEXT:     6    content of file0.c:6
+# CHECK-NEXT:     7    content of file0.c:7
+# CHECK-NEXT:     8    content of file0.c:8
+# CHECK-NEXT:     9    content of file0.c:9
+# CHECK-NEXT:     10   content of file0.c:10
+
+# CASE 1: function in the middle of the file
+source list -n func1
+# CHECK-NEXT: source list -n func1
+# CHECK-NEXT:  File: file0.c
+# CHECK-NEXT:     5    content of file0.c:5
+# CHECK-NEXT:     6    content of file0.c:6
+# CHECK-NEXT:     7    content of file0.c:7
+# CHECK-NEXT:     8    content of file0.c:8
+# CHECK-NEXT:     9    content of file0.c:9
+# CHECK-NEXT:     10   content of file0.c:10
+# CHECK-NEXT:     11   content of file0.c:11
+# CHECK-NEXT:     12   content of file0.c:12
+# CHECK-NEXT:     13   content of file0.c:13
+# CHECK-NEXT:     14   content of file0.c:14
+# CHECK-NEXT:     15   content of file0.c:15
+# CHECK-NEXT:     16   content of file0.c:16
+# CHECK-NEXT:     17   content of file0.c:17
+
+# CASE 2: function at the end of the file
+source list -n func2
+# CHECK-NEXT: source list -n func2
+# CHECK-NEXT:  File: file0.c
+# CHECK-NEXT:     20   content of file0.c:20
+# CHECK-NEXT:     21   content of file0.c:21
+# CHECK-NEXT:     22   content of file0.c:22
+# CHECK-NEXT:     23   content of file0.c:23
+# CHECK-NEXT:     24   content of file0.c:24
+# CHECK-NEXT:     25   content of file0.c:25
+# CHECK-NEXT:     26   content of file0.c:26
+# CHECK-NEXT:     27   content of file0.c:27
+# CHECK-NEXT:     28   content of file0.c:28
+# CHECK-NEXT:     29   content of file0.c:29
+# CHECK-NEXT:     30   content of file0.c:30
+
+# CASE 3: function ends in a different file
+source list -n func3
+# CHECK-NEXT: source list -n func3
+# CHECK-NEXT:  File: file0.c
+# CHECK-NEXT:     1    content of file0.c:1
+# CHECK-NEXT:     2    content of file0.c:2
+# CHECK-NEXT:     3    content of file0.c:3
+# CHECK-NEXT:     4    content of file0.c:4
+# CHECK-NEXT:     5    content of file0.c:5
+# CHECK-NEXT:     6    content of file0.c:6
+# CHECK-NEXT:     7    content of file0.c:7
+# CHECK-NEXT:     8    content of file0.c:8
+# CHECK-NEXT:     9    content of file0.c:9
+# CHECK-NEXT:     10   content of file0.c:10
+
+# CASE 4: discontinuous function
+source list -n func4
+# CHECK-NEXT: source list -n func4
+# CHECK-NEXT:  File: file0.c
+# CHECK-NEXT:     1    content of file0.c:1
+# CHECK-NEXT:     2    content of file0.c:2
+# CHECK-NEXT:     3    content of file0.c:3
+# CHECK-NEXT:     4    content of file0.c:4
+# CHECK-NEXT:     5    content of file0.c:5
+# CHECK-NEXT:     6    content of file0.c:6
+# CHECK-NEXT:     7    content of file0.c:7
+# CHECK-NEXT:     8    content of file0.c:8
+# CHECK-NEXT:     9    content of file0.c:9
+# CHECK-NEXT:     10   content of file0.c:10
+
+
+#--- a.s
+        .file   0 "." "file0.c"
+        .file   1 "." "file1.c"
+        .text
+func0:
+        .loc    0 1
+        nop
+        .loc    0 5
+        nop
+.Lfunc0_end:
+
+func1:
+        .loc    0 10
+        nop
+        .loc    0 12
+        nop
+.Lfunc1_end:
+
+func2:
+        .loc    0 25
+        nop
+        .loc    0 30
+        nop
+.Lfunc2_end:
+
+func3:
+        .loc    0 1
+        nop
+        .loc    0 5
+        nop
+        .loc    1 5
+        nop
+.Lfunc3_end:
+
+func4.__part.1:
+        .loc    0 1
+        nop
+.Lfunc4.__part.1_end:
+
+.Lpadding:
+        nop
+
+func4:
+        .loc    0 5
+        nop
+.Lfunc4_end:
+
+.Ltext_end:
+
+        .section        .debug_abbrev,"",@progbits
+        .byte   1                               # Abbreviation Code
+        .byte   17                              # DW_TAG_compile_unit
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   37                              # DW_AT_producer
+        .byte   8                               # DW_FORM_string
+        .byte   19                              # DW_AT_language
+        .byte   5                               # DW_FORM_data2
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   18                              # DW_AT_high_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   16                              # DW_AT_stmt_list
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   2                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   0                               # DW_CHILDREN_no
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   18                              # DW_AT_high_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   3                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   0                               # DW_CHILDREN_no
+        .byte   85                              # DW_AT_ranges
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   0                               # EOM(3)
+        .section        .debug_info,"",@progbits
+.Lcu_begin0:
+        .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+        .short  5                               # DWARF version number
+        .byte   1                               # DWARF Unit Type
+        .byte   8                               # Address Size (in bytes)
+        .long   .debug_abbrev                   # Offset Into Abbrev. Section
+        .byte   1                               # Abbrev DW_TAG_compile_unit
+        .asciz  "file0.c"                       # DW_AT_producer
+        .asciz  "Hand-written DWARF"            # DW_AT_producer
+        .short  29                              # DW_AT_language
+        .quad   .text                           # DW_AT_low_pc
+        .quad   .Ltext_end                      # DW_AT_high_pc
+        .long   .Lline_table_start0             # DW_AT_stmt_list
+        .rept 4
+        .byte   2                               # Abbrev DW_TAG_subprogram
+        .quad   func\+                          # DW_AT_low_pc
+        .quad   .Lfunc\+_end                    # DW_AT_high_pc
+        .asciz  "func\+"                        # DW_AT_name
+        .endr
+        .byte   3                               # Abbrev DW_TAG_subprogram
+        .long   .Ldebug_ranges0
+        .asciz  "func4"                         # DW_AT_name
+        .byte   0                               # End Of Children Mark
+.Ldebug_info_end0:
+
+        .section        .debug_rnglists,"",@progbits
+        .long   .Ldebug_list_header_end0-.Ldebug_list_header_start0 # Length
+.Ldebug_list_header_start0:
+        .short  5                               # Version
+        .byte   8                               # Address size
+        .byte   0                               # Segment selector size
+        .long   2                               # Offset entry count
+.Lrnglists_table_base0:
+        .long   .Ldebug_ranges0-.Lrnglists_table_base0
+.Ldebug_ranges0:
+        .byte   6                               # DW_RLE_start_end
+        .quad   func4
+        .quad   .Lfunc4_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   func4.__part.1
+        .quad   .Lfunc4.__part.1_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_list_header_end0:
+        .section        .debug_line,"",@progbits
+.Lline_table_start0:
+
+#--- file0.c
+content of file0.c:1
+content of file0.c:2
+content of file0.c:3
+content of file0.c:4
+content of file0.c:5
+content of file0.c:6
+content of file0.c:7
+content of file0.c:8
+content of file0.c:9
+content of file0.c:10
+content of file0.c:11
+content of file0.c:12
+content of file0.c:13
+content of file0.c:14
+content of file0.c:15
+content of file0.c:16
+content of file0.c:17
+content of file0.c:18
+content of file0.c:19
+content of file0.c:20
+content of file0.c:21
+content of file0.c:22
+content of file0.c:23
+content of file0.c:24
+content of file0.c:25
+content of file0.c:26
+content of file0.c:27
+content of file0.c:28
+content of file0.c:29
+content of file0.c:30
+#--- file1.c
+content of file1.c:1
+content of file1.c:2
+content of file1.c:3
+content of file1.c:4
+content of file1.c:5
+content of file1.c:6
+content of file1.c:7
+content of file1.c:8
+content of file1.c:9
+content of file1.c:10

While looking at how to make Function::GetEndLineSourceInfo (which is
only used in "command source") work with discontinuous functions, I
realized there are other corner cases that this function doesn't handle.

The code assumed that the last line entry in the function will also
correspond to the last source line. This is probably true for
unoptimized code, but I don't think we can rely on the optimizer to
preserve this property. What's worse, the code didn't check that the
last line entry belonged to the same file as the first one, so if this
line entry was the result of inlining, we could end up using a line from
a completely different file.

To fix this, I change the algorithm to iterate over all line entries in
the function (which belong to the same file) and find the max line
number out of those. This way we can naturally handle the discontinuous
case as well.

This implementations is going to be slower than the previous one, but I
don't think that matters, because:
- this command is only used rarely, and interactively
- we have plenty of other code which iterates through the line table

I added some basic tests for the function operation. I don't claim the
tests to be comprehensive, or that the function handles all edge cases,
but test framework created here could be used for testing other
fixes/edge cases as well.
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Seems like a more-than-fair tradeoff between correctness and performance.

Comment on lines 788 to 789
if (!expected_info)
result.AppendError(llvm::toString(expected_info.takeError()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to retain the return 0;? Now you're dereferencing expected_info even if it contains an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably also means we're missing a test for when this fails?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes on both counts. I'll add a test.

LineTable *line_table = m_comp_unit->GetLineTable();
if (line_table == nullptr)
return;
if (start_line == 0 || !line_table) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption here that the first source line of a function can never legitimately be a compiler-generated line that we assign 0? Is that guaranteed even in optimized code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is (and was) the assumption, but that definitely isn't guaranteed. The reason this works most of the time is because the GetStartLineSourceInfo will preferentially pick the line number out of the debug info section (DW_AT_decl_file), so the line table will only be consulted if that is missing (which probably only happens in this test case).

That is one of the corner cases I did not want to go into in this patch, though I can if you think I should.

@jimingham
Copy link
Collaborator

jimingham commented Feb 10, 2025

"command source" is another lldb command which this PR is not about... This is about "source list" and/or "source info". Not important but the title was pretty confusing...

@labath labath changed the title [lldb] Fixing edge cases in "command source" [lldb] Fixing edge cases in "source list" Feb 11, 2025
@labath
Copy link
Collaborator Author

labath commented Feb 11, 2025

"command source" is another lldb command which this PR is not about... This is about "source list" and/or "source info". Not important but the title was pretty confusing...

Yeah, sorry. I myself got confused because I was looking at another test case in this directory with a confusing name. I've fixed the title.

Copy link

github-actions bot commented Feb 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@labath labath merged commit 71af48f into llvm:main Feb 21, 2025
7 checks passed
@labath labath deleted the source branch February 21, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants