Skip to content

Fix line table lookups in line tables with multiple lines with the sa… #70879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 6, 2023

Conversation

clayborg
Copy link
Collaborator

@clayborg clayborg commented Nov 1, 2023

Fix line table lookups in line tables with multiple lines with the same address.

Compilers emit line tables that have multiple line table entries with the same address. When doing lookups, we always need to use the last line entry if a lookup address matches the address of one or more line entries. This is because the size of an address range for a line uses the next line entry to figure out how big the current line entry is. If the next line entry has the same address, that means the current line entry has a size of zero, so no bytes correspond to the line entry.

This patch ensures that lookups will always pick the last matching line entry when the lookup address matches more than one line entry.

…me address.

Compilers emit line tables that have multiple line table entries with the same address. When doing lookups, we always need to use the last line entry if a lookup address matches the address of one or more line entries. This is because the size of an address range for a line uses the next line entry to figure out how big the current line entry is. If the next line entry has the same address, that means the current line entry has a size of zero, so no bytes correspond to the line entry.

This patch ensures that lookups will always pick the last matching line entry when the lookup address matches more than one line entry.
@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2023

@llvm/pr-subscribers-debuginfo

Author: Greg Clayton (clayborg)

Changes

Fix line table lookups in line tables with multiple lines with the same address.

Compilers emit line tables that have multiple line table entries with the same address. When doing lookups, we always need to use the last line entry if a lookup address matches the address of one or more line entries. This is because the size of an address range for a line uses the next line entry to figure out how big the current line entry is. If the next line entry has the same address, that means the current line entry has a size of zero, so no bytes correspond to the line entry.

This patch ensures that lookups will always pick the last matching line entry when the lookup address matches more than one line entry.


Full diff: https://github.com/llvm/llvm-project/pull/70879.diff

2 Files Affected:

  • (modified) llvm/lib/DebugInfo/GSYM/LineTable.cpp (-5)
  • (modified) llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp (+169)
diff --git a/llvm/lib/DebugInfo/GSYM/LineTable.cpp b/llvm/lib/DebugInfo/GSYM/LineTable.cpp
index a49a3ba9bf2ad1d..666d9f15f1b43d6 100644
--- a/llvm/lib/DebugInfo/GSYM/LineTable.cpp
+++ b/llvm/lib/DebugInfo/GSYM/LineTable.cpp
@@ -270,11 +270,6 @@ Expected<LineEntry> LineTable::lookup(DataExtractor &Data, uint64_t BaseAddr, ui
     if (Addr < Row.Addr)
       return false; // Stop parsing, result contains the line table row!
     Result = Row;
-    if (Addr == Row.Addr) {
-      // Stop parsing, this is the row we are looking for since the address
-      // matches.
-      return false;
-    }
     return true; // Keep parsing till we find the right row.
   });
   if (Err)
diff --git a/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp b/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
index ad81a2fcd16441a..499910447e37adb 100644
--- a/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
+++ b/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
@@ -4157,3 +4157,172 @@ TEST(GSYMTest, TestEmptyLinkageName) {
   // Make sure we don't see spurious errors in the output:
   EXPECT_TRUE(errors.find("error:") == std::string::npos);
 }
+
+TEST(GSYMTest, TestLineTablesWithEmptyRanges) {
+  // Test that lookups find the right line table entry when there are multiple
+  // line entries with the same address. When we have multiple line table
+  // entries with the same address, we need to pick the last one in the line
+  // table. We do this because a line entry's start address in the defined by
+  // the line table entry's address and the size is determined by the
+  // subtracting the next line table's address. If the current line table
+  // entry's address is the same as the next one, then there is no code
+  // assiciated with the current line table entry and it should be ignored.
+  //
+  // 0x0000000b: DW_TAG_compile_unit
+  //               DW_AT_name        ("/tmp/main.cpp")
+  //               DW_AT_language    (DW_LANG_C)
+  //               DW_AT_stmt_list   (0x00000000)
+  //
+  // 0x00000015:   DW_TAG_subprogram
+  //                 DW_AT_name      ("foo")
+  //                 DW_AT_low_pc    (0x0000000000001000)
+  //                 DW_AT_high_pc   (0x0000000000001050)
+  //
+  // 0x0000002a:   NULL
+  //
+  // The line table has a duplicate entry at 0x1010:
+  //
+  // Address            Line   Column File   ISA Discriminator Flags
+  // ------------------ ------ ------ ------ --- ------------- -------------
+  // 0x0000000000001000     10      0      1   0             0  is_stmt
+  // 0x0000000000001010     11      0      1   0             0  is_stmt
+  // 0x0000000000001010     12      0      1   0             0  is_stmt
+  // 0x0000000000001050     13      0      1   0             0  is_stmt end_sequence
+
+  StringRef yamldata = R"(
+  debug_str:
+    - ''
+    - '/tmp/main.cpp'
+    - foo
+  debug_abbrev:
+    - ID:              0
+      Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_yes
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+            - Attribute:       DW_AT_language
+              Form:            DW_FORM_udata
+            - Attribute:       DW_AT_stmt_list
+              Form:            DW_FORM_sec_offset
+        - Code:            0x2
+          Tag:             DW_TAG_subprogram
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+            - Attribute:       DW_AT_low_pc
+              Form:            DW_FORM_addr
+            - Attribute:       DW_AT_high_pc
+              Form:            DW_FORM_addr
+  debug_info:
+    - Length:          0x27
+      Version:         4
+      AbbrevTableID:   0
+      AbbrOffset:      0x0
+      AddrSize:        8
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x1
+            - Value:           0x2
+            - Value:           0x0
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0xF
+            - Value:           0x1000
+            - Value:           0x1050
+        - AbbrCode:        0x0
+  debug_line:
+    - Length:          71
+      Version:         2
+      PrologueLength:  36
+      MinInstLength:   1
+      DefaultIsStmt:   1
+      LineBase:        251
+      LineRange:       14
+      OpcodeBase:      13
+      StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
+      IncludeDirs:
+        - '/tmp'
+      Files:
+        - Name:            main.cpp
+          DirIdx:          1
+          ModTime:         0
+          Length:          0
+      Opcodes:
+        - Opcode:          DW_LNS_extended_op
+          ExtLen:          9
+          SubOpcode:       DW_LNE_set_address
+          Data:            4096
+        - Opcode:          DW_LNS_advance_line
+          SData:           9
+          Data:            0
+        - Opcode:          DW_LNS_copy
+          Data:            0
+        - Opcode:          DW_LNS_advance_pc
+          Data:            16
+        - Opcode:          DW_LNS_advance_line
+          SData:           1
+          Data:            0
+        - Opcode:          DW_LNS_copy
+          Data:            0
+        - Opcode:          DW_LNS_advance_line
+          SData:           1
+          Data:            0
+        - Opcode:          DW_LNS_copy
+          Data:            0
+        - Opcode:          DW_LNS_advance_pc
+          Data:            64
+        - Opcode:          DW_LNS_advance_line
+          SData:           1
+          Data:            0
+        - Opcode:          DW_LNS_extended_op
+          ExtLen:          1
+          SubOpcode:       DW_LNE_end_sequence
+          Data:            0
+  )";
+  auto ErrOrSections = DWARFYAML::emitDebugSections(yamldata);
+  ASSERT_THAT_EXPECTED(ErrOrSections, Succeeded());
+  std::unique_ptr<DWARFContext> DwarfContext =
+      DWARFContext::create(*ErrOrSections, 8);
+  ASSERT_TRUE(DwarfContext.get() != nullptr);
+  std::string errors;
+  raw_string_ostream OS(errors);
+  GsymCreator GC;
+  DwarfTransformer DT(*DwarfContext, GC);
+  const uint32_t ThreadCount = 1;
+  ASSERT_THAT_ERROR(DT.convert(ThreadCount, &OS), Succeeded());
+  ASSERT_THAT_ERROR(GC.finalize(OS), Succeeded());
+  OS.flush();
+  SmallString<512> Str;
+  raw_svector_ostream OutStrm(Str);
+  const auto ByteOrder = llvm::endianness::native;
+  FileWriter FW(OutStrm, ByteOrder);
+  ASSERT_THAT_ERROR(GC.encode(FW), Succeeded());
+  Expected<GsymReader> GR = GsymReader::copyBuffer(OutStrm.str());
+  ASSERT_THAT_EXPECTED(GR, Succeeded());
+  // There should be one function in our GSYM.
+  EXPECT_EQ(GR->getNumAddresses(), 1u);
+  // Verify "foo" is present and has a line table and no inline info.
+  auto ExpFI = GR->getFunctionInfo(0x1000);
+  ASSERT_THAT_EXPECTED(ExpFI, Succeeded());
+  ASSERT_EQ(ExpFI->Range, AddressRange(0x1000, 0x1050));
+  EXPECT_TRUE(ExpFI->OptLineTable.has_value());
+  EXPECT_FALSE(ExpFI->Inline.has_value());
+  StringRef FuncName = GR->getString(ExpFI->Name);
+  EXPECT_EQ(FuncName, "foo");
+
+  // Make sure we don't see spurious errors in the output:
+  EXPECT_TRUE(errors.find("error:") == std::string::npos);
+
+  // Make sure that when we lookup address 0x1010, that we get the entry that
+  // matches line 12, the second line entry that also has the address of
+  // 0x1010.
+  auto LR = GR->lookup(0x1010);
+  ASSERT_THAT_EXPECTED(LR, Succeeded());
+  EXPECT_THAT(LR->Locations,
+    testing::ElementsAre(SourceLocation{"foo", "/tmp", "main.cpp", 12, 16}));
+}

Copy link

github-actions bot commented Nov 1, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 4fbbb7ad7c1c96f0fa795e2b1528b93f604e2303 3a089a24df8d1a25e7707e6e867d174e533f7fe3 -- llvm/lib/DebugInfo/GSYM/LineTable.cpp llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
View the diff from clang-format here.
diff --git a/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp b/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
index cfc27ba80c31..13f3fefabf26 100644
--- a/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
+++ b/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
@@ -4325,4 +4325,5 @@ TEST(GSYMTest, TestLineTablesWithEmptyRanges) {
   ASSERT_THAT_EXPECTED(LR, Succeeded());
   SourceLocation src_loc = {"foo", "/tmp", "main.cpp", 12, 16};
   EXPECT_THAT(LR->Locations, testing::ElementsAre(src_loc));
-git}
+  git
+}

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this looks right to me (& at some point it'd be nice to fix LLVM to never produce these zero-length entries, they just take up extra space in the line table for no benefit)

Copy link
Contributor

@ayermolo ayermolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget formatting. :)

…me address.

Compilers emit line tables that have multiple line table entries with the same address. When doing lookups, we always need to use the last line entry if a lookup address matches the address of one or more line entries. This is because the size of an address range for a line uses the next line entry to figure out how big the current line entry is. If the next line entry has the same address, that means the current line entry has a size of zero, so no bytes correspond to the line entry.

This patch ensures that lookups will always pick the last matching line entry when the lookup address matches more than one line entry.
@clayborg
Copy link
Collaborator Author

clayborg commented Nov 1, 2023

Don't forget formatting. :)

Fixed!

@clayborg clayborg merged commit 27033cc into llvm:main Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants