Skip to content

[DWARFVerifier] Fix debug_str_offsets DWARF version detection #81303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

felipepiovezan
Copy link
Contributor

@felipepiovezan felipepiovezan commented Feb 9, 2024

The DWARF 5 debug_str_offsets section starts with a header, which must be skipped in order to access the underlying strps.

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section. More in 1 and 2 about this legacy section.

How does The DWARF verifier figure out which version to use? It manually reads the first header in debug_info and uses that. This is wrong when multiple debug_str_offset sections have been linked together, in particular it is wrong in the following two cases:

  1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a standard DWARF 5 object file.
  2. A non-standard DWARF 4 object file (i.e. containing the header-less debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in #81210, the legacy version is only possible with dwo files, and dwo files cannot mix the legacy version with the dwarf 5 version. As such, we change the verifier to only check the debug_info header in the case of dwo files. If it sees a dwarf 4 version, it handles it the legacy way.

Note: the modified test was technically testing an unsupported combination of dwarf version + non-dwo sections. To see why, simply note that the test contained no debug_info.dwo sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing. We were finding the error through the "standard version", which shouldn't happen.

The DWARF 5 debug_str_offsets section starts with a header, which must be
skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section
(with the same section name), which does not have a header. In this case, the
offsets start on the first byte of the section, although it's not clear where
this is documented.

How does The DWARF verifier figure out which version to use? It manually reads
the **first** header in debug_info and uses that. This is wrong when multiple
debug_str_offset sections have been linked together, in particular it is wrong
in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a
standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in llvm#81210,
the legacy version is only possible with dwo files, and dwo files cannot mix the
legacy version with the dwarf 5 version. As such, we change the verifier to only
check the debug_info header in the case of dwo files. If it sees a dwarf 4
version, it handles it the legacy way.

Note: the modified test _never_ worked to being with. It was emitting the error
message by accident, because it treated the "legacy" dwarf 4 section as a dwarf
5 section. To see why, simply note that the test contained no `debug_info.dwo`
sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing.
@felipepiovezan
Copy link
Contributor Author

@dwblaikie please see the two last paragraphs of the commit message, as they are new.
Sadly the test we had originally never worked as intended :(

@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2024

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

The DWARF 5 debug_str_offsets section starts with a header, which must be skipped in order to access the underlying strps.

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

How does The DWARF verifier figure out which version to use? It manually reads the first header in debug_info and uses that. This is wrong when multiple debug_str_offset sections have been linked together, in particular it is wrong in the following two cases:

  1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a standard DWARF 5 object file.
  2. A non-standard DWARF 4 object file (i.e. containing the header-less debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in #81210, the legacy version is only possible with dwo files, and dwo files cannot mix the legacy version with the dwarf 5 version. As such, we change the verifier to only check the debug_info header in the case of dwo files. If it sees a dwarf 4 version, it handles it the legacy way.

Note: the modified test never worked to being with. It was emitting the error message by accident, because it treated the "legacy" dwarf 4 section as a dwarf 5 section. To see why, simply note that the test contained no debug_info.dwo sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing.


Full diff: https://github.com/llvm/llvm-project/pull/81303.diff

4 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h (+3-3)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp (+25-21)
  • (added) llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml (+57)
  • (modified) llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml (+9-8)
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index ea73664b1e46ca..c2365a4c7cf647 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -360,9 +360,9 @@ class DWARFVerifier {
   ///
   /// \returns true if the .debug_line verifies successfully, false otherwise.
   bool handleDebugStrOffsets();
-  bool verifyDebugStrOffsets(
-      StringRef SectionName, const DWARFSection &Section, StringRef StrData,
-      void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
+  bool verifyDebugStrOffsets(std::optional<dwarf::DwarfFormat> LegacyFormat,
+                             StringRef SectionName, const DWARFSection &Section,
+                             StringRef StrData);
 
   /// Emits any aggregate information collected, depending on the dump options
   void summarize();
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 2124ff835c5727..f56ac9889a296e 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -1881,33 +1881,37 @@ bool DWARFVerifier::handleDebugStrOffsets() {
   OS << "Verifying .debug_str_offsets...\n";
   const DWARFObject &DObj = DCtx.getDWARFObj();
   bool Success = true;
+
+  // dwo sections may contain the legacy debug_str_offsets format (and they
+  // can't be mixed with dwarf 5's format). This section format contains no
+  // header.
+  // As such, check the version from debug_info and, if we are in the legacy
+  // mode (Dwarf <= 4), extract Dwarf32/Dwarf64.
+  std::optional<DwarfFormat> DwoLegacyDwarf4Format;
+  DObj.forEachInfoDWOSections([&](const DWARFSection &S) {
+    if (DwoLegacyDwarf4Format)
+      return;
+    DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
+    uint64_t Offset = 0;
+    DwarfFormat InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
+    if (uint16_t InfoVersion = DebugInfoData.getU16(&Offset); InfoVersion <= 4)
+      DwoLegacyDwarf4Format = InfoFormat;
+  });
+
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets.dwo", DObj.getStrOffsetsDWOSection(),
-      DObj.getStrDWOSection(), &DWARFObject::forEachInfoDWOSections);
+      DwoLegacyDwarf4Format, ".debug_str_offsets.dwo",
+      DObj.getStrOffsetsDWOSection(), DObj.getStrDWOSection());
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets", DObj.getStrOffsetsSection(), DObj.getStrSection(),
-      &DWARFObject::forEachInfoSections);
+      /*LegacyFormat=*/std::nullopt, ".debug_str_offsets",
+      DObj.getStrOffsetsSection(), DObj.getStrSection());
   return Success;
 }
 
-bool DWARFVerifier::verifyDebugStrOffsets(
-    StringRef SectionName, const DWARFSection &Section, StringRef StrData,
-    void (DWARFObject::*VisitInfoSections)(
-        function_ref<void(const DWARFSection &)>) const) {
+bool DWARFVerifier::verifyDebugStrOffsets(std::optional<DwarfFormat> LegacyFormat,
+    StringRef SectionName, const DWARFSection &Section, StringRef StrData) {
   const DWARFObject &DObj = DCtx.getDWARFObj();
-  uint16_t InfoVersion = 0;
-  DwarfFormat InfoFormat = DwarfFormat::DWARF32;
-  (DObj.*VisitInfoSections)([&](const DWARFSection &S) {
-    if (InfoVersion)
-      return;
-    DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
-    uint64_t Offset = 0;
-    InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
-    InfoVersion = DebugInfoData.getU16(&Offset);
-  });
 
   DWARFDataExtractor DA(DObj, Section, DCtx.isLittleEndian(), 0);
-
   DataExtractor::Cursor C(0);
   uint64_t NextUnit = 0;
   bool Success = true;
@@ -1915,8 +1919,8 @@ bool DWARFVerifier::verifyDebugStrOffsets(
     DwarfFormat Format;
     uint64_t Length;
     uint64_t StartOffset = C.tell();
-    if (InfoVersion == 4) {
-      Format = InfoFormat;
+    if (LegacyFormat) {
+      Format = *LegacyFormat;
       Length = DA.getData().size();
       NextUnit = C.tell() + Length;
     } else {
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
new file mode 100644
index 00000000000000..d10460896171d6
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
@@ -0,0 +1,57 @@
+# RUN: yaml2obj %s -o %t.o
+# RUN: llvm-dwarfdump -debug-str-offsets -verify %t.o | FileCheck %s
+
+# CHECK: Verifying .debug_str_offsets...
+# CHECK: No errors
+
+# Check that when mixing standard DWARF 4 debug information with standard DWARF
+# 5 debug information, the verifier correctly interprets the debug_str_offsets
+# section as a standards-conforming DWARF 5 section.
+
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_EXEC
+DWARF:
+  debug_str:
+    - 'cu1'
+    - 'cu2'
+  debug_str_offsets:
+    - Offsets:
+        - 0x0
+  debug_abbrev:
+    - Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x2
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strx1
+            - Attribute:       DW_AT_str_offsets_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         4
+      AbbrevTableID:   0
+      AbbrOffset:      0x0
+      AddrSize:        8
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x4
+    - Version:         5
+      UnitType:        DW_UT_compile
+      AbbrOffset:      0x0
+      AddrSize:        8
+      AbbrevTableID:   0
+      Entries:
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0x0
+            - Value:           0x8 # str offsets base
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
index 37f376352a19a4..1bdc640acae6dc 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
@@ -13,7 +13,7 @@
 # CHECK-NEXT: error: .debug_str_offsets: contribution 0x29: length exceeds available space (contribution offset (0x29) + length field space (0x4) + length (0x5000000) == 0x500002D > section size 0x30)
 # Errors detected.
 
-# V4: error: .debug_str_offsets: contribution 0x0: index 0x2: invalid string offset *0x8 == 0x2, is neither zero nor immediately following a null character
+# V4: error: .debug_str_offsets.dwo: contribution 0x0: index 0x2: invalid string offset *0x8 == 0x2, is neither zero nor immediately following a null character
 
 
 #--- v4.yaml
@@ -23,16 +23,17 @@ FileHeader:
   Data:  ELFDATA2LSB
   Type:  ET_EXEC
 DWARF:
-  debug_str:
-    - 'foo'
-    - 'bar'
-  debug_info:
-    - Version: 4
-      AddrSize: 4
 Sections:
-  - Name: '.debug_str_offsets'
+  - Name: '.debug_info.dwo'
+    Type: SHT_PROGBITS
+    Content: "0700000004000000000004"
+  - Name: '.debug_str_offsets.dwo'
     Type: SHT_PROGBITS
     Content: "000000000400000002000000"
+  - Name: 'debug_str.dwo'
+    Type: SHT_PROGBITS
+    Content: "666F6F0062617200"
+
 
 #--- v5.yaml
 --- !ELF

Copy link

github-actions bot commented Feb 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@dwblaikie
Copy link
Collaborator

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

https://gcc.gnu.org/wiki/DebugFission and https://gcc.gnu.org/wiki/DebugFissionDWP is where this pre-standard stuff was documented

@felipepiovezan
Copy link
Contributor Author

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

https://gcc.gnu.org/wiki/DebugFission and https://gcc.gnu.org/wiki/DebugFissionDWP is where this pre-standard stuff was documented

Thank you for the pointers! I will update the commit message. Is the patch itself ok?

@dwblaikie
Copy link
Collaborator

dwblaikie commented Feb 10, 2024

Note: the modified test never worked to being with. It was emitting the error message by accident, because it treated the "legacy" dwarf 4 section as a dwarf 5 section. To see why, simply note that the test contained no debug_info.dwo sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing.

Not sure I'm following this - the test looks like it did work to test the functionality as it was before this patch (where str_offsets and str_offsets.dwo were treated the same - incorrectly for str_offsets non-dwo, as we've discussed - but for what was implemented, it was tested (it didn't need to test the dwo case specifically, because the implemented behavior wasn't unique to str_offsets.dwo - it was generic to any str offsets))

I'm not sure I understand why the test is changed to use the raw byte encoding for the section, rather than the yaml encoding which is a bit more readable? Does the yaml encoding not support dwo sections? Or some other reason to phrase it that way?

@felipepiovezan
Copy link
Contributor Author

felipepiovezan commented Feb 10, 2024

Does the yaml encoding not support dwo sections?

Sadly, this is the reason! The encodings are tiny enough that I think they are still legible though

  • the test looks like it did work to test the functionality as it was before this patch (where str_offsets and str_offsets.dwo were treated the same

Sorry, I think I used too strong of a wording there, I will rephrase it in the commit message. This comment on the other review made me be a bit too literal on "dwo is not tested":

The legacy DWARFv4 Split DWARF mode str offsets format can only appear in .dwo files

In this sense, the test was finding an error through an unintended path, and the dwo path of the code was never exercised.

@dwblaikie
Copy link
Collaborator

Cool cool, just wanted to make sure we were seeing the same things - thanks for the patch and explanations!

@felipepiovezan
Copy link
Contributor Author

Updated PR message with links for legacy section and re-wording of why the test was changed

@felipepiovezan felipepiovezan merged commit 20948df into llvm:main Feb 12, 2024
@felipepiovezan felipepiovezan deleted the felipe/dwarf-verifier-debug-str-offsets-v2 branch February 12, 2024 17:32
felipepiovezan added a commit to felipepiovezan/llvm-project that referenced this pull request Feb 12, 2024
…1303)

The DWARF 5 debug_str_offsets section starts with a header, which must
be skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this
section (with the same section name), which does not have a header. In
this case, the offsets start on the first byte of the section. More in
[1] and [2] about this legacy section.

How does The DWARF verifier figure out which version to use? It manually
reads the **first** header in debug_info and uses that. This is wrong
when multiple debug_str_offset sections have been linked together, in
particular it is wrong in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked
with a standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in llvm#81210,
the legacy version is only possible with dwo files, and dwo files cannot
mix the legacy version with the dwarf 5 version. As such, we change the
verifier to only check the debug_info header in the case of dwo files.
If it sees a dwarf 4 version, it handles it the legacy way.

Note: the modified test was technically testing an unsupported
combination of dwarf version + non-dwo sections. To see why, simply note
that the test contained no `debug_info.dwo` sections, so the call to
DWARFObject::forEachInfoDWOSections was doing nothing. We were finding
the error through the "standard version", which shouldn't happen.

[1]: https://gcc.gnu.org/wiki/DebugFission
[2]: https://gcc.gnu.org/wiki/DebugFissionDWP

(cherry picked from commit 20948df)
felipepiovezan added a commit to swiftlang/llvm-project that referenced this pull request Feb 13, 2024
…verifier-debug-str-offsets-v2

[CherryPick][DWARFVerifier] Fi x debug_str_offsets DWARF version detection (llvm#81303)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants