Skip to content

[lldb] Add more ways to find the .dwp file. #81067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 21, 2024
Merged

Conversation

clayborg
Copy link
Collaborator

@clayborg clayborg commented Feb 8, 2024

When using split DWARF we can run into many different ways to store debug info:

  • lldb loads <exe> which contains skeleton DWARF and needs to find <exe>.dwp
  • lldb loads <exe> which is stripped but has .gnu_debuglink pointing to <exe>.debug with skeleton DWARF and needs to find <exe>.dwp
  • lldb loads <exe> which is stripped but has .gnu_debuglink pointing to <exe>.debug with skeleton DWARF and needs to find <exe>.debug.dwp
  • lldb loads <exe>.debug and needs to find <exe>.dwp

Previously we only handled the first two cases. This patch adds support for the latter two.

When using split DWARF we can run into many different ways to store debug info:
- lldb loads "<exe>" which contains skeleton DWARF and needs to find "<exe>.dwp"
- lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.dwp"
- lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.debug.dwp"
- lldb loads "<exe>.debug" and needs to find "<exe>.dwp

Previously we only handled the first two cases. This patch adds support for the latter two.
@llvmbot
Copy link
Member

llvmbot commented Feb 8, 2024

@llvm/pr-subscribers-lldb

Author: Greg Clayton (clayborg)

Changes

When using split DWARF we can run into many different ways to store debug info:

  • lldb loads "<exe>" which contains skeleton DWARF and needs to find "<exe>.dwp"
  • lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.dwp"
  • lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.debug.dwp"
  • lldb loads "<exe>.debug" and needs to find "<exe>.dwp

Previously we only handled the first two cases. This patch adds support for the latter two.


Full diff: https://github.com/llvm/llvm-project/pull/81067.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Utility/FileSpecList.h (+4)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+44-17)
  • (modified) lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp (+30)
diff --git a/lldb/include/lldb/Utility/FileSpecList.h b/lldb/include/lldb/Utility/FileSpecList.h
index 49edc667ddd5b..6eb3bb9971f13 100644
--- a/lldb/include/lldb/Utility/FileSpecList.h
+++ b/lldb/include/lldb/Utility/FileSpecList.h
@@ -238,6 +238,10 @@ class FileSpecList {
   const_iterator begin() const { return m_files.begin(); }
   const_iterator end() const { return m_files.end(); }
 
+  llvm::iterator_range<const_iterator> files() const {
+    return llvm::make_range(begin(), end());
+  }
+
 protected:
   collection m_files; ///< A collection of FileSpec objects.
 };
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 781f5c5a43677..487961fa7437f 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -4349,26 +4349,53 @@ SymbolFileDWARFDebugMap *SymbolFileDWARF::GetDebugMapSymfile() {
 
 const std::shared_ptr<SymbolFileDWARFDwo> &SymbolFileDWARF::GetDwpSymbolFile() {
   llvm::call_once(m_dwp_symfile_once_flag, [this]() {
+    // Create a list of files to try and append .dwp to
+    FileSpecList symfiles;
+    // Append the module's object file path.
+    const FileSpec module_fspec = m_objfile_sp->GetModule()->GetFileSpec();
+    symfiles.Append(module_fspec);
+    // Append the object file for this SymbolFile only if it is different from
+    // the module's file path. Our main module could be "a.out", our symbol file
+    // could be "a.debug" and our ".dwp" file might be "a.debug.dwp" instead of
+    // "a.out.dwp".
+    const FileSpec symfile_fspec(m_objfile_sp->GetFileSpec());
+    if (symfile_fspec != module_fspec) {
+      symfiles.Append(symfile_fspec);
+    } else {
+      // If we don't have a separate debug info file, then try stripping the
+      // extension. We main module could be "a.debug" and the .dwp file could be
+      // "a.dwp" instead of "a.debug.dwp".
+      ConstString filename_no_ext =
+          module_fspec.GetFileNameStrippingExtension();
+      if (filename_no_ext != module_fspec.GetFilename()) {
+        FileSpec module_spec_no_ext(module_fspec);
+        module_spec_no_ext.SetFilename(filename_no_ext);
+        symfiles.Append(module_spec_no_ext);
+      }
+    }
+
+    FileSpecList search_paths = Target::GetDefaultDebugFileSearchPaths();
     ModuleSpec module_spec;
     module_spec.GetFileSpec() = m_objfile_sp->GetFileSpec();
-    module_spec.GetSymbolFileSpec() =
-        FileSpec(m_objfile_sp->GetModule()->GetFileSpec().GetPath() + ".dwp");
-
     module_spec.GetUUID() = m_objfile_sp->GetUUID();
-    FileSpecList search_paths = Target::GetDefaultDebugFileSearchPaths();
-    FileSpec dwp_filespec =
-        PluginManager::LocateExecutableSymbolFile(module_spec, search_paths);
-    if (FileSystem::Instance().Exists(dwp_filespec)) {
-      DataBufferSP dwp_file_data_sp;
-      lldb::offset_t dwp_file_data_offset = 0;
-      ObjectFileSP dwp_obj_file = ObjectFile::FindPlugin(
-          GetObjectFile()->GetModule(), &dwp_filespec, 0,
-          FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp,
-          dwp_file_data_offset);
-      if (!dwp_obj_file)
-        return;
-      m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>(
-          *this, dwp_obj_file, DIERef::k_file_index_mask);
+    for (const auto &symfile : symfiles.files()) {
+      module_spec.GetSymbolFileSpec() =
+          FileSpec(symfile.GetPath() + ".dwp", symfile.GetPathStyle());
+      FileSpec dwp_filespec =
+          PluginManager::LocateExecutableSymbolFile(module_spec, search_paths);
+      if (FileSystem::Instance().Exists(dwp_filespec)) {
+        DataBufferSP dwp_file_data_sp;
+        lldb::offset_t dwp_file_data_offset = 0;
+        ObjectFileSP dwp_obj_file = ObjectFile::FindPlugin(
+            GetObjectFile()->GetModule(), &dwp_filespec, 0,
+            FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp,
+            dwp_file_data_offset);
+        if (!dwp_obj_file)
+          return;
+        m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>(
+            *this, dwp_obj_file, DIERef::k_file_index_mask);
+        break;
+      }
     }
   });
   return m_dwp_symfile;
diff --git a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
index a47209931c384..408bdcb3fbd99 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
@@ -1,5 +1,6 @@
 // REQUIRES: lld
 
+// Now test with DWARF5
 // RUN: %clang -target x86_64-pc-linux -gsplit-dwarf -gdwarf-5 -c %s -o %t.dwarf5.o
 // RUN: ld.lld %t.dwarf5.o -o %t.dwarf5
 // RUN: llvm-dwp %t.dwarf5.dwo -o %t.dwarf5.dwp
@@ -34,6 +35,19 @@
 // RUN:   -o "statistics dump" \
 // RUN:   %t.dwarf5 -b | FileCheck %s -check-prefix=CACHED
 
+// Make sure that if we load the "%t.dwarf5.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.dwp"
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf5" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.debug.dwp"
+// RUN: mv %t.dwarf5.dwp %t.dwarf5.debug.dwp
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf5.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.debug.dwp"
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
 // Now test with DWARF4
 // RUN: %clang -target x86_64-pc-linux -gsplit-dwarf -gdwarf-4 -c %s -o %t.dwarf4.o
 // RUN: ld.lld %t.dwarf4.o -o %t.dwarf4
@@ -69,6 +83,19 @@
 // RUN:   -o "statistics dump" \
 // RUN:   %t.dwarf4 -b | FileCheck %s -check-prefix=CACHED
 
+// Make sure that if we load the "%t.dwarf4.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.dwp"
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf4" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.debug.dwp"
+// RUN: mv %t.dwarf4.dwp %t.dwarf4.debug.dwp
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf4.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.debug.dwp"
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
 // CHECK: (A) a = (x = 47)
 
 // CACHE: script lldb.target.modules[0].FindTypes('::A').GetTypeAtIndex(0)
@@ -83,6 +110,9 @@
 // CACHED-NEXT: }
 // CACHED: "totalDebugInfoIndexLoadedFromCache": 1
 
+// Make sure debug information was loaded by verifying that the
+// DEBUG: Breakpoint 1: where = dwp-separate-debug-file.cpp.tmp.dwarf{{[45]}}.debug`main + {{[0-9]+}} at dwp-separate-debug-file.cpp:{{[0-9]+}}:{{[0-9]+}}, address = {{0x[0-9a-fA-F]+}}
+
 struct A {
   int x = 47;
 };

@ayermolo
Copy link
Contributor

ayermolo commented Feb 8, 2024

Will this now work with .dwp files not having UUID?

@kevinfrei
Copy link
Contributor

Will this now work with .dwp files not having UUID?

The lack of UUID is kinda why this is so important. The connection is strictly based on the filename. This just expands the variety of filenames that can be supported. One thing that's helpful is that the .gnu_debuglink can be a relative/absolute path, so it enables putting the files in a different location than right beside the binary, which is definitely an improvement.

@clayborg
Copy link
Collaborator Author

clayborg commented Feb 8, 2024

Will this now work with .dwp files not having UUID?

No. If binairies have UUIDs (GNU build IDs), they need to match right now. That is larger fix that involves adding a "enum UUIDFlavor" to the UUIDs so we can ensure we aren't comparing two different things.

What Alexander is talking about is if we have a GNU build ID in <exe>, the <exe>.debug file will have the same UUID, but llvm-dwp currently doesn't copy the GNU build ID over into the .dwp file. This causes LLDB to not allow the .dwp file to be loaded. The problem is if the .dwp file doesn't have a UUID, it will make one up by calculating a CRC of the file itself, and then we will compare a GNU build ID from <exe> to the CRC calculated by the .dwp file and they won't match.

@dwblaikie do you know how accurate the DWO ID is? Can we avoid relying on matching up the UUID on the .dwp file and solely rely on allowing it to be loaded and rely on the DWO IDs matching between the skeleton unit and the .dwo unit? If so, there is an easy fix I can make to this patch to solve that problem.

@ayermolo
Copy link
Contributor

ayermolo commented Feb 8, 2024

Will this now work with .dwp files not having UUID?

No. If binairies have UUIDs (GNU build IDs), they need to match right now. That is larger fix that involves adding a "enum UUIDFlavor" to the UUIDs so we can ensure we aren't comparing two different things.

What Alexander is talking about is if we have a GNU build ID in <exe>, the <exe>.debug file will have the same UUID, but llvm-dwp currently doesn't copy the GNU build ID over into the .dwp file. This causes LLDB to not allow the .dwp file to be loaded. The problem is if the .dwp file doesn't have a UUID, it will make one up by calculating a CRC of the file itself, and then we will compare a GNU build ID from <exe> to the CRC calculated by the .dwp file and they won't match.

@dwblaikie do you know how accurate the DWO ID is? Can we avoid relying on matching up the UUID on the .dwp file and solely rely on allowing it to be loaded and rely on the DWO IDs matching between the skeleton unit and the .dwo unit? If so, there is an easy fix I can make to this patch to solve that problem.

Not sure I follow. For .dwo files path is described in Skeleton CU: DW_AT_comp_dir/DW_AT_dwo_name. The DWP file can have multiple CUs with different DWO IDs.

@dwblaikie
Copy link
Collaborator

FWIW, I think we should be opinionated (& consistent with whatever gdb does, if it has some precedent here - if it is less opinionated, then maybe we have to be accepting) when it comes to whether x.debug goes with x.dwp or x.debug.dwp - we shouldn't support both unless there's some prior precedent that's unavoidable to support. It'd be better for everyone if there was one option we encourage people to follow. (so I think we shouldn't support (2) and (3) if we can help it - we should pick one)

I'm not sure I understand the "lldb loads .debug and needs to find .dwp" case. the .debug file usually only has the debug info, not the executable code, so you can't load it directly, right? (see the documentation in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html - objcopy --only-keep-debug foo foo.debug).

@clayborg
Copy link
Collaborator Author

clayborg commented Feb 9, 2024

Will this now work with .dwp files not having UUID?

No. If binairies have UUIDs (GNU build IDs), they need to match right now. That is larger fix that involves adding a "enum UUIDFlavor" to the UUIDs so we can ensure we aren't comparing two different things.
What Alexander is talking about is if we have a GNU build ID in <exe>, the <exe>.debug file will have the same UUID, but llvm-dwp currently doesn't copy the GNU build ID over into the .dwp file. This causes LLDB to not allow the .dwp file to be loaded. The problem is if the .dwp file doesn't have a UUID, it will make one up by calculating a CRC of the file itself, and then we will compare a GNU build ID from <exe> to the CRC calculated by the .dwp file and they won't match.
@dwblaikie do you know how accurate the DWO ID is? Can we avoid relying on matching up the UUID on the .dwp file and solely rely on allowing it to be loaded and rely on the DWO IDs matching between the skeleton unit and the .dwo unit? If so, there is an easy fix I can make to this patch to solve that problem.

Not sure I follow. For .dwo files path is described in Skeleton CU: DW_AT_comp_dir/DW_AT_dwo_name. The DWP file can have multiple CUs with different DWO IDs.

Exactly. The question is, if a have one a.out.dwp file that is out of date, and one of the .dwo files was recompiled into the a.out binary, but the a.out.dwp file wasn't recreated and if it was actually allowed to be loaded without comparing the GNU build ID, will the DWO ID be unique if the skeleton unit has a DW_AT_dwo_name with "foo.o" that has a DWO ID of 0x123 and if we lookup up "foo.o" in the old .dwp file, will the DWO ID mismatch be enough for us to ensure we don't load the .dwo info. If the answer is yes, which implied the DWO ID is some sort of checksum or signature that always changes when a specific file is rebuilt, then it is ok to not match a UUID on a .dwp file. Seeing as no .dwp files actually have a GNU build ID right now unless people manually add it, it seems like we should allow loading any .dwp regardless of the lack of a GNU build ID and deal with the mismatches from the DWO ID. If the DWO ID is just a hash of the file path or something that isn't guaranteed to be unique with each new build, then we need the UUID in the .dwp file.

@dwblaikie
Copy link
Collaborator

dwblaikie commented Feb 9, 2024

If the DWO ID is just a hash of the file path or something that isn't guaranteed to be unique with each new build, then we need the UUID in the .dwp file.

Nah, the DWO ID, as per spec, is a semantic hash of the DWARF contents. It should change, generally, if any part of the DWARF changes.

/// This is based on the type signature computation given in section 7.27 of the
/// DWARF4 standard. It is an md5 hash of the flattened description of the DIE
/// with the inclusion of the full CU and all top level CU entities.
// TODO: Initialize the type chain at 0 instead of 1 for CU signatures.
uint64_t DIEHash::computeCUSignature(StringRef DWOName, const DIE &Die) {

@clayborg
Copy link
Collaborator Author

clayborg commented Feb 9, 2024

FWIW, I think we should be opinionated (& consistent with whatever gdb does, if it has some precedent here - if it is less opinionated, then maybe we have to be accepting) when it comes to whether x.debug goes with x.dwp or x.debug.dwp - we shouldn't support both unless there's some prior precedent that's unavoidable to support. It'd be better for everyone if there was one option we encourage people to follow. (so I think we shouldn't support (2) and (3) if we can help it - we should pick one)

There currently is no enforcement on any of this from the compiler or linker driver side and everyone must do the stripping of the executable and creating the .dwp file manually. Since there is no actual enforcement, it seems like we should support both IMHO. More on this below...

I'm not sure I understand the "lldb loads .debug and needs to find .dwp" case. the .debug file usually only has the debug info, not the executable code, so you can't load it directly, right? (see the documentation in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html - objcopy --only-keep-debug foo foo.debug).

If we have a build system that creates:

  • a.out which is stripped and shipped for distribution
  • a.out.debug which contains debug info only with the skeleton units, address table, line tables, accelerator tables
  • a.out.dwp or a.out.debug.dwp which contains the .dwo files for a.out.debug

We might request to download debug info for this build using a GNU build ID or some other build number for say a symbolication server which wants to use LLDB. If we get this debug info, we might only download a.out.debug and the associated .dwp file (either a.out.dwp or a.out.debug.dwp). It would be a shame to require that we download the a.out binary as well just so that we can load this into a debugger and get the .dwp file to load automatically.

With this change people can load a.out.debug into LLDB and peruse the debug information and perform symbolication without having to download the original a.out, a file which is not required for symbolication. If we downloaded a.out.debug + a.out.dwp, unless a user knows to rename a.out.dwp to a.out.debug.dwp, or create a symlink, then they won't get any info from the .dwo files in the .dwp file.

If we don't fix this, then people will get confused and not know how to fix things and get full debug info to get loaded. No one besides debugger, compiler, and linker engineers and very few other people actually know what split DWARF does and or means and if they can't get things to work, they give up or waste time figuring out the magic things that need to happen to make us able to load the right debug info. So I prefer to allow things to work more seamlessly without trying to enforce something that is left up to users to do in the first place. I am someone that often responds to peoples posts when I am on call and I need to help people figure out how to load their debug symbols correrctl.

With split DWARF it is even worse because if they get the debug info file with the skeleton compile units but the .dwp file isn't named consistent with GDB conventions, then they can set file and line breakpoints and hit them, but as soon as they stop there, they get no variable information because the .dwo file doesn't get loaded. They don't know what a .dwo or a .dwp file is or why it is needed. We get a few posts a week with people trying to use split DWARF symbols and not being able to get things to work.

So IMHO there is no harm in making all scenarios work without having to require some naming convention that isn't actually enforced by any compiler or linker driver, and when things don't work, we put the pressure on the user to find some wiki somewhere that says "here is how you were supposed to do it" and figure out how to fix the issue themselves or by seeking help from us.

Fixed:
- added tests for missing .dwp files for both DWARF4 and DWARF5
- added a tests where we create a file with a GNU build ID where the stripped executable and debug file have UUIDs and the .dwp doesn't and make sure it loads the .dwp file.
- fix a dwarf4 and dwarf5 test that was testing the wrong binary
- fixed comments
- don't use break + return
Copy link

github-actions bot commented Feb 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@clayborg
Copy link
Collaborator Author

clayborg commented Feb 9, 2024

Will this now work with .dwp files not having UUID?

I actually added a test for this and it works just fine if the main executable (a.out) has a UUID and the debug info file (a.out.debug) also has the UUID, but the .dwp file doesn't, we are able to load the .dwp file just fine.

@clayborg clayborg changed the title Add more ways to find the .dwp file. [lldb] Add more ways to find the .dwp file. Feb 9, 2024
@dwblaikie
Copy link
Collaborator

FWIW, I think we should be opinionated (& consistent with whatever gdb does, if it has some precedent here - if it is less opinionated, then maybe we have to be accepting) when it comes to whether x.debug goes with x.dwp or x.debug.dwp - we shouldn't support both unless there's some prior precedent that's unavoidable to support. It'd be better for everyone if there was one option we encourage people to follow. (so I think we shouldn't support (2) and (3) if we can help it - we should pick one)

There currently is no enforcement on any of this from the compiler or linker driver side and everyone must do the stripping of the executable and creating the .dwp file manually. Since there is no actual enforcement, it seems like we should support both IMHO. More on this below...

I'm not sure I understand the "lldb loads .debug and needs to find .dwp" case. the .debug file usually only has the debug info, not the executable code, so you can't load it directly, right? (see the documentation in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html - objcopy --only-keep-debug foo foo.debug).

If we have a build system that creates:

  • a.out which is stripped and shipped for distribution
  • a.out.debug which contains debug info only with the skeleton units, address table, line tables, accelerator tables
  • a.out.dwp or a.out.debug.dwp which contains the .dwo files for a.out.debug

We might request to download debug info for this build using a GNU build ID or some other build number for say a symbolication server which wants to use LLDB. If we get this debug info, we might only download a.out.debug and the associated .dwp file (either a.out.dwp or a.out.debug.dwp). It would be a shame to require that we download the a.out binary as well just so that we can load this into a debugger and get the .dwp file to load automatically.

Ah, OK, yeah I can see how for symbolizing that could be done with just the .debug and .dwp files, without the original executable. Thanks for explaining. (I wouldn't've expected someone to use a /debugger/ in this case, rather than llvm-symbolizer, addr2line, etc - but don't object to a debugger supporting that functionality)

If we don't fix this, then people will get confused and not know how to fix things and get full debug info to get loaded. No one besides debugger, compiler, and linker engineers and very few other people actually know what split DWARF does and or means and if they can't get things to work, they give up or waste time figuring out the magic things that need to happen to make us able to load the right debug info. So I prefer to allow things to work more seamlessly without trying to enforce something that is left up to users to do in the first place. I am someone that often responds to peoples posts when I am on call and I need to help people figure out how to load their debug symbols correrctl.

My concern with supporting a wider variety of naming conventions is the risk of fragmentation and more user confusion - if there's multiple naming conventions, there's a risk that some tools end up supporting some of them but not others, and users are confused why their files work with some tools and not others.

I think we'd be doing users a favor, in terms of focussing the ecosystem on fewer permutations, by being opinionated on how these things should be named.

We are going to be somewhat opinionated regardless (we are using /some/ naming convention rules to derive .dwp/.debug names from other names - so we're only haggling over how many different naming conventions we should support, not whether or not we should limit users - they'll always be limited to some set of rules we pick).

With split DWARF it is even worse because if they get the debug info file with the skeleton compile units but the .dwp file isn't named consistent with GDB conventions, then they can set file and line breakpoints and hit them, but as soon as they stop there, they get no variable information because the .dwo file doesn't get loaded. They don't know what a .dwo or a .dwp file is or why it is needed. We get a few posts a week with people trying to use split DWARF symbols and not being able to get things to work.

Yeah, it's certainly a cost that comes with the benefits of Split DWARF - they should get suitable warnings/error messages, and perhaps those should link to an article describing what they're talking about if necessary.

So IMHO there is no harm in making all scenarios work without having to require some naming convention that isn't actually enforced by any compiler or linker driver, and when things don't work, we put the pressure on the user to find some wiki somewhere that says "here is how you were supposed to do it" and figure out how to fix the issue themselves or by seeking help from us.

"all" scenarios can't work (because we have to be able to derive one name from another) It'll always require some naming convention be met - we're only haggling over how many naming conventions we should support. And I think the harm in making more naming conventions work, is the risk that users end up with a variet of naming conventions in use, and other tools that need to handle these cases might not learn all those conventions - and so there will be varying/confusing support across a variety of tools.

I think we do users a favour by limiting the ecosystem to fewer conventions.

And in this case, I think that rule would be , .debug, and .dwp

@clayborg
Copy link
Collaborator Author

clayborg commented Feb 10, 2024

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

It would also be nice if we do have a single .debug file that has debug info only, it would be nice to allow it and the .dwp file to be combined into a single file. There is no reason for them to be separate anymore once we have a.out stripped, it would be nice to only require a.out.debug which contains the .dwp sections inside of it already instead of requiring people to have two files needed for debug info.

@clayborg
Copy link
Collaborator Author

Any chance we can get this in?

m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>(
*this, dwp_obj_file, DIERef::k_file_index_mask);
for (const auto &symfile : symfiles.files()) {
module_spec.GetSymbolFileSpec() =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a logging for each probing entries.

One thing I noticed that Microsoft debugger (like windbg) does well is they provide an option to show debug artifacts probing paths for both local and symbol servers. This provides an important self servicing capability for end users - even they do not know the debug info file locating algorithms, they can use the probing paths to figure out and rename to match the expected probing.

GetObjectFile()->GetModule(), &dwp_filespec, 0,
FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp,
dwp_file_data_offset);
if (dwp_obj_file) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, adding a logging saying - found a matching dwp file and use it.

@dwblaikie
Copy link
Collaborator

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

Not sure I follow - one of the scenarios mentioned in this patch is

"lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp"

I don't think we should support that, for instance - we should load <exe>.dwp in that case.

It would also be nice if we do have a single .debug file that has debug info only, it would be nice to allow it and the .dwp file to be combined into a single file. There is no reason for them to be separate anymore once we have a.out stripped, it would be nice to only require a.out.debug which contains the .dwp sections inside of it already instead of requiring people to have two files needed for debug info.

Maybe? I figure once you've got to download one file, two isn't a substantial imposition... - it'd be a bit weird having a DWP file and a .debug file mashed up together, but can't see any reason it wouldn't work - with the logic of "check if this program has a cu_index in it, if so, treat it as a dwp, otherwise look for .dwp, otherwise look for the dwos".

@clayborg
Copy link
Collaborator Author

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

Not sure I follow - one of the scenarios mentioned in this patch is

"lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp"

I don't think we should support that, for instance - we should load <exe>.dwp in that case.

If the client strips the debug info first into "a.out.debug" and then runs llvm-dwp, they will end up with a "a.out.debug.dwp". We have clients that are doing this already and we want to support them. The compiler and linker drivers are staying out of this and we expect people to do this on their own, so this is what we end up with when there is no enforcement. I am not sure why this is such a sticking point. Lets make the debugger work for people.

It would also be nice if we do have a single .debug file that has debug info only, it would be nice to allow it and the .dwp file to be combined into a single file. There is no reason for them to be separate anymore once we have a.out stripped, it would be nice to only require a.out.debug which contains the .dwp sections inside of it already instead of requiring people to have two files needed for debug info.

Maybe? I figure once you've got to download one file, two isn't a substantial imposition... - it'd be a bit weird having a DWP file and a .debug file mashed up together, but can't see any reason it wouldn't work - with the logic of "check if this program has a cu_index in it, if so, treat it as a dwp, otherwise look for .dwp, otherwise look for the dwos".

Just more things that can go wrong for clients as they try to use split DWARF. But this fix isn't about that, that was just a tangent.

Logging can be enabled with "log enable dwarf split". Also added logging to some tests.
@dwblaikie
Copy link
Collaborator

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

Not sure I follow - one of the scenarios mentioned in this patch is
"lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp"
I don't think we should support that, for instance - we should load <exe>.dwp in that case.

If the client strips the debug info first into "a.out.debug" and then runs llvm-dwp, they will end up with a "a.out.debug.dwp". We have clients that are doing this already and we want to support them.

OK, could we fix llvm-dwp to match the behavior, then? If the file has a .debug extension, strip that and add the .dwp extension.

The compiler and linker drivers are staying out of this and we expect people to do this on their own, so this is what we end up with when there is no enforcement.

They aren't doing it on their own though - they're using llvm-dwp and its defaults (they're passing it a .debug file and getting a .debug.dwp file - it's the defaults you/we are worried about, and how to make other tools work well with those defaults). We can change those defaults if they don't work well/don't create a consistent environment.

I am not sure why this is such a sticking point. Lets make the debugger work for people.

As I explained above - my concern is that supporting a wider variety of ways these files can be named/arranged means more variants that need to be supported across a variety of tooling (symbolizers and debuggers - not just LLVM's but binutils, etc too).

But that's my 2c - if LLDB owners prefer this direction, so be it. Wouldn't mind hearing some other people's perspectives on the issues around limiting variation here.

@clayborg
Copy link
Collaborator Author

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

Not sure I follow - one of the scenarios mentioned in this patch is

"lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp"

I don't think we should support that, for instance - we should load <exe>.dwp in that case.

We currently have this happening here at Facebook all over the place. We have tools that convert DWARF to GSYM and they don't need the stripped executable at all, they

It would also be nice if we do have a single .debug file that has debug info only, it would be nice to allow it and the .dwp file to be combined into a single file. There is no reason for them to be separate anymore once we have a.out stripped, it would be nice to only require a.out.debug which contains the .dwp sections inside of it already instead of requiring people to have two files needed for debug info.

Maybe? I figure once you've got to download one file, two isn't a substantial imposition... - it'd be a bit weird having a DWP file and a .debug file mashed up together, but can't see any reason it wouldn't work - with the logic of "check if this program has a cu_index in it, if so, treat it as a dwp, otherwise look for .dwp, otherwise look for the dwos".

I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use .debug and .dwp, but if we want to only support this, this leaves the downloading of the .debug file requiring a rename from .dwp to .debug.dwp in order for it to work for people. So then everything in this patch should be supported to allow loading the .debug file with a .dwp like we will encourage people to do.

Not sure I follow - one of the scenarios mentioned in this patch is
"lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp"
I don't think we should support that, for instance - we should load <exe>.dwp in that case.

If the client strips the debug info first into "a.out.debug" and then runs llvm-dwp, they will end up with a "a.out.debug.dwp". We have clients that are doing this already and we want to support them.

OK, could we fix llvm-dwp to match the behavior, then? If the file has a .debug extension, strip that and add the .dwp extension.

Here people are not using ".debug", but are using ".debuginfo"... Again, nothing is enforced and people are left to use llvm-objcopy + llvm-dwp how ever they want. Getting a solution that does everything might be nice. Any thoughts on modifying llvm-dwp to be able to do all of this and provide some path for people where it can either just create a .dwp file for a given executable or it can create a .debug file, strip the original file, and create a .dwp file?

The compiler and linker drivers are staying out of this and we expect people to do this on their own, so this is what we end up with when there is no enforcement.

They aren't doing it on their own though - they're using llvm-dwp and its defaults (they're passing it a .debug file and getting a .debug.dwp file - it's the defaults you/we are worried about, and how to make other tools work well with those defaults). We can change those defaults if they don't work well/don't create a consistent environment.

I am not sure why this is such a sticking point. Lets make the debugger work for people.

As I explained above - my concern is that supporting a wider variety of ways these files can be named/arranged means more variants that need to be supported across a variety of tooling (symbolizers and debuggers - not just LLVM's but binutils, etc too).

But that's my 2c - if LLDB owners prefer this direction, so be it. Wouldn't mind hearing some other people's perspectives on the issues around limiting variation here.

I am happy to hear any other opinions as well. I tend to want to make my life easier and ease the support burden I run into everyday where people that know nothing about split DWARF are trying to use it and failing and require tech support to make it work for them. I am happy to suggest a path to follow, in fact I am going to write up the best practices on a DWARF group here at work that I can point poeple to.

@dwblaikie
Copy link
Collaborator

If the client strips the debug info first into "a.out.debug" and then runs llvm-dwp, they will end up with a "a.out.debug.dwp". We have clients that are doing this already and we want to support them.

OK, could we fix llvm-dwp to match the behavior, then? If the file has a .debug extension, strip that and add the .dwp extension.

Here people are not using ".debug", but are using ".debuginfo"...

They could only use that for symbolizing, yeah? They wouldn't be able to debug their binary, because a debugger wouldn't know that, given the stripped binary, they need to append .debuginfo to find the debug info, right? (but I think debuggers do currently have the ability to add ".debug" to find the debug info) - or maybe it doesn't? (I guess reading https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html - it's never a simple mapping from binary to binary.debug - either it uses debuglink, in which case the filename is encoded in the debuglink and could be anything, or it's buildID, in which case it is binary.debug, but in a buildID-named directory)

Again, nothing is enforced and people are left to use llvm-objcopy + llvm-dwp how ever they want. Getting a solution that does everything might be nice. Any thoughts on modifying llvm-dwp to be able to do all of this and provide some path for people where it can either just create a .dwp file for a given executable or it can create a .debug file, strip the original file, and create a .dwp file?

I don't know that llvm-dwp needs to do everything, that's a bit against the grain of *nix tool design. But certainly llvm-objcopy could be more ergonomic (like a one-shot, that both strips the debug info from the binary, and produces the keep-only-debug in binary.debug - an even smaller feature would be, like llvm-dwp has a default -o of binary.dwp, llvm-objcopy could have a default output file when using --only-keep-debug of binary.debug, and llvm-dwp could grow a special case for if the file ends in .debug, strip that before adding .dwp (maybe more generally, strip any .word file extension before adding .dwp though that might be overly aggressive on *nix systems where users don't expect .suffix to be special cased at all)

So I guess that'd be the smallest two features I'd suggest starting with

  1. llvm-objcopy (& ideally, if someone wanetd to write the patch, binutils objcopy) defaulting to binary.debug as the output file when using --only-keep-debug
  2. llvm-dwp stripping the .debug suffix, if it appears, before adding .dwp (this one's a bit tricky/I could imagine some disagreement, because someone mightve just named their real binary "blah.debug" but I'm not sure there's much else to do in that case if we're going to support doing things with debug info with the --only-keep-debug file, and with the original unstripped binary, and with the stripped binary+--only-keep-debug`, etc... have to have some basename to derive everything from)

A wrapper script or program (maybe it could be built into llvm-objcopy, but I'm a little hesitant there, but would be open to other folks opinion on it for sure) that ingests the binary and produces the 3 products (stripped binary, binary.debug, binary.dwp) seems easy enough to provide.

The compiler and linker drivers are staying out of this and we expect people to do this on their own, so this is what we end up with when there is no enforcement.

They aren't doing it on their own though - they're using llvm-dwp and its defaults (they're passing it a .debug file and getting a .debug.dwp file - it's the defaults you/we are worried about, and how to make other tools work well with those defaults). We can change those defaults if they don't work well/don't create a consistent environment.

I am not sure why this is such a sticking point. Lets make the debugger work for people.

As I explained above - my concern is that supporting a wider variety of ways these files can be named/arranged means more variants that need to be supported across a variety of tooling (symbolizers and debuggers - not just LLVM's but binutils, etc too).
But that's my 2c - if LLDB owners prefer this direction, so be it. Wouldn't mind hearing some other people's perspectives on the issues around limiting variation here.

I am happy to hear any other opinions as well. I tend to want to make my life easier and ease the support burden I run into everyday where people that know nothing about split DWARF are trying to use it and failing and require tech support to make it work for them. I am happy to suggest a path to follow, in fact I am going to write up the best practices on a DWARF group here at work that I can point poeple to.

Yeah, I do want people to not have problems here/for things to work - I worry that making too many different things work is good short term (yay, many users are happy without having to change anything), but harmful long term when there's a wider variety of ways people do things and then new tools have to learn all those ways (& probably don't learn them all in one go - so we end up with different tools implementing subsets of all possible lookup rules, which is more confusing/problematic for users).

@ayermolo
Copy link
Contributor

Just my 2 cents as a "random dude who works on DWARF". The interoperability of various gnu extensions and DWARF spec is not well defined. Which leads to situations like this.
If binary is A then DWP is A.dwp no direct link between binary and dwp file.
If binary has gnulink than it has a section that points to an elf file with rest of the debuginfo. Most people name it as binary.debuglink.
If binary has split dwarf and gnulink.... not well defined I think.

It would be great if tools took an option away from the user that "create" debug information or at least restricted so that all of this was standardized, but currently they are not. Even if we change them there are still legacy builds that is a free for all.
I think we are already at a "people are confused" point, and DWARF consumer (LLDB) being more permissive doesn't really contribute to it. Because people who use LLDB by and large are not the people who are responsible for build systems that produce debug information. Users of LLDB just want things to work, and have no clue what split-dwarf is or gnulink. They download a coredump launch lldb and then wonder why it is not working.

@clayborg
Copy link
Collaborator Author

Is there a website or something that details how to correctly save symbols for split DWARF? Is there an existing tool people use? If the answer is no, I would like to support all variations for now. I am happy to emit a warning with a URL for best practices when it comes to emitting and archiving split DWARF if a user is doing something we don't want them to. But if we have no enforcements anywhere and no website that details how to do it correctly, just saying "well you should know to do it the right way but we don't provide any documentation on how to do i t right and our debuggers or symbolizers won't load your info" doesn't make for a great user experience.

The only contentious think in this patch seems to be allowing <exe>.debug + <exe>.debug.dwp.

@jeffreytan81
Copy link
Contributor

I believe the concern regarding standardizing the methods for locating debug information files is valid. Based on my experience, the only reliable approach to achieving this is either documenting it in a well-agreed spec, document, or website, or enforcing all symbolication tools to utilize a shared implementation. Microsoft employs the latter approach by mandating that all tools (such as Windbg, Microsoft Visual Studio debugger, Sysinternals tools, UMDH, etc.) share/use msdia.dll for symbolication.

Additionally, as I mentioned in previous inline comments, the most useful feature for these symbolication tools is detailed symbol search paths logging/callbacks, which enable users to self-discover the implementation rules. In my opinion, improving in this direction is much more useful than enforcing consistent implementations across all tools, and it is also cheaper/easier to achieve. If a user can figure out through logging where their dwp/debuginfo/dSYM files are being searched by the debugger/tools, they are happy to self-fix the issue without concern for the implementation/consistency.

Since there is currently no standard/official documentation yet, I will accept the patch to unblock the customers. Feel free to continue the discussion and improve the experience for users.

@clayborg clayborg merged commit 5a45d32 into llvm:main Feb 21, 2024
@clayborg clayborg deleted the debug-dwp branch February 21, 2024 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants