-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] Add more ways to find the .dwp file. #81067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When using split DWARF we can run into many different ways to store debug info: - lldb loads "<exe>" which contains skeleton DWARF and needs to find "<exe>.dwp" - lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.dwp" - lldb loads "<exe>" which is stripped but has .gnu_debuglink pointing to "<exe>.debug" with skeleton DWARF and needs to find "<exe>.debug.dwp" - lldb loads "<exe>.debug" and needs to find "<exe>.dwp Previously we only handled the first two cases. This patch adds support for the latter two.
@llvm/pr-subscribers-lldb Author: Greg Clayton (clayborg) ChangesWhen using split DWARF we can run into many different ways to store debug info:
Previously we only handled the first two cases. This patch adds support for the latter two. Full diff: https://github.com/llvm/llvm-project/pull/81067.diff 3 Files Affected:
diff --git a/lldb/include/lldb/Utility/FileSpecList.h b/lldb/include/lldb/Utility/FileSpecList.h
index 49edc667ddd5b..6eb3bb9971f13 100644
--- a/lldb/include/lldb/Utility/FileSpecList.h
+++ b/lldb/include/lldb/Utility/FileSpecList.h
@@ -238,6 +238,10 @@ class FileSpecList {
const_iterator begin() const { return m_files.begin(); }
const_iterator end() const { return m_files.end(); }
+ llvm::iterator_range<const_iterator> files() const {
+ return llvm::make_range(begin(), end());
+ }
+
protected:
collection m_files; ///< A collection of FileSpec objects.
};
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 781f5c5a43677..487961fa7437f 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -4349,26 +4349,53 @@ SymbolFileDWARFDebugMap *SymbolFileDWARF::GetDebugMapSymfile() {
const std::shared_ptr<SymbolFileDWARFDwo> &SymbolFileDWARF::GetDwpSymbolFile() {
llvm::call_once(m_dwp_symfile_once_flag, [this]() {
+ // Create a list of files to try and append .dwp to
+ FileSpecList symfiles;
+ // Append the module's object file path.
+ const FileSpec module_fspec = m_objfile_sp->GetModule()->GetFileSpec();
+ symfiles.Append(module_fspec);
+ // Append the object file for this SymbolFile only if it is different from
+ // the module's file path. Our main module could be "a.out", our symbol file
+ // could be "a.debug" and our ".dwp" file might be "a.debug.dwp" instead of
+ // "a.out.dwp".
+ const FileSpec symfile_fspec(m_objfile_sp->GetFileSpec());
+ if (symfile_fspec != module_fspec) {
+ symfiles.Append(symfile_fspec);
+ } else {
+ // If we don't have a separate debug info file, then try stripping the
+ // extension. We main module could be "a.debug" and the .dwp file could be
+ // "a.dwp" instead of "a.debug.dwp".
+ ConstString filename_no_ext =
+ module_fspec.GetFileNameStrippingExtension();
+ if (filename_no_ext != module_fspec.GetFilename()) {
+ FileSpec module_spec_no_ext(module_fspec);
+ module_spec_no_ext.SetFilename(filename_no_ext);
+ symfiles.Append(module_spec_no_ext);
+ }
+ }
+
+ FileSpecList search_paths = Target::GetDefaultDebugFileSearchPaths();
ModuleSpec module_spec;
module_spec.GetFileSpec() = m_objfile_sp->GetFileSpec();
- module_spec.GetSymbolFileSpec() =
- FileSpec(m_objfile_sp->GetModule()->GetFileSpec().GetPath() + ".dwp");
-
module_spec.GetUUID() = m_objfile_sp->GetUUID();
- FileSpecList search_paths = Target::GetDefaultDebugFileSearchPaths();
- FileSpec dwp_filespec =
- PluginManager::LocateExecutableSymbolFile(module_spec, search_paths);
- if (FileSystem::Instance().Exists(dwp_filespec)) {
- DataBufferSP dwp_file_data_sp;
- lldb::offset_t dwp_file_data_offset = 0;
- ObjectFileSP dwp_obj_file = ObjectFile::FindPlugin(
- GetObjectFile()->GetModule(), &dwp_filespec, 0,
- FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp,
- dwp_file_data_offset);
- if (!dwp_obj_file)
- return;
- m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>(
- *this, dwp_obj_file, DIERef::k_file_index_mask);
+ for (const auto &symfile : symfiles.files()) {
+ module_spec.GetSymbolFileSpec() =
+ FileSpec(symfile.GetPath() + ".dwp", symfile.GetPathStyle());
+ FileSpec dwp_filespec =
+ PluginManager::LocateExecutableSymbolFile(module_spec, search_paths);
+ if (FileSystem::Instance().Exists(dwp_filespec)) {
+ DataBufferSP dwp_file_data_sp;
+ lldb::offset_t dwp_file_data_offset = 0;
+ ObjectFileSP dwp_obj_file = ObjectFile::FindPlugin(
+ GetObjectFile()->GetModule(), &dwp_filespec, 0,
+ FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp,
+ dwp_file_data_offset);
+ if (!dwp_obj_file)
+ return;
+ m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>(
+ *this, dwp_obj_file, DIERef::k_file_index_mask);
+ break;
+ }
}
});
return m_dwp_symfile;
diff --git a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
index a47209931c384..408bdcb3fbd99 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
@@ -1,5 +1,6 @@
// REQUIRES: lld
+// Now test with DWARF5
// RUN: %clang -target x86_64-pc-linux -gsplit-dwarf -gdwarf-5 -c %s -o %t.dwarf5.o
// RUN: ld.lld %t.dwarf5.o -o %t.dwarf5
// RUN: llvm-dwp %t.dwarf5.dwo -o %t.dwarf5.dwp
@@ -34,6 +35,19 @@
// RUN: -o "statistics dump" \
// RUN: %t.dwarf5 -b | FileCheck %s -check-prefix=CACHED
+// Make sure that if we load the "%t.dwarf5.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.dwp"
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf5" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.debug.dwp"
+// RUN: mv %t.dwarf5.dwp %t.dwarf5.debug.dwp
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf5.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf5.debug.dwp"
+// RUN: %lldb %t.dwarf5.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
// Now test with DWARF4
// RUN: %clang -target x86_64-pc-linux -gsplit-dwarf -gdwarf-4 -c %s -o %t.dwarf4.o
// RUN: ld.lld %t.dwarf4.o -o %t.dwarf4
@@ -69,6 +83,19 @@
// RUN: -o "statistics dump" \
// RUN: %t.dwarf4 -b | FileCheck %s -check-prefix=CACHED
+// Make sure that if we load the "%t.dwarf4.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.dwp"
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf4" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.debug.dwp"
+// RUN: mv %t.dwarf4.dwp %t.dwarf4.debug.dwp
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
+// Make sure that if we load the "%t.dwarf4.debug" file, that we can find and
+// load the .dwo file from the .dwp when it is "%t.dwarf4.debug.dwp"
+// RUN: %lldb %t.dwarf4.debug -o "b main" -b | FileCheck %s -check-prefix=DEBUG
+
// CHECK: (A) a = (x = 47)
// CACHE: script lldb.target.modules[0].FindTypes('::A').GetTypeAtIndex(0)
@@ -83,6 +110,9 @@
// CACHED-NEXT: }
// CACHED: "totalDebugInfoIndexLoadedFromCache": 1
+// Make sure debug information was loaded by verifying that the
+// DEBUG: Breakpoint 1: where = dwp-separate-debug-file.cpp.tmp.dwarf{{[45]}}.debug`main + {{[0-9]+}} at dwp-separate-debug-file.cpp:{{[0-9]+}}:{{[0-9]+}}, address = {{0x[0-9a-fA-F]+}}
+
struct A {
int x = 47;
};
|
Will this now work with .dwp files not having UUID? |
The lack of UUID is kinda why this is so important. The connection is strictly based on the filename. This just expands the variety of filenames that can be supported. One thing that's helpful is that the .gnu_debuglink can be a relative/absolute path, so it enables putting the files in a different location than right beside the binary, which is definitely an improvement. |
No. If binairies have UUIDs (GNU build IDs), they need to match right now. That is larger fix that involves adding a "enum UUIDFlavor" to the UUIDs so we can ensure we aren't comparing two different things. What Alexander is talking about is if we have a GNU build ID in @dwblaikie do you know how accurate the DWO ID is? Can we avoid relying on matching up the UUID on the .dwp file and solely rely on allowing it to be loaded and rely on the DWO IDs matching between the skeleton unit and the .dwo unit? If so, there is an easy fix I can make to this patch to solve that problem. |
Not sure I follow. For .dwo files path is described in Skeleton CU: DW_AT_comp_dir/DW_AT_dwo_name. The DWP file can have multiple CUs with different DWO IDs. |
FWIW, I think we should be opinionated (& consistent with whatever gdb does, if it has some precedent here - if it is less opinionated, then maybe we have to be accepting) when it comes to whether x.debug goes with x.dwp or x.debug.dwp - we shouldn't support both unless there's some prior precedent that's unavoidable to support. It'd be better for everyone if there was one option we encourage people to follow. (so I think we shouldn't support (2) and (3) if we can help it - we should pick one) I'm not sure I understand the "lldb loads .debug and needs to find .dwp" case. the .debug file usually only has the debug info, not the executable code, so you can't load it directly, right? (see the documentation in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html - |
Exactly. The question is, if a have one |
Nah, the DWO ID, as per spec, is a semantic hash of the DWARF contents. It should change, generally, if any part of the DWARF changes. llvm-project/llvm/lib/CodeGen/AsmPrinter/DIEHash.cpp Lines 395 to 399 in 1d4fc38
|
There currently is no enforcement on any of this from the compiler or linker driver side and everyone must do the stripping of the executable and creating the .dwp file manually. Since there is no actual enforcement, it seems like we should support both IMHO. More on this below...
If we have a build system that creates:
We might request to download debug info for this build using a GNU build ID or some other build number for say a symbolication server which wants to use LLDB. If we get this debug info, we might only download With this change people can load If we don't fix this, then people will get confused and not know how to fix things and get full debug info to get loaded. No one besides debugger, compiler, and linker engineers and very few other people actually know what split DWARF does and or means and if they can't get things to work, they give up or waste time figuring out the magic things that need to happen to make us able to load the right debug info. So I prefer to allow things to work more seamlessly without trying to enforce something that is left up to users to do in the first place. I am someone that often responds to peoples posts when I am on call and I need to help people figure out how to load their debug symbols correrctl. With split DWARF it is even worse because if they get the debug info file with the skeleton compile units but the .dwp file isn't named consistent with GDB conventions, then they can set file and line breakpoints and hit them, but as soon as they stop there, they get no variable information because the .dwo file doesn't get loaded. They don't know what a .dwo or a .dwp file is or why it is needed. We get a few posts a week with people trying to use split DWARF symbols and not being able to get things to work. So IMHO there is no harm in making all scenarios work without having to require some naming convention that isn't actually enforced by any compiler or linker driver, and when things don't work, we put the pressure on the user to find some wiki somewhere that says "here is how you were supposed to do it" and figure out how to fix the issue themselves or by seeking help from us. |
Fixed: - added tests for missing .dwp files for both DWARF4 and DWARF5 - added a tests where we create a file with a GNU build ID where the stripped executable and debug file have UUIDs and the .dwp doesn't and make sure it loads the .dwp file. - fix a dwarf4 and dwarf5 test that was testing the wrong binary - fixed comments - don't use break + return
✅ With the latest revision this PR passed the C/C++ code formatter. |
I actually added a test for this and it works just fine if the main executable ( |
Ah, OK, yeah I can see how for symbolizing that could be done with just the .debug and .dwp files, without the original executable. Thanks for explaining. (I wouldn't've expected someone to use a /debugger/ in this case, rather than llvm-symbolizer, addr2line, etc - but don't object to a debugger supporting that functionality)
My concern with supporting a wider variety of naming conventions is the risk of fragmentation and more user confusion - if there's multiple naming conventions, there's a risk that some tools end up supporting some of them but not others, and users are confused why their files work with some tools and not others. I think we'd be doing users a favor, in terms of focussing the ecosystem on fewer permutations, by being opinionated on how these things should be named. We are going to be somewhat opinionated regardless (we are using /some/ naming convention rules to derive .dwp/.debug names from other names - so we're only haggling over how many different naming conventions we should support, not whether or not we should limit users - they'll always be limited to some set of rules we pick).
Yeah, it's certainly a cost that comes with the benefits of Split DWARF - they should get suitable warnings/error messages, and perhaps those should link to an article describing what they're talking about if necessary.
"all" scenarios can't work (because we have to be able to derive one name from another) It'll always require some naming convention be met - we're only haggling over how many naming conventions we should support. And I think the harm in making more naming conventions work, is the risk that users end up with a variet of naming conventions in use, and other tools that need to handle these cases might not learn all those conventions - and so there will be varying/confusing support across a variety of tools. I think we do users a favour by limiting the ecosystem to fewer conventions. And in this case, I think that rule would be , .debug, and .dwp |
I am fine with telling people what to do and giving them a golden path to what is easiest for our debuggers. And I will suggest to everyone that they use It would also be nice if we do have a single |
Any chance we can get this in? |
m_dwp_symfile = std::make_shared<SymbolFileDWARFDwo>( | ||
*this, dwp_obj_file, DIERef::k_file_index_mask); | ||
for (const auto &symfile : symfiles.files()) { | ||
module_spec.GetSymbolFileSpec() = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a logging for each probing entries.
One thing I noticed that Microsoft debugger (like windbg) does well is they provide an option to show debug artifacts probing paths for both local and symbol servers. This provides an important self servicing capability for end users - even they do not know the debug info file locating algorithms, they can use the probing paths to figure out and rename to match the expected probing.
GetObjectFile()->GetModule(), &dwp_filespec, 0, | ||
FileSystem::Instance().GetByteSize(dwp_filespec), dwp_file_data_sp, | ||
dwp_file_data_offset); | ||
if (dwp_obj_file) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar, adding a logging saying - found a matching dwp file and use it.
Not sure I follow - one of the scenarios mentioned in this patch is "lldb loads which is stripped but has .gnu_debuglink pointing to .debug with skeleton DWARF and needs to find .debug.dwp" I don't think we should support that, for instance - we should load
Maybe? I figure once you've got to download one file, two isn't a substantial imposition... - it'd be a bit weird having a DWP file and a .debug file mashed up together, but can't see any reason it wouldn't work - with the logic of "check if this program has a cu_index in it, if so, treat it as a dwp, otherwise look for .dwp, otherwise look for the dwos". |
If the client strips the debug info first into "a.out.debug" and then runs llvm-dwp, they will end up with a "a.out.debug.dwp". We have clients that are doing this already and we want to support them. The compiler and linker drivers are staying out of this and we expect people to do this on their own, so this is what we end up with when there is no enforcement. I am not sure why this is such a sticking point. Lets make the debugger work for people.
Just more things that can go wrong for clients as they try to use split DWARF. But this fix isn't about that, that was just a tangent. |
Logging can be enabled with "log enable dwarf split". Also added logging to some tests.
OK, could we fix llvm-dwp to match the behavior, then? If the file has a .debug extension, strip that and add the .dwp extension.
They aren't doing it on their own though - they're using llvm-dwp and its defaults (they're passing it a .debug file and getting a .debug.dwp file - it's the defaults you/we are worried about, and how to make other tools work well with those defaults). We can change those defaults if they don't work well/don't create a consistent environment.
As I explained above - my concern is that supporting a wider variety of ways these files can be named/arranged means more variants that need to be supported across a variety of tooling (symbolizers and debuggers - not just LLVM's but binutils, etc too). But that's my 2c - if LLDB owners prefer this direction, so be it. Wouldn't mind hearing some other people's perspectives on the issues around limiting variation here. |
We currently have this happening here at Facebook all over the place. We have tools that convert DWARF to GSYM and they don't need the stripped executable at all, they
Here people are not using ".debug", but are using ".debuginfo"... Again, nothing is enforced and people are left to use llvm-objcopy + llvm-dwp how ever they want. Getting a solution that does everything might be nice. Any thoughts on modifying llvm-dwp to be able to do all of this and provide some path for people where it can either just create a .dwp file for a given executable or it can create a
I am happy to hear any other opinions as well. I tend to want to make my life easier and ease the support burden I run into everyday where people that know nothing about split DWARF are trying to use it and failing and require tech support to make it work for them. I am happy to suggest a path to follow, in fact I am going to write up the best practices on a DWARF group here at work that I can point poeple to. |
They could only use that for symbolizing, yeah? They wouldn't be able to debug their binary, because a debugger wouldn't know that, given the stripped binary, they need to append
I don't know that llvm-dwp needs to do everything, that's a bit against the grain of *nix tool design. But certainly llvm-objcopy could be more ergonomic (like a one-shot, that both strips the debug info from the binary, and produces the keep-only-debug in So I guess that'd be the smallest two features I'd suggest starting with
A wrapper script or program (maybe it could be built into llvm-objcopy, but I'm a little hesitant there, but would be open to other folks opinion on it for sure) that ingests the binary and produces the 3 products (stripped binary, binary.debug, binary.dwp) seems easy enough to provide.
Yeah, I do want people to not have problems here/for things to work - I worry that making too many different things work is good short term (yay, many users are happy without having to change anything), but harmful long term when there's a wider variety of ways people do things and then new tools have to learn all those ways (& probably don't learn them all in one go - so we end up with different tools implementing subsets of all possible lookup rules, which is more confusing/problematic for users). |
Just my 2 cents as a "random dude who works on DWARF". The interoperability of various gnu extensions and DWARF spec is not well defined. Which leads to situations like this. It would be great if tools took an option away from the user that "create" debug information or at least restricted so that all of this was standardized, but currently they are not. Even if we change them there are still legacy builds that is a free for all. |
Is there a website or something that details how to correctly save symbols for split DWARF? Is there an existing tool people use? If the answer is no, I would like to support all variations for now. I am happy to emit a warning with a URL for best practices when it comes to emitting and archiving split DWARF if a user is doing something we don't want them to. But if we have no enforcements anywhere and no website that details how to do it correctly, just saying "well you should know to do it the right way but we don't provide any documentation on how to do i t right and our debuggers or symbolizers won't load your info" doesn't make for a great user experience. The only contentious think in this patch seems to be allowing |
I believe the concern regarding standardizing the methods for locating debug information files is valid. Based on my experience, the only reliable approach to achieving this is either documenting it in a well-agreed spec, document, or website, or enforcing all symbolication tools to utilize a shared implementation. Microsoft employs the latter approach by mandating that all tools (such as Windbg, Microsoft Visual Studio debugger, Sysinternals tools, UMDH, etc.) share/use msdia.dll for symbolication. Additionally, as I mentioned in previous inline comments, the most useful feature for these symbolication tools is detailed symbol search paths logging/callbacks, which enable users to self-discover the implementation rules. In my opinion, improving in this direction is much more useful than enforcing consistent implementations across all tools, and it is also cheaper/easier to achieve. If a user can figure out through logging where their dwp/debuginfo/dSYM files are being searched by the debugger/tools, they are happy to self-fix the issue without concern for the implementation/consistency. Since there is currently no standard/official documentation yet, I will accept the patch to unblock the customers. Feel free to continue the discussion and improve the experience for users. |
When using split DWARF we can run into many different ways to store debug info:
<exe>
which contains skeleton DWARF and needs to find<exe>.dwp
<exe>
which is stripped but has .gnu_debuglink pointing to<exe>.debug
with skeleton DWARF and needs to find<exe>.dwp
<exe>
which is stripped but has .gnu_debuglink pointing to<exe>.debug
with skeleton DWARF and needs to find<exe>.debug.dwp
<exe>.debug
and needs to find<exe>.dwp
Previously we only handled the first two cases. This patch adds support for the latter two.