[llvm][dwarf][rfc][donotcommit] Enable print of ranges addresses from .debug_info.dwo #65516

ayermolo · 2023-09-06T18:52:24Z

Summary:
For split dwarf some of the sections remain in the main binary. For DWARF4 it's
.debug_ranges, .debug_addr. For DWARF5 it's .debug_addr. When using
llvm-dwarfdump on .dwo/.dwp files this results in not being able to see what ranges
and addresses for DW_AT_low_pc are used in DIEs, and output having "Error: " in it.

I added a new option --main-binary= that will create a link in
DWARFContext between DWO context and main binary. This allows tool to display
addresses for DW_AT_ranges and DW_AT_low_pc.

Example (DWARF5):

DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000b21 "flush_RL")
DW_AT_ranges (indexed (0x0) rangelist = 0x00000054
[0x0000000000403fe2, 0x0000000000403ff3)
[0x0000000000403ff6, 0x0000000000403ffe))

DW_TAG_subprogram
DW_AT_low_pc (0x0000000000403940)

Original phab review: https://reviews.llvm.org/D159374

… .debug_info.dwo Summary: For split dwarf some of the sections remain in the main binary. For DWARF4 it's .debug_ranges, .debug_addr. For DWARF5 it's .debug_addr. When using llvm-dwarfdump on .dwo/.dwp files this results in not being able to see what ranges and addresses for DW_AT_low_pc are used in DIEs, and output having "Error: " in it. I added a new option --main-binary=<binary> that will create a link in DWARFContext between DWO context and main binary. This allows tool to display addresses for DW_AT_ranges and DW_AT_low_pc. Example (DWARF5): DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x00000b21 "flush_RL") DW_AT_ranges (indexed (0x0) rangelist = 0x00000054 [0x0000000000403fe2, 0x0000000000403ff3) [0x0000000000403ff6, 0x0000000000403ffe)) DW_TAG_subprogram DW_AT_low_pc (0x0000000000403940)

adrian-prantl

If I have to specify the dwo and the .o simultaneously, why would I do that instead of just running dwarfdump on the .o (which will follow links to the .dwo)?

This is also missing a test.

adrian-prantl · 2023-09-08T21:21:39Z

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h

@@ -125,6 +125,12 @@ class DWARFContext : public DIContext {
  DWARFUnitVector &getDWOUnits(bool Lazy = false);

  std::unique_ptr<const DWARFObject> DObj;
+  /// Can be optionally set by tools that work with .dwo/.dwp files to reference
+  /// main binary debug information. Usefull for accessing .debug_ranges and


adrian-prantl · 2023-09-08T21:23:36Z

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp

@@ -1004,21 +1006,47 @@ void DWARFContext::dump(
                 DObj->getAbbrevDWOSection()))
    getDebugAbbrevDWO()->dump(OS);

+  std::unordered_map<uint64_t, DWARFUnit *> DwoIDToDUMap;


We usually use llvm::DenseMap.
https://llvm.org/docs/CodingStandards.html#c-standard-library

ayermolo · 2023-09-08T21:41:07Z

If I have to specify the dwo and the .o simultaneously, why would I do that instead of just running dwarfdump on the .o (which will follow links to the .dwo)?

This is also missing a test.

Not sure what you mean. llvm-dwarfdump won't follow the DW_AT_comp_dr+DW_AT_dwo_name to the .dwo file if you use it on the the main elf binary. The idea behind the patch is being able to output .dwo/.dwp directly with all the references to .debug_ranges (DWARF4) .debug_addr resolved.

Forgot to copy a comment from original phab review. I was planning on adding tests once it's not clear this is not dead on arrival. :)

dwblaikie · 2023-09-16T03:51:59Z

(be good to have a link to the original phab review to make it clear whatever context is being carried from there - I think I commented on the phab version of this)

Yeah - currently if you dumped just the binary, it wouldn't dump associated dwo/dwp files - but we could make that happen & then it wouldn't need another input file argument (though might still need another flag to say whetwher to do this deep dumping, versus the current shallow behavior)

That could also use the existing filter flags that only dump part of the input file to avoid dumping, say, the whole dwp file or all the dwo files.

I don't feel /too/ strongly, but I do rather like the idea of relying on the existing paths/references in the format, rather than adding a new one.

ayermolo · 2023-09-18T17:12:54Z

(be good to have a link to the original phab review to make it clear whatever context is being carried from there - I think I commented on the phab version of this)

Yeah - currently if you dumped just the binary, it wouldn't dump associated dwo/dwp files - but we could make that happen & then it wouldn't need another input file argument (though might still need another flag to say whetwher to do this deep dumping, versus the current shallow behavior)

That could also use the existing filter flags that only dump part of the input file to avoid dumping, say, the whole dwp file or all the dwo files.

I don't feel /too/ strongly, but I do rather like the idea of relying on the existing paths/references in the format, rather than adding a new one.

Maybe I am missing something, but one downside going from main binary and using existing references (getNonSkeletonCU?), is that we will need to parse through all the CUs in the main binary to match DWO ID and figure out for which CU we need to get non-skeleton portion. In DWARF4 it's not part of the header either :( For pure split dwarf it's not that big of a deal, but for more complex build where there there are hundreds of megabytes of monolithic DWARF this can get annoying fast for the user.

Do you think I need to add anyone else to this review?

dwblaikie · 2023-09-18T18:45:59Z

(be good to have a link to the original phab review to make it clear whatever context is being carried from there - I think I commented on the phab version of this)
Yeah - currently if you dumped just the binary, it wouldn't dump associated dwo/dwp files - but we could make that happen & then it wouldn't need another input file argument (though might still need another flag to say whetwher to do this deep dumping, versus the current shallow behavior)
That could also use the existing filter flags that only dump part of the input file to avoid dumping, say, the whole dwp file or all the dwo files.
I don't feel /too/ strongly, but I do rather like the idea of relying on the existing paths/references in the format, rather than adding a new one.

Maybe I am missing something, but one downside going from main binary and using existing references (getNonSkeletonCU?), is that we will need to parse through all the CUs in the main binary to match DWO ID and figure out for which CU we need to get non-skeleton portion. In DWARF4 it's not part of the header either :( For pure split dwarf it's not that big of a deal, but for more complex build where there there are hundreds of megabytes of monolithic DWARF this can get annoying fast for the user.

Yeah, I was suggesting/thinking that the user could rely on --lookup or --find, but that might not always provide the features you want.

What's the use case you're interested in, I guess? If you didn't have Split DWARF, what would you be doing with the DWARF/trying to find out? (might be helpful to build features that aren't Split DWARF-specific, but work with Split DWARF too)

Do you think I need to add anyone else to this review?

Don't think so?

ayermolo · 2023-09-18T18:50:49Z

(be good to have a link to the original phab review to make it clear whatever context is being carried from there - I think I commented on the phab version of this)
Yeah - currently if you dumped just the binary, it wouldn't dump associated dwo/dwp files - but we could make that happen & then it wouldn't need another input file argument (though might still need another flag to say whetwher to do this deep dumping, versus the current shallow behavior)
That could also use the existing filter flags that only dump part of the input file to avoid dumping, say, the whole dwp file or all the dwo files.
I don't feel /too/ strongly, but I do rather like the idea of relying on the existing paths/references in the format, rather than adding a new one.

Maybe I am missing something, but one downside going from main binary and using existing references (getNonSkeletonCU?), is that we will need to parse through all the CUs in the main binary to match DWO ID and figure out for which CU we need to get non-skeleton portion. In DWARF4 it's not part of the header either :( For pure split dwarf it's not that big of a deal, but for more complex build where there there are hundreds of megabytes of monolithic DWARF this can get annoying fast for the user.

Yeah, I was suggesting/thinking that the user could rely on --lookup or --find, but that might not always provide the features you want.

What's the use case you're interested in, I guess? If you didn't have Split DWARF, what would you be doing with the DWARF/trying to find out? (might be helpful to build features that aren't Split DWARF-specific, but work with Split DWARF too)

Do you think I need to add anyone else to this review?

Don't think so?

This is 100% split dwarf specific. Main usage model is to be able to dump out .debug_info.dwo either full or specific dies (with usual optional parent/children flags), and see addresses and ranges. So to bring the functionality of monolithic dwarf to split-dwarf space.

dwblaikie · 2023-09-18T20:21:42Z

This is 100% split dwarf specific. Main usage model is to be able to dump out .debug_info.dwo either full or specific dies (with usual optional parent/children flags), and see addresses and ranges. So to bring the functionality of monolithic dwarf to split-dwarf space.

I'm trying to ask about the higher level goal - what's the problem you were trying to address by dumping this information? Presumably you were looking to see if some piece of the DWARF encoded some data correctly, etc? And what I'm asking is, if the binary hadn't been built with Split DWARF, what would be the right tools to help dump just the data you're interested in? (eg: if you were looking at a specific function and wanted to see how the address ranges of the function were emitted - the ability to search the DWARF by function name, using the index, or scoping it to a specific file could be useful - and that could be suitable with or without Split DWARF)

And if we had /that/ tool, and made it work with Split DWARF too, then we'd have a more unified mechanism for answering that sort of question.

ayermolo · 2023-09-18T22:13:05Z

This is 100% split dwarf specific. Main usage model is to be able to dump out .debug_info.dwo either full or specific dies (with usual optional parent/children flags), and see addresses and ranges. So to bring the functionality of monolithic dwarf to split-dwarf space.

I'm trying to ask about the higher level goal - what's the problem you were trying to address by dumping this information? Presumably you were looking to see if some piece of the DWARF encoded some data correctly, etc? And what I'm asking is, if the binary hadn't been built with Split DWARF, what would be the right tools to help dump just the data you're interested in? (eg: if you were looking at a specific function and wanted to see how the address ranges of the function were emitted - the ability to search the DWARF by function name, using the index, or scoping it to a specific file could be useful - and that could be suitable with or without Split DWARF)

And if we had /that/ tool, and made it work with Split DWARF too, then we'd have a more unified mechanism for answering that sort of question.

Without the split dwarf the correct tool still would have been llvm-dwarfdump. The goal is to be able to look at "random" DIEs, or all DIEs, and see full debug information about that DIE. One concrete example is verify that debug information BOLT outputs is correct. This came up during internal verification of llvm-gsymutil. It was reporting error that address range was not in it's parent. At high level right now when you look at output of llvm-dwarfdump for monolithic case you can query CU for all the dies, or random DIE and see full information that includes address and ranges. This functionality is missing when we enable split dwarf and try to output context of .debug_info.dwo section. This requires manual effort of figuring out what correct offset is in .debug_ranges/.debug_addr from CU, adding correct index, etc.

What I would like is to have a functionality in existing tool, llvm-dwarfdump, to display the same information for monolithic case and for split-dwarf. Preferably as fast as possible.

clayborg · 2023-09-19T01:25:36Z

I posted the PR for doing it the other way where we specify the main binary:

#66726

ayermolo · 2023-11-07T01:22:50Z

@dwblaikie Should I close this, or WDYT?

dwblaikie · 2023-11-07T17:51:38Z

@dwblaikie Should I close this, or WDYT?

Yeah - sorry about this. I know the usability tradeoff either way (dwo->exe, exe->dwo) isn't super smooth either way, but yeah - let's continue over on the other review.

ayermolo requested a review from a team as a code owner September 6, 2023 18:52

ayermolo requested review from clayborg, dwblaikie, JDevlieghere and adrian-prantl September 6, 2023 18:58

adrian-prantl reviewed Sep 8, 2023

View reviewed changes

ayermolo closed this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm][dwarf][rfc][donotcommit] Enable print of ranges addresses from .debug_info.dwo #65516

[llvm][dwarf][rfc][donotcommit] Enable print of ranges addresses from .debug_info.dwo #65516

Uh oh!

ayermolo commented Sep 6, 2023 •

edited

Loading

Uh oh!

adrian-prantl left a comment

Uh oh!

adrian-prantl Sep 8, 2023

Uh oh!

adrian-prantl Sep 8, 2023

Uh oh!

ayermolo commented Sep 8, 2023

Uh oh!

dwblaikie commented Sep 16, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

dwblaikie commented Sep 18, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

dwblaikie commented Sep 18, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

clayborg commented Sep 19, 2023

Uh oh!

ayermolo commented Nov 7, 2023

Uh oh!

dwblaikie commented Nov 7, 2023

Uh oh!

Uh oh!

[llvm][dwarf][rfc][donotcommit] Enable print of ranges addresses from .debug_info.dwo #65516

[llvm][dwarf][rfc][donotcommit] Enable print of ranges addresses from .debug_info.dwo #65516

Uh oh!

Conversation

ayermolo commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

adrian-prantl Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

adrian-prantl Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

ayermolo commented Sep 8, 2023

Uh oh!

dwblaikie commented Sep 16, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

dwblaikie commented Sep 18, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

dwblaikie commented Sep 18, 2023

Uh oh!

ayermolo commented Sep 18, 2023

Uh oh!

clayborg commented Sep 19, 2023

Uh oh!

ayermolo commented Nov 7, 2023

Uh oh!

dwblaikie commented Nov 7, 2023

Uh oh!

Uh oh!

ayermolo commented Sep 6, 2023 •

edited

Loading