-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm][DebugInfo] Attach object-pointer to DISubprogram declarations #122742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][DebugInfo] Attach object-pointer to DISubprogram declarations #122742
Conversation
Currently Clang only attaches `DW_AT_object_pointer` to `DW_TAG_subprogram` definitions. LLDB constructs C++ method types from their `DW_TAG_subprogram` declaration, which is also the point at which it needs to determine whether a method is static or not. LLDB's heuristic for this could be very simple if we emitted `DW_AT_object_pointer` on declarations. But since we don't, LLDB has to guess whether an argument is an implicit object parameter based on the DW_AT_name and DW_AT_type. To simplify LLDB's job (and to eventually support C++23's explicit object parameters), this patch adds the `DIFlagObjectPointer` to `DISubprogram` declarations. For reference, GCC attaches the object-pointer DIE to both the definition and declaration: https://godbolt.org/z/3TWjTfWon Fixes llvm#120973
@llvm/pr-subscribers-backend-nvptx @llvm/pr-subscribers-debuginfo Author: Michael Buch (Michael137) ChangesCurrently Clang only attaches To simplify LLDB's job (and to eventually support C++23's explicit object parameters), this patch adds the For reference, GCC attaches the object-pointer DIE to both the definition and declaration: https://godbolt.org/z/3TWjTfWon Fixes #120973 Full diff: https://github.com/llvm/llvm-project/pull/122742.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index 0a8a1ad38c959f..d3450b8b0556fd 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -849,7 +849,9 @@ void DwarfUnit::constructTypeDIE(DIE &Buffer, const DIDerivedType *DTy) {
}
}
-void DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
+DIE *DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
+ // Args[0] is the return type.
+ DIE *ObjectPointer = nullptr;
for (unsigned i = 1, N = Args.size(); i < N; ++i) {
const DIType *Ty = Args[i];
if (!Ty) {
@@ -860,8 +862,14 @@ void DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
addType(Arg, Ty);
if (Ty->isArtificial())
addFlag(Arg, dwarf::DW_AT_artificial);
+ if (Ty->isObjectPointer()) {
+ assert(!ObjectPointer && "Can't have more than one object pointer");
+ ObjectPointer = &Arg;
+ }
}
}
+
+ return ObjectPointer;
}
void DwarfUnit::constructTypeDIE(DIE &Buffer, const DISubroutineType *CTy) {
@@ -1358,7 +1366,8 @@ void DwarfUnit::applySubprogramAttributes(const DISubprogram *SP, DIE &SPDie,
// Add arguments. Do not add arguments for subprogram definition. They will
// be handled while processing variables.
- constructSubprogramArguments(SPDie, Args);
+ if (auto *ObjectPointer = constructSubprogramArguments(SPDie, Args))
+ addDIEEntry(SPDie, dwarf::DW_AT_object_pointer, *ObjectPointer);
}
addThrownTypes(SPDie, SP->getThrownTypes());
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
index 163205378fb4b6..7a5295d826a483 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
@@ -268,7 +268,9 @@ class DwarfUnit : public DIEUnit {
void constructContainingTypeDIEs();
/// Construct function argument DIEs.
- void constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args);
+ ///
+ /// \returns DIE of the object pointer if one exists. Nullptr otherwise.
+ DIE *constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args);
/// Create a DIE with the given Tag, add the DIE to its parent, and
/// call insertDIE if MD is not null.
diff --git a/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll b/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
index d9988ac31451e2..30d4203466766e 100644
--- a/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
+++ b/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
@@ -5,7 +5,15 @@
; CHECK-NOT: ""
; CHECK: DW_TAG
; CHECK: DW_TAG_class_type
-; CHECK: DW_AT_object_pointer [DW_FORM_ref4] (cu + 0x{{[0-9a-f]*}} => {[[PARAM:0x[0-9a-f]*]]})
+; CHECK: [[DECL:0x[0-9a-f]+]]: DW_TAG_subprogram
+; CHECK: DW_AT_name {{.*}} "A"
+; CHECK: DW_AT_object_pointer [DW_FORM_ref4]
+; CHECK-SAME: (cu + 0x{{[0-9a-f]*}} => {[[DECL_PARAM:0x[0-9a-f]*]]})
+; CHECK: [[DECL_PARAM]]: DW_TAG_formal_parameter
+;
+; CHECK: DW_TAG_subprogram
+; CHECK: DW_AT_specification [DW_FORM_ref4] (cu + {{.*}} => {[[DECL]]}
+; CHECK: DW_AT_object_pointer [DW_FORM_ref4] (cu + 0x{{[0-9a-f]*}} => {[[PARAM:0x[0-9a-f]*]]})
; CHECK: [[PARAM]]: DW_TAG_formal_parameter
; CHECK-NOT: DW_TAG
; CHECK: DW_AT_name [DW_FORM_strp] ( .debug_str[0x{{[0-9a-f]*}}] = "this")
diff --git a/llvm/test/DebugInfo/X86/dwarf-public-names.ll b/llvm/test/DebugInfo/X86/dwarf-public-names.ll
index c2274511d4191f..a484c094892d0c 100644
--- a/llvm/test/DebugInfo/X86/dwarf-public-names.ll
+++ b/llvm/test/DebugInfo/X86/dwarf-public-names.ll
@@ -61,7 +61,7 @@
; Skip the output to the header of the pubnames section.
; LINUX: debug_pubnames
-; LINUX-NEXT: unit_size = 0x00000128
+; LINUX-NEXT: unit_size =
; Check for each name in the output.
; LINUX-DAG: "ns"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since other compilers also do it this way, that sounds fine to me.
Not sure who maintains the NVPTX debug-info test, but it doesn't seem very maintainable. The hardcoded addresses and giant amount of debug-info makes it pretty brittle against changes like this |
Yeah, that's pretty uncool. @alexey-bataev could you please fix this test case (& any others like it) to be more in line with the testing done for other targets? |
It was discussed already a long time ago. There is no other way to test debug info for NVPTX target, since LLVM can generate only ptx files |
This is a bit of a maintenance burden - might be worth revisiting. But I take it "ptx files" amounts to "assembly files for which we don't have an assembler (integrated or otherwise)" that can be used in tests? OK - though we could still test more narrowly - the same way we test DWARF output where it's generated then dumped. We don't test every line of the dumped output, we just test specific parts of it for the features we're looking for. Without the symbolic output it's a bit tedious - could rely on the comments that are produced and test against those rather than the instructions themselves? (at least generally - could have a really small test that tests a handful of the assembly itself) - and perhaps the output could be more symbolic, doing label differences, etc, so that it's more stable/less prone to additions or removals throwing off the offsets? (or perhaps most of that stability can be gained with the usual filecheck matching that we do in the dwarfdump driven tests) |
No, it cannot be, nvptx do not support label differences, it is very old DWARF2 +some extensions |
That's one of several suggestions I made - good to understand why that particular one is infeasible. But scoping down the testing would be good. Like maybe don't check the abbrevs at all in most cases, just check the annotated comments that describe the values expected. Maybe include the values in the comments (like the tag comments include the abbrev number and DIE offset - the DW_AT_name comment could include the string name, rather than matching 20 lines of ASCII integers) It's also just a huge generic test covering 200 functions. This isn't generally how LLVM is tested - small, isolated examples using a handful (usually one or two, almost definitely single digits) of functions and instructions, not integration tests exercising all manner of functionality (the name's a hint too - it's called "debug-info.ll" rather than testing some specific feature of debug info) |
Not sure it will reduce the maintenance burden significantly
Am I missing something? There is only one small function |
Forgot to mention. Ptx does not support string literals in debug info, only byte/word/etc. values (at least before, not sure it has changed). |
Last thing. These tests not only test the debug info, but also its format in ptx. It checks that there is no label arithmetics, no strings, etc. |
…efinition with that of a declaration In llvm#122742 we will start attaching DW_AT_object_pointer to method declarations (in addition to definitions). Currently when LLDB parses a `DW_TAG_subprogram` definition, it will parse all the attributes of the declaration as well. If we have `DW_AT_object_pointer` on both, then we would overwrite the more specific attribute that we got from the defintion with the one from the specification. This is problematic because LLDB relies on getting the `DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the specification.
Currently blocked on #123089 (this will fix the failing LLDB |
I'm pretty sure anything would be better than doing nothing here.
While there's only one llvm Function, there's 200 DISubprograms in the DWARF - a bunch of intrinsics, by the looks of it.
Testing for the absence of things may be better done with a separate invocation using CHECK-NOT, or --implicit-check-not. Without needing to check every line of output.
Right, I figured as much - that's why I was suggesting having the comments include the textual string and checking against that, rather than against every numeric character code. |
…efinition with that of a declaration (#123089) In #122742 we will start attaching DW_AT_object_pointer to method declarations (in addition to definitions). Currently when LLDB parses a `DW_TAG_subprogram` definition, it will parse all the attributes of the declaration as well. If we have `DW_AT_object_pointer` on both, then we would overwrite the more specific attribute that we got from the defintion with the one from the specification. This is problematic because LLDB relies on getting the `DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the specification. Note GCC does attach `DW_AT_object_pointer` on declarations *and* definitions already (see https://godbolt.org/z/G1GvddY48), so there's definitely some expressions that will fail for GCC compiled binaries. This patch will fix those cases (e.g., I would expect `TestConstThis.py` to fail with GCC).
…ointer of definition with that of a declaration (#123089) In llvm/llvm-project#122742 we will start attaching DW_AT_object_pointer to method declarations (in addition to definitions). Currently when LLDB parses a `DW_TAG_subprogram` definition, it will parse all the attributes of the declaration as well. If we have `DW_AT_object_pointer` on both, then we would overwrite the more specific attribute that we got from the defintion with the one from the specification. This is problematic because LLDB relies on getting the `DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the specification. Note GCC does attach `DW_AT_object_pointer` on declarations *and* definitions already (see https://godbolt.org/z/G1GvddY48), so there's definitely some expressions that will fail for GCC compiled binaries. This patch will fix those cases (e.g., I would expect `TestConstThis.py` to fail with GCC).
We're seeing a 5-10% growth in .debug_info size (well, zstd compressed .debug_info.dwo size, to be more specific) that seems worrying. Might be worth reverting this and doing some size analysis to understand the cost involved? Would you like more data before that? |
Thanks for pinging A debug-info size increase was expected given this affects all method declarations. Though 10% does seem like a lot. Any ideas what we could do to reduce some of it? We do really need a way to determine the object pointer from declarations in LLDB. When parsing a method declaration, doing a lookup for a function definition to find an object pointer feels a bit awkward but i guess it would be an option |
Given the size of the increase, I'm wondering if there's some bug/issue with the patch. If not, we could consider alternate solutions, representation changes, etc. |
DW_FORM_implicit_const |
That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.
I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues |
goes to look up the right form Oh, that's what I was thinking of: DW_FORM_implicit_const - not sure if we use that yet, but it'd probably be the right thing to use here, because every method declaration would have the same/only one value for this (zero). I imagine that /should/ address any size issue. But the 5-10% regression seems so large that it feels like it might be caused by some unanticipated side effects of this patch, somehow - so I worry that trying fixes without understanding the current impact might not be as effective as we'd like :/ |
Yea that's fair. Do you have a more detailed size report by any chance? The patch is quite trivial. It should've just attached |
Finest grained we have is by section. Mostly abbrev and info section size increases - one small .debug_str size increase, but I imagine that's incidental/not related (maybe path differences between A/B, for instance). The abbrev increase isn't such a big deal (abbrev is a smaller section). Do you have any mechanisms for measuring size regressions? Otherwise I guess trying a bootstrap build and look at dsym size for the clang binary before/after the change? |
Nothing automated, but yea I'll kick off a bootstrap build now and check |
Also, re. the |
Yeah - this is one of those "DWARF is permissive" kind of things - the spec is a sign, not a cop, if we try something novel and it works out OK, we could propose it for standardization, etc (which might end up going a different way, of course - perhaps the DWARF committee figures out some other dense encoding - like a DIE-relative DIE offset in a ULEB?). We'd only want to do this when tuning for lldb, since it'd be useless at best and confusing at worst to a consumer not expecting it. |
Here are some results of the bootstrapped build (with and without this patch reverted and also replacing the Just looking at the dwarfdump of the top-of-tree build, I don't see anything surprising that would indicate a bug here. A lot more methods just have the Based on this, are you fine with going ahead with changing the |
…with DW_FORM_implicit_const We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't affect the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
Here's what the implicit_const version looks like: #124790 that I used to come up with the above numbers |
…with DW_FORM_implicit_const We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't affect the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
Though now that i think about those stats again, it does feel weird that we're getting that many more object pointer occurrences...will double check what that's about |
Nevermind from what I can tell it really is just the number of extra |
…rations (llvm#122742)" This introduces a substantial (5-10%) regression in .debug_info size, so we're discussing alternatives in llvm#122742 and llvm#124790. This reverts commit 7c72941.
…with DW_FORM_implicit_const (#124790) We started attaching `DW_AT_object_pointer`s on method declarations in #122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. The value of `DW_FORM_implicit_const` will be the *index* of the object parameter in the list of formal parameters of the subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec only mentions the use of the `reference` attribute class to for `DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an extension to (and not something mandated/specified by) the standard. Though it'd make sense to extend the wording in the spec to allow for this optimization. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't impact the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
…clarations with DW_FORM_implicit_const (#124790) We started attaching `DW_AT_object_pointer`s on method declarations in llvm/llvm-project#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. The value of `DW_FORM_implicit_const` will be the *index* of the object parameter in the list of formal parameters of the subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec only mentions the use of the `reference` attribute class to for `DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an extension to (and not something mandated/specified by) the standard. Though it'd make sense to extend the wording in the spec to allow for this optimization. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't impact the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
…with DW_FORM_implicit_const (llvm#124790) We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. The value of `DW_FORM_implicit_const` will be the *index* of the object parameter in the list of formal parameters of the subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec only mentions the use of the `reference` attribute class to for `DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an extension to (and not something mandated/specified by) the standard. Though it'd make sense to extend the wording in the spec to allow for this optimization. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't impact the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
…with DW_FORM_implicit_const (llvm#124790) We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. The value of `DW_FORM_implicit_const` will be the *index* of the object parameter in the list of formal parameters of the subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec only mentions the use of the `reference` attribute class to for `DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an extension to (and not something mandated/specified by) the standard. Though it'd make sense to extend the wording in the spec to allow for this optimization. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't impact the total debug-info size much however). We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
Currently Clang only attaches
DW_AT_object_pointer
toDW_TAG_subprogram
definitions. LLDB constructs C++ method types from theirDW_TAG_subprogram
declaration, which is also the point at which it needs to determine whether a method is static or not. LLDB's heuristic for this could be very simple if we emittedDW_AT_object_pointer
on declarations. But since we don't, LLDB has to guess whether an argument is an implicit object parameter based on the DW_AT_name and DW_AT_type.To simplify LLDB's job (and to eventually support C++23's explicit object parameters), this patch adds the
DIFlagObjectPointer
toDISubprogram
declarations.For reference, GCC attaches the object-pointer DIE to both the definition and declaration: https://godbolt.org/z/3TWjTfWon
Fixes #120973