Skip to content

[llvm][DebugInfo] Attach object-pointer to DISubprogram declarations #122742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

Michael137
Copy link
Member

Currently Clang only attaches DW_AT_object_pointer to DW_TAG_subprogram definitions. LLDB constructs C++ method types from their DW_TAG_subprogram declaration, which is also the point at which it needs to determine whether a method is static or not. LLDB's heuristic for this could be very simple if we emitted DW_AT_object_pointer on declarations. But since we don't, LLDB has to guess whether an argument is an implicit object parameter based on the DW_AT_name and DW_AT_type.

To simplify LLDB's job (and to eventually support C++23's explicit object parameters), this patch adds the DIFlagObjectPointer to DISubprogram declarations.

For reference, GCC attaches the object-pointer DIE to both the definition and declaration: https://godbolt.org/z/3TWjTfWon

Fixes #120973

Currently Clang only attaches `DW_AT_object_pointer` to
`DW_TAG_subprogram` definitions. LLDB constructs C++ method
types from their `DW_TAG_subprogram` declaration, which is also
the point at which it needs to determine whether a method is
static or not. LLDB's heuristic for this could be very simple if we
emitted `DW_AT_object_pointer` on declarations. But since we don't,
LLDB has to guess whether an argument is an implicit object parameter
based on the DW_AT_name and DW_AT_type.

To simplify LLDB's job (and to eventually support C++23's explicit
object parameters), this patch adds the `DIFlagObjectPointer` to
`DISubprogram` declarations.

For reference, GCC attaches the object-pointer DIE to both the
definition and declaration: https://godbolt.org/z/3TWjTfWon

Fixes llvm#120973
@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-debuginfo

Author: Michael Buch (Michael137)

Changes

Currently Clang only attaches DW_AT_object_pointer to DW_TAG_subprogram definitions. LLDB constructs C++ method types from their DW_TAG_subprogram declaration, which is also the point at which it needs to determine whether a method is static or not. LLDB's heuristic for this could be very simple if we emitted DW_AT_object_pointer on declarations. But since we don't, LLDB has to guess whether an argument is an implicit object parameter based on the DW_AT_name and DW_AT_type.

To simplify LLDB's job (and to eventually support C++23's explicit object parameters), this patch adds the DIFlagObjectPointer to DISubprogram declarations.

For reference, GCC attaches the object-pointer DIE to both the definition and declaration: https://godbolt.org/z/3TWjTfWon

Fixes #120973


Full diff: https://github.com/llvm/llvm-project/pull/122742.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+11-2)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h (+3-1)
  • (modified) llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll (+9-1)
  • (modified) llvm/test/DebugInfo/X86/dwarf-public-names.ll (+1-1)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index 0a8a1ad38c959f..d3450b8b0556fd 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -849,7 +849,9 @@ void DwarfUnit::constructTypeDIE(DIE &Buffer, const DIDerivedType *DTy) {
   }
 }
 
-void DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
+DIE *DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
+  // Args[0] is the return type.
+  DIE *ObjectPointer = nullptr;
   for (unsigned i = 1, N = Args.size(); i < N; ++i) {
     const DIType *Ty = Args[i];
     if (!Ty) {
@@ -860,8 +862,14 @@ void DwarfUnit::constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args) {
       addType(Arg, Ty);
       if (Ty->isArtificial())
         addFlag(Arg, dwarf::DW_AT_artificial);
+      if (Ty->isObjectPointer()) {
+        assert(!ObjectPointer && "Can't have more than one object pointer");
+        ObjectPointer = &Arg;
+      }
     }
   }
+
+  return ObjectPointer;
 }
 
 void DwarfUnit::constructTypeDIE(DIE &Buffer, const DISubroutineType *CTy) {
@@ -1358,7 +1366,8 @@ void DwarfUnit::applySubprogramAttributes(const DISubprogram *SP, DIE &SPDie,
 
     // Add arguments. Do not add arguments for subprogram definition. They will
     // be handled while processing variables.
-    constructSubprogramArguments(SPDie, Args);
+    if (auto *ObjectPointer = constructSubprogramArguments(SPDie, Args))
+      addDIEEntry(SPDie, dwarf::DW_AT_object_pointer, *ObjectPointer);
   }
 
   addThrownTypes(SPDie, SP->getThrownTypes());
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
index 163205378fb4b6..7a5295d826a483 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.h
@@ -268,7 +268,9 @@ class DwarfUnit : public DIEUnit {
   void constructContainingTypeDIEs();
 
   /// Construct function argument DIEs.
-  void constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args);
+  ///
+  /// \returns DIE of the object pointer if one exists. Nullptr otherwise.
+  DIE *constructSubprogramArguments(DIE &Buffer, DITypeRefArray Args);
 
   /// Create a DIE with the given Tag, add the DIE to its parent, and
   /// call insertDIE if MD is not null.
diff --git a/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll b/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
index d9988ac31451e2..30d4203466766e 100644
--- a/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
+++ b/llvm/test/DebugInfo/X86/DW_AT_object_pointer.ll
@@ -5,7 +5,15 @@
 ; CHECK-NOT: ""
 ; CHECK: DW_TAG
 ; CHECK: DW_TAG_class_type
-; CHECK: DW_AT_object_pointer [DW_FORM_ref4]     (cu + 0x{{[0-9a-f]*}} => {[[PARAM:0x[0-9a-f]*]]})
+; CHECK: [[DECL:0x[0-9a-f]+]]: DW_TAG_subprogram
+; CHECK:                         DW_AT_name {{.*}} "A"
+; CHECK:                         DW_AT_object_pointer [DW_FORM_ref4] 
+; CHECK-SAME:                    (cu + 0x{{[0-9a-f]*}} => {[[DECL_PARAM:0x[0-9a-f]*]]})
+; CHECK: [[DECL_PARAM]]:         DW_TAG_formal_parameter
+;
+; CHECK: DW_TAG_subprogram
+; CHECK:   DW_AT_specification [DW_FORM_ref4] (cu + {{.*}} => {[[DECL]]}
+; CHECK:   DW_AT_object_pointer [DW_FORM_ref4]     (cu + 0x{{[0-9a-f]*}} => {[[PARAM:0x[0-9a-f]*]]})
 ; CHECK: [[PARAM]]:     DW_TAG_formal_parameter
 ; CHECK-NOT: DW_TAG
 ; CHECK: DW_AT_name [DW_FORM_strp]     ( .debug_str[0x{{[0-9a-f]*}}] = "this")
diff --git a/llvm/test/DebugInfo/X86/dwarf-public-names.ll b/llvm/test/DebugInfo/X86/dwarf-public-names.ll
index c2274511d4191f..a484c094892d0c 100644
--- a/llvm/test/DebugInfo/X86/dwarf-public-names.ll
+++ b/llvm/test/DebugInfo/X86/dwarf-public-names.ll
@@ -61,7 +61,7 @@
 
 ; Skip the output to the header of the pubnames section.
 ; LINUX: debug_pubnames
-; LINUX-NEXT: unit_size = 0x00000128
+; LINUX-NEXT: unit_size =
 
 ; Check for each name in the output.
 ; LINUX-DAG: "ns"

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since other compilers also do it this way, that sounds fine to me.

@Michael137
Copy link
Member Author

Not sure who maintains the NVPTX debug-info test, but it doesn't seem very maintainable. The hardcoded addresses and giant amount of debug-info makes it pretty brittle against changes like this

@dwblaikie
Copy link
Collaborator

Not sure who maintains the NVPTX debug-info test, but it doesn't seem very maintainable. The hardcoded addresses and giant amount of debug-info makes it pretty brittle against changes like this

Yeah, that's pretty uncool.

@alexey-bataev could you please fix this test case (& any others like it) to be more in line with the testing done for other targets?

@alexey-bataev
Copy link
Member

Not sure who maintains the NVPTX debug-info test, but it doesn't seem very maintainable. The hardcoded addresses and giant amount of debug-info makes it pretty brittle against changes like this

Yeah, that's pretty uncool.

@alexey-bataev could you please fix this test case (& any others like it) to be more in line with the testing done for other targets?

It was discussed already a long time ago. There is no other way to test debug info for NVPTX target, since LLVM can generate only ptx files

@dwblaikie
Copy link
Collaborator

Not sure who maintains the NVPTX debug-info test, but it doesn't seem very maintainable. The hardcoded addresses and giant amount of debug-info makes it pretty brittle against changes like this

Yeah, that's pretty uncool.
@alexey-bataev could you please fix this test case (& any others like it) to be more in line with the testing done for other targets?

It was discussed already a long time ago. There is no other way to test debug info for NVPTX target, since LLVM can generate only ptx files

This is a bit of a maintenance burden - might be worth revisiting. But I take it "ptx files" amounts to "assembly files for which we don't have an assembler (integrated or otherwise)" that can be used in tests?

OK - though we could still test more narrowly - the same way we test DWARF output where it's generated then dumped. We don't test every line of the dumped output, we just test specific parts of it for the features we're looking for.

Without the symbolic output it's a bit tedious - could rely on the comments that are produced and test against those rather than the instructions themselves? (at least generally - could have a really small test that tests a handful of the assembly itself) - and perhaps the output could be more symbolic, doing label differences, etc, so that it's more stable/less prone to additions or removals throwing off the offsets? (or perhaps most of that stability can be gained with the usual filecheck matching that we do in the dwarfdump driven tests)

@alexey-bataev
Copy link
Member

No, it cannot be, nvptx do not support label differences, it is very old DWARF2 +some extensions

@dwblaikie
Copy link
Collaborator

No, it cannot be, nvptx do not support label differences, it is very old DWARF2 +some extensions

That's one of several suggestions I made - good to understand why that particular one is infeasible. But scoping down the testing would be good. Like maybe don't check the abbrevs at all in most cases, just check the annotated comments that describe the values expected. Maybe include the values in the comments (like the tag comments include the abbrev number and DIE offset - the DW_AT_name comment could include the string name, rather than matching 20 lines of ASCII integers)

It's also just a huge generic test covering 200 functions. This isn't generally how LLVM is tested - small, isolated examples using a handful (usually one or two, almost definitely single digits) of functions and instructions, not integration tests exercising all manner of functionality (the name's a hint too - it's called "debug-info.ll" rather than testing some specific feature of debug info)

@alexey-bataev
Copy link
Member

No, it cannot be, nvptx do not support label differences, it is very old DWARF2 +some extensions

That's one of several suggestions I made - good to understand why that particular one is infeasible. But scoping down the testing would be good. Like maybe don't check the abbrevs at all in most cases, just check the annotated comments that describe the values expected. Maybe include the values in the comments (like the tag comments include the abbrev number and DIE offset - the DW_AT_name comment could include the string name, rather than matching 20 lines of ASCII integers)

Not sure it will reduce the maintenance burden significantly

It's also just a huge generic test covering 200 functions. This isn't generally how LLVM is tested - small, isolated examples using a handful (usually one or two, almost definitely single digits) of functions and instructions, not integration tests exercising all manner of functionality (the name's a hint too - it's called "debug-info.ll" rather than testing some specific feature of debug info)

Am I missing something? There is only one small function _Z5saxpyifPfS_, which is already reduced version of the original code. This minimal reproducer allowed to test ptx support for debug info.

@alexey-bataev
Copy link
Member

alexey-bataev commented Jan 15, 2025

Forgot to mention. Ptx does not support string literals in debug info, only byte/word/etc. values (at least before, not sure it has changed).

@alexey-bataev
Copy link
Member

Last thing. These tests not only test the debug info, but also its format in ptx. It checks that there is no label arithmetics, no strings, etc.

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jan 15, 2025
…efinition with that of a declaration

In llvm#122742 we will start attaching DW_AT_object_pointer to method declarations (in addition to definitions).

Currently when LLDB parses a `DW_TAG_subprogram` definition, it will
parse all the attributes of the declaration as well. If we have
`DW_AT_object_pointer` on both, then we would overwrite the more
specific attribute that we got from the defintion with the one from the
specification. This is problematic because LLDB relies on getting the
`DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the
specification.
@Michael137
Copy link
Member Author

Currently blocked on #123089 (this will fix the failing LLDB TestConstThis.py test)

@dwblaikie
Copy link
Collaborator

No, it cannot be, nvptx do not support label differences, it is very old DWARF2 +some extensions

That's one of several suggestions I made - good to understand why that particular one is infeasible. But scoping down the testing would be good. Like maybe don't check the abbrevs at all in most cases, just check the annotated comments that describe the values expected. Maybe include the values in the comments (like the tag comments include the abbrev number and DIE offset - the DW_AT_name comment could include the string name, rather than matching 20 lines of ASCII integers)

Not sure it will reduce the maintenance burden significantly

I'm pretty sure anything would be better than doing nothing here.

It's also just a huge generic test covering 200 functions. This isn't generally how LLVM is tested - small, isolated examples using a handful (usually one or two, almost definitely single digits) of functions and instructions, not integration tests exercising all manner of functionality (the name's a hint too - it's called "debug-info.ll" rather than testing some specific feature of debug info)

Am I missing something? There is only one small function _Z5saxpyifPfS_, which is already reduced version of the original code. This minimal reproducer allowed to test ptx support for debug info.

While there's only one llvm Function, there's 200 DISubprograms in the DWARF - a bunch of intrinsics, by the looks of it.
Removing those would probably help reduce the size a fair bit - could be done by removing imports: !3, and all the associated metadata that references.

Last thing. These tests not only test the debug info, but also its format in ptx. It checks that there is no label arithmetics, no strings, etc.

Testing for the absence of things may be better done with a separate invocation using CHECK-NOT, or --implicit-check-not. Without needing to check every line of output.

Forgot to mention. Ptx does not support string literals in debug info, only byte/word/etc. values (at least before, not sure it has changed).

Right, I figured as much - that's why I was suggesting having the comments include the textual string and checking against that, rather than against every numeric character code.

Michael137 added a commit that referenced this pull request Jan 17, 2025
…efinition with that of a declaration (#123089)

In #122742 we will start
attaching DW_AT_object_pointer to method declarations (in addition to
definitions).

Currently when LLDB parses a `DW_TAG_subprogram` definition, it will
parse all the attributes of the declaration as well. If we have
`DW_AT_object_pointer` on both, then we would overwrite the more
specific attribute that we got from the defintion with the one from the
specification. This is problematic because LLDB relies on getting the
`DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the
specification.

Note GCC does attach `DW_AT_object_pointer` on declarations *and*
definitions already (see https://godbolt.org/z/G1GvddY48), so there's
definitely some expressions that will fail for GCC compiled binaries.
This patch will fix those cases (e.g., I would expect `TestConstThis.py`
to fail with GCC).
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 17, 2025
…ointer of definition with that of a declaration (#123089)

In llvm/llvm-project#122742 we will start
attaching DW_AT_object_pointer to method declarations (in addition to
definitions).

Currently when LLDB parses a `DW_TAG_subprogram` definition, it will
parse all the attributes of the declaration as well. If we have
`DW_AT_object_pointer` on both, then we would overwrite the more
specific attribute that we got from the defintion with the one from the
specification. This is problematic because LLDB relies on getting the
`DW_AT_name` from the `DW_AT_object_pointer`, which doesn't exist on the
specification.

Note GCC does attach `DW_AT_object_pointer` on declarations *and*
definitions already (see https://godbolt.org/z/G1GvddY48), so there's
definitely some expressions that will fail for GCC compiled binaries.
This patch will fix those cases (e.g., I would expect `TestConstThis.py`
to fail with GCC).
@Michael137 Michael137 merged commit 7c72941 into llvm:main Jan 17, 2025
8 checks passed
@Michael137 Michael137 deleted the clang/object-pointer-on-decl branch January 17, 2025 15:27
@dwblaikie
Copy link
Collaborator

We're seeing a 5-10% growth in .debug_info size (well, zstd compressed .debug_info.dwo size, to be more specific) that seems worrying.

Might be worth reverting this and doing some size analysis to understand the cost involved? Would you like more data before that?

@Michael137
Copy link
Member Author

Thanks for pinging

A debug-info size increase was expected given this affects all method declarations. Though 10% does seem like a lot. Any ideas what we could do to reduce some of it? We do really need a way to determine the object pointer from declarations in LLDB. When parsing a method declaration, doing a lookup for a function definition to find an object pointer feels a bit awkward but i guess it would be an option

@dwblaikie
Copy link
Collaborator

Given the size of the increase, I'm wondering if there's some bug/issue with the patch. If not, we could consider alternate solutions, representation changes, etc.
One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition). Or if that feels too niche, we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

@pogo59
Copy link
Collaborator

pogo59 commented Jan 27, 2025

I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present

DW_FORM_implicit_const

@Michael137
Copy link
Member Author

Michael137 commented Jan 27, 2025

One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition).

That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.

we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues

@dwblaikie
Copy link
Collaborator

One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition).

That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.

we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues

goes to look up the right form Oh, that's what I was thinking of: DW_FORM_implicit_const - not sure if we use that yet, but it'd probably be the right thing to use here, because every method declaration would have the same/only one value for this (zero). I imagine that /should/ address any size issue.

But the 5-10% regression seems so large that it feels like it might be caused by some unanticipated side effects of this patch, somehow - so I worry that trying fixes without understanding the current impact might not be as effective as we'd like :/

@Michael137
Copy link
Member Author

One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition).

That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.

we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues

goes to look up the right form Oh, that's what I was thinking of: DW_FORM_implicit_const - not sure if we use that yet, but it'd probably be the right thing to use here, because every method declaration would have the same/only one value for this (zero). I imagine that /should/ address any size issue.

But the 5-10% regression seems so large that it feels like it might be caused by some unanticipated side effects of this patch, somehow - so I worry that trying fixes without understanding the current impact might not be as effective as we'd like :/

Yea that's fair. Do you have a more detailed size report by any chance? The patch is quite trivial. It should've just attached DW_AT_object_pointer to any method declaration. Could it have affected some sort of de-duplication that isn't happening now?

@dwblaikie
Copy link
Collaborator

One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition).

That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.

we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues

goes to look up the right form Oh, that's what I was thinking of: DW_FORM_implicit_const - not sure if we use that yet, but it'd probably be the right thing to use here, because every method declaration would have the same/only one value for this (zero). I imagine that /should/ address any size issue.
But the 5-10% regression seems so large that it feels like it might be caused by some unanticipated side effects of this patch, somehow - so I worry that trying fixes without understanding the current impact might not be as effective as we'd like :/

Yea that's fair. Do you have a more detailed size report by any chance? The patch is quite trivial. It should've just attached DW_AT_object_pointer to any method declaration. Could it have affected some sort of de-duplication that isn't happening now?

Finest grained we have is by section. Mostly abbrev and info section size increases - one small .debug_str size increase, but I imagine that's incidental/not related (maybe path differences between A/B, for instance). The abbrev increase isn't such a big deal (abbrev is a smaller section).

Do you have any mechanisms for measuring size regressions? Otherwise I guess trying a bootstrap build and look at dsym size for the clang binary before/after the change?

@Michael137
Copy link
Member Author

One alternate solution that comes to mind is to use a boolean to just say "this function /has/ an object pointer" (flag present, really really cheap to encode), but doesn't say which parameter is the object pointer (can leave that on the definition).

That would be a good start, though LLDB also deduces the function CV qualifiers from the object pointer.

we could consider using a parameter index (one byte, instead of 4 bytes of DIE offset), I thought we'd standardized a more general purpose constant-stored-in-abbrev like flag_present, in which case we could use that to encode the value 0 (zeroth parameter is the object pointer)...

I like this idea though. That would work for LLDB and probably bring down the size regression enough? Will have a look at doing that tomorrow. But feel free to revert in the meantime if this is causing issues

goes to look up the right form Oh, that's what I was thinking of: DW_FORM_implicit_const - not sure if we use that yet, but it'd probably be the right thing to use here, because every method declaration would have the same/only one value for this (zero). I imagine that /should/ address any size issue.
But the 5-10% regression seems so large that it feels like it might be caused by some unanticipated side effects of this patch, somehow - so I worry that trying fixes without understanding the current impact might not be as effective as we'd like :/

Yea that's fair. Do you have a more detailed size report by any chance? The patch is quite trivial. It should've just attached DW_AT_object_pointer to any method declaration. Could it have affected some sort of de-duplication that isn't happening now?

Finest grained we have is by section. Mostly abbrev and info section size increases - one small .debug_str size increase, but I imagine that's incidental/not related (maybe path differences between A/B, for instance). The abbrev increase isn't such a big deal (abbrev is a smaller section).

Do you have any mechanisms for measuring size regressions? Otherwise I guess trying a bootstrap build and look at dsym size for the clang binary before/after the change?

Nothing automated, but yea I'll kick off a bootstrap build now and check

@Michael137
Copy link
Member Author

Also, re. the DW_FORM_implici_const form. Would it be fine adding a DW_AT_object_pointer of that form (despite the spec saying it should be a reference...or is that just a recommendation, not mandated?).

@dwblaikie
Copy link
Collaborator

Also, re. the DW_FORM_implici_const form. Would it be fine adding a DW_AT_object_pointer of that form (despite the spec saying it should be a reference...or is that just a recommendation, not mandated?).

Yeah - this is one of those "DWARF is permissive" kind of things - the spec is a sign, not a cop, if we try something novel and it works out OK, we could propose it for standardization, etc (which might end up going a different way, of course - perhaps the DWARF committee figures out some other dense encoding - like a DIE-relative DIE offset in a ULEB?). We'd only want to do this when tuning for lldb, since it'd be useless at best and confusing at worst to a consumer not expecting it.

@Michael137
Copy link
Member Author

Michael137 commented Jan 28, 2025

@dwblaikie

$ bloaty `find build-debug-with-patch/lib -name *.o`  │$ bloaty `find build-debug-no-patch/lib -name *.o`  │$ bloaty `find build-debug-implicit-const/lib -name *.o`
    FILE SIZE        VM SIZE                          │    FILE SIZE        VM SIZE                        │    FILE SIZE        VM SIZE    
 --------------  --------------                       │ --------------  --------------                     │ --------------  -------------- 
  54.5%  2.33Gi  59.5%  2.33Gi    ,__debug_str        │  55.6%  2.33Gi  60.9%  2.33Gi    ,__debug_str      │  55.6%  2.33Gi  60.9%  2.33Gi    ,__debug_str
  20.5%   896Mi  22.4%   896Mi    ,__debug_info       │  18.8%   807Mi  20.6%   807Mi    ,__debug_info     │  18.9%   808Mi  20.6%   808Mi    ,__debug_info
   5.5%   240Mi   0.0%       0    String Table        │   5.6%   240Mi   0.0%       0    String Table      │   5.6%   240Mi   0.0%       0    String Table
   4.2%   184Mi   4.6%   184Mi    ,__text             │   4.3%   184Mi   4.7%   184Mi    ,__text           │   4.3%   184Mi   4.7%   184Mi    ,__text
   3.6%   157Mi   3.9%   157Mi    ,__debug_names      │   3.7%   157Mi   4.0%   157Mi    ,__debug_names    │   3.7%   157Mi   4.0%   157Mi    ,__debug_names
   3.3%   143Mi   3.6%   143Mi    ,__debug_str_offs   │   3.4%   143Mi   3.7%   143Mi    ,__debug_str_offs │   3.3%   143Mi   3.7%   143Mi    ,__debug_str_offs
   2.0%  88.4Mi   0.0%       0    [Unmapped]          │   2.1%  88.4Mi   0.0%       0    [Unmapped]        │   2.1%  88.4Mi   0.0%       0    [Unmapped]
   1.8%  79.8Mi   2.0%  79.8Mi    ,__debug_line       │   1.9%  79.8Mi   2.0%  79.8Mi    ,__debug_line     │   1.9%  79.8Mi   2.0%  79.8Mi    ,__debug_line
   1.6%  67.9Mi   1.7%  67.9Mi    ,__compact_unwind   │   1.6%  67.9Mi   1.7%  67.9Mi    ,__compact_unwind │   1.6%  67.9Mi   1.7%  67.9Mi    ,__compact_unwind
   0.9%  41.2Mi   0.0%       0    Symbol Table        │   1.0%  41.2Mi   0.0%       0    Symbol Table      │   1.0%  41.2Mi   0.0%       0    Symbol Table
   0.7%  31.6Mi   0.8%  31.6Mi    ,__debug_line_str   │   0.7%  31.6Mi   0.8%  31.6Mi    ,__debug_line_str │   0.7%  31.6Mi   0.8%  31.6Mi    ,__debug_line_str
   0.5%  20.9Mi   0.5%  20.9Mi    ,__debug_addr       │   0.5%  20.9Mi   0.5%  20.9Mi    ,__debug_addr     │   0.5%  20.9Mi   0.5%  20.9Mi    ,__debug_addr
   0.4%  17.0Mi   0.4%  17.0Mi    ,__const            │   0.4%  17.0Mi   0.4%  17.0Mi    ,__const          │   0.4%  17.0Mi   0.4%  17.0Mi    ,__const
   0.2%  9.40Mi   0.2%  9.40Mi    ,__debug_abbrev     │   0.2%  8.95Mi   0.2%  8.95Mi    ,__cstring        │   0.2%  9.54Mi   0.2%  9.54Mi    ,__debug_abbrev
   0.2%  8.95Mi   0.2%  8.95Mi    ,__cstring          │   0.2%  8.61Mi   0.2%  8.61Mi    ,__debug_abbrev   │   0.2%  8.95Mi   0.2%  8.95Mi    ,__cstring
   0.1%  2.58Mi   0.0%       0    [Mach-O Headers]    │   0.1%  2.58Mi   0.0%       0    [Mach-O Headers]  │   0.1%  2.58Mi   0.0%       0    [Mach-O Headers]
   0.0%   482Ki   0.0%   482Ki    ,__StaticInit       │   0.0%   482Ki   0.0%   482Ki    ,__StaticInit     │   0.0%   482Ki   0.0%   482Ki    ,__StaticInit
   0.0%       0   0.0%   464Ki    ,__bss              │   0.0%       0   0.0%   464Ki    ,__bss            │   0.0%       0   0.0%   464Ki    ,__bss
   0.0%   342Ki   0.0%   342Ki    ,__data             │   0.0%   342Ki   0.0%   342Ki    ,__data           │   0.0%   342Ki   0.0%   342Ki    ,__data
   0.0%   188Ki   0.0%   188Ki    ,__debug_rnglists   │   0.0%   188Ki   0.0%   188Ki    ,__debug_rnglists │   0.0%   188Ki   0.0%   188Ki    ,__debug_rnglists
   0.0%  22.1Ki   0.0%  57.8Ki    [10 Others]         │   0.0%  22.1Ki   0.0%  57.8Ki    [10 Others]       │   0.0%  22.1Ki   0.0%  57.8Ki    [10 Others]
 100.0%  4.27Gi 100.0%  3.91Gi    TOTAL               │ 100.0%  4.19Gi 100.0%  3.82Gi    TOTAL             │ 100.0%  4.19Gi 100.0%  3.82Gi    TOTAL

Here are some results of the bootstrapped build (with and without this patch reverted and also replacing the DW_FORM_ref4 with DW_FORM_implicit_const). With top-of-tree (which includes this PR) the .debug_info section size increased by ~11%, which aligns with what you were seeing. The implicit_const version pretty much has a negligible effect on the .debug_info (and total) size. Though it does increase the .debug_abbrev size from 8.61 Mi to 9.54 Mi. But that seems inconsequential in the larger breakdown.

Just looking at the dwarfdump of the top-of-tree build, I don't see anything surprising that would indicate a bug here. A lot more methods just have the DW_AT_object_pointer. E.g., the number of DW_AT_object_pointers occurrences in libclangASTMatchers.a increased from 19,650 to 169,958 (which would be ~600 KB worth of extra references). Which is pretty much exactly the amount by which that particular archive increased. The only diff in the dwarfdump's is the existence of this attribute.

Based on this, are you fine with going ahead with changing the DW_FORM_*. Or is there anything else worth looking at with the current regression?

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jan 28, 2025
…with DW_FORM_implicit_const

We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't affect the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
@Michael137
Copy link
Member Author

Here's what the implicit_const version looks like: #124790 that I used to come up with the above numbers

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jan 28, 2025
…with DW_FORM_implicit_const

We started attaching `DW_AT_object_pointer`s on method declarations in llvm#122742. However, that caused the `.debug_info` section size to increase significantly (by around ~10% on some projects). This was mainly due to the large number of new `DW_FORM_ref4` values. This patch tries to address that regression by changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for declarations. That way we don't pay for the 4 byte references on every attribute occurrence. In a local build of clang this barely affected the `.debug_info` section size (but did increase `.debug_abbrev` by up to 10%, which doesn't affect the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for this purpose may surprise consumers) and DWARFv5 (since that's where `DW_FORM_implicit_const` was first standardized).
@Michael137
Copy link
Member Author

Though now that i think about those stats again, it does feel weird that we're getting that many more object pointer occurrences...will double check what that's about

@Michael137
Copy link
Member Author

Nevermind from what I can tell it really is just the number of extra DW_AT_object_pointer. There's 208713 DW_TAG_subprograms in my example. So a large chunk of those got the attribute (169958). The majority of them don't have a corresponding definition DW_TAG_subprogram, which is where the spike came from

dwblaikie added a commit to dwblaikie/llvm-project that referenced this pull request Jan 28, 2025
…rations (llvm#122742)"

This introduces a substantial (5-10%) regression in .debug_info size, so
we're discussing alternatives in llvm#122742 and llvm#124790.

This reverts commit 7c72941.
alexfh pushed a commit that referenced this pull request Jan 29, 2025
…rations (#122742)" (#124853)

This introduces a substantial (5-10%) regression in .debug_info size, so
we're discussing alternatives in #122742 and #124790.

This reverts commit 7c72941.
Michael137 added a commit that referenced this pull request Jun 16, 2025
…with DW_FORM_implicit_const (#124790)

We started attaching `DW_AT_object_pointer`s on method declarations in
#122742. However, that caused
the `.debug_info` section size to increase significantly (by around ~10%
on some projects). This was mainly due to the large number of new
`DW_FORM_ref4` values. This patch tries to address that regression by
changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for
declarations. The value of `DW_FORM_implicit_const` will be the *index*
of the object parameter in the list of formal parameters of the
subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object
pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec
only mentions the use of the `reference` attribute class to for
`DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an
extension to (and not something mandated/specified by) the standard.
Though it'd make sense to extend the wording in the spec to allow for
this optimization.

That way we don't pay for the 4 byte references on every attribute
occurrence. In a local build of clang this barely affected the
`.debug_info` section size (but did increase `.debug_abbrev` by up to
10%, which doesn't impact the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for
this purpose may surprise consumers) and DWARFv5 (since that's where
`DW_FORM_implicit_const` was first standardized).
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 16, 2025
…clarations with DW_FORM_implicit_const (#124790)

We started attaching `DW_AT_object_pointer`s on method declarations in
llvm/llvm-project#122742. However, that caused
the `.debug_info` section size to increase significantly (by around ~10%
on some projects). This was mainly due to the large number of new
`DW_FORM_ref4` values. This patch tries to address that regression by
changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for
declarations. The value of `DW_FORM_implicit_const` will be the *index*
of the object parameter in the list of formal parameters of the
subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object
pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec
only mentions the use of the `reference` attribute class to for
`DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an
extension to (and not something mandated/specified by) the standard.
Though it'd make sense to extend the wording in the spec to allow for
this optimization.

That way we don't pay for the 4 byte references on every attribute
occurrence. In a local build of clang this barely affected the
`.debug_info` section size (but did increase `.debug_abbrev` by up to
10%, which doesn't impact the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for
this purpose may surprise consumers) and DWARFv5 (since that's where
`DW_FORM_implicit_const` was first standardized).
fschlimb pushed a commit to fschlimb/llvm-project that referenced this pull request Jun 18, 2025
…with DW_FORM_implicit_const (llvm#124790)

We started attaching `DW_AT_object_pointer`s on method declarations in
llvm#122742. However, that caused
the `.debug_info` section size to increase significantly (by around ~10%
on some projects). This was mainly due to the large number of new
`DW_FORM_ref4` values. This patch tries to address that regression by
changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for
declarations. The value of `DW_FORM_implicit_const` will be the *index*
of the object parameter in the list of formal parameters of the
subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object
pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec
only mentions the use of the `reference` attribute class to for
`DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an
extension to (and not something mandated/specified by) the standard.
Though it'd make sense to extend the wording in the spec to allow for
this optimization.

That way we don't pay for the 4 byte references on every attribute
occurrence. In a local build of clang this barely affected the
`.debug_info` section size (but did increase `.debug_abbrev` by up to
10%, which doesn't impact the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for
this purpose may surprise consumers) and DWARFv5 (since that's where
`DW_FORM_implicit_const` was first standardized).
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…with DW_FORM_implicit_const (llvm#124790)

We started attaching `DW_AT_object_pointer`s on method declarations in
llvm#122742. However, that caused
the `.debug_info` section size to increase significantly (by around ~10%
on some projects). This was mainly due to the large number of new
`DW_FORM_ref4` values. This patch tries to address that regression by
changing the `DW_FORM_ref4` to a `DW_FORM_implicit_const` for
declarations. The value of `DW_FORM_implicit_const` will be the *index*
of the object parameter in the list of formal parameters of the
subprogram (i.e., if the first `DW_TAG_formal_parameter` is the object
pointer, the `DW_FORM_implicit_const` would be `0`). The DWARFv5 spec
only mentions the use of the `reference` attribute class to for
`DW_AT_object_pointer`. So using a `DW_FORM_impilicit_const` would be an
extension to (and not something mandated/specified by) the standard.
Though it'd make sense to extend the wording in the spec to allow for
this optimization.

That way we don't pay for the 4 byte references on every attribute
occurrence. In a local build of clang this barely affected the
`.debug_info` section size (but did increase `.debug_abbrev` by up to
10%, which doesn't impact the total debug-info size much however).

We guarded this on LLDB tuning (since using `DW_FORM_implicit_const` for
this purpose may surprise consumers) and DWARFv5 (since that's where
`DW_FORM_implicit_const` was first standardized).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DebugInfo] DW_AT_object_pointer not attached to method declarations
6 participants