Skip to content

[lldb][DWARFASTParserClang] Resolve nested types when parsing structures #66879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Endilll
Copy link
Contributor

@Endilll Endilll commented Sep 20, 2023

@Endilll Endilll added the lldb label Sep 20, 2023
@Endilll Endilll requested a review from Michael137 September 20, 2023 09:36
@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2023

@llvm/pr-subscribers-lldb

Changes

Migrated from https://reviews.llvm.org/D156774


Full diff: https://github.com/llvm/llvm-project/pull/66879.diff

3 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+9-6)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h (+1-1)
  • (modified) lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp (+93)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 37fb16d4e0351c9..fc340da8eba0763 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -2150,14 +2150,14 @@ bool DWARFASTParserClang::CompleteRecordType(const DWARFDIE &die,
 
     std::vector<std::unique_ptr<clang::CXXBaseSpecifier>> bases;
     // Parse members and base classes first
-    std::vector<DWARFDIE> member_function_dies;
+    std::vector<DWARFDIE> member_function_and_type_dies;
 
     DelayedPropertyList delayed_properties;
-    ParseChildMembers(die, clang_type, bases, member_function_dies,
+    ParseChildMembers(die, clang_type, bases, member_function_and_type_dies,
                       delayed_properties, default_accessibility, layout_info);
 
-    // Now parse any methods if there were any...
-    for (const DWARFDIE &die : member_function_dies)
+    // Now parse any methods or nested types if there were any...
+    for (const DWARFDIE &die : member_function_and_type_dies)
       dwarf->ResolveType(die);
 
     if (type_is_objc_object_or_interface) {
@@ -3153,7 +3153,7 @@ void DWARFASTParserClang::ParseSingleMember(
 bool DWARFASTParserClang::ParseChildMembers(
     const DWARFDIE &parent_die, CompilerType &class_clang_type,
     std::vector<std::unique_ptr<clang::CXXBaseSpecifier>> &base_classes,
-    std::vector<DWARFDIE> &member_function_dies,
+    std::vector<DWARFDIE> &member_function_and_type_dies,
     DelayedPropertyList &delayed_properties,
     const AccessType default_accessibility,
     ClangASTImporter::LayoutInfo &layout_info) {
@@ -3189,8 +3189,11 @@ bool DWARFASTParserClang::ParseChildMembers(
       break;
 
     case DW_TAG_subprogram:
+    case DW_TAG_enumeration_type:
+    case DW_TAG_structure_type:
+    case DW_TAG_union_type:
       // Let the type parsing code handle this one for us.
-      member_function_dies.push_back(die);
+      member_function_and_type_dies.push_back(die);
       break;
 
     case DW_TAG_inheritance:
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h
index 88bfc490e890744..d4218959e61a8fa 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h
@@ -157,7 +157,7 @@ class DWARFASTParserClang : public DWARFASTParser {
   bool ParseChildMembers(
       const DWARFDIE &die, lldb_private::CompilerType &class_compiler_type,
       std::vector<std::unique_ptr<clang::CXXBaseSpecifier>> &base_classes,
-      std::vector<DWARFDIE> &member_function_dies,
+      std::vector<DWARFDIE> &member_function_and_type_dies,
       DelayedPropertyList &delayed_properties,
       const lldb::AccessType default_accessibility,
       lldb_private::ClangASTImporter::LayoutInfo &layout_info);
diff --git a/lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp b/lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
index 20a085741f73d00..d47f9b636ec15eb 100644
--- a/lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
+++ b/lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
@@ -462,3 +462,96 @@ TEST_F(DWARFASTParserClangTests, TestDefaultTemplateParamParsing) {
     }
   }
 }
+
+TEST_F(DWARFASTParserClangTests, EnsureNestedTypesOfTypeAreParsed) {
+  const char *yamldata = R"(
+--- !ELF
+FileHeader:
+  Class:   ELFCLASS64
+  Data:    ELFDATA2LSB
+  Type:    ET_EXEC
+  Machine: EM_386
+DWARF:
+  debug_str:
+    - Info
+    - B
+    - C
+    - Mask
+    - Enum
+  debug_abbrev:
+    - Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_yes
+          Attributes:
+            - Attribute:       DW_AT_language
+              Form:            DW_FORM_data2
+        - Code:            0x2
+          Tag:             DW_TAG_structure_type
+          Children:        DW_CHILDREN_yes
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x3
+          Tag:             DW_TAG_union_type
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x4
+          Tag:             DW_TAG_structure_type
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x5
+          Tag:             DW_TAG_enumeration_type
+          Children:        DW_CHILDREN_yes
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x6
+          Tag:             DW_TAG_enumerator
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp  
+  debug_info:
+    - Version:         5
+      AddrSize:        8
+      UnitType:        DW_UT_compile
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x21
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0x0
+        - AbbrCode:        0x3
+          Values:
+            - Value:           0x5
+        - AbbrCode:        0x4
+          Values:
+            - Value:           0x7
+        - AbbrCode:        0x5
+          Values:
+            - Value:           0x9
+        - AbbrCode:        0x6
+          Values:
+            - Value:           0xE
+)";
+
+  YAMLModuleTester t(yamldata);
+
+  DWARFUnit *unit = t.GetDwarfUnit();
+  ASSERT_NE(unit, nullptr);
+  const DWARFDebugInfoEntry *cu_entry = unit->DIE().GetDIE();
+  ASSERT_EQ(cu_entry->Tag(), DW_TAG_compile_unit);
+  DWARFDIE cu_die(unit, cu_entry);
+
+  auto holder = std::make_unique<clang_utils::TypeSystemClangHolder>("ast");
+  auto &ast_ctx = *holder->GetAST();
+  DWARFASTParserClangStub ast_parser(ast_ctx);
+
+  EXPECT_TRUE(false);
+}

auto &ast_ctx = *holder->GetAST();
DWARFASTParserClangStub ast_parser(ast_ctx);

EXPECT_TRUE(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michael137 Can you help me with the rest of this test? I'm changing function that complete (presumably) incomplete CompilerType's, but I'm not sure how to gather all required input to call it. Maybe that's not even the right way to write the test I need to write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing!

I'll have a look at this unit-test locally. We can try doing ParseTypeFromDWARF on the Info DIE to do the DWARF parsing (we do it elsewhere in the test too). Then call CompleteType on the resulting type to go through the new codepath you added.

@@ -462,3 +462,96 @@ TEST_F(DWARFASTParserClangTests, TestDefaultTemplateParamParsing) {
}
}
}

TEST_F(DWARFASTParserClangTests, EnsureNestedTypesOfTypeAreParsed) {
const char *yamldata = R"(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this YAML way of describing debug info quite unfriendly, especially compared to the output of llvm-dwarfdump. Anyway, I tried to encode debug info for the following snippet:

struct Info {
  enum Mask : uintptr_t {
    Enum =  ~(uintptr_t)(((intptr_t)1 << 3) - 1);
  };
  union B {};
  struct C {};
}

Filtered out llvm-dwarfdump output:

0x0000000c: DW_TAG_compile_unit
              DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus_14)

0x0000002c:   DW_TAG_structure_type
                DW_AT_calling_convention [DW_FORM_data1]        (DW_CC_pass_by_value)
                DW_AT_name [DW_FORM_strx1]      ("Info")
                DW_AT_byte_size [DW_FORM_data1] (0x04)

0x0000003a:     DW_TAG_union_type
                  DW_AT_calling_convention [DW_FORM_data1]      (DW_CC_pass_by_value)
                  DW_AT_name [DW_FORM_strx1]    ("B")
                  DW_AT_byte_size [DW_FORM_data1]       (0x01)

0x00000049:     DW_TAG_structure_type
                  DW_AT_calling_convention [DW_FORM_data1]      (DW_CC_pass_by_value)
                  DW_AT_name [DW_FORM_strx1]    ("C")
                  DW_AT_byte_size [DW_FORM_data1]       (0x01)

0x0000003b:     DW_TAG_enumeration_type
                  DW_AT_type [DW_FORM_ref4]     (0x0000005e "uintptr_t")
                  DW_AT_name [DW_FORM_strx1]    ("Mask")
                  DW_AT_byte_size [DW_FORM_data1]       (0x08)

0x00000044:       DW_TAG_enumerator
                    DW_AT_name [DW_FORM_strx1]  ("Enum")
                    DW_AT_const_value [DW_FORM_udata]   (18446744073709551608)

Form: DW_FORM_strp
- Code: 0x5
Tag: DW_TAG_enumeration_type
Children: DW_CHILDREN_yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I have to encode tree-like structure (see C++ code above) in a flat list, which doesn't seem natural. I wonder if YAML I've written is even sensible.

Copy link
Member

@Michael137 Michael137 Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back and forth, but I had a look and I'd suggest writing an API test instead. The yaml looks fine to me and we could use it to write a unit-test as a follow-up, but we'll have to jump through some hoops to get the unit-test down the code-path we want anyway. So an API test would be sufficient.

See reference example: https://github.com/llvm/llvm-project/blob/main/lldb/test/API/lang/cpp/bool/TestCPPBool.py

Put your test program into a main.cpp and call the relevant member types from the expression evaluator using self.expect_expr() in the python test file. E.g., for this change the test would like:

class NestedTypesTestCase(TestBase):                                                   
    def test_with_run_command(self):                                               
        """Test that referencing nested types work in the expression parser"""                   
        self.build()                                                               
        lldbutil.run_to_source_breakpoint(                                         
            self, "// breakpoint 1", lldb.SBFileSpec("main.cpp")                   
        )                                                                          
                                                                                   
        self.expect_expr(                                                          
            "Info::Mask::Enum",                             
            result_type=......,
            result_value=....,                                                  
        )                                                                          
        self.expect_expr("Info::B::val", result_type=..., result_value=...)

Let me know if you need more details on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll give it a try.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to complete nested types within a type? Seems like we can put that off until later. Right now if you parse a member function, any types it needs will be parsed lazily and only if needed, which is ok. If we start completing all types within types without ever needing to actually use then, it will make debugging slower and cause our memory usage to grow. I didn't see much explanation as to why this is needed in the bug report.

@clayborg
Copy link
Collaborator

If we have a end to end test that shows the problem you are trying to fix it would help as well.

@Michael137
Copy link
Member

Michael137 commented Sep 23, 2023

I didn't see much explanation as to why this is needed in the bug report.

The motivating example is something like:

struct Info {
  enum Mask : uintptr_t {
    Enum
  };
}

expr Info::Mask::Enum.

Is there a reason we need to complete nested types within a type? Seems like we can put that off until later. Right now if you parse a member function, any types it needs will be parsed lazily and only if needed, which is ok. If we start completing all types within types without ever needing to actually use then, it will make debugging slower and cause our memory usage to grow.

LLDB will first resolve the Info structure (and all its member fields, but not nested types). When clang looks for Mask, it wants to find it in a direct lookup into the Info DeclContext. But that lookup will fail because we never created the decls for the nested type. There is no fallback to the external source in such a case unfortunately so LLDB never has the chance to correct this. There was no obvious point to which we could defer completing the nested type when completing the outer. Maybe adding that external source fallback? But I might've missed something obvious. A potential option would be to limit the completion performed in this patch to enums, though a general solution would be nice.

@clayborg
Copy link
Collaborator

I didn't see much explanation as to why this is needed in the bug report.

The motivating example is something like:

struct Info {
  enum Mask : uintptr_t {
    Enum
  };
}

expr Info::Mask::Enum.

Is there a reason we need to complete nested types within a type? Seems like we can put that off until later. Right now if you parse a member function, any types it needs will be parsed lazily and only if needed, which is ok. If we start completing all types within types without ever needing to actually use then, it will make debugging slower and cause our memory usage to grow.

LLDB will first resolve the Info structure (and all its member fields, but not nested types). When clang looks for Mask, it wants to find it in a direct lookup into the Info DeclContext. But that lookup will fail because we never created the decls for the nested type. There is no fallback to the external source in such a case unfortunately so LLDB never has the chance to correct this. There was no obvious point to which we could defer completing the nested type when completing the outer. Maybe adding that external source fallback? But I might've missed something obvious. A potential option would be to limit the completion performed in this patch to enums, though a general solution would be nice.

What we do for classes is we create forward declarations for them at first. As soon as someone wants to know more, we complete the type via the external source interface. This works for expressions since this is what precomiled headers do, it also works with our CompilerType as we can ask for the forward type, the layout type, or the complete type. I wonder if we can make this work for enums as well.

I am worried about a class that contains many types. Usually when we need to know something about a type, like in your example above, we will get a callback to the ClangExternalASTSourceCallbacks methods:

  void FindExternalLexicalDecls(
      const clang::DeclContext *DC,
      llvm::function_ref<bool(clang::Decl::Kind)> IsKindWeWant,
      llvm::SmallVectorImpl<clang::Decl *> &Result) override;

  bool FindExternalVisibleDeclsByName(const clang::DeclContext *DC,
                                      clang::DeclarationName Name) override;

So if you typed Info::Mask::Enum, the expression parser will first ask about "Info" with no containing decl context (or maybe it specifies the translation unit decl context to indicate it is a root type) and we will return the "struct Info". Then it would call FindExternalVisibleDeclsByName() with the DeclContext set to "struct Info" and we will find "Mask" inside of this type, then it will ask about "Enum" within the "Info::Mask" type and it should be in the type definition. Maybe some isn't working correctly for enums in this lookup mechanism?

@Michael137
Copy link
Member

Michael137 commented Sep 24, 2023

I didn't see much explanation as to why this is needed in the bug report.

The motivating example is something like:

struct Info {
  enum Mask : uintptr_t {
    Enum
  };
}

expr Info::Mask::Enum.

Is there a reason we need to complete nested types within a type? Seems like we can put that off until later. Right now if you parse a member function, any types it needs will be parsed lazily and only if needed, which is ok. If we start completing all types within types without ever needing to actually use then, it will make debugging slower and cause our memory usage to grow.

LLDB will first resolve the Info structure (and all its member fields, but not nested types). When clang looks for Mask, it wants to find it in a direct lookup into the Info DeclContext. But that lookup will fail because we never created the decls for the nested type. There is no fallback to the external source in such a case unfortunately so LLDB never has the chance to correct this. There was no obvious point to which we could defer completing the nested type when completing the outer. Maybe adding that external source fallback? But I might've missed something obvious. A potential option would be to limit the completion performed in this patch to enums, though a general solution would be nice.

What we do for classes is we create forward declarations for them at first. As soon as someone wants to know more, we complete the type via the external source interface. This works for expressions since this is what precomiled headers do, it also works with our CompilerType as we can ask for the forward type, the layout type, or the complete type. I wonder if we can make this work for enums as well.

I am worried about a class that contains many types. Usually when we need to know something about a type, like in your example above, we will get a callback to the ClangExternalASTSourceCallbacks methods:

  void FindExternalLexicalDecls(
      const clang::DeclContext *DC,
      llvm::function_ref<bool(clang::Decl::Kind)> IsKindWeWant,
      llvm::SmallVectorImpl<clang::Decl *> &Result) override;

  bool FindExternalVisibleDeclsByName(const clang::DeclContext *DC,
                                      clang::DeclarationName Name) override;

So if you typed Info::Mask::Enum, the expression parser will first ask about "Info" with no containing decl context (or maybe it specifies the translation unit decl context to indicate it is a root type) and we will return the "struct Info". Then it would call FindExternalVisibleDeclsByName() with the DeclContext set to "struct Info" and we will find "Mask" inside of this type, then it will ask about "Enum" within the "Info::Mask" type and it should be in the type definition. Maybe some isn't working correctly for enums in this lookup mechanism?

Yup that was my understanding too. Though last I stepped through clang lookup for this case the call back into LLDB to get Mask in the Info DeclContext wasn't being done (or was "too late", I don't quite recall the details). It has been a while since I looked at this so I'll double check when I get the chance

@Michael137
Copy link
Member

Michael137 commented Sep 29, 2023

Thanks, the API test looks good

Apologies for the delay, didn't have a chance to follow-up on Greg's concern yet. But will hopefully get the time in the next few days

@Endilll
Copy link
Contributor Author

Endilll commented Oct 15, 2023

Superseded by #68705

@clayborg
Copy link
Collaborator

clayborg commented Jan 3, 2024

I recently ran into this issue when looking at new failures in:

#74786 (comment)

And this was referred to in bug report:

#53904

After looking at the code, I believe we do need to parse contained types for a struct/class/union as we don't get an external AST source call for them. So I do believe this is the right thing to do. Even if we do parse the extra types, if we have a class within a class, then it will add a forward declaration to it only, and expand it later if any only if it is needed. I have attached a slightly different version of this patch to #53904 for reference.

@clayborg
Copy link
Collaborator

clayborg commented Jan 3, 2024

So while you were able to work around issues with the expression parser with #68705, I do believe a modified version of this patch is needed.

@clayborg
Copy link
Collaborator

clayborg commented Jan 5, 2024

#77029 is a slightly modified version of this patch where it parses the types after the struct/union/class is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants