Skip to content

Make only one function that needs to be implemented when searching for types #74786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 13, 2023

Conversation

clayborg
Copy link
Collaborator

@clayborg clayborg commented Dec 7, 2023

This patch revives the effort to get this Phabricator patch into upstream:

https://reviews.llvm.org/D137900

This patch was accepted before in Phabricator but I found some -gsimple-template-names issues that are fixed in this patch.

A fixed up version of the description from the original patch starts now.

This patch started off trying to fix Module::FindFirstType() as it sometimes didn't work. The issue was the SymbolFile plug-ins didn't do any filtering of the matching types they produced, and they only looked up types using the type basename. This means if you have two types with the same basename, your type lookup can fail when only looking up a single type. We would ask the Module::FindFirstType to lookup "Foo::Bar" and it would ask the symbol file to find only 1 type matching the basename "Bar", and then we would filter out any matches that didn't match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it would work, but if it found "Baz::Bar" first, it would return only that type and it would be filtered out.

Discovering this issue lead me to think of the patch Alex Langford did a few months ago that was done for finding functions, where he allowed SymbolFile objects to make sure something fully matched before parsing the debug information into an AST type and other LLDB types. So this patch aimed to allow type lookups to also be much more efficient.

As LLDB has been developed over the years, we added more ways to to type lookups. These functions have lots of arguments. This patch aims to make one API that needs to be implemented that serves all previous lookups:

  • Find a single type
  • Find all types
  • Find types in a namespace

This patch introduces a TypeQuery class that contains all of the state needed to perform the lookup which is powerful enough to perform all of the type searches that used to be in our API. It contain a vector of CompilerContext objects that can fully or partially specify the lookup that needs to take place.

If you just want to lookup all types with a matching basename, regardless of the containing context, you can specify just a single CompilerContext entry that has a name and a CompilerContextKind mask of CompilerContextKind::AnyType.

Or you can fully specify the exact context to use when doing lookups like: CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"

This change expands on the clang modules code that already used a vector items, but it modifies it to work with expression type lookups which have contexts, or user lookups where users query for types. The clang modules type lookup is still an option that can be enabled on the TypeQuery objects.

This mirrors the most recent addition of type lookups that took a vector that allowed lookups to happen for the expression parser in certain places.

Prior to this we had the following APIs in Module:

void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeList &types);

void
Module::FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeMap &types);

void Module::FindTypesInNamespace(ConstString type_name,
                                  const CompilerDeclContext &parent_decl_ctx,
                                  size_t max_matches, TypeList &type_list);

The new Module API is much simpler. It gets rid of all three above functions and replaces them with:

void FindTypes(const TypeQuery &query, TypeResults &results);

The TypeQuery class contains all of the needed settings:

  • The vector that allow efficient lookups in the symbol file classes since they can look at basename matches only realize fully matching types. Before this any basename that matched was fully realized only to be removed later by code outside of the SymbolFile layer which could cause many types to be realized when they didn't need to.
  • If the lookup is exact or not. If not exact, then the compiler context must match the bottom most items that match the compiler context, otherwise it must match exactly
  • If the compiler context match is for clang modules or not. Clang modules matches include a Module compiler context kind that allows types to be matched only from certain modules and these matches are not needed when d oing user type lookups.
  • An optional list of languages to use to limit the search to only certain languages

The TypeResults object contains all state required to do the lookup and store the results:

  • The max number of matches
  • The set of SymbolFile objects that have already been searched
  • The matching type list for any matches that are found

The benefits of this approach are:

  • Simpler API, and only one API to implement in SymbolFile classes
  • Replaces the FindTypesInNamespace that used a CompilerDeclContext as a way to limit the search, but this only worked if the TypeSystem matched the current symbol file's type system, so you couldn't use it to lookup a type in another module
  • Fixes a serious bug in our FindFirstType functions where if we were searching for "foo::bar", and we found a "baz::bar" first, the basename would match and we would only fetch 1 type using the basename, only to drop it from the matching list and returning no results

@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-lldb

Author: Greg Clayton (clayborg)

Changes

This patch revives the effort to get this Phabricator patch into upstream:

https://reviews.llvm.org/D137900

This patch was accepted before in Phabricator but I found some -gsimple-template-names issues that are fixed in this patch.

A fixed up version of the description from the original patch starts now.

This patch started off trying to fix Module::FindFirstType() as it sometimes didn't work. The issue was the SymbolFile plug-ins didn't do any filtering of the matching types they produced, and they only looked up types using the type basename. This means if you have two types with the same basename, your type lookup can fail when only looking up a single type. We would ask the Module::FindFirstType to lookup "Foo::Bar" and it would ask the symbol file to find only 1 type matching the basename "Bar", and then we would filter out any matches that didn't match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it would work, but if it found "Baz::Bar" first, it would return only that type and it would be filtered out.

Discovering this issue lead me to think of the patch Alex Langford did a few months ago that was done for finding functions, where he allowed SymbolFile objects to make sure something fully matched before parsing the debug information into an AST type and other LLDB types. So this patch aimed to allow type lookups to also be much more efficient.

As LLDB has been developed over the years, we added more ways to to type lookups. These functions have lots of arguments. This patch aims to make one API that needs to be implemented that serves all previous lookups:

  • Find a single type
  • Find all types
  • Find types in a namespace

This patch introduces a TypeQuery class that contains all of the state needed to perform the lookup which is powerful enough to perform all of the type searches that used to be in our API. It contain a vector of CompilerContext objects that can fully or partially specify the lookup that needs to take place.

If you just want to lookup all types with a matching basename, regardless of the containing context, you can specify just a single CompilerContext entry that has a name and a CompilerContextKind mask of CompilerContextKind::AnyType.

Or you can fully specify the exact context to use when doing lookups like: CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"

This change expands on the clang modules code that already used a vector<CompilerContext> items, but it modifies it to work with expression type lookups which have contexts, or user lookups where users query for types. The clang modules type lookup is still an option that can be enabled on the TypeQuery objects.

This mirrors the most recent addition of type lookups that took a vector<CompilerContext> that allowed lookups to happen for the expression parser in certain places.

Prior to this we had the following APIs in Module:

void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
                  llvm::DenseSet&lt;lldb_private::SymbolFile *&gt; &amp;searched_symbol_files,
                  TypeList &amp;types);

void
Module::FindTypes(llvm::ArrayRef&lt;CompilerContext&gt; pattern, LanguageSet languages,
                  llvm::DenseSet&lt;lldb_private::SymbolFile *&gt; &amp;searched_symbol_files,
                  TypeMap &amp;types);

void Module::FindTypesInNamespace(ConstString type_name,
                                  const CompilerDeclContext &amp;parent_decl_ctx,
                                  size_t max_matches, TypeList &amp;type_list);

The new Module API is much simpler. It gets rid of all three above functions and replaces them with:

void FindTypes(const TypeQuery &amp;query, TypeResults &amp;results);

The TypeQuery class contains all of the needed settings:

  • The vector<CompilerContext> that allow efficient lookups in the symbol file classes since they can look at basename matches only realize fully matching types. Before this any basename that matched was fully realized only to be removed later by code outside of the SymbolFile layer which could cause many types to be realized when they didn't need to.
  • If the lookup is exact or not. If not exact, then the compiler context must match the bottom most items that match the compiler context, otherwise it must match exactly
  • If the compiler context match is for clang modules or not. Clang modules matches include a Module compiler context kind that allows types to be matched only from certain modules and these matches are not needed when d oing user type lookups.
  • An optional list of languages to use to limit the search to only certain languages

The TypeResults object contains all state required to do the lookup and store the results:

  • The max number of matches
  • The set of SymbolFile objects that have already been searched
  • The matching type list for any matches that are found

The benefits of this approach are:

  • Simpler API, and only one API to implement in SymbolFile classes
  • Replaces the FindTypesInNamespace that used a CompilerDeclContext as a way to limit the search, but this only worked if the TypeSystem matched the current symbol file's type system, so you couldn't use it to lookup a type in another module
  • Fixes a serious bug in our FindFirstType functions where if we were searching for "foo::bar", and we found a "baz::bar" first, the basename would match and we would only fetch 1 type using the basename, only to drop it from the matching list and returning no results

Patch is 119.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74786.diff

52 Files Affected:

  • (modified) lldb/include/lldb/Core/Module.h (+10-67)
  • (modified) lldb/include/lldb/Core/ModuleList.h (+10-14)
  • (modified) lldb/include/lldb/Symbol/CompilerDecl.h (+7)
  • (modified) lldb/include/lldb/Symbol/CompilerDeclContext.h (+8)
  • (modified) lldb/include/lldb/Symbol/SymbolFile.h (+14-15)
  • (modified) lldb/include/lldb/Symbol/SymbolFileOnDemand.h (+2-11)
  • (modified) lldb/include/lldb/Symbol/Type.h (+301)
  • (modified) lldb/include/lldb/Symbol/TypeMap.h (+4-2)
  • (modified) lldb/include/lldb/Symbol/TypeSystem.h (+7-17)
  • (modified) lldb/include/lldb/lldb-forward.h (+2)
  • (modified) lldb/include/lldb/lldb-private-enumerations.h (+4-1)
  • (modified) lldb/source/API/SBModule.cpp (+24-30)
  • (modified) lldb/source/API/SBTarget.cpp (+12-26)
  • (modified) lldb/source/Commands/CommandObjectMemory.cpp (+10-13)
  • (modified) lldb/source/Commands/CommandObjectTarget.cpp (+16-17)
  • (modified) lldb/source/Core/Module.cpp (+2-92)
  • (modified) lldb/source/Core/ModuleList.cpp (+11-26)
  • (modified) lldb/source/DataFormatters/TypeFormat.cpp (+5-6)
  • (modified) lldb/source/Plugins/ExpressionParser/Clang/ClangASTSource.cpp (+39-58)
  • (modified) lldb/source/Plugins/LanguageRuntime/CPlusPlus/ItaniumABI/ItaniumABILanguageRuntime.cpp (+18-12)
  • (modified) lldb/source/Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.cpp (+4-11)
  • (modified) lldb/source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp (-9)
  • (modified) lldb/source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.h (-9)
  • (modified) lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp (+9-14)
  • (modified) lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.h (+2-6)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+14-18)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp (+48)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.h (+14-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+125-145)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+2-8)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp (+4-19)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.h (+2-7)
  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp (+23-14)
  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h (+2-8)
  • (modified) lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp (+37-38)
  • (modified) lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.h (+2-13)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp (+68)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h (+6)
  • (modified) lldb/source/Symbol/CompilerDecl.cpp (+5)
  • (modified) lldb/source/Symbol/CompilerDeclContext.cpp (+7)
  • (modified) lldb/source/Symbol/SymbolFile.cpp (-11)
  • (modified) lldb/source/Symbol/SymbolFileOnDemand.cpp (+3-20)
  • (modified) lldb/source/Symbol/Type.cpp (+135-9)
  • (modified) lldb/source/Symbol/TypeMap.cpp (+10-4)
  • (modified) lldb/source/Symbol/TypeSystem.cpp (+10)
  • (modified) lldb/source/Target/Language.cpp (+4-6)
  • (added) lldb/test/API/functionalities/type_find_first/Makefile (+2)
  • (added) lldb/test/API/functionalities/type_find_first/TestFindFirstType.py (+38)
  • (added) lldb/test/API/functionalities/type_find_first/main.cpp (+17)
  • (modified) lldb/test/API/lang/cpp/unique-types4/TestUniqueTypes4.py (+9-10)
  • (modified) lldb/test/API/lang/cpp/unique-types4/main.cpp (+4)
  • (modified) lldb/tools/lldb-test/lldb-test.cpp (+26-22)
diff --git a/lldb/include/lldb/Core/Module.h b/lldb/include/lldb/Core/Module.h
index 2973ee0e7ec4a..76de74aeb8101 100644
--- a/lldb/include/lldb/Core/Module.h
+++ b/lldb/include/lldb/Core/Module.h
@@ -415,70 +415,19 @@ class Module : public std::enable_shared_from_this<Module>,
   void FindGlobalVariables(const RegularExpression &regex, size_t max_matches,
                            VariableList &variable_list);
 
-  /// Find types by name.
-  ///
-  /// Type lookups in modules go through the SymbolFile. The SymbolFile needs to
-  /// be able to lookup types by basename and not the fully qualified typename.
-  /// This allows the type accelerator tables to stay small, even with heavily
-  /// templatized C++. The type search will then narrow down the search
-  /// results. If "exact_match" is true, then the type search will only match
-  /// exact type name matches. If "exact_match" is false, the type will match
-  /// as long as the base typename matches and as long as any immediate
-  /// containing namespaces/class scopes that are specified match. So to
-  /// search for a type "d" in "b::c", the name "b::c::d" can be specified and
-  /// it will match any class/namespace "b" which contains a class/namespace
-  /// "c" which contains type "d". We do this to allow users to not always
-  /// have to specify complete scoping on all expressions, but it also allows
-  /// for exact matching when required.
-  ///
-  /// \param[in] type_name
-  ///     The name of the type we are looking for that is a fully
-  ///     or partially qualified type name.
-  ///
-  /// \param[in] exact_match
-  ///     If \b true, \a type_name is fully qualified and must match
-  ///     exactly. If \b false, \a type_name is a partially qualified
-  ///     name where the leading namespaces or classes can be
-  ///     omitted to make finding types that a user may type
-  ///     easier.
-  ///
-  /// \param[out] types
-  ///     A type list gets populated with any matches.
+  /// Find types using a type matching object that contains all search
+  /// parameters.
   ///
-  void
-  FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
-            llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-            TypeList &types);
-
-  /// Find types by name.
-  ///
-  /// This behaves like the other FindTypes method but allows to
-  /// specify a DeclContext and a language for the type being searched
-  /// for.
-  ///
-  /// \param searched_symbol_files
-  ///     Prevents one file from being visited multiple times.
-  void
-  FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
-            llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-            TypeMap &types);
-
-  lldb::TypeSP FindFirstType(const SymbolContext &sc, ConstString type_name,
-                             bool exact_match);
-
-  /// Find types by name that are in a namespace. This function is used by the
-  /// expression parser when searches need to happen in an exact namespace
-  /// scope.
+  /// \see lldb_private::TypeQuery
   ///
-  /// \param[in] type_name
-  ///     The name of a type within a namespace that should not include
-  ///     any qualifying namespaces (just a type basename).
+  /// \param[in] query
+  ///     A type matching object that contains all of the details of the type
+  ///     search.
   ///
-  /// \param[out] type_list
-  ///     A type list gets populated with any matches.
-  void FindTypesInNamespace(ConstString type_name,
-                            const CompilerDeclContext &parent_decl_ctx,
-                            size_t max_matches, TypeList &type_list);
+  /// \param[in] results
+  ///     Any matching types will be populated into the \a results object using
+  ///     TypeMap::InsertUnique(...).
+  void FindTypes(const TypeQuery &query, TypeResults &results);
 
   /// Get const accessor for the module architecture.
   ///
@@ -1122,12 +1071,6 @@ class Module : public std::enable_shared_from_this<Module>,
 private:
   Module(); // Only used internally by CreateJITModule ()
 
-  void FindTypes_Impl(
-      ConstString name, const CompilerDeclContext &parent_decl_ctx,
-      size_t max_matches,
-      llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-      TypeMap &types);
-
   Module(const Module &) = delete;
   const Module &operator=(const Module &) = delete;
 
diff --git a/lldb/include/lldb/Core/ModuleList.h b/lldb/include/lldb/Core/ModuleList.h
index 9826dd09e91d7..bb6128cc5c7ca 100644
--- a/lldb/include/lldb/Core/ModuleList.h
+++ b/lldb/include/lldb/Core/ModuleList.h
@@ -340,26 +340,22 @@ class ModuleList {
                                        lldb::SymbolType symbol_type,
                                        SymbolContextList &sc_list) const;
 
-  /// Find types by name.
+  /// Find types using a type matching object that contains all search
+  /// parameters.
   ///
   /// \param[in] search_first
   ///     If non-null, this module will be searched before any other
   ///     modules.
   ///
-  /// \param[in] name
-  ///     The name of the type we are looking for.
-  ///
-  /// \param[in] max_matches
-  ///     Allow the number of matches to be limited to \a
-  ///     max_matches. Specify UINT32_MAX to get all possible matches.
-  ///
-  /// \param[out] types
-  ///     A type list gets populated with any matches.
+  /// \param[in] query
+  ///     A type matching object that contains all of the details of the type
+  ///     search.
   ///
-  void FindTypes(Module *search_first, ConstString name,
-                 bool name_is_fully_qualified, size_t max_matches,
-                 llvm::DenseSet<SymbolFile *> &searched_symbol_files,
-                 TypeList &types) const;
+  /// \param[in] results
+  ///     Any matching types will be populated into the \a results object using
+  ///     TypeMap::InsertUnique(...).
+  void FindTypes(Module *search_first, const TypeQuery &query,
+                 lldb_private::TypeResults &results) const;
 
   bool FindSourceFile(const FileSpec &orig_spec, FileSpec &new_spec) const;
 
diff --git a/lldb/include/lldb/Symbol/CompilerDecl.h b/lldb/include/lldb/Symbol/CompilerDecl.h
index 67290b9be0663..d0c3eab669aa8 100644
--- a/lldb/include/lldb/Symbol/CompilerDecl.h
+++ b/lldb/include/lldb/Symbol/CompilerDecl.h
@@ -84,6 +84,13 @@ class CompilerDecl {
   // based argument index
   CompilerType GetFunctionArgumentType(size_t arg_idx) const;
 
+  /// Populate a valid compiler context from the current declaration.
+  ///
+  /// \returns A valid vector of CompilerContext entries that describes
+  /// this declaration. The first entry in the vector is the parent of
+  /// the subsequent entry, so the top most entry is the global namespace.
+  std::vector<lldb_private::CompilerContext> GetCompilerContext() const;
+
 private:
   TypeSystem *m_type_system = nullptr;
   void *m_opaque_decl = nullptr;
diff --git a/lldb/include/lldb/Symbol/CompilerDeclContext.h b/lldb/include/lldb/Symbol/CompilerDeclContext.h
index 61a9c9c341bfe..8640768c0eae4 100644
--- a/lldb/include/lldb/Symbol/CompilerDeclContext.h
+++ b/lldb/include/lldb/Symbol/CompilerDeclContext.h
@@ -11,6 +11,7 @@
 
 #include <vector>
 
+#include "lldb/Symbol/Type.h"
 #include "lldb/Utility/ConstString.h"
 #include "lldb/lldb-private.h"
 
@@ -56,6 +57,13 @@ class CompilerDeclContext {
     return m_type_system != nullptr && m_opaque_decl_ctx != nullptr;
   }
 
+  /// Populate a valid compiler context from the current decl context.
+  ///
+  /// \returns A valid vector of CompilerContext entries that describes
+  /// this declaration context. The first entry in the vector is the parent of
+  /// the subsequent entry, so the top most entry is the global namespace.
+  std::vector<lldb_private::CompilerContext> GetCompilerContext() const;
+
   std::vector<CompilerDecl> FindDeclByName(ConstString name,
                                            const bool ignore_using_decls);
 
diff --git a/lldb/include/lldb/Symbol/SymbolFile.h b/lldb/include/lldb/Symbol/SymbolFile.h
index a546b05bfd318..636becc9fe774 100644
--- a/lldb/include/lldb/Symbol/SymbolFile.h
+++ b/lldb/include/lldb/Symbol/SymbolFile.h
@@ -301,21 +301,20 @@ class SymbolFile : public PluginInterface {
                              bool include_inlines, SymbolContextList &sc_list);
   virtual void FindFunctions(const RegularExpression &regex,
                              bool include_inlines, SymbolContextList &sc_list);
-  virtual void
-  FindTypes(ConstString name, const CompilerDeclContext &parent_decl_ctx,
-            uint32_t max_matches,
-            llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-            TypeMap &types);
-
-  /// Find types specified by a CompilerContextPattern.
-  /// \param languages
-  ///     Only return results in these languages.
-  /// \param searched_symbol_files
-  ///     Prevents one file from being visited multiple times.
-  virtual void
-  FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
-            llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-            TypeMap &types);
+
+  /// Find types using a type matching object that contains all search
+  /// parameters.
+  ///
+  /// \see lldb_private::TypeQuery
+  ///
+  /// \param[in] query
+  ///     A type matching object that contains all of the details of the type
+  ///     search.
+  ///
+  /// \param[in] results
+  ///     Any matching types will be populated into the \a results object using
+  ///     TypeMap::InsertUnique(...).
+  virtual void FindTypes(const TypeQuery &query, TypeResults &results) {}
 
   virtual void
   GetMangledNamesForFunction(const std::string &scope_qualified_name,
diff --git a/lldb/include/lldb/Symbol/SymbolFileOnDemand.h b/lldb/include/lldb/Symbol/SymbolFileOnDemand.h
index 9cbcef2a111d3..cde9f3c3b8ce1 100644
--- a/lldb/include/lldb/Symbol/SymbolFileOnDemand.h
+++ b/lldb/include/lldb/Symbol/SymbolFileOnDemand.h
@@ -152,17 +152,8 @@ class SymbolFileOnDemand : public lldb_private::SymbolFile {
       const std::string &scope_qualified_name,
       std::vector<lldb_private::ConstString> &mangled_names) override;
 
-  void
-  FindTypes(lldb_private::ConstString name,
-            const lldb_private::CompilerDeclContext &parent_decl_ctx,
-            uint32_t max_matches,
-            llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
-            lldb_private::TypeMap &types) override;
-
-  void FindTypes(llvm::ArrayRef<lldb_private::CompilerContext> pattern,
-                 lldb_private::LanguageSet languages,
-                 llvm::DenseSet<SymbolFile *> &searched_symbol_files,
-                 lldb_private::TypeMap &types) override;
+  void FindTypes(const lldb_private::TypeQuery &query,
+                 lldb_private::TypeResults &results) override;
 
   void GetTypes(lldb_private::SymbolContextScope *sc_scope,
                 lldb::TypeClass type_mask,
diff --git a/lldb/include/lldb/Symbol/Type.h b/lldb/include/lldb/Symbol/Type.h
index 15edbea3cc7ae..f0e6f943f6df7 100644
--- a/lldb/include/lldb/Symbol/Type.h
+++ b/lldb/include/lldb/Symbol/Type.h
@@ -12,11 +12,15 @@
 #include "lldb/Core/Declaration.h"
 #include "lldb/Symbol/CompilerDecl.h"
 #include "lldb/Symbol/CompilerType.h"
+#include "lldb/Symbol/TypeList.h"
+#include "lldb/Symbol/TypeMap.h"
+#include "lldb/Symbol/TypeSystem.h"
 #include "lldb/Utility/ConstString.h"
 #include "lldb/Utility/UserID.h"
 #include "lldb/lldb-private.h"
 
 #include "llvm/ADT/APSInt.h"
+#include "llvm/ADT/DenseSet.h"
 
 #include <optional>
 #include <set>
@@ -24,6 +28,23 @@
 namespace lldb_private {
 class SymbolFileCommon;
 
+/// A SmallBitVector that represents a set of source languages (\p
+/// lldb::LanguageType).  Each lldb::LanguageType is represented by
+/// the bit with the position of its enumerator. The largest
+/// LanguageType is < 64, so this is space-efficient and on 64-bit
+/// architectures a LanguageSet can be completely stack-allocated.
+struct LanguageSet {
+  llvm::SmallBitVector bitvector;
+  LanguageSet();
+
+  /// If the set contains a single language only, return it.
+  std::optional<lldb::LanguageType> GetSingularLanguage();
+  void Insert(lldb::LanguageType language);
+  bool Empty() const;
+  size_t Size() const;
+  bool operator[](unsigned i) const;
+};
+
 /// CompilerContext allows an array of these items to be passed to perform
 /// detailed lookups in SymbolVendor and SymbolFile functions.
 struct CompilerContext {
@@ -45,6 +66,286 @@ struct CompilerContext {
 bool contextMatches(llvm::ArrayRef<CompilerContext> context_chain,
                     llvm::ArrayRef<CompilerContext> pattern);
 
+FLAGS_ENUM(TypeQueryOptions){
+    e_none = 0u,
+    /// If set TypeQuery::m_context contains an exact context that must match
+    /// the full context. If not set TypeQuery::m_context can contain a partial
+    /// type match where the full context isn't fully specified.
+    e_exact_match = (1u << 0),
+    /// If set, TypeQuery::m_context is a clang module compiler context. If not
+    /// set TypeQuery::m_context is normal type lookup context.
+    e_module_search = (1u << 1),
+    /// When true, the find types call should stop the query as soon as a single
+    /// matching type is found. When false, the type query should find all
+    /// matching types.
+    e_find_one = (1u << 2),
+};
+LLDB_MARK_AS_BITMASK_ENUM(TypeQueryOptions)
+
+/// A class that contains all state required for type lookups.
+///
+/// Using a TypeQuery class for matching types simplifies the internal APIs we
+/// need to implement type lookups in LLDB. Type lookups can fully specify the
+/// exact typename by filling out a complete or partial CompilerContext array.
+/// This allows for powerful searches and also allows the SymbolFile classes to
+/// use the m_context array to lookup types by basename, then eliminate
+/// potential matches without having to resolve types into each TypeSystem. This
+/// makes type lookups vastly more efficient and allows the SymbolFile objects
+/// to stop looking up types when the type matching is complete, like if we are
+/// looking for only a single type in our search.
+class TypeQuery {
+public:
+  TypeQuery() = delete;
+
+  /// Construct a type match object using a fully or partially qualified name.
+  ///
+  /// The specified \a type_name will be chopped up and the m_context will be
+  /// populated by separating the string by looking for "::". We do this because
+  /// symbol files have indexes that contain only the type's basename. This also
+  /// allows symbol files to efficiently not realize types that don't match the
+  /// specified context. Example of \a type_name values that can be specified
+  /// include:
+  ///   "foo": Look for any type whose basename matches "foo".
+  ///     If \a exact_match is true, then the type can't be contained in any
+  ///     declaration context like a namespace, class, or other containing
+  ///     scope.
+  ///     If \a exact match is false, then we will find all matches including
+  ///     ones that are contained in other declaration contexts, including top
+  ///     level types.
+  ///   "foo::bar": Look for any type whose basename matches "bar" but make sure
+  ///     its parent declaration context is any named declaration context
+  ///     (namespace, class, struct, etc) whose name matches "foo".
+  ///     If \a exact_match is true, then the "foo" declaration context must
+  ///     appear at the source file level or inside of a function.
+  ///     If \a exact match is false, then the "foo" declaration context can
+  ///     be contained in any other declaration contexts.
+  ///   "class foo": Only match types that are classes whose basename matches
+  ///     "foo".
+  ///   "struct foo": Only match types that are structures whose basename
+  ///     matches "foo".
+  ///   "class foo::bar": Only match types that are classes whose basename
+  ///     matches "bar" and that are contained in any named declaration context
+  ///     named "foo".
+  ///
+  /// \param[in] type_name
+  ///   A fully or partially qualified type name. This name will be parsed and
+  ///   broken up and the m_context will be populated with the various parts of
+  ///   the name. This typename can be prefixed with "struct ", "class ",
+  ///   "union", "enum " or "typedef " before the actual type name to limit the
+  ///   results of the types that match. The declaration context can be
+  ///   specified with the "::" string. like "a::b::my_type".
+  ///
+  /// \param[in] options A set of boolean enumeration flags from the
+  ///   TypeQueryOptions enumerations. \see TypeQueryOptions.
+  TypeQuery(llvm::StringRef name, TypeQueryOptions options = e_none);
+
+  /// Construct a type match object that matches a type basename that exists
+  /// in the specified declaration context.
+  ///
+  /// This allows the m_context to be first populated using a declaration
+  /// context to exactly identify the containing declaration context of a type.
+  /// This can be used when you have a forward declaration to a type and you
+  /// need to search for its complete type.
+  ///
+  /// \param[in] decl_ctx
+  ///   A declaration context object that comes from a TypeSystem plug-in. This
+  ///   object will be asked to full the array of CompilerContext objects
+  ///   by adding the top most declaration context first into the array and then
+  ///   adding any containing declaration contexts.
+  ///
+  /// \param[in] type_basename
+  ///   The basename of the type to lookup in the specified declaration context.
+  ///
+  /// \param[in] options A set of boolean enumeration flags from the
+  ///   TypeQueryOptions enumerations. \see TypeQueryOptions.
+  TypeQuery(const CompilerDeclContext &decl_ctx, ConstString type_basename,
+            TypeQueryOptions options = e_none);
+  /// Construct a type match object using a compiler declaration that specifies
+  /// a typename and a declaration context to use when doing exact type lookups.
+  ///
+  /// This allows the m_context to be first populated using a type declaration.
+  /// The type declaration might have a declaration context and each TypeSystem
+  /// plug-in can populate the declaration context needed to perform an exact
+  /// lookup for a type.
+  /// This can be used when you have a forward declaration to a type and you
+  /// need to search for its complete type.
+  ///
+  /// \param[in] decl
+  ///   A type declaration context object that comes from a TypeSystem plug-in.
+  ///   This object will be asked to full the array of CompilerContext objects
+  ///   by adding the top most declaration context first into the array and then
+  ///   adding any containing declaration contexts, and ending with the exact
+  ///   typename and the kind of type it is (class, struct, union, enum, etc).
+  ///
+  /// \param[in] options A set of boolean enumeration flags from the
+  ///   TypeQueryOptions enumerations. \see TypeQueryOptions.
+  TypeQuery(const CompilerDecl &decl, TypeQueryOptions options = e_none);
+
+  /// Construct a type match object using a CompilerContext array.
+  ///
+  /// Clients can manually create compiler contexts and use these to find
+  /// matches when searching for types. There are two types of contexts that
+  /// are supported when doing type searchs: type contexts and clang module
+  /// contexts. Type contexts have contexts that specify the type and its
+  /// containing declaration context like namespaces and classes. Clang module
+  /// contexts specify contexts more completely to find exact matches within
+  /// clang module debug information. They will include the modules that the
+  /// type is included in and any functions that the type might be defined in.
+  /// This allows very fine grained type resolution.
+  ///
+  /// \param[in] context The compiler context to use when doing the search.
+  ///
+  /// \param[in] options A set of boolean enumeration flags from the
+  ///   TypeQueryOptions enumerations. \see TypeQueryOptions.
+  TypeQuery(const llvm::ArrayRef<lldb_private::CompilerContext> &context,
+            TypeQueryOptions options = e_none);
+
+  /// Construct a type match object that duplicates all matching criterea,
+  /// but not any searched symbol files or the type map for matches. This allows
+  /// the m_context to be modified prior to performing another search.
+  TypeQuery(cons...
[truncated]

@clayborg
Copy link
Collaborator Author

clayborg commented Dec 7, 2023

@adrian-prantl This patch is exactly the same as it was back when you accepted it in Phabricator with two things added:

  • fixes for -gsimple-template-names
  • updated the new SBType::FindDirectNestedType(...)

All tests pass on macOS.

Copy link

github-actions bot commented Dec 7, 2023

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 365777ecbe18777431681fb54d068885044c6ef1...16c66eaebd28335a4e68e8d95617891962726748 lldb/test/API/functionalities/type_find_first/TestFindFirstType.py lldb/test/API/lang/cpp/unique-types4/TestUniqueTypes4.py
View the diff from darker here.
--- functionalities/type_find_first/TestFindFirstType.py	2023-12-13 00:21:40.000000 +0000
+++ functionalities/type_find_first/TestFindFirstType.py	2023-12-13 00:24:45.120711 +0000
@@ -6,22 +6,21 @@
 from lldbsuite.test.lldbtest import *
 from lldbsuite.test import lldbutil
 
 
 class TypeFindFirstTestCase(TestBase):
-
     NO_DEBUG_INFO_TESTCASE = True
 
     def test_find_first_type(self):
         """
-            Test SBTarget::FindFirstType() and SBModule::FindFirstType() APIs.
+        Test SBTarget::FindFirstType() and SBModule::FindFirstType() APIs.
 
-            This function had regressed after some past modification of the type
-            lookup internal code where if we had multiple types with the same
-            basename, FindFirstType() could end up failing depending on which
-            type was found first in the debug info indexes. This test will
-            ensure this doesn't regress in the future.
+        This function had regressed after some past modification of the type
+        lookup internal code where if we had multiple types with the same
+        basename, FindFirstType() could end up failing depending on which
+        type was found first in the debug info indexes. This test will
+        ensure this doesn't regress in the future.
         """
         self.build()
         target = self.createTestTarget()
         # Test the SBTarget APIs for FindFirstType
         integer_type = target.FindFirstType("Integer::Point")
--- lang/cpp/unique-types4/TestUniqueTypes4.py	2023-12-13 00:21:41.000000 +0000
+++ lang/cpp/unique-types4/TestUniqueTypes4.py	2023-12-13 00:24:45.149659 +0000
@@ -15,24 +15,31 @@
         lldbutil.run_to_source_breakpoint(
             self, "// Set breakpoint here", lldb.SBFileSpec("main.cpp")
         )
         # FIXME: these should successfully print the values
         self.expect(
-            "expression ns::Foo<double>::value", substrs=["'Foo' in namespace 'ns'"], error=True
+            "expression ns::Foo<double>::value",
+            substrs=["'Foo' in namespace 'ns'"],
+            error=True,
         )
         self.expect(
-            "expression ns::Foo<int>::value", substrs=["'Foo' in namespace 'ns'"], error=True
+            "expression ns::Foo<int>::value",
+            substrs=["'Foo' in namespace 'ns'"],
+            error=True,
         )
         self.expect(
-            "expression ns::Bar<double>::value", substrs=["'Bar' in namespace 'ns'"], error=True
+            "expression ns::Bar<double>::value",
+            substrs=["'Bar' in namespace 'ns'"],
+            error=True,
         )
         self.expect(
-            "expression ns::Bar<int>::value", substrs=["'Bar' in namespace 'ns'"], error=True
+            "expression ns::Bar<int>::value",
+            substrs=["'Bar' in namespace 'ns'"],
+            error=True,
         )
         self.expect_expr("ns::FooDouble::value", result_type="double", result_value="0")
         self.expect_expr("ns::FooInt::value", result_type="int", result_value="0")
-
 
     @skipIf(compiler=no_match("clang"))
     @skipIf(compiler_version=["<", "15.0"])
     def test_simple_template_names(self):
         self.do_test(dict(CFLAGS_EXTRAS="-gsimple-template-names"))

@felipepiovezan felipepiovezan self-requested a review December 8, 2023 00:43
@felipepiovezan
Copy link
Contributor

I am really struggling to handle the scope of the changes here, IIUC there are a handful of changes that could be split into separate commits and merged independently of each other: there is some code being moved around, new GetCompilerContext APIs, the new query, the replacing of the old queries with the new queries. All of these could be different changes? It would also making reverting things a lot easier if we end up breaking bots down the line, and some of these changes are just "NFC goodness" that would never need to be reverted

@clayborg
Copy link
Collaborator Author

clayborg commented Dec 8, 2023

I am really struggling to handle the scope of the changes here, IIUC there are a handful of changes that could be split into separate commits and merged independently of each other: there is some code being moved around, new GetCompilerContext APIs, the new query, the replacing of the old queries with the new queries. All of these could be different changes? It would also making reverting things a lot easier if we end up breaking bots down the line, and some of these changes are just "NFC goodness" that would never need to be reverted

I would prefer to keep this all in one as I spent a month last year making all of the changes needed for this patch and it was accepted in Phabricator by Adrian after many iterations. While some of the changes are "NFC goodness", they don't make sense without this patch and we usually don't commit changes that aren't used (the GetCompilerContext APIs). The only code that was moved was the Language set. And if we want to revert this, I would say this is easier to revert as a single commit.

@adrian-prantl thoughts?

@adrian-prantl
Copy link
Collaborator

Sorry, it's been a while since I last reviewed this and I must admit I forgot about this in the mean time. The only thing I'd like to check before we can land this is to hear from @Michael137 who has been actively working on FindTypes performance recently whether this aligns or clashes with what he's been working on so we don't get a mid-air collision between patches.

@clayborg
Copy link
Collaborator Author

clayborg commented Dec 9, 2023

Sorry, it's been a while since I last reviewed this and I must admit I forgot about this in the mean time. The only thing I'd like to check before we can land this is to hear from @Michael137 who has been actively working on FindTypes performance recently whether this aligns or clashes with what he's been working on so we don't get a mid-air collision between patches.

Sounds good, I look forward to hearing if this fixes the issues as well. Before this fix I ran a case where I did an expression like "MyClass::iterator" and before this fix, with linux and no accelerator tables, we would convert DWARF into clang AST types for over 2 million types, and after this fix it went down to 11. This was when debugging LLDB + LLVM + clang with debug info, so there were TONS of "iterator" entries in the manual DWARF index.

On Apple the type accelerator tables have the hash of the fully qualified name which can help reduce the number of types we parse, but we don't have this on linux even with DWARF5 + .debug_names

Copy link
Contributor

@hawkinsw hawkinsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work. I really appreciate the excellent documentation that you wrote. I am not well versed enough in the code to comment on the implementation, but your comments were so good that I intuitively understood the point of the new classes. I hope that some of these suggestions are helpful. Again, these are suggestions and you should not feel like you are under any obligation to accept them.

Thanks for the great work!

///
/// \returns A valid vector of CompilerContext entries that describes
/// this declaration context. The first entry in the vector is the parent of
/// the subsequent entry, so the top most entry is the global namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the subsequent entry, so the top most entry is the global namespace.
/// the subsequent entry, so the topmost entry is the global namespace.

///
/// \returns A valid vector of CompilerContext entries that describes
/// this declaration. The first entry in the vector is the parent of
/// the subsequent entry, so the top most entry is the global namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the subsequent entry, so the top most entry is the global namespace.
/// the subsequent entry, so the topmost entry is the global namespace.

/// \param[out] types
/// A type list gets populated with any matches.
/// \param[in] query
/// A type matching object that contains all of the details of the type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A type matching object that contains all of the details of the type
/// A type-matching object that contains all of the details of the type

/// The name of a type within a namespace that should not include
/// any qualifying namespaces (just a type basename).
/// \param[in] query
/// A type matching object that contains all of the details of the type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A type matching object that contains all of the details of the type
/// A type-matching object that contains all of the details of the type

llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
TypeMap &types);

/// Find types using a type matching object that contains all search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Find types using a type matching object that contains all search
/// Find types using a type-matching object that contains all search

/// The \a m_context can be used in two ways: normal types searching with
/// the context containing a stanadard declaration context for a type, or
/// with the context being more complete for exact matches in clang modules.
/// Se this to true if you wish to search for a type in clang module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Se this to true if you wish to search for a type in clang module.
/// Set this to true if you wish to search for a type in clang module.

protected:
/// A full or partial compiler context array where the parent declaration
/// contexts appear at the top of the array starting at index zero and the
/// last entry is contains the type and name of the type we are looking for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// last entry is contains the type and name of the type we are looking for.
/// last entry contains the type and name of the type we are looking for.

/// An options bitmask that contains enabled options for the type query.
/// \see TypeQueryOptions.
TypeQueryOptions m_options;
/// If this has a value, then the language family must match at least one of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If this has a value, then the language family must match at least one of
/// If this variable has a value, then the language family must match at least one of

/// \see TypeQueryOptions.
TypeQueryOptions m_options;
/// If this has a value, then the language family must match at least one of
/// the specified languages. If this has no value, then the language of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the specified languages. If this has no value, then the language of the
/// the specified languages. If this variable has no value, then the language of the

const TypeMap &GetTypeMap() const { return m_type_map; }

private:
/// Matching types get added to this maps as type search continues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Matching types get added to this maps as type search continues.
/// Matching types get added to this map as type search continues.

@Michael137
Copy link
Member

Michael137 commented Dec 11, 2023

Sorry, it's been a while since I last reviewed this and I must admit I forgot about this in the mean time. The only thing I'd like to check before we can land this is to hear from @Michael137 who has been actively working on FindTypes performance recently whether this aligns or clashes with what he's been working on so we don't get a mid-air collision between patches.

Sounds good, I look forward to hearing if this fixes the issues as well. Before this fix I ran a case where I did an expression like "MyClass::iterator" and before this fix, with linux and no accelerator tables, we would convert DWARF into clang AST types for over 2 million types, and after this fix it went down to 11. This was when debugging LLDB + LLVM + clang with debug info, so there were TONS of "iterator" entries in the manual DWARF index.

On Apple the type accelerator tables have the hash of the fully qualified name which can help reduce the number of types we parse, but we don't have this on linux even with DWARF5 + .debug_names

To add more context to @adrian-prantl's comment, we've been working on reviving https://reviews.llvm.org/D101950, which changes the way we complete record types. The main blocker was performance, because as you mention, we often end up doing these expensive namespace searches, which gets much more noticeable with D101950. The problem we saw was mostly triggered by the second lookup that -gsimple-template-names does in FindTypes.

I applied your patch on top of D101950 and performance seems to be much better (by 2 orders of magnitude). It did break some tests locally for me, so I'm just skimming this patch to see what changed

///
/// \param[in] options A set of boolean enumeration flags from the
/// TypeQueryOptions enumerations. \see TypeQueryOptions.
TypeQuery(llvm::StringRef name, TypeQueryOptions options = e_none);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments refer to a parameter called type_name, but it's called name

@@ -437,26 +437,25 @@ lldb::SBType SBModule::FindFirstType(const char *name_cstr) {
LLDB_INSTRUMENT_VA(this, name_cstr);

ModuleSP module_sp(GetSP());
if (!name_cstr || !module_sp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we get rid of the early return here?

module_sp->FindTypes(query, results);
TypeSP type_sp = results.GetFirstType();
if (type_sp)
return SBType(type_sp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously would only return sb_type if sb_type.IsValid() == true. Does this mean we can now return an "invalid" SBType? Or is this checked elsewhere now?

/// matches "foo".
/// "class foo::bar": Only match types that are classes whose basename
/// matches "bar" and that are contained in any named declaration context
/// named "foo".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also document the special behaviour of a query that begins with ::? It looks like in those cases we set exact_match to true?

retval.Append(SBType(type_sp));
}
for (const TypeSP &type_sp : results.GetTypeMap().Types())
retval.Append(SBType(type_sp));
Copy link
Member

@Michael137 Michael137 Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a debug-assert here (or maybe in InsertUnique) that tests type_sp != nullptr? Or are we guaranteed somewhere that these won't be nullptr?

TypeQuery query(typename_cstr);
TypeResults results;
images.FindTypes(nullptr, query, results);
for (const TypeSP &type_sp : results.GetTypeMap().Types())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a debug-assert here (or maybe in InsertUnique) that tests type_sp != nullptr? Or are we guaranteed somewhere that these won't be nullptr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change TypeResults::InsertUnique(...) to not add any invalid TypeSP objects.

SymbolContext sc;
if (module)
sc.module_sp = module->shared_from_this();
sc.SortTypeList(results.GetTypeMap(), type_list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this type-map sorting used to also be done in the old FindTypes APIs, but can you leave a comment on why we need to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -146,6 +133,9 @@ class TypeSystem : public PluginInterface,
virtual CompilerDeclContext
GetCompilerDeclContextForType(const CompilerType &type);

virtual std::vector<lldb_private::CompilerContext>
DeclContextGetCompilerContext(void *opaque_decl_ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this function be marked const?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not put any limitations on TypeSystem plug-ins in case they can be more efficient if they can put off parsing information about a type until accessors are called on the DeclContext objects.

@@ -122,6 +106,9 @@ class TypeSystem : public PluginInterface,
virtual CompilerType DeclGetFunctionArgumentType(void *opaque_decl,
size_t arg_idx);

virtual std::vector<lldb_private::CompilerContext>
DeclGetCompilerContext(void *opaque_decl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this function be marked const?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not put any limitations on TypeSystem plug-ins in case they can be more efficient if they can put off parsing information about a type until accessors are called on the Decl objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea makes sense, looks like that's what all the other TypeSystem APIs do too


// // There is an assumption 'name' is not a regex
// FindTypesByName(name.GetStringRef(), parent_decl_ctx, max_matches, types);
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@@ -9070,6 +9070,66 @@ size_t TypeSystemClang::DeclGetFunctionNumArguments(void *opaque_decl) {
return 0;
}

static CompilerContextKind GetCompilerKind(clang::Decl::Kind clang_kind,
clang::DeclContext *decl_ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clang::DeclContext *decl_ctx) {
clang::DeclContext const *decl_ctx) {

@@ -12,33 +12,32 @@ class UniqueTypesTestCase4(TestBase):
def do_test(self, debug_flags):
"""Test that we display the correct template instantiation."""
self.build(dictionary=debug_flags)
exe = self.getBuildArtifact("a.out")
print(exe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from debugging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will remove

lldbutil.run_to_source_breakpoint(
self, "// Set breakpoint here", lldb.SBFileSpec("main.cpp")
)
# FIXME: these should successfully print the values
self.expect(
"expression ns::Foo<double>::value", substrs=["no member named"], error=True
"print ns::Foo<double>::value", substrs=["no template named 'Foo' in namespace 'ns'"], error=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change these to print?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched them back to use expression

"expression ns::FooDouble::value",
substrs=["Couldn't look up symbols"],
error=True,
"print ns::FooDouble::value", substrs=["(double) 0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be now turned into expect_expr

"expression ns::FooInt::value",
substrs=["Couldn't look up symbols"],
error=True,
"print ns::FooInt::value", substrs=["(int) 0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect_expr

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (just left some small comments)

if (scope.consume_front("::"))
m_options |= e_exact_match;
if (!scope.empty()) {
std::pair<llvm::StringRef, llvm::StringRef> scope_pair =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to block the PR on this, but wouldn't using the split(SmallVector<StringRef> &) overload make this whole block simpler?

}

bool TypeResults::AlreadySearched(lldb_private::SymbolFile *sym_file) {
return !m_searched_symbol_files.insert(sym_file).second;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !m_searched_symbol_files.insert(sym_file).second;
return m_searched_symbol_files.contains(sym_file);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to try and insert it as this function is called as something that puts the symbol file into the map and returns false if it wasn't in the set. Or it tries to add it and returns true if it was alraedy in there. I have modified the header documentation to make this more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense

@Michael137
Copy link
Member

BTW, lets prefix the commit with [lldb] before merging

Copy link

github-actions bot commented Dec 11, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 365777ecbe18777431681fb54d068885044c6ef1 16c66eaebd28335a4e68e8d95617891962726748 -- lldb/test/API/functionalities/type_find_first/main.cpp lldb/include/lldb/Core/Module.h lldb/include/lldb/Core/ModuleList.h lldb/include/lldb/Symbol/CompilerDecl.h lldb/include/lldb/Symbol/CompilerDeclContext.h lldb/include/lldb/Symbol/SymbolFile.h lldb/include/lldb/Symbol/SymbolFileOnDemand.h lldb/include/lldb/Symbol/Type.h lldb/include/lldb/Symbol/TypeMap.h lldb/include/lldb/Symbol/TypeSystem.h lldb/include/lldb/lldb-forward.h lldb/include/lldb/lldb-private-enumerations.h lldb/source/API/SBModule.cpp lldb/source/API/SBTarget.cpp lldb/source/Commands/CommandObjectMemory.cpp lldb/source/Commands/CommandObjectTarget.cpp lldb/source/Core/Module.cpp lldb/source/Core/ModuleList.cpp lldb/source/DataFormatters/TypeFormat.cpp lldb/source/Plugins/ExpressionParser/Clang/ClangASTSource.cpp lldb/source/Plugins/LanguageRuntime/CPlusPlus/ItaniumABI/ItaniumABILanguageRuntime.cpp lldb/source/Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.cpp lldb/source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp lldb/source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.h lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.h lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.h lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.h lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.h lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h lldb/source/Symbol/CompilerDecl.cpp lldb/source/Symbol/CompilerDeclContext.cpp lldb/source/Symbol/SymbolFile.cpp lldb/source/Symbol/SymbolFileOnDemand.cpp lldb/source/Symbol/Type.cpp lldb/source/Symbol/TypeMap.cpp lldb/source/Symbol/TypeSystem.cpp lldb/source/Target/Language.cpp lldb/test/API/lang/cpp/unique-types4/main.cpp lldb/tools/lldb-test/lldb-test.cpp
View the diff from clang-format here.
diff --git a/lldb/source/Commands/CommandObjectTarget.cpp b/lldb/source/Commands/CommandObjectTarget.cpp
index bc8bc51356..f84ecdeab2 100644
--- a/lldb/source/Commands/CommandObjectTarget.cpp
+++ b/lldb/source/Commands/CommandObjectTarget.cpp
@@ -2085,7 +2085,8 @@ protected:
           }
           if (INTERRUPT_REQUESTED(GetDebugger(),
                                   "Interrupted in dump all symtabs with {0} "
-                                  "of {1} dumped.", num_dumped, num_modules))
+                                  "of {1} dumped.",
+                                  num_dumped, num_modules))
             break;
 
           num_dumped++;
@@ -2113,9 +2114,10 @@ protected:
                 result.GetOutputStream().EOL();
                 result.GetOutputStream().EOL();
               }
-              if (INTERRUPT_REQUESTED(GetDebugger(),
-                    "Interrupted in dump symtab list with {0} of {1} dumped.",
-                    num_dumped, num_matches))
+              if (INTERRUPT_REQUESTED(
+                      GetDebugger(),
+                      "Interrupted in dump symtab list with {0} of {1} dumped.",
+                      num_dumped, num_matches))
                 break;
 
               num_dumped++;
@@ -2176,9 +2178,10 @@ protected:
       result.GetOutputStream().Format("Dumping sections for {0} modules.\n",
                                       num_modules);
       for (size_t image_idx = 0; image_idx < num_modules; ++image_idx) {
-        if (INTERRUPT_REQUESTED(GetDebugger(),
-              "Interrupted in dump all sections with {0} of {1} dumped",
-              image_idx, num_modules))
+        if (INTERRUPT_REQUESTED(
+                GetDebugger(),
+                "Interrupted in dump all sections with {0} of {1} dumped",
+                image_idx, num_modules))
           break;
 
         num_dumped++;
@@ -2197,9 +2200,10 @@ protected:
             FindModulesByName(target, arg_cstr, module_list, true);
         if (num_matches > 0) {
           for (size_t i = 0; i < num_matches; ++i) {
-            if (INTERRUPT_REQUESTED(GetDebugger(),
-                  "Interrupted in dump section list with {0} of {1} dumped.",
-                  i, num_matches))
+            if (INTERRUPT_REQUESTED(
+                    GetDebugger(),
+                    "Interrupted in dump section list with {0} of {1} dumped.",
+                    i, num_matches))
               break;
 
             Module *module = module_list.GetModulePointerAtIndex(i);
@@ -2339,9 +2343,10 @@ protected:
       }
 
       for (size_t i = 0; i < num_matches; ++i) {
-        if (INTERRUPT_REQUESTED(GetDebugger(),
-              "Interrupted in dump clang ast list with {0} of {1} dumped.",
-              i, num_matches))
+        if (INTERRUPT_REQUESTED(
+                GetDebugger(),
+                "Interrupted in dump clang ast list with {0} of {1} dumped.", i,
+                num_matches))
           break;
 
         Module *m = module_list.GetModulePointerAtIndex(i);
@@ -2480,8 +2485,8 @@ protected:
           for (ModuleSP module_sp : target_modules.ModulesNoLocking()) {
             if (INTERRUPT_REQUESTED(GetDebugger(),
                                     "Interrupted in dump all line tables with "
-                                    "{0} of {1} dumped", num_dumped,
-                                    num_modules))
+                                    "{0} of {1} dumped",
+                                    num_dumped, num_modules))
               break;
 
             if (DumpCompileUnitLineTable(

@clayborg
Copy link
Collaborator Author

BTW, lets prefix the commit with [lldb] before merging

Do we need to fix this in the commit message when I squash? Or do I just update the title of this PR?

@clayborg
Copy link
Collaborator Author

Sorry, it's been a while since I last reviewed this and I must admit I forgot about this in the mean time. The only thing I'd like to check before we can land this is to hear from @Michael137 who has been actively working on FindTypes performance recently whether this aligns or clashes with what he's been working on so we don't get a mid-air collision between patches.

Sounds good, I look forward to hearing if this fixes the issues as well. Before this fix I ran a case where I did an expression like "MyClass::iterator" and before this fix, with linux and no accelerator tables, we would convert DWARF into clang AST types for over 2 million types, and after this fix it went down to 11. This was when debugging LLDB + LLVM + clang with debug info, so there were TONS of "iterator" entries in the manual DWARF index.
On Apple the type accelerator tables have the hash of the fully qualified name which can help reduce the number of types we parse, but we don't have this on linux even with DWARF5 + .debug_names

To add more context to @adrian-prantl's comment, we've been working on reviving https://reviews.llvm.org/D101950, which changes the way we complete record types. The main blocker was performance, because as you mention, we often end up doing these expensive namespace searches, which gets much more noticeable with D101950. The problem we saw was mostly triggered by the second lookup that -gsimple-template-names does in FindTypes.

I applied your patch on top of D101950 and performance seems to be much better (by 2 orders of magnitude). It did break some tests locally for me, so I'm just skimming this patch to see what changed

All tests passed on my machine from with this, so you might try those tests without D101950 and see how things go? Happy to help if you need info on how to do things with the new lookups.

@Michael137
Copy link
Member

Sorry, it's been a while since I last reviewed this and I must admit I forgot about this in the mean time. The only thing I'd like to check before we can land this is to hear from @Michael137 who has been actively working on FindTypes performance recently whether this aligns or clashes with what he's been working on so we don't get a mid-air collision between patches.

Sounds good, I look forward to hearing if this fixes the issues as well. Before this fix I ran a case where I did an expression like "MyClass::iterator" and before this fix, with linux and no accelerator tables, we would convert DWARF into clang AST types for over 2 million types, and after this fix it went down to 11. This was when debugging LLDB + LLVM + clang with debug info, so there were TONS of "iterator" entries in the manual DWARF index.
On Apple the type accelerator tables have the hash of the fully qualified name which can help reduce the number of types we parse, but we don't have this on linux even with DWARF5 + .debug_names

To add more context to @adrian-prantl's comment, we've been working on reviving https://reviews.llvm.org/D101950, which changes the way we complete record types. The main blocker was performance, because as you mention, we often end up doing these expensive namespace searches, which gets much more noticeable with D101950. The problem we saw was mostly triggered by the second lookup that -gsimple-template-names does in FindTypes.
I applied your patch on top of D101950 and performance seems to be much better (by 2 orders of magnitude). It did break some tests locally for me, so I'm just skimming this patch to see what changed

All tests passed on my machine from with this, so you might try those tests without D101950 and see how things go? Happy to help if you need info on how to do things with the new lookups.

All tests are passing for me now, it had to do with some badly resolved merge conflicts

Do we need to fix this in the commit message when I squash? Or do I just update the title of this PR?

I think when you press the Squash and merge button it will give you the chance to correct the title of the commit

@clayborg
Copy link
Collaborator Author

@DavidSpickett I created a new PR to fix the buildbots:

#75566

This disables the TestUniqueTypes4.py on windows, and fixes the test in lldb/unittests/SymbolFile/PDB/SymbolFilePDBTests.cpp to test what it was testing before.

DavidSpickett pushed a commit that referenced this pull request Dec 15, 2023
This patch fixes the SymbolFilePDBTests::TestMaxMatches(...) by making
it test what it was testing before, see comments in the test case for
details.

It also disables TestUniqueTypes4.py for now until we can debug or fix
why it isn't working.
Michael137 pushed a commit to Michael137/llvm-project that referenced this pull request Dec 15, 2023
…hing for types (llvm#74786)

This patch revives the effort to get this Phabricator patch into
upstream:

https://reviews.llvm.org/D137900

This patch was accepted before in Phabricator but I found some
-gsimple-template-names issues that are fixed in this patch.

A fixed up version of the description from the original patch starts
now.

This patch started off trying to fix Module::FindFirstType() as it
sometimes didn't work. The issue was the SymbolFile plug-ins didn't do
any filtering of the matching types they produced, and they only looked
up types using the type basename. This means if you have two types with
the same basename, your type lookup can fail when only looking up a
single type. We would ask the Module::FindFirstType to lookup "Foo::Bar"
and it would ask the symbol file to find only 1 type matching the
basename "Bar", and then we would filter out any matches that didn't
match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it
would work, but if it found "Baz::Bar" first, it would return only that
type and it would be filtered out.

Discovering this issue lead me to think of the patch Alex Langford did a
few months ago that was done for finding functions, where he allowed
SymbolFile objects to make sure something fully matched before parsing
the debug information into an AST type and other LLDB types. So this
patch aimed to allow type lookups to also be much more efficient.

As LLDB has been developed over the years, we added more ways to to type
lookups. These functions have lots of arguments. This patch aims to make
one API that needs to be implemented that serves all previous lookups:

- Find a single type
- Find all types
- Find types in a namespace

This patch introduces a `TypeQuery` class that contains all of the state
needed to perform the lookup which is powerful enough to perform all of
the type searches that used to be in our API. It contain a vector of
CompilerContext objects that can fully or partially specify the lookup
that needs to take place.

If you just want to lookup all types with a matching basename,
regardless of the containing context, you can specify just a single
CompilerContext entry that has a name and a CompilerContextKind mask of
CompilerContextKind::AnyType.

Or you can fully specify the exact context to use when doing lookups
like: CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"

This change expands on the clang modules code that already used a
vector<CompilerContext> items, but it modifies it to work with
expression type lookups which have contexts, or user lookups where users
query for types. The clang modules type lookup is still an option that
can be enabled on the `TypeQuery` objects.

This mirrors the most recent addition of type lookups that took a
vector<CompilerContext> that allowed lookups to happen for the
expression parser in certain places.

Prior to this we had the following APIs in Module:

```
void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeList &types);

void
Module::FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeMap &types);

void Module::FindTypesInNamespace(ConstString type_name,
                                  const CompilerDeclContext &parent_decl_ctx,
                                  size_t max_matches, TypeList &type_list);
```

The new Module API is much simpler. It gets rid of all three above
functions and replaces them with:

```
void FindTypes(const TypeQuery &query, TypeResults &results);
```
The `TypeQuery` class contains all of the needed settings:

- The vector<CompilerContext> that allow efficient lookups in the symbol
file classes since they can look at basename matches only realize fully
matching types. Before this any basename that matched was fully realized
only to be removed later by code outside of the SymbolFile layer which
could cause many types to be realized when they didn't need to.
- If the lookup is exact or not. If not exact, then the compiler context
must match the bottom most items that match the compiler context,
otherwise it must match exactly
- If the compiler context match is for clang modules or not. Clang
modules matches include a Module compiler context kind that allows types
to be matched only from certain modules and these matches are not needed
when d oing user type lookups.
- An optional list of languages to use to limit the search to only
certain languages

The `TypeResults` object contains all state required to do the lookup
and store the results:
- The max number of matches
- The set of SymbolFile objects that have already been searched
- The matching type list for any matches that are found

The benefits of this approach are:
- Simpler API, and only one API to implement in SymbolFile classes
- Replaces the FindTypesInNamespace that used a CompilerDeclContext as a
way to limit the search, but this only worked if the TypeSystem matched
the current symbol file's type system, so you couldn't use it to lookup
a type in another module
- Fixes a serious bug in our FindFirstType functions where if we were
searching for "foo::bar", and we found a "baz::bar" first, the basename
would match and we would only fetch 1 type using the basename, only to
drop it from the matching list and returning no results

(cherry picked from commit dd95877)
Michael137 pushed a commit to Michael137/llvm-project that referenced this pull request Dec 15, 2023
Fix unexpected pass after
llvm#74786.

(cherry picked from commit dcbf1e4)
@dwblaikie
Copy link
Collaborator

FWIW, we're seeing this breaking a pretty printer in google - previously a lookup was able to find absl::cord_internal::RefcountAndFlags::Flags::kNumFlags ( https://github.com/abseil/abseil-cpp/blob/8184f16e898fcb13b310b40430e13ba40679cf0e/absl/strings/internal/cord_internal.h#L202 ) (the name components are, in order: namespace, namespace, class, unscoped enumeration, enumerator)

The code we have (working via gala) has been doing this query twice to workaround this bug: #53904 - but it looks like since this change, that workaround has even broken and the name is no longer found?

I don't have a fully isolated repro yet, but figured I'd start with raising the flag in case anyone recognizes this.

@Endilll Endilll removed their request for review December 29, 2023 11:03
@dwblaikie
Copy link
Collaborator

@clayborg any thoughts on this ^ ?

@Michael137
Copy link
Member

FWIW, we're seeing this breaking a pretty printer in google - previously a lookup was able to find absl::cord_internal::RefcountAndFlags::Flags::kNumFlags ( https://github.com/abseil/abseil-cpp/blob/8184f16e898fcb13b310b40430e13ba40679cf0e/absl/strings/internal/cord_internal.h#L202 ) (the name components are, in order: namespace, namespace, class, unscoped enumeration, enumerator)

The code we have (working via gala) has been doing this query twice to workaround this bug: #53904 - but it looks like since this change, that workaround has even broken and the name is no longer found?

I don't have a fully isolated repro yet, but figured I'd start with raising the flag in case anyone recognizes this.

FWIW, you might be able to make use of the recently added SBType::FindDirectNestedType, to get to the enum value. Something like:

>>> var = lldb.frame.FindVariable('myVar')
>>> var_type = var.GetType()
>>> flags = var_type.FindDirectNestedType('Flags')
>>> flags
enum Flags {
    kNumFlags,
    kImmortalFlag
}
>>> flags.GetEnumMembers()['kNumFlags].GetValueAsUnsigned()
1

@clayborg
Copy link
Collaborator Author

clayborg commented Jan 3, 2024

absl::cord_internal::RefcountAndFlags::Flags::kNumFlags

I have a small repro case from the code pointer you gave. I will look into this and report back what I find.

Are you guys using the expression parser in your pretty printer? As Michael pointed out, you can get the enum type as a lldb::SBType in and explore the enumerators via the API. Above Michael used a variable to get the type, but you can use the lldb::SBType lldb::SBTarget::FindFirstType(const char *type) API:

>>> t = lldb.target.FindFirstType("absl::cord_internal::RefcountAndFlags::Flags")
>>> t
enum Flags {
    kNumFlags,
    kImmortalFlag,
    kRefIncrement
}
>>> t.GetEnumMembers()
<lldb.SBTypeEnumMemberList; proxy of <Swig Object of type 'lldb::SBTypeEnumMemberList *' at 0x107d0fdb0> >
>>> t.GetEnumMembers()['kNumFlags']
unsigned int kNumFlags
>>> t.GetEnumMembers()['kNumFlags'].GetValueAsUnsigned()
1

That being said, it does seem like this issue is due to #53904. Getting the type from the target and getting the enum value directly will be much more efficient than using the expression parser, but we do need to fix the expression parser as I wasn't aware of this bug where running the expression multiple times would eventually allow it to work.

@clayborg
Copy link
Collaborator Author

clayborg commented Jan 3, 2024

I am also not saying this patch hasn't made this issue happen more often, so I will check out my APIs to see how they are being called from the expression parser.

@clayborg
Copy link
Collaborator Author

clayborg commented Jan 3, 2024

So I traced down why things used to work and why they don't work now. The issue is we are being too accurate with our type lookups now. For my reduced example I had this code:

struct A {
  struct B {
    static int f;
  };
};

What would happen when you type "p A::B::f" was this:
We would look for 'A' in the root namespace and we would find it, the record decl looks like:

CXXRecordDecl 0x105504e08 <<invalid sloc>> <invalid sloc> struct A definition
`-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
  |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
  |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
  |-MoveConstructor exists simple trivial needs_implicit
  |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
  |-MoveAssignment exists simple trivial needs_implicit
  `-Destructor simple irrelevant trivial needs_implicit

We would look for 'B' in the root namespace, but since type lookups used to realize all basename matches (where 'B' would match 'A::B') and then we would throw away results that didn't match, but we did end up converting the A::B and it now exists in the 'A' CXXRecordDecl, but the compiler isn't looking for it there, so the expression fails with:

error: <user expression 0>:1:4: no member named 'B' in 'A'
    1 | A::B::f
      | ~~~^

When you run the expression the second time, the "struct A" now has a forward declaration for "A::B" in the CXXRecordDecl. So the second expression would look for 'A' in the root namespace and we would find it, the record decl
now looks like:

CXXRecordDecl 0x156b78a08 <<invalid sloc>> <invalid sloc> struct A definition
|-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
`-CXXRecordDecl 0x156b78b58 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct B

Note the last line is our definition for 'A::B' that was not there before, but now since it is there, the compiler will find it.

With the current upstream we run and we find "A" and have the same CXXRecordDecl as the first expression in the old LLDB, but now we will ask for "B" in the root namespace and we won't find anything because we are smart enough to not create every type whose basename matches "B", so we won't create "B", so it won't get added to A's CXXRecordDecl that this causes the expression to never be able to succeed.

So I am surprised to see that the compiler is asking for "B" in the root namespace, and not asking us to find "B" in the "A" decl context.

But the real issue might be that LLDB, which is acting like a precompiled header, and when asked us to find 'A', we return a forward declaration that is marked as "can be completed lazily via the precompiled header (external AST source) interface. And when we are asked to complete the 'A' type, it might expect us to add all contained types, like "struct A::B". We are not doing that right now, but maybe that is how the actual precompiled header support does things.

Michael137 pushed a commit to Michael137/llvm-project that referenced this pull request Jan 18, 2024
…hing for types (llvm#74786)

This patch revives the effort to get this Phabricator patch into
upstream:

https://reviews.llvm.org/D137900

This patch was accepted before in Phabricator but I found some
-gsimple-template-names issues that are fixed in this patch.

A fixed up version of the description from the original patch starts
now.

This patch started off trying to fix Module::FindFirstType() as it
sometimes didn't work. The issue was the SymbolFile plug-ins didn't do
any filtering of the matching types they produced, and they only looked
up types using the type basename. This means if you have two types with
the same basename, your type lookup can fail when only looking up a
single type. We would ask the Module::FindFirstType to lookup "Foo::Bar"
and it would ask the symbol file to find only 1 type matching the
basename "Bar", and then we would filter out any matches that didn't
match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it
would work, but if it found "Baz::Bar" first, it would return only that
type and it would be filtered out.

Discovering this issue lead me to think of the patch Alex Langford did a
few months ago that was done for finding functions, where he allowed
SymbolFile objects to make sure something fully matched before parsing
the debug information into an AST type and other LLDB types. So this
patch aimed to allow type lookups to also be much more efficient.

As LLDB has been developed over the years, we added more ways to to type
lookups. These functions have lots of arguments. This patch aims to make
one API that needs to be implemented that serves all previous lookups:

- Find a single type
- Find all types
- Find types in a namespace

This patch introduces a `TypeQuery` class that contains all of the state
needed to perform the lookup which is powerful enough to perform all of
the type searches that used to be in our API. It contain a vector of
CompilerContext objects that can fully or partially specify the lookup
that needs to take place.

If you just want to lookup all types with a matching basename,
regardless of the containing context, you can specify just a single
CompilerContext entry that has a name and a CompilerContextKind mask of
CompilerContextKind::AnyType.

Or you can fully specify the exact context to use when doing lookups
like: CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"

This change expands on the clang modules code that already used a
vector<CompilerContext> items, but it modifies it to work with
expression type lookups which have contexts, or user lookups where users
query for types. The clang modules type lookup is still an option that
can be enabled on the `TypeQuery` objects.

This mirrors the most recent addition of type lookups that took a
vector<CompilerContext> that allowed lookups to happen for the
expression parser in certain places.

Prior to this we had the following APIs in Module:

```
void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeList &types);

void
Module::FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeMap &types);

void Module::FindTypesInNamespace(ConstString type_name,
                                  const CompilerDeclContext &parent_decl_ctx,
                                  size_t max_matches, TypeList &type_list);
```

The new Module API is much simpler. It gets rid of all three above
functions and replaces them with:

```
void FindTypes(const TypeQuery &query, TypeResults &results);
```
The `TypeQuery` class contains all of the needed settings:

- The vector<CompilerContext> that allow efficient lookups in the symbol
file classes since they can look at basename matches only realize fully
matching types. Before this any basename that matched was fully realized
only to be removed later by code outside of the SymbolFile layer which
could cause many types to be realized when they didn't need to.
- If the lookup is exact or not. If not exact, then the compiler context
must match the bottom most items that match the compiler context,
otherwise it must match exactly
- If the compiler context match is for clang modules or not. Clang
modules matches include a Module compiler context kind that allows types
to be matched only from certain modules and these matches are not needed
when d oing user type lookups.
- An optional list of languages to use to limit the search to only
certain languages

The `TypeResults` object contains all state required to do the lookup
and store the results:
- The max number of matches
- The set of SymbolFile objects that have already been searched
- The matching type list for any matches that are found

The benefits of this approach are:
- Simpler API, and only one API to implement in SymbolFile classes
- Replaces the FindTypesInNamespace that used a CompilerDeclContext as a
way to limit the search, but this only worked if the TypeSystem matched
the current symbol file's type system, so you couldn't use it to lookup
a type in another module
- Fixes a serious bug in our FindFirstType functions where if we were
searching for "foo::bar", and we found a "baz::bar" first, the basename
would match and we would only fetch 1 type using the basename, only to
drop it from the matching list and returning no results

(cherry picked from commit dd95877)
Michael137 pushed a commit to Michael137/llvm-project that referenced this pull request Jan 18, 2024
Fix unexpected pass after
llvm#74786.

(cherry picked from commit dcbf1e4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants