Backport the import-std-module patch and enable the fallback mode by default #2222

Teemperor · 2020-12-10T20:42:26Z

This PR contains the LLDB/ASTImporter patches for the import-std-module feature + 1 dependent Clang patch that just removes unspecified template args from printed C++ type names.

Also one final commit that enables the fallback mode by default.

…ion bodies Right now the ASTImporter assumes for most Expr nodes that they are always equal which leads to non-compatible declarations ending up being merged. This patch adds the basic framework for comparing Stmts (and with that also Exprs) and implements the custom checks for a few Stmt subclasses. I'll implement the remaining subclasses in follow up patches (mostly because there are a lot of subclasses and some of them require further changes like having GNU language in the testing framework) The motivation for this is that in LLDB we try to import libc++ source code and some of the types we are importing there contain expressions (e.g. because they use `enable_if<expr>`), so those declarations are currently merged even if they are completely different (e.g. `enable_if<value> ...` and `enable_if<!value> ...` are currently considered equal which is clearly not true). Reviewed By: martong, balazske Differential Revision: https://reviews.llvm.org/D87444 (cherry picked from commit c0bcd11)

…e more consistent There are several `::IsStructurallyEquivalent` overloads for Decl subclasses that are used for comparing declarations. There is also one overload that takes just two Decl pointers which ends up queuing the passed Decls to be later compared in `CheckKindSpecificEquivalence`. `CheckKindSpecificEquivalence` implements the dispatch logic for the different Decl subclasses. It is supposed to hand over the queued Decls to the subclass-specific `::IsStructurallyEquivalent` overload that will actually compare the Decl instance. It also seems to implement a few pieces of actual node comparison logic inbetween the dispatch code. This implementation causes that the different overloads of `::IsStructurallyEquivalent` do different (and sometimes no) comparisons depending on which overload of `::IsStructurallyEquivalent` ends up being called. For example, if I want to compare two FieldDecl instances, then I could either call the `::IsStructurallyEquivalent` with `Decl *` or with `FieldDecl *` parameters. The overload that takes FieldDecls is doing a correct comparison. However, the `Decl *` overload just queues the Decl pair. `CheckKindSpecificEquivalence` has no dispatch logic for `FieldDecl`, so it always returns true and never does any actual comparison. On the other hand, if I try to compare two FunctionDecl instances the two possible overloads of `::IsStructurallyEquivalent` have the opposite behaviour: The overload that takes `FunctionDecl` pointers isn't comparing the names of the FunctionDecls while the overload taking a plain `Decl` ends up comparing the function names (as the comparison logic for that is implemented in `CheckKindSpecificEquivalence`). This patch tries to make this set of functions more consistent by making `CheckKindSpecificEquivalence` a pure dispatch function without any subclass-specific comparison logic. Also the dispatch logic is now autogenerated so it can no longer miss certain subclasses. The comparison code from `CheckKindSpecificEquivalence` is moved to the respective `::IsStructurallyEquivalent` overload so that the comparison result no longer depends if one calls the `Decl *` overload or the overload for the specific subclass. The only difference is now that the `Decl *` overload is queuing the parameter while the subclass-specific overload is directly doing the comparison. `::IsStructurallyEquivalent` is an implementation detail and I don't think the behaviour causes any bugs in the current implementation (as carefully calling the right overload for the different classes works around the issue), so the test for this change is that I added some new code for comparing `MemberExpr`. The new comparison code always calls the dispatching overload and it previously failed as the dispatch didn't support FieldDecls. Reviewed By: martong, a_sidorin Differential Revision: https://reviews.llvm.org/D87619 (cherry picked from commit 7c4575e)

…of imported decls when merging ClassTemplateSpecializationDecls When importing a `ClassTemplateSpecializationDecl` definition into a TU with a matching `ClassTemplateSpecializationDecl` definition and a more recent forward decl, the ASTImporter currently will call `MapImported()` for the definitions, but will return the forward declaration from the `ASTImporter::Import()` call. This is triggering some assertions in LLDB when we try to fully import some DeclContexts before we delete the 'From' AST. The returned 'To' Decl before this patch is just the most recent forward decl but that's not the Decl with the definition to which the ASTImporter will import the child declarations. This patch just changes that the ASTImporter returns the definition that the imported Decl was merged with instead of the found forward declaration. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D92016 (cherry picked from commit 0c926e6)

The test case isn't using the AST matchers for all checks as there doesn't seem to be support for matching NonTypeTemplateParmDecl default arguments. Otherwise this is simply importing the default arguments. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D92106 (cherry picked from commit 89c1a7a)

…when trying to complete a forward decl SemaSourceWithPriorities is a special SemaSource that wraps our normal LLDB ExternalASTSource and the ASTReader (which is used for the C++ module loading). It's only active when the `import-std-module` setting is turned on. The `CompleteType` function there in `SemaSourceWithPriorities` is looping over all ExternalASTSources and asks each to complete the type. However, that loop is in another loop that keeps doing that until the type is complete. If that function is ever called on a type that is a forward decl then that causes LLDB to go into an infinite loop. I remember I added that second loop and the comment because I thought I saw a similar pattern in some other Clang code, but after some grepping I can't find that code anywhere and it seems the rest of the code base only calls CompleteType once (It would also be kinda silly to have calling it multiple times). So it seems that's just a silly mistake. The is implicitly tested by importing `std::pair`, but I also added a simpler dedicated test that creates a dummy libc++ module with some forward declarations and then imports them into the scratch AST context. At some point the ASTImporter will check if one of the forward decls could be completed by the ExternalASTSource, which will cause the `SemaSourceWithPriorities` to go into an infinite loop once it receives the `CompleteType` call. Reviewed By: shafik Differential Revision: https://reviews.llvm.org/D87289 (cherry picked from commit 32c8da4)

This adds support for substituting std::pair instantiations with enabled import-std-module. With the fixes in parent revisions we can currently substitute a single pair (however, a result that returns a second pair currently causes LLDB to crash while importing the second template instantiation). Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D85141 (cherry picked from commit b852225)

With the recent patches to the ASTImporter that improve template type importing (D87444), most of the import-std-module tests can now finally import the type of the STL container they are testing. This patch removes most of the casts that were added to simplify types to something the ASTImporter can import (for example, std::vector<int>::size_type was casted to `size_t` until now). Also adds the missing tests that require referencing the container type (for example simply printing the whole container) as here we couldn't use a casting workaround. The only casts that remain are in the forward_list tests that reference the iterator and the stack test. Both tests are still failing to import the respective container type correctly (or crash while trying to import). (cherry picked from commit cabee89)

"[lldb/DataFormatters] Display null C++ pointers as nullptr" added an assumption that the member we check for is always a nullptr, but it is actually never initialized. That causes the test to randomly fail due to the pointer having some random value that isn't 0. (cherry picked from commit dc848a0)

We can handle all the types in the expression evaluator now without casting. On Linux, we have a system header declaration that is still causing issues, so I'm skipping the test there until I get around to fix this. (cherry picked from commit a703998)

The test case isn't using the AST matchers for all checks as there doesn't seem to be support for matching TemplateTypeParmDecl default arguments. Otherwise this is simply importing the default arguments. Also updates several LLDB tests that now as intended omit the default template arguments of several std templates. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D92103 (cherry picked from commit 3f6c856)

(cherry picked from commit 6face91)

LLDB is supposed to ask the Clang Driver what the default module cache path is and then use that value as the default for the `symbols.clang-modules-cache-path` setting. However, we use the property type `String` to change `symbols.clang-modules-cache-path` even though the type of that setting is `FileSpec`, so the setter will simply do nothing and return `false`. We also don't check the return value of the setter, so this whole code ends up not doing anything at all. This changes the setter to use the correct property type and adds an assert that we actually successfully set the default path. Also adds a test that checks that the default value for this setting is never unset/empty path as this would effectively disable the import-std-module feature from working by default. Reviewed By: JDevlieghere, shafik Differential Revision: https://reviews.llvm.org/D92772 (cherry picked from commit 208e3f5)

…types are in the scratch AST Several data formatters assume their types are in the Target's scratch AST and build new types from that scratch AST instance. However, types from different ASTs shouldn't be mixed, so this (unchecked) assumption may lead to problems if we ever have more than one scratch AST or someone somehow invokes data formatters on a type that are not in the scratch AST. Instead we can use in all the formatters just the TypeSystem of the type we're formatting. That's much simpler and avoids all the headache of finding the right TypeSystem that matches the one of the formatted type. Right now LLDB only has one scratch TypeSystemClang instance and we format only types that are in the scratch AST, so this doesn't change anything in the current way LLDB works. The intention here is to allow follow up refactorings that introduce multiple scratch ASTs with the same Target. Differential Revision: https://reviews.llvm.org/D92757 (cherry picked from commit 839e845)

arguments of types by default. This somewhat improves the worst-case printing of types like std::string, std::vector, etc., where many irrelevant default arguments can be included in the type as printed if we've lost the type sugar. (cherry picked from commit e7f3e21)

…pressed inline namespaces Commit 5f12f4f made suppressing inline namespaces when printing typenames default to true. As we're using the inline namespaces in LLDB to construct internal type names (which need internal namespaces in them to, for example, differentiate libc++'s std::__1::string from the std::string from libstdc++), this broke most of the type formatting logic. (cherry picked from commit da121ff)

…e in the internal type name Our type formatters/summaries match on the internal type name we generate in LLDB for Clang types. These names were generated using Clang's default printing policy. However Clang's default printing policy got tweaked over the last month to make the generated type names more readable (by for example excluding inline/anonymous namespaces and removing template arguments that have their default value). This broke the formatter system where LLDB's matching logic now no longer can format certain types as the new type names generated by Clang's default printing policy no longer match the type names that LLDB/the user specified. I already introduced LLDB's own type printing policy and fixed the inline/anonymous namespaces in da121ff (just to get the test suite passing again). This patch is restoring the old type printing behaviour where always include the template arguments in the internal type name (even if they match the default args). This should get template type formatters/summaries working again in the rare situation where we do know template default arguments within LLDB. This can only happen when either having a template that was parsed in the expression parser or when we get type information from a C++ module. The Clang change that removed defaulted template arguments from Clang's printing policy was e7f3e21 Reviewed By: labath Differential Revision: https://reviews.llvm.org/D92311 (cherry picked from commit e0e7bbe)

Clang has some type sugar that only serves as a way to preserve the way a user has typed a certain type in the source code. These types are currently not unwrapped when we query the type name for a Clang type, which means that this type sugar actually influences what formatters are picked for a certain type. Currently if a user decides to reference a type by doing `::GlobalDecl Var = 3;`, the type formatter for `GlobalDecl` will not be used (as the type sugar around the type gives it the name `::GlobalDecl`. The same goes for other ways to spell out a type such as `auto` etc. With this patch most of this type sugar gets stripped when the full type name is calculated. Typedefs are not getting desugared as that seems counterproductive. I also don't desugar atomic types as that's technically not type sugar. Reviewed By: jarin Differential Revision: https://reviews.llvm.org/D87481 (cherry picked from commit b5e49e9)

Like the other type sugar removed by RemoveWrappingTypes, SubstTemplateTypeParm is just pure sugar that should be ignored. If we don't ignore it (as we do now), LLDB will fail to read values from record fields that have a SubstTemplateTypeParm type. Only way to produce such a type in LLDB is to either use the `import-std-module` setting to get a template into the expression parser or just create your own template directly in the expression parser which is what we do in the test. Reviewed By: jarin Differential Revision: https://reviews.llvm.org/D85132 (cherry picked from commit 950f1bf)

…nd a Python>=3.7 crash Usually when we enter a SWIG wrapper function from Python, SWIG automatically adds a `Py_BEGIN_ALLOW_THREADS`/`Py_END_ALLOW_THREADS` around the call to the SB API C++ function. This will ensure that Python's GIL is released when we enter LLDB and locked again when we return to the wrapper code. D51569 changed this behaviour but only for the generated `__str__` wrappers. The added `nothreadallow` disables the injection of the GIL release/re-acquire code and the GIL is now kept locked when entering LLDB and is expected to be still locked when returning from the LLDB implementation. The main reason for that was that back when D51569 landed the wrapper itself created a Python string. These days it just creates a std::string and SWIG itself takes care of getting the GIL and creating the Python string from the std::string, so that workaround isn't necessary anymore. This patch just removes `nothreadallow` so that our `__str__` functions now behave like all other wrapper functions in that they release the GIL when calling into the SB API implementation. The motivation here is actually to work around another potential bug in LLDB. When one calls into the LLDB SB API while holding the GIL and that call causes LLDB to interpret some Python script via `ScriptInterpreterPython`, then the GIL will be unlocked when the control flow returns from the SB API. In the case of the `__str__` wrapper this would cause that the next call to a Python function requiring the GIL would fail (as SWIG will not try to reacquire the GIL as it isn't aware that LLDB removed it). The reason for this unexpected GIL release seems to be a workaround for recent Python versions: ``` // The only case we should go further and acquire the GIL: it is unlocked. if (PyGILState_Check()) return; ``` The early-exit here causes `InitializePythonRAII::m_was_already_initialized` to be always false and that causes that `InitializePythonRAII`'s destructor always directly unlocks the GIL via `PyEval_SaveThread`. I'm investigating how to properly fix this bug in a follow up patch, but for now this straightforward patch seems to be enough to unblock my other patches (and it also has the benefit of removing this workaround). The test for this is just a simple test for `std::deque` which has a synthetic child provider implemented as a Python script. Inspecting the deque object will cause `expect_expr` to create a string error message by calling `str(deque_object)`. Printing the ValueObject causes the Python script for the synthetic children to execute which then triggers the bug described above where the GIL ends up being unlocked. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D88302 (cherry picked from commit 070a1d5)

…Clang We keep referring to the single object created by this class as 'scratch AST/Context/TypeSystem' so at this point we might as well rename the class. It's also not involved at all in expression evaluation, so the 'ForExpressions' prefix is a bit misleading. (cherry picked from commit 973f390) Conflicts: lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp

Template specializations are not handled in many of the TypeSystemClang methods. For example, GetNumChildren does not handle the TemplateSpecialization type class, so template specializations always look like empty objects. This patch just desugars template specializations in the existing RemoveWrappingTypes desugaring helper. Differential Revision: https://reviews.llvm.org/D83858 (cherry picked from commit 93ec6cd)

Just adds the respective accessor functions to ASTContextMetadata instead of directly exposing the OriginMap to the whole world. (cherry picked from commit f6913e7)

…lbacks for the same target decl The ASTImporter has an `Imported(From, To)` callback that notifies subclasses that a declaration has been imported in some way. LLDB uses this in the `CompleteTagDeclsScope` to see which records have been imported into the scratch context. If the record was declared inside the expression, then the `CompleteTagDeclsScope` will forcibly import the full definition of that record to the scratch context so that the expression AST can safely be disposed later (otherwise we might end up going back to the deleted AST to complete the minimally imported record). The way this is implemented is that there is a list of decls that need to be imported (`m_decls_to_complete`) and we keep completing the declarations inside that list until the list is empty. Every `To` Decl we get via the `Imported` callback will be added to the list of Decls to be completed. There are some situations where the ASTImporter will actually give us two `Imported` calls with the same `To` Decl. One way where this happens is if the ASTImporter decides to merge an imported definition into an already imported one. Another way is that the ASTImporter just happens to get two calls to `ASTImporter::Import` for the same Decl. This for example happens when importing the DeclContext of a Decl requires importing the Decl itself, such as when importing a RecordDecl that was declared inside a function. The bug addressed in this patch is that when we end up getting two `Imported` calls for the same `To` Decl, then we would crash in the `CompleteTagDeclsScope`. That's because the first time we complete the Decl we remove the Origin tracking information (that maps the Decl back to from where it came from). The next time we try to complete the same `To` Decl the Origin tracking information is gone and we hit the `to_context_md->getOrigin(decl).ctx == m_src_ctx` assert (`getOrigin(decl).ctx` is a nullptr the second time as the Origin was deleted). This is actually a regression coming from D72495. Before D72495 `m_decls_to_complete` was actually a set so every declaration in there could only be queued once to be completed. The set was changed to a vector to make the iteration over it deterministic, but that also causes that we now potentially end up trying to complete a Decl twice. This patch essentially just reverts D72495 and makes the `CompleteTagDeclsScope` use a SetVector for the list of declarations to be completed. The SetVector should filter out the duplicates (as the original `set` did) and also ensure that the completion order is deterministic. I actually couldn't find any way to cause LLDB to reproduce this bug by merging declarations (this would require that we for example declare two namespaces in a non-top-level expression which isn't possible). But the bug reproduces very easily by just declaring a class in an expression, so that's what the test is doing. Reviewed By: shafik Differential Revision: https://reviews.llvm.org/D85648 (cherry picked from commit 7866b91) (cherry picked from commit 5c61c46f5c5bc142d9047bdb5f39ee7f36962c3c)

…imported std C++ module By now LLDB can import the 'std' C++ module to improve expression evaluation, but there are still a few problems to solve before we can do this by default. One is that importing the C++ module is slightly slower than normal expression evaluation (mostly because the disk access and loading the initial lookup data is quite slow in comparison to the barebone Clang setup the rest of the LLDB expression evaluator is usually doing). Another problem is that some complicated types in the standard library aren't fully supported yet by the ASTImporter, so we end up types that fail to import (which usually appears to the user as if the type is empty or there is just no result variable). To still allow people to adopt this mode in their daily debugging, this patch adds a setting that allows LLDB to automatically retry failed expression with a loaded C++ module. All success expressions will behave exactly as they would do before this patch. Failed expressions get a another parse attempt if we find a usable C++ module in the current execution context. This way we shouldn't have any performance/parsing regressions in normal debugging workflows, while the debugging workflows involving STL containers benefit from the C++ module type info. This setting is off by default for now with the intention to enable it by default on macOS soon-ish. The implementation is mostly just extracting the existing parse logic into its own function and then calling the parse function again if the first evaluation failed and we have a C++ module to retry the parsing with. Reviewed By: shafik, JDevlieghere, aprantl Differential Revision: https://reviews.llvm.org/D92784 (cherry picked from commit 9586082)

…ang::GetForTarget Also add some documentation while I'm at it. (cherry picked from commit 594308c)

… imported from C++ modules. Right now we have one large AST for all types in LLDB. All ODR violations in types we reconstruct are resolved by just letting the ASTImporter handle the conflicts (either by merging types or somehow trying to introduce a duplicated declaration in the AST). This works ok for the normal types we build from debug information as most of them are just simple CXXRecordDecls or empty template declarations. However, with a loaded `std` C++ module we have alternative versions of pretty much all declarations in the `std` namespace that are much more fleshed out than the debug information declarations. They have all the information that is lost when converting to DWARF, such as default arguments, template default arguments, the actual uninstantiated template declarations and so on. When we merge these C++ module types into the big scratch AST (that might already contain debug information types) we give the ASTImporter the tricky task of somehow creating a consistent AST out of all these declarations. Usually this ends in a messy AST that contains a mostly broken mix of both module and debug info declarations. The ASTImporter in LLDB is also importing types with the MinimalImport setting, which usually means the only information we have when merging two types is often just the name of the declaration and the information that it contains some child declarations. This makes it pretty much impossible to even implement a better merging logic (as the names of C++ module declarations and debug info declarations are identical). This patch works around this whole merging problem by separating C++ module types from debug information types. This is done by splitting up the single scratch AST into two: One default AST for debug information and a dedicated AST for C++ module types. The C++ module AST is implemented as a 'specialised AST' that lives within the default ScratchTypeSystemClang. When we select the scratch AST we can explicitly request that we want such a isolated sub-AST of the scratch AST. I kept the infrastructure more general as we probably can use the same mechanism for other features that introduce conflicting types (such as programs that are compiled with a custom -wchar-size= option). There are just two places where we explicitly have request the C++ module AST: When we export persistent declarations (`$mytype`) and when we create our persistent result variable (`$0`, `$1`, ...). There are a few formatters that were previously assuming that there is only one scratch AST which I cleaned up in a preparation revision here (D92757). Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D92759 (cherry picked from commit 47e7ecd)

This enables the import-std-module setting's fallback mode by default. With this all expressions are from no on retried with a loaded C++ module if they fail to parse without.

Teemperor · 2020-12-10T20:42:43Z

@swift-ci test

Teemperor and others added 27 commits December 10, 2020 20:03

[lldb][import-std-module] Add a test for typedef'd std types

c899b93

(cherry picked from commit 6face91)

[lldb][NFC] Document and encapsulate OriginMap in ASTContextMetadata

664cdeb

Just adds the respective accessor functions to ASTContextMetadata instead of directly exposing the OriginMap to the whole world. (cherry picked from commit f6913e7)

[lldb][NFC] Rename TypeSystemClang::GetScratch to ScratchTypeSystemCl…

b1dabed

…ang::GetForTarget Also add some documentation while I'm at it. (cherry picked from commit 594308c)

[lldb] Enable import-std-module's fallback mode by default

7f6e443

This enables the import-std-module setting's fallback mode by default. With this all expressions are from no on retried with a loaded C++ module if they fail to parse without.

Teemperor requested a review from JDevlieghere December 10, 2020 20:42

fredriss merged commit 4903d45 into swiftlang:apple/stable/20200714 Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backport the import-std-module patch and enable the fallback mode by default #2222

Backport the import-std-module patch and enable the fallback mode by default #2222

Uh oh!

Teemperor commented Dec 10, 2020

Uh oh!

Teemperor commented Dec 10, 2020

Uh oh!

Uh oh!

Backport the import-std-module patch and enable the fallback mode by default #2222

Backport the import-std-module patch and enable the fallback mode by default #2222

Uh oh!

Conversation

Teemperor commented Dec 10, 2020

Uh oh!

Teemperor commented Dec 10, 2020

Uh oh!

Uh oh!