Skip to content

[WIP] Improve ClangImporter Diagnostics #39923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 4, 2022

Conversation

NuriAmari
Copy link
Contributor

In this diff, I took some of our discussion and made a small example using C functions. There are many ways to expand upon this strategy, this is just a baseline. High level summary:

  • Avoid wrapping return values in structs to bubble up diagnostics

    • Instead load notes into a queue
    • As you move up the tree, you can add more detail to the notes once you have the required context (ie. SourceLoc)
    • Finally, at some point you produce an error and attach the accumulated notes
    • Obviously, there is some "inaccuracy" with this queue, but we could add some more control (ex: note error target ids, note priority for sorting)
    • We have to a little bit careful that we assert notes are consumed and consumed properly, but I it is manageable
  • Use the fatal errors to suppress further diagnostics

    • This makes it easy to not worry about repeated lookups etc
    • The downside obviously is you only get a single error at a time
    • There are compromises, we can experiment with adding diagnostics to the ignore list and "recovering" from a fatal diagnostics is also just a single method call in situations where we think it appropriate to keep displaying errors

Overall, I think this strategy or one derived from it gives us a little less control, but I think its easier to work with and avoids some of the big problems we've been having with stubs

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot better. I have two comments, though:

  1. It would be nice to factor out the logic for diagnosing errors into helper methods. You could do something as fine grained as diagnoseInvokedCFunctionNotImported(...) or something like diagnoseImportFailures(...). Anyway, that can come later.
  2. I think the biggest problem you're going to see is that these errors are emitted whenever we import an entire decl. So, if we ask for all the members of a record, you're going to get an error that the user tried to call a function which we failed to import (which they didn't). How do we solve this? There are two solutions that come to mind: either we could ensure that we only every lazily load types. This shouldn't actually be too hard, and I have patches open to implement this everywhere that it matters (which you're welcome to take over, or I could try to prioritize landing them). Alternatively, you could hook into the requests that are issued for direct lookups and enable certain diagnostics in the clang importer only when a direct lookup request is evaluated. This is the strategy I was showing you on our call yesterday. Essentially, whenever a user asks "please lookup this function for me" the compiler goes to look it up. If we find a clang decl for that function, then enable these diagnostics (maybe with a filter for certain types/decls if it gets too loud), then disable them again after we import (or fail to import) the decl/type. I'm happy to discuss this in more depth if you'd like.

@NuriAmari
Copy link
Contributor Author

  1. It would be nice to factor out the logic for diagnosing errors into helper methods. You could do something as fine grained as diagnoseInvokedCFunctionNotImported(...) or something like diagnoseImportFailures(...). Anyway, that can come later.

Sure 👍

  1. I think the biggest problem you're going to see is that these errors are emitted whenever we import an entire decl. So, if we ask for all the members of a record, you're going to get an error that the user tried to call a function which we failed to import (which they didn't). How do we solve this? There are two solutions that come to mind: either we could ensure that we only every lazily load types. This shouldn't actually be too hard, and I have patches open to implement this everywhere that it matters (which you're welcome to take over, or I could try to prioritize landing them). Alternatively, you could hook into the requests that are issued for direct lookups and enable certain diagnostics in the clang importer only when a direct lookup request is evaluated. This is the strategy I was showing you on our call yesterday. Essentially, whenever a user asks "please lookup this function for me" the compiler goes to look it up. If we find a clang decl for that function, then enable these diagnostics (maybe with a filter for certain types/decls if it gets too loud), then disable them again after we import (or fail to import) the decl/type. I'm happy to discuss this in more depth if you'd like.

Yeah, we were discussing this problem earlier. I think I plan to take it situation by situation. For the diagnostics I've been adding thus far, it has been quite lazy. The only eager importing I've faced, as you mention is Obj-C classes are on occasion fetched in bulk. I don't really want to commit to getting involved in making the ClangImporter lazier, at least not at this point. If I get this project finished and there is extra time I would be interested though. At least initially, my strategy would be to identify eager loading and make sure we don't suppress the relevant diagnostics. Reporting diagnostics for nodes the user didn't use is by no means new to the ClangImporter, so if there is a little extra noise, I can live with that. If it really ends up being too much noise, I will try the filtering strategy you propose. I need to look at the details, I'm just not sure how easy it will be to pass the diagnostic filter information down the the ClangImporter. We could just manipulate the DiagnosticEngine itself, but that seems like a bad idea.

@NuriAmari NuriAmari changed the title Alternate Diagnostic Strategy example [WIP] Alternate Diagnostic Strategy example Oct 27, 2021
@NuriAmari
Copy link
Contributor Author

Just leaving a note here for the future: I am using this PR as a way to share and discuss my approach, I don't expect an in-depth review, nor do I intend to merge this.

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When things are more crystalized, I would recommend adding a little bit of commenting here and there to explain why in some cases the notes are iterated, and why in some other cases they are consumed. Without the context, someone that has to work on this 6 months later is going to find those parts confusing.

@NuriAmari
Copy link
Contributor Author

Overall Structure

There are a few mechanisms I've introduced for controlling diagnostic emission that I'd like to explain. The overall diagnostic structure:

  • Notes are produced in place anywhere in the ClangImporter
    • These notes are added to a pending note queue
  • An error can later be produced, consuming relevant previously produced notes
  • This is helpful for providing more detailed context to errors without having to bubble up diagnostics through the ClangImporter

Next, there are three mechanisms in place to ensure diagnostics are only displayed once per directly accessed ClangDecl. That is the user should not see diagnostics relating to entities they have not directly imported and they should only see these diagnostics once.

diagnosticTarget

This is a field on the ClangImporter that is used to indicate the Clang entity we are interested in showing diagnostics about. This is most useful when the ClangImporter is reentrant. Consider importing a C struct. A name lookup is performed, and the diagnostic target is set to the Clang node corresponding to the C struct. While importing this C struct, the ClangImporter will import each field. We do not want to produce errors for each unimportable field until the user explicitly accesses them (until the diagnostic target points at them). Notes include the diagnostic target that was set when they were produced. This allows error to filter out notes that were produced for a different target but not yet consumed.

eagerImportActive

The task of only reporting diagnostics about used entities is made infinitely easier by lazy loading. However, sometimes the same direct import method used for a lazy load is used repeatedly for an eager load. For eager loads we usually want to suppress diagnostics, as eager loads contain imports the user did not explicitly as for and are usually only interested in successful imports anyways. We use the eagerImportActive boolean flag to indicate in methods like loadVisibleDecls, loadAllMembers etc that even if down the line the request looks lazy, it is not.

Visited diagnosticTargets

A set keeping track of the diagnostic targets for which diagnostics were already produced. This allows us to eliminate repetition and avoid accessing cached import results when we know diagnostics for the entity have not yet been displayed.

Outstanding Issues

Until recently, I did all my initial testing and debugging on small scale examples invoking the swift-frontend manually. This is convenient as it reduces debugging noise. As I've learned, this can be problematic as the configuration of the swift-frontend can be different. I have since discovered two related issues that I am still working through. They are as follows:

  1. The first unimported struct field be accessed with trigger a diagnostic typo check for similar fields, resulting in an eager import of all fields. Diagnostics during the typo check are suppressed and member name lookups are now cached, meaning diagnostics for remaining unimported fields are lost.

  2. A similar situation with ObjC methods, though more difficult. As far as I can tell, method lookups haven't quite been "requestified / lazied" to the same extent as record field lookup has been. Once again, typo check causes an eager import.

I have tried a few approaches, but I am proposing the following. We modify the start of DirectLookupRequest::evaluate to roughly perform the following

  1. Check the in use module loader is the ClangImporter
  2. If so and the name to lookup has somehow already been cached, check there is an imported result (nullptr) among the results.
  3. Perform a namelook up using the ClangImporter and check if any of the returned ClangDecls have not been reported on yet.
  4. If we get this far, perform a lazy lookup for this name again.

@NuriAmari
Copy link
Contributor Author

I may have solved the aforementioned issues, patch update soon.

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great. I only had time for these comments, but I will go over this again, and expand on my comments surrounding eagerImportActive and the non-linear flow.

Anyway, the bottom line is this looks good. The errors look super helpful, they are very clear and are in the correct place. That's great! Nice job :)

@NuriAmari NuriAmari changed the title [WIP] Alternate Diagnostic Strategy example [WIP] Improve ClangImporter Diagnostics Nov 2, 2021
@zoecarver
Copy link
Contributor

I don't love this... I feel like it's just waiting to cause problems. Especially because we import some decls eagerly and some lazily. Let's see if we can get rid of it somehow. I'll think about it after looking through this whole patch and reply here if I think of anything.

...

I am also not a fan, but I'm not sure how to do better. If there were distinct flows for eager vs lazy loading (ie. direct imports are only used for lazy loading), that would solve the problem, but it doesn't look like we're there yet.

I thought about this a bit, and realized something from earlier. We went into this whole thing with the mindset that TypeChecker::resolveDeclRefExpr was a problem we were trying to solve. In reality, this is not the case. I think TypeChecker::resolveDeclRefExpr is actually the solution to most of the problematic parts of this patch. This is the point at which the compiler tries to resolve a reference to something (a function, a field, a method, etc.), so, that is where we should actually put this logic.

To step back a bit, we really have three different ways we can represent a thing here. One is as a reference (in the abstract, we can think of this as a name or a string, but concretely, I mean a DeclRefExpr because this indicates that a user asked for it), another is as a Clang decl, and finally, we can represent it as an imported Swift decl. At various places in the compiler, we have various representations, and they all mean different things. If we have a decl ref, that means a user asked us for that thing. If we have a Clang decl, it means that thing exists in C++. And, if we have a Swift decl, that means it can be imported into Swift. As we discussed offline, having a Clang decl and no Swift decl actually conveys some information: there is some C++ thing that cannot be imported into Swift (i.e., we hit a problem importing it). But, this isn't the full picture, because the decl ref representation also carries some information. If we put these together, you see the heuristic we are actually looking for: when do we have a decl ref and a C++ decl, but no Swift decl. This difference is key because if we have a decl ref, that means a user requested the C++ decl.

Okay, so, where in the compiler do we have all of these representations? Unfortunately, not in DirectLookupRequest::evaluate. At this point, we only have a DeclName which is essentially a string. We don't know if a user asked us for this. If we want an actual decl ref, we have to walk up the call stack a bit, and we get to... resolveDeclRefExpr.

What's the problem with "putting this logic" in TypeChecker::resolveDeclRefExpr? The problem is that above we said we need all three representations, and in resolveDeclRefExpr we only have two: the decl ref and the Swift decl. So, we need to find the Clang decl. This should be pretty easy to do: you can just issue a ClangDirectLookupRequest (or maybe just pull out the body of ClangDirectLookupRequest::evaluate to get exactly what you need. Happy to help you with this offline). This also means, you can bail out of resolveDeclRefExpr and prevent typo checking, etc.

What do you think of this idea? Do you think it might help get rid of DiagnosticTarget and EagerImportActive?

@NuriAmari
Copy link
Contributor Author

Hi everyone,

In the lastest commit, I have rewritten the improvements to the ClangImporter diagnostics. The behaviour is largely unchanged (existing tests were not modified), but the implementation is entirely different. Here I include a brief summary of the changes --let me know what you think -- any questions or feedback are appreciated.

Goals

At a high level the my goal with these changes is:

- Remove the hackier parts of the previous design (ie. EagerImportActive)
- Implement a less fragile diagnostic emission strategy
    - Previously, when diagnostics were emitted was heavily influenced by
      the design of the ClangImporter (ie. Lazy vs eager loading, reentrancy, caching)
    - This change aims to as much as possible decouple the two systems

Design Changes

The implementation changes themselves are as follows. Diagnostics are now stored instead of being emitted immediatley. The ClangImporter::Implementation now has this new API allowing clients to diagnose a declaration that couldn't be imported, either by name or by ClangNode directly, whichever is more convenient. More on this later.

/// Emit any import diagnostics associated with the given name.
void diagnoseValue(SwiftLookupTable &table, DeclName name);

/// Emit any import diagnostics associated with the given Clang node.
void diagnoseTargetDirectly(ImportDiagnosticTarget target);

The diagnostics themselves are all associated with a particular ClangNode. There is typically a natural ClangNode to attach to (ex: Failed to import a function, attach a diagnostic to the function). Diagnostics are deduplicated by insisting that each emitted diagnostic has a unique (SourceLocation, DiagID, ClangNode) combination. This means it does not matter how many times or when a ClangNode is visited by the ClangImporter.

When either of the above methods are called, diagnostics attached to the target node, or any of its descendants* are emitted. To accomplish this, I traverse the Clang AST starting at the target node using what I am calling a DiagnosticWalker.

* It is not really a complete traversal, the ClangImporter sometimes bails early which is reflected in the traversal. See DiagnosticWalker::TraverseParmVarDecl for details.

Observed Benefits

Overall, the design is in my opinion simpler and more flexible. For example, the previous solution required the occasional bypass of a cache to ensure diagnostics are produced. This solution does not care if the ClangImporter caches or not. Similarly, while these changes are implemented on top of @zoecarver's changes to make the ClangImporter lazier, they do not depend on this lazy behaviour any longer. Even if the ClangImporter was entirely eager, we could still call diagnoseValue strategically to implement lazy diagnostics. For example, the previous solution required fiddling with some caches, this one does not care if the ClangImporter caches or not.

Lastly, we have much more control over what diagnostics are emitted since we control the traversal of the ClangAST. If tomorrow we decided that a failed import of a struct should only display diagnostics for fields who's names starts with 'A', we could do that without modifying the rest of the ClangImporter.

Remaining Changes

The initial motivation behind these changes was @zoecarver's great suggestion to tie the emission of diagnostics to the existence of a referencing Swift AST node. I have mostly done this, in that calls to diagnoseValue or diagnoseTargetDirectly are made outside the ClangImporter in locations where the existence of such a referencing Swift AST node is guaranteed. However, diagnoseValue calls are still made relatively close to the ClangImporter for the time being (this was just easier to start with). You'll notice I have a set DiagnosedValues to prevent repeated calls to diagnoseValue for the same target. This is to combat the dreaded resolveDeclRefExpr's repeated look up behaviour. This also means for two references of the same unimportable declaration, only the first is diagnosed. This is not a new limitation, but I think we can do bettter. If we think this is worth addressing, I intend to fix this by triggering the call to diagnoseValue in resolveDeclRefExpr after all lookups fail. I just haven't gotten around to this as it is far removed from the ClangImporter and may require some new requests.

Misc Additions

  • Integrate with Becca's HeaderLoc changes
  • Allow clang::NamedDecl's to be passed directly to some diagnostics
    • This is more convenient, but it also it also solves issues with passing a StringRef
      pointing to a String who's lifetime does not extend beyond the point the diagnostics is emitted

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, really nicely done. This looks awesome.

I didn't do an in-depth review, but I'll try to do that later this week. I really like this new direction, it looks great.

@zoecarver
Copy link
Contributor

@swift-ci please smoke test.

@zoecarver
Copy link
Contributor

zoecarver commented Dec 28, 2021

@NuriAmari I'm really looking forward to landing this :)

Thanks for all your work on it. I appreciate you spending the time to implement it correctly.

@NuriAmari
Copy link
Contributor Author

Update

CR Revision "4" addresses oustanding CR comments.

CR Revision "5" introduces another flag for the eager displaying of diagnostics:

  • The idea is simple, load all visible declarations from any loaded modules, and all members within those declarations, producing diagnostics for any failures.
  • The new diagnostic system complicates things slightly, since we need to know what declarations we tried but failed to import. Currently, the ClangImporter is not very good at answering this question.
  • This patches is relatively simple, it covers most declarations (including all we have diagnostics for), but misses some. In some cases, eager import is unavailable currenlty (ex: namespace members are lazy load only it seems), or the code path for loading is very specific to the declaration. For example, I suspect inherited constructors for example, might be left out at the moment. It shouldn't be difficult to include them, I just haven't covered every case.

Side Comment

In this PR, Becca makes effort to reduce reentrancy and reduce the number of entry points into the ClangImporter. I haven't looked at the details, but if working on this has taught me anything, it's that this would be beneficial. It's quite difficult to implement solutions that work for a number of different types of imports when every one handled slightly differently. Any unification of API would help.

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the remaining comments, but other than that, I think it's ready.

@zoecarver
Copy link
Contributor

@swift-ci please smoke test.

@zoecarver
Copy link
Contributor

@swift-ci please test Windows.

This patch introduces new diagnostics to the ClangImporter to help
explain why certain C, Objective-C or C++ declarations fail to import
into Swift. This patch includes new diagnostics for the following entities:

- C functions
- C struct fields
- Macros
- Objective-C properties
- Objective-C methods

In particular, notes are attached to indicate when any of the above
entities fail to import as a result of refering an incomplete (only
forward declared) type.

The new diangostics are hidden behind two new flags, -enable-experimental-clang-importer-diagnostics
and -enable-experimental-eager-clang-module-diagnostics. The first flag emits diagnostics lazily,
while the second eagerly imports all declarations visible from loaded Clang modules. The first
flag is intended for day to day swiftc use, the second for module linting or debugging the importer.
@zoecarver
Copy link
Contributor

@swift-ci please smoke test.

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'll try to land it tomorrow.

Thanks again @NuriAmari!

@zoecarver
Copy link
Contributor

@swift-ci please smoke test.

@zoecarver
Copy link
Contributor

@swift-ci please smoke macOS.

@zoecarver
Copy link
Contributor

@swift-ci please smoke macOS platform.

@zoecarver
Copy link
Contributor

@swift-ci Please smoke test macOS platform

2 similar comments
@zoecarver
Copy link
Contributor

@swift-ci Please smoke test macOS platform

@zoecarver
Copy link
Contributor

@swift-ci Please smoke test macOS platform

@drodriguez
Copy link
Contributor

@NuriAmari: thanks a lot for all the work! This looks amazing and I hope a lot of people find it useful. I know we will.

PD: I took my PTO seriously this year, so sorry for not chiming it earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants