-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[cxx-interop] Improve performance of deep template instantiations. #36553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
varungandhi-apple
merged 1 commit into
swiftlang:main
from
zoecarver:cxx/fix-deep-template-spec
Mar 24, 2021
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix makes importing the standard library much much faster. I took this example from valarray, who I think was the main culprit of the slow builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I realized that it was untenable to import the C++ stdlib while valarray was taking this much time+memory. I tried to fix this in a few ways, and I think this was the easiest. I think we should figure out a way to only import the submodules when someone does, for example,
import std.array
we shouldn't also importstd.string
. Anyway, we can fix that down the road, though. Because I think that will be a bigger fix related to how we import modules. Another thing I tried is only importing the individual namespace subdecls. Right now when we import a namespace we also import all its children (which is pretty slow).This is sort of a problem with Clang, the thing is they don't need to instantiate all the templates/types all the way down in the same way we do. But, the Clang APIs were definitely the slowest part of this when I profiled it. I think we can/should make importing these class templates lazy in the future, but that will also be a bit more involved.
Also, just a side-note: the libc++ implementation of valarray is not very good at all (in fact, I'm not sure any stdlib has a good implementation of valarray). People probably shouldn't be using it. There are lots of things we could do to improve both runtime performance and compile-time performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential risk is if it becomes load-bearing and other parts of the code rely on certain things being available at certain times, then changing the model to be lazy later on is much more complex compared to changing it earlier. More broadly, I think it would be nice if we could get the core model "right" first and build on top of that, especially since people aren't relying on this feature right now. You've been posting a bunch of PRs to fix/work around individual issues, maybe it's worthwhile to step back and reconsider how we're handling things, what we want the core importing behavior to be like, so that things are fast by default and there is less of a need to work around individual issues.
(This is just me thinking out loud; I'm not familiar with the code to be able to judge that risk accurately here, it's quite possible that I'm overthinking it and the performance problems are easily fixable later.)
Could you elaborate more on the difference/why we're doing more work than Clang normally does?
I'm a little bit confused. This seems to contradict the comment in the code "The below template is super slow even after bailing at 8 template type params deep." Is it "super slow" relative to something else and not in absolute terms (taking several seconds or minutes)?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. Before C++ interop stops being experimental we need to figure out the modules issue. I guess I didn't really say this explicitly but I think even if we "fix" the module problem, we should still fix this. That way we still won't take hours to import
std.valarray
, for example.Your more broad point about taking a step back and thinking about some of the performance issues with the Clang importer is also a good one. I think we should have a discussion on the forum about this. I think making a lot of template stuff lazy would be a good idea. As I describe below, that would make us more closely follow what Clang does. (For example, the last bug I fixed, to properly error for invalid inline static data members, wouldn't have been needed if we lazily evaluated those, as Clang does.)
I'll give it a try, but I have to admit, I'm not completely confident in this answer and I'd like to investigate further. Essentially, I think the problem is this, take the following example:
Clang compiles this just fine; we don't. Clang (as I understand it, which may not be entirely correct) won't evaluate the return type until the function
test
is actually called. So, every timeX::test
is called, it goes one level deeper. In Swift, the Clang importer actually needs to fully instantiate all function types so that we can generate Swift functions based on them (note: we don't need to fully instantiate all functions, we now do that lazily). But, unfortunately, I think we're going to need to continue to fully instantiate function types. Otherwise, I'm not sure how we could generate the "Swift version" of the function type.Because Clang doesn't need to instantiate any unused templates, it doesn't fall into these recursive/exponential traps that we are hitting.
You're right. I put this in a very confusing way. Two things, first, instantiating ~30,000 templates is relatively slow. Second, we're not actually importing those class templates I'm talking about (at least not fully). I guess what I'm trying to say in the comment is "fully importing this would be super slow." Hopefully, that clears it up a bit. I will update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were to guess, it would probably work in the same way as it does for other requests. Getting the Swift version of the function type would become a request, and we only force that when needed, instead of doing it eagerly. We may need to requestify other things which utilize this too, so that they don't end up eagerly forcing the type. I understand that might require a more dramatic rework of how the Clang importer is set up, but it would problematic if we hit a snag later on where we couldn't actually fix it for some reason and people who start using this feature end up hitting pathological cases immediately with complex libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a super interesting idea. I think I've seen that in other parts of the compiler/type checker, so maybe we could follow a similar model. I'll open a forum post tomorrow and we can discuss more thoroughly.