Skip to content

[cxx-interop] Improve performance of deep template instantiations. #36553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

zoecarver
Copy link
Contributor

If there are more than 10,000 instantiations of the same class template then we need to bail because it will take too long to mangle, instantiate, etc.

This is combined with the existing check for deep templates. This way we can bail on templates that are too big in both directions.

@zoecarver zoecarver added the c++ interop Feature: Interoperability with C++ label Mar 23, 2021
@@ -4,3 +4,48 @@ struct HasTypeWithSelfAsParam {
};

using WillBeInfinite = HasTypeWithSelfAsParam<int>;

namespace RegressionTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix makes importing the standard library much much faster. I took this example from valarray, who I think was the main culprit of the slow builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you share rough numbers for before vs after? Is the slowness here because of additional inefficiencies on the Swift side (are we trying to import the whole standard library at once vs granular headers used by C++ developers) or is this equally slow with Clang itself?
  2. Since this is a performance-based test, should it be under validation-test instead of test? You mention: "The below template is super slow even after bailing at 8 template type params deep." Given that, it doesn't seem worthwhile to have this be part of the usual test suite, otherwise it slows down people unnecessarily...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The numbers for this test are: it takes less than a second now and it would take on the order of weeks before. If we stop bailing after 10,000 specializations, the CI will hopefully timeout. Otherwise, this test will take more RAM than they have.

So, I realized that it was untenable to import the C++ stdlib while valarray was taking this much time+memory. I tried to fix this in a few ways, and I think this was the easiest. I think we should figure out a way to only import the submodules when someone does, for example, import std.array we shouldn't also import std.string. Anyway, we can fix that down the road, though. Because I think that will be a bigger fix related to how we import modules. Another thing I tried is only importing the individual namespace subdecls. Right now when we import a namespace we also import all its children (which is pretty slow).

This is sort of a problem with Clang, the thing is they don't need to instantiate all the templates/types all the way down in the same way we do. But, the Clang APIs were definitely the slowest part of this when I profiled it. I think we can/should make importing these class templates lazy in the future, but that will also be a bit more involved.

Also, just a side-note: the libc++ implementation of valarray is not very good at all (in fact, I'm not sure any stdlib has a good implementation of valarray). People probably shouldn't be using it. There are lots of things we could do to improve both runtime performance and compile-time performance.

  1. As I said above, this test actually takes very little time right now. It certainly takes less time than any of the execution tests. So I think it's fine to keep it where it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, we can fix that down the road, though. Because I think that will be a bigger fix related to how we import modules

One potential risk is if it becomes load-bearing and other parts of the code rely on certain things being available at certain times, then changing the model to be lazy later on is much more complex compared to changing it earlier. More broadly, I think it would be nice if we could get the core model "right" first and build on top of that, especially since people aren't relying on this feature right now. You've been posting a bunch of PRs to fix/work around individual issues, maybe it's worthwhile to step back and reconsider how we're handling things, what we want the core importing behavior to be like, so that things are fast by default and there is less of a need to work around individual issues.

(This is just me thinking out loud; I'm not familiar with the code to be able to judge that risk accurately here, it's quite possible that I'm overthinking it and the performance problems are easily fixable later.)

This is sort of a problem with Clang, the thing is they don't need to instantiate all the templates/types all the way down in the same way we do.

Could you elaborate more on the difference/why we're doing more work than Clang normally does?

As I said above, this test actually takes very little time right now.

I'm a little bit confused. This seems to contradict the comment in the code "The below template is super slow even after bailing at 8 template type params deep." Is it "super slow" relative to something else and not in absolute terms (taking several seconds or minutes)?

Copy link
Contributor Author

@zoecarver zoecarver Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential risk is if it becomes load-bearing and other parts of the code rely on certain things being available at certain times, then changing the model to be lazy later on is much more complex compared to changing it earlier. More broadly, I think it would be nice if we could get the core model "right" first and build on top of that, especially since people aren't relying on this feature right now. You've been posting a bunch of PRs to fix/work around individual issues, maybe it's worthwhile to step back and reconsider how we're handling things, what we want the core importing behavior to be like, so that things are fast by default and there is less of a need to work around individual issues.

(This is just me thinking out loud; I'm not familiar with the code to be able to judge that risk accurately here, it's quite possible that I'm overthinking it and the performance problems are easily fixable later.)

I completely agree. Before C++ interop stops being experimental we need to figure out the modules issue. I guess I didn't really say this explicitly but I think even if we "fix" the module problem, we should still fix this. That way we still won't take hours to import std.valarray, for example.

Your more broad point about taking a step back and thinking about some of the performance issues with the Clang importer is also a good one. I think we should have a discussion on the forum about this. I think making a lot of template stuff lazy would be a good idea. As I describe below, that would make us more closely follow what Clang does. (For example, the last bug I fixed, to properly error for invalid inline static data members, wouldn't have been needed if we lazily evaluated those, as Clang does.)

Could you elaborate more on the difference/why we're doing more work than Clang normally does?

I'll give it a try, but I have to admit, I'm not completely confident in this answer and I'd like to investigate further. Essentially, I think the problem is this, take the following example:

template<class T> struct X  { X<X<T>> test(); } ;
void test(X<int>);

Clang compiles this just fine; we don't. Clang (as I understand it, which may not be entirely correct) won't evaluate the return type until the function test is actually called. So, every time X::test is called, it goes one level deeper. In Swift, the Clang importer actually needs to fully instantiate all function types so that we can generate Swift functions based on them (note: we don't need to fully instantiate all functions, we now do that lazily). But, unfortunately, I think we're going to need to continue to fully instantiate function types. Otherwise, I'm not sure how we could generate the "Swift version" of the function type.

Because Clang doesn't need to instantiate any unused templates, it doesn't fall into these recursive/exponential traps that we are hitting.

I'm a little bit confused. This seems to contradict the comment in the code "The below template is super slow even after bailing at 8 template type params deep." Is it "super slow" relative to something else and not in absolute terms (taking several seconds or minutes)?

You're right. I put this in a very confusing way. Two things, first, instantiating ~30,000 templates is relatively slow. Second, we're not actually importing those class templates I'm talking about (at least not fully). I guess what I'm trying to say in the comment is "fully importing this would be super slow." Hopefully, that clears it up a bit. I will update the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, unfortunately, I think we're going to need to continue to fully instantiate function types. Otherwise, I'm not sure how we could generate the "Swift version" of the function type.

If I were to guess, it would probably work in the same way as it does for other requests. Getting the Swift version of the function type would become a request, and we only force that when needed, instead of doing it eagerly. We may need to requestify other things which utilize this too, so that they don't end up eagerly forcing the type. I understand that might require a more dramatic rework of how the Clang importer is set up, but it would problematic if we hit a snag later on where we couldn't actually fix it for some reason and people who start using this feature end up hitting pathological cases immediately with complex libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a super interesting idea. I think I've seen that in other parts of the compiler/type checker, so maybe we could follow a similar model. I'll open a forum post tomorrow and we can discuss more thoroughly.

@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test.

@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test macOS.

If there are more than 10,000 instantiations of the same class template
then we need to bail because it will take to long to mangle,
instantiate, etc.
@zoecarver zoecarver force-pushed the cxx/fix-deep-template-spec branch from fd56d3e to 3cd6c80 Compare March 23, 2021 23:59
@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test.

@zoecarver
Copy link
Contributor Author

For what it's worth: this is the last patch we need to be able to import the C++ stdlib without crashing.

@zoecarver
Copy link
Contributor Author

@varungandhi-apple alright if I land this?

@varungandhi-apple varungandhi-apple merged commit 6202dd6 into swiftlang:main Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ interop Feature: Interoperability with C++
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants