Skip to content

SILOptimizer: try inplace_function for compile performance (Test) #42208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

troughton
Copy link
Contributor

@troughton troughton commented Apr 6, 2022

I was profiling some slow compiles and saw that the std::functions in UpdatingListIterator seems to be incurring a lot of new/delete overhead (around 5% of the compile time for a -O -wmo build), so was curious as to how much moving the out-of-line std::function allocations inline would affect performance.

I haven't been able to test performance on code locally due to difficulties in building the SwiftPM part of a toolchain (although the Swift tests pass), and I don't expect this to be merged as-is (since I’m guessing the inplace_function dependency might be an issue). It'd be great if someone could trigger a CI compiler performance test to see if this is worth pursuing; otherwise I can try to fix the toolchain build errors and get some local numbers sometime later this week.

inplace_function.h was taken from https://github.com/WG21-SG14/SG14/blob/master/SG14/inplace_function.h and hackily modified to assert on exceptions.

@troughton troughton force-pushed the inline-the-function-block branch from 47abc69 to bfe32b0 Compare April 8, 2022 11:55
@troughton
Copy link
Contributor Author

troughton commented Apr 8, 2022

I've run some tests locally and seem to be getting roughly a 2-4% speedup in the full build time on the project where I noticed the compile time issues; e.g.

swift build --configuration release 1612.49s user 53.46s system 408% cpu 6:47.81 total without, vs.
swift build --configuration release 1581.71s user 49.29s system 407% cpu 6:40.41 total with the changes.

@eeckstein or @gottesmm (as the last people to modify these files): if you think it's worth running a full compiler performance test on CI then I'd appreciate you starting that; otherwise, given these are fairly modest speedups, I'm also okay with closing this PR.

@eeckstein
Copy link
Contributor

nice work!
Eventually we want to get rid of InstModCallbacks at all. But in the meantime it sounds like a good compile time improvement.
@atrick any thoughts?

@atrick
Copy link
Contributor

atrick commented Apr 11, 2022

Thank you! Although this overhead is not surprising, it's unacceptable--we need a fix.

I really am surprised that std::function doesn't have any inline storage optimization though! llvm::unique_function does. Can you just use that instead?

I don't know if there's any possibility of including a header under a different software license. AFAIK we need to keep everything in the Swift repo under the same license. You'd need to check on the forum.

Just to make sure I understand, is the 5% overhead mostly from std::function allocation within UpdatingInstructionIterator here:

  UpdatingInstructionIterator.h
    callbacks(InstModCallbacks()
                    .onDelete([this](SILInstruction *toDelete) {
                      notifyDelete(toDelete);
                      toDelete->eraseFromParent();
                    })
                    .onCreateNewInst(
                        [this](SILInstruction *newlyCreatedInst) {
                          notifyNew(newlyCreatedInst);
                        }))

For the record, the InstructionDeleter could be fixed to directly call iteratorRegistry.notifyDelete and notifyNew. But that's unsafe within any SIL passes that still rely on InstModCallbacks. To do that safely, we would need to eliminate InstModCallbacks complete, which still requires fixing a handful of SIL passes.

@troughton
Copy link
Contributor Author

troughton commented Apr 12, 2022

I’ve looked into this a little more and now think that the timing improvements from before may have been benchmark noise. The ~5% impact was measured on the 5.6 branch and was mainly in UpdatingInstructionIteratorRegistry::rechainCallbacks; that particular compile time issue seems to have been fixed in a different way by d418192, since I don't see any real std::function overhead in SILOptimizer now when profiling main.

In terms of the timing from before: I ran the build a few more times with everything else closed on my machine, and the results were:

Without these changes:

swift build --configuration release  1081.76s user 34.91s system 579% cpu 3:12.78 total
swift build --configuration release  1070.14s user 34.85s system 570% cpu 3:13.81 total
swift build --configuration release  1071.76s user 33.99s system 572% cpu 3:13.10 total

With these changes:

swift build --configuration release  1065.35s user 34.08s system 565% cpu 3:14.58 total
swift build --configuration release  1066.52s user 33.85s system 568% cpu 3:13.43 total
swift build --configuration release  1067.52s user 35.01s system 568% cpu 3:14.03 total

That seems close enough to be in the margin of error, although I do find it interesting that the user time and CPU usage were consistently slightly lower than without the changes in this PR. Either that's just benchmark noise or is a tiny real improvement caused by some other factors (alignment changes?), but either way I don't feel it's enough to justify this PR.

To quickly address one other point:

I really am surprised that std::function doesn’t have any inline storage optimization though!

My understanding (from https://stackoverflow.com/a/46163732 and its comments) is std::function does have an inline storage optimisation (up to a libstdc++/libc++ determined size), but it was (in 5.6) being exceeded in some places.

Anyway, thank you for having fixed the underlying issue back in #41239 and I hope looking at this didn't take up too much of your time.

@troughton troughton closed this Apr 12, 2022
@eeckstein
Copy link
Contributor

Thanks for doing these experiments!

@atrick
Copy link
Contributor

atrick commented Apr 12, 2022

@eeckstein fixed a severe problem where these closures were allocating unnecessarily sometime in the 5.7 time frame. It should be in that file's git log. I suspect that's what you were running into.

Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants