-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Speed up _Bitset in size-optimized builds #19664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Speed up _Bitset in size-optimized builds #19664
Conversation
@swift-ci please smoke benchmark |
@swift-ci please test |
This comment has been minimized.
This comment has been minimized.
The +90% DictionaryKeysContainsNative regression in the -Osize build seems to hint at an issue. |
It's also super weird, as this PR doesn't touch those paths. |
@swift-ci please smoke benchmark |
This comment has been minimized.
This comment has been minimized.
Wow. Divisions are slow! |
😲 |
@swift-ci please smoke benchmark |
@swift-ci test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The last commit switches multiplications to use |
@swift-ci please smoke benchmark |
@swift-ci please test |
@swift-ci please test macOS platform |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I'm going to rebase this, revert the initial (mostly unsuccessful) commit and re-run the benchmarks before landing. |
…tion/division by powers-of-two The divisions don’t get eliminated in -Osize builds.
However, convert input values to UInts before dividing them.
We know it won’t overflow, because the result will not exceed the capacity, which is guaranteed to fit in an integer.
6cb27a2
to
267b80f
Compare
@swift-ci smoke test |
@swift-ci smoke benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
Interesting. The AnyHashableWithAClass regression is new; however, I don't see how it could be caused by this change. I'll look into it later. |
Due to a possible low-level optimizer issue, a division-by-constant-power-of-two expression gets compiled into an actual division instruction with
-Osize
. Explicitly changing to unsigned arithmetic ensures they get emitted as right shifts.There is a similar issue with multiplications, although that one boils down to unnecessary overflow checking. Switching to
&*
makes sure those get compiled into corresponding left shifts.