Skip to content

[stdlib] Speed up _Bitset in size-optimized builds #19664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Oct 2, 2018

Due to a possible low-level optimizer issue, a division-by-constant-power-of-two expression gets compiled into an actual division instruction with -Osize. Explicitly changing to unsigned arithmetic ensures they get emitted as right shifts.

There is a similar issue with multiplications, although that one boils down to unnecessary overflow checking. Switching to &* makes sure those get compiled into corresponding left shifts.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please smoke benchmark

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please test

@swift-ci

This comment has been minimized.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

The +90% DictionaryKeysContainsNative regression in the -Osize build seems to hint at an issue.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

It's also super weird, as this PR doesn't touch those paths.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please smoke benchmark

@swift-ci

This comment has been minimized.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

Wow. Divisions are slow!

@airspeedswift
Copy link
Member

😲

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please smoke benchmark

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci test

@swift-ci

This comment has been minimized.

@swift-ci

This comment has been minimized.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

The last commit switches multiplications to use &*, so that they too get compiled down to bit shifts.

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please smoke benchmark

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please test

@lorentey
Copy link
Member Author

lorentey commented Oct 2, 2018

@swift-ci please test macOS platform

@swift-ci

This comment has been minimized.

@lorentey lorentey requested a review from stephentyrone October 2, 2018 19:55
@swift-ci

This comment has been minimized.

@swift-ci

This comment has been minimized.

@lorentey
Copy link
Member Author

lorentey commented Oct 3, 2018

I'm going to rebase this, revert the initial (mostly unsuccessful) commit and re-run the benchmarks before landing.

…tion/division by powers-of-two

The divisions don’t get eliminated in -Osize builds.
However, convert input values to UInts before dividing them.
We know it won’t overflow, because the result will not exceed the capacity, which is guaranteed to fit in an integer.
@lorentey lorentey force-pushed the dictionary-subscript-setter-is-slow branch from 6cb27a2 to 267b80f Compare October 3, 2018 11:02
@lorentey
Copy link
Member Author

lorentey commented Oct 3, 2018

@swift-ci smoke test

@lorentey
Copy link
Member Author

lorentey commented Oct 3, 2018

@swift-ci smoke benchmark

@lorentey lorentey changed the title [stdlib] Partially force-inline Dictionary’s keyed subscript setter [stdlib] Speed up _Bitset in size-optimized builds Oct 3, 2018
@swift-ci
Copy link
Contributor

swift-ci commented Oct 3, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
AnyHashableWithAClass 88989 115574 +29.9% 0.77x
Improvement
SetSymmetricDifferenceInt0 426 386 -9.4% 1.10x
SetExclusiveOr 4282 3887 -9.2% 1.10x
CStringLongAscii 3550 3296 -7.2% 1.08x

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
Histogram.o 5004 4604 -8.0% 1.09x
HashQuadratic.o 6776 6328 -6.6% 1.07x
DictionaryCompactMapValues.o 26194 24546 -6.3% 1.07x
DictionaryCopy.o 10662 9992 -6.3% 1.07x
DictionaryRemove.o 18110 17006 -6.1% 1.06x
TwoSum.o 6664 6264 -6.0% 1.06x
DictionarySubscriptDefault.o 32891 31163 -5.3% 1.06x
DictionarySwap.o 31502 29854 -5.2% 1.06x
DictionaryKeysContains.o 19875 18963 -4.6% 1.05x
DictTest2.o 21296 20320 -4.6% 1.05x
DictTest3.o 29840 28528 -4.4% 1.05x
DictTest4Legacy.o 27338 26218 -4.1% 1.04x
RGBHistogram.o 27677 26605 -3.9% 1.04x
DictTest4.o 25884 24908 -3.8% 1.04x
ReversedCollections.o 11755 11371 -3.3% 1.03x
DictionaryGroup.o 18279 17751 -2.9% 1.03x
WordCount.o 67071 65279 -2.7% 1.03x
SetTests.o 61321 59697 -2.6% 1.03x
StringEdits.o 14925 14541 -2.6% 1.03x
CharacterProperties.o 19249 18833 -2.2% 1.02x
DictionaryLiteral.o 1420 1392 -2.0% 1.02x
StringRemoveDupes.o 16229 15909 -2.0% 1.02x
DictTest.o 52863 51887 -1.8% 1.02x
CountAlgo.o 21540 21172 -1.7% 1.02x
DictOfArraysToArrayOfDicts.o 33028 32500 -1.6% 1.02x
ReduceInto.o 24783 24399 -1.5% 1.02x
TestsUtils.o 24325 23957 -1.5% 1.02x
Hash.o 30051 29683 -1.2% 1.01x
ObjectiveCBridging.o 46142 45614 -1.1% 1.01x
DriverUtils.o 169441 167649 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
AnyHashableWithAClass 109426 139001 +27.0% 0.79x
Improvement
SetSymmetricDifferenceInt0 875 390 -55.4% 2.24x
SetExclusiveOr 8752 3927 -55.1% 2.23x
SetIsSubsetInt25 163 77 -52.8% 2.12x
SetUnionInt0 665 317 -52.3% 2.10x
SetUnion 6659 3182 -52.2% 2.09x
SetIntersectionInt0 127 62 -51.2% 2.05x
SetIntersect 1274 624 -51.0% 2.04x
SetSymmetricDifferenceInt25 469 238 -49.3% 1.97x
SetSubtractingInt0 141 72 -48.9% 1.96x
SetIsSubsetInt50 303 155 -48.8% 1.95x
SetIntersectionInt100 767 393 -48.8% 1.95x
SetIntersectionInt50 443 230 -48.1% 1.93x
SetUnionInt25 270 141 -47.8% 1.91x
SetIntersectionInt25 287 151 -47.4% 1.90x
TwoSum 2343 1244 -46.9% 1.88x
SetSymmetricDifferenceInt100 454 249 -45.2% 1.82x
SetUnionInt50 215 118 -45.1% 1.82x
SetSymmetricDifferenceInt50 456 253 -44.5% 1.80x
DictionarySwap 1956 1109 -43.3% 1.76x
DictionaryRemove 7544 4307 -42.9% 1.75x
SetUnionInt100 124 71 -42.7% 1.75x
DictionaryCopy 131805 76303 -42.1% 1.73x
SetSubtractingInt25 201 118 -41.3% 1.70x
SetIsSubsetInt100 523 308 -41.1% 1.70x
SetSubtractingInt100 337 199 -40.9% 1.69x
RGBHistogram 4044 2408 -40.5% 1.68x
SetSubtractingInt50 261 158 -39.5% 1.65x
SetSymmetricDifferenceBox0 1219 752 -38.3% 1.62x
SetExclusiveOr_OfObjects 12182 7519 -38.3% 1.62x
SetUnionBox0 939 596 -36.5% 1.58x
SetUnion_OfObjects 9374 5975 -36.3% 1.57x
SetIsSubsetInt0 400 256 -36.0% 1.56x
Histogram 962 618 -35.8% 1.56x
DictionarySwapAt 1700 1099 -35.4% 1.55x
DictionaryFilter 108942 71291 -34.6% 1.53x
DictionaryCompactMapValuesOfNilValue 8921 5867 -34.2% 1.52x
SetIntersectionBox0 220 145 -34.1% 1.52x
SetIntersectionBox25 424 283 -33.3% 1.50x
SetSubtractingBox0 230 155 -32.6% 1.48x
SetIsSubsetBox25 271 184 -32.1% 1.47x
Dictionary 597 409 -31.5% 1.46x
DictionarySubscriptDefaultMutation 510 355 -30.4% 1.44x
DictionaryGroup 455 324 -28.8% 1.40x
Dictionary2 867 641 -26.1% 1.35x
SetSymmetricDifferenceBox25 767 568 -25.9% 1.35x
SetUnionBox25 502 376 -25.1% 1.34x
Dictionary3 231 179 -22.5% 1.29x
SetIsSubsetBox0 471 366 -22.3% 1.29x
WordCountUniqueASCII 1985 1555 -21.7% 1.28x
SetSubtractingBox25 371 291 -21.6% 1.27x
DictionaryCompactMapValuesOfCastValue 16246 12765 -21.4% 1.27x
CharacterPropertiesPrecomputed 1924 1514 -21.3% 1.27x
Dictionary4 341 282 -17.3% 1.21x
FrequenciesUsingReduceInto 1442 1194 -17.2% 1.21x
DictionarySubscriptDefaultMutationArray 895 744 -16.9% 1.20x
Prims 1030 872 -15.3% 1.18x
PrimsSplit 1025 872 -14.9% 1.18x
Dictionary4OfObjects 425 363 -14.6% 1.17x
DictionaryRemoveOfObjects 23867 20796 -12.9% 1.15x
StringRemoveDupes 489 433 -11.5% 1.13x
CharacterPropertiesStashedMemo 2288 2057 -10.1% 1.11x
Dictionary4Legacy 715 658 -8.0% 1.09x
DictionaryLiteral 3050 2810 -7.9% 1.09x
CStringLongAscii 3549 3305 -6.9% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
HashQuadratic.o 6184 5832 -5.7% 1.06x
Histogram.o 4672 4480 -4.1% 1.04x
TwoSum.o 6125 5949 -2.9% 1.03x
DictionaryCompactMapValues.o 23578 22970 -2.6% 1.03x
DictionaryCopy.o 9345 9105 -2.6% 1.03x
DictTest2.o 16882 16498 -2.3% 1.02x
DictionaryRemove.o 15163 14827 -2.2% 1.02x
DictTest3.o 23618 23106 -2.2% 1.02x
DictionarySubscriptDefault.o 27939 27347 -2.1% 1.02x
DictionarySwap.o 27331 26771 -2.0% 1.02x
StringRemoveDupes.o 9213 9037 -1.9% 1.02x
DictTest4.o 21307 20939 -1.7% 1.02x
DictionaryKeysContains.o 17147 16859 -1.7% 1.02x
DictTest4Legacy.o 23017 22633 -1.7% 1.02x
StringEdits.o 14382 14174 -1.4% 1.01x
WordCount.o 55520 54720 -1.4% 1.01x
CountAlgo.o 14624 14432 -1.3% 1.01x
RGBHistogram.o 24253 23949 -1.3% 1.01x
ReduceInto.o 16938 16746 -1.1% 1.01x
ReversedCollections.o 11925 11797 -1.1% 1.01x
DictionaryGroup.o 16503 16327 -1.1% 1.01x
DictionaryLiteral.o 1541 1525 -1.0% 1.01x
SetTests.o 53377 52825 -1.0% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
AnyHashableWithAClass 106832 152855 +43.1% 0.70x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@lorentey
Copy link
Member Author

lorentey commented Oct 3, 2018

Interesting. The AnyHashableWithAClass regression is new; however, I don't see how it could be caused by this change. I'll look into it later.

@lorentey lorentey merged commit da463e0 into swiftlang:master Oct 3, 2018
@lorentey lorentey deleted the dictionary-subscript-setter-is-slow branch October 3, 2018 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants