Skip to content

[stdlib] Refactor Range Overlaps for Performance #23293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

PatrickPijnappel
Copy link
Contributor

@PatrickPijnappel PatrickPijnappel commented Mar 14, 2019

Refactors the overlaps(_:) method for Range & ClosedRange to use fewer comparisons, and adds a benchmark.

Speed-ups for Int ranges based on local running of the benchmark:

  • Range x Range: 22%
  • Range x ClosedRange: 14%
  • ClosedRange x ClosedRange: 10%

Speed-ups for String ranges based on a different local test:

  • Range x Range: 22%
  • Range x ClosedRange: 48%
  • ClosedRange x ClosedRange: 44%

Benchmark available separately in #23294.

@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 6ac770b8f65b0eefb795916aa6f4f1d915f15a8a

@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please test OS X platform

@PatrickPijnappel PatrickPijnappel force-pushed the range-overlaps-performance-impl branch 2 times, most recently from 25f469e to 3fa7e82 Compare March 18, 2019 05:19
@airspeedswift
Copy link
Member

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeStubFromNSDateRef 4060 4630 +14.0% 0.88x (?)
ObjectiveCBridgeStubToNSDate2 1380 1530 +10.9% 0.90x (?)
FlattenListLoop 3976 4334 +9.0% 0.92x
Array2D 6928 7520 +8.5% 0.92x (?)
MapReduceAnyCollection 369 398 +7.9% 0.93x (?)
FlattenListFlatMap 6076 6540 +7.6% 0.93x (?)
MapReduce 369 397 +7.6% 0.93x
Improvement
ClosedRangeOverlapsClosedRange 126 90 -28.6% 1.40x
RangeOverlapsClosedRange 127 109 -14.2% 1.17x
LessSubstringSubstring 44 39 -11.4% 1.13x
RangeOverlapsRange 154 138 -10.4% 1.12x
EqualStringSubstring 43 39 -9.3% 1.10x
EqualSubstringString 43 39 -9.3% 1.10x
LessSubstringSubstringGenericComparable 43 39 -9.3% 1.10x
StringHasSuffixAscii 2400 2220 -7.5% 1.08x (?)
EqualSubstringSubstring 42 39 -7.1% 1.08x (?)
EqualSubstringSubstringGenericEquatable 42 39 -7.1% 1.08x
SortStringsUnicode 3540 3305 -6.6% 1.07x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
FlattenListLoop 4069 4431 +8.9% 0.92x (?)
Array2D 6912 7520 +8.8% 0.92x
Improvement
ClosedRangeOverlapsClosedRange 126 89 -29.4% 1.42x
RangeOverlapsClosedRange 136 108 -20.6% 1.26x
LessSubstringSubstringGenericComparable 44 39 -11.4% 1.13x
EqualSubstringSubstring 43 39 -9.3% 1.10x (?)
EqualStringSubstring 43 39 -9.3% 1.10x (?)
LessSubstringSubstring 44 40 -9.1% 1.10x
EqualSubstringSubstringGenericEquatable 44 40 -9.1% 1.10x
EqualSubstringString 44 40 -9.1% 1.10x (?)
RangeOverlapsRange 150 139 -7.3% 1.08x
StringComparison_fastPrenormal 1050 980 -6.7% 1.07x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StringToDataMedium 13400 14900 +11.2% 0.90x (?)
StrComplexWalk 8330 9110 +9.4% 0.91x (?)
StringToDataLargeUnicode 13950 15100 +8.2% 0.92x (?)
Improvement
ArrayOfPOD 856 775 -9.5% 1.10x (?)
DictionaryKeysContainsCocoa 125 114 -8.8% 1.10x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please benchmark

@PatrickPijnappel PatrickPijnappel force-pushed the range-overlaps-performance-impl branch from 3fa7e82 to 1ce3792 Compare March 18, 2019 22:12
@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
FlattenListLoop 3976 4335 +9.0% 0.92x
StringUTF16Builder 350 380 +8.6% 0.92x (?)
MapReduceAnyCollection 369 398 +7.9% 0.93x
FlattenListFlatMap 6066 6534 +7.7% 0.93x (?)
MapReduce 369 397 +7.6% 0.93x (?)
Improvement
ClosedRangeOverlapsClosedRange 126 90 -28.6% 1.40x
RangeOverlapsClosedRange 127 109 -14.2% 1.17x
LessSubstringSubstring 44 39 -11.4% 1.13x
RangeOverlapsRange 154 137 -11.0% 1.12x
EqualStringSubstring 43 39 -9.3% 1.10x (?)
EqualSubstringString 43 39 -9.3% 1.10x (?)
LessSubstringSubstringGenericComparable 43 39 -9.3% 1.10x
EqualSubstringSubstring 42 39 -7.1% 1.08x (?)
EqualSubstringSubstringGenericEquatable 42 39 -7.1% 1.08x (?)
SortStringsUnicode 3540 3300 -6.8% 1.07x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
FlattenListLoop 4069 4431 +8.9% 0.92x (?)
Array2D 6912 7520 +8.8% 0.92x
Improvement
ClosedRangeOverlapsClosedRange 126 89 -29.4% 1.42x
RangeOverlapsClosedRange 136 108 -20.6% 1.26x
LessSubstringSubstringGenericComparable 44 39 -11.4% 1.13x (?)
EqualSubstringSubstring 43 39 -9.3% 1.10x
EqualStringSubstring 43 39 -9.3% 1.10x (?)
LessSubstringSubstring 44 40 -9.1% 1.10x
EqualSubstringSubstringGenericEquatable 44 40 -9.1% 1.10x
EqualSubstringString 44 40 -9.1% 1.10x
RangeOverlapsRange 150 138 -8.0% 1.09x
StringComparison_fastPrenormal 1050 980 -6.7% 1.07x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StrComplexWalk 8340 9070 +8.8% 0.92x (?)
Improvement
ArrayOfPOD 855 775 -9.4% 1.10x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@PatrickPijnappel
Copy link
Contributor Author

Just checked these out, didn't have time to investigate before. I think these regression must be noise, even though they are consistent – they don't use overlaps and overlaps is not used within the standard library itself so they can't use it indirectly either.

The benchmarks that actually use overlaps are consistently better.

@airspeedswift Thoughts?

@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented May 3, 2019

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DistinctClassFieldAccesses 179 204 +14.0% 0.88x
DataCreateEmptyArray 1650 1800 +9.1% 0.92x (?)
Improvement
ClosedRangeOverlapsClosedRange 56 41 -26.8% 1.37x
EqualStringSubstring 28 21 -25.0% 1.33x
RangeOverlapsClosedRange 64 49 -23.4% 1.31x (?)
EqualSubstringSubstring 27 21 -22.2% 1.29x
LessSubstringSubstring 27 21 -22.2% 1.29x
EqualSubstringSubstringGenericEquatable 27 21 -22.2% 1.29x
EqualSubstringString 27 21 -22.2% 1.29x
LessSubstringSubstringGenericComparable 27 21 -22.2% 1.29x
RangeOverlapsRange 74 65 -12.2% 1.14x (?)
StringComparison_longSharedPrefix 355 318 -10.4% 1.12x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
DistinctClassFieldAccesses 180 203 +12.8% 0.89x (?)
ArrayInClass 845 935 +10.7% 0.90x (?)
ArrayLiteral2 113 124 +9.7% 0.91x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed 76 82 +7.9% 0.93x (?)
Improvement
LessSubstringSubstring 28 21 -25.0% 1.33x
LessSubstringSubstringGenericComparable 28 21 -25.0% 1.33x
RangeOverlapsClosedRange 68 52 -23.5% 1.31x
EqualSubstringSubstring 27 21 -22.2% 1.29x
EqualStringSubstring 27 21 -22.2% 1.29x
EqualSubstringSubstringGenericEquatable 27 21 -22.2% 1.29x
EqualSubstringString 27 21 -22.2% 1.29x (?)
ClosedRangeOverlapsClosedRange 55 48 -12.7% 1.15x
StringComparison_longSharedPrefix 356 320 -10.1% 1.11x (?)
DataToStringEmpty 500 450 -10.0% 1.11x (?)
RangeOverlapsRange 72 65 -9.7% 1.11x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ExclusivityGlobal 104 113 +8.7% 0.92x (?)
UTF8Decode_InitFromBytes 127 137 +7.9% 0.93x (?)
Improvement
EqualSubstringSubstringGenericEquatable 30 24 -20.0% 1.25x (?)
EqualSubstringSubstring 31 25 -19.4% 1.24x (?)
LessSubstringSubstring 31 25 -19.4% 1.24x (?)
EqualSubstringString 32 26 -18.7% 1.23x (?)
LessSubstringSubstringGenericComparable 29 24 -17.2% 1.21x (?)
EqualStringSubstring 31 26 -16.1% 1.19x (?)
ArrayOfPOD 542 469 -13.5% 1.16x (?)
ArrayOfGenericPOD2 617 575 -6.8% 1.07x (?)
StringToDataSmall 750 700 -6.7% 1.07x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please smoke test

@PatrickPijnappel PatrickPijnappel force-pushed the range-overlaps-performance-impl branch from 1ce3792 to 4e1afce Compare May 3, 2019 08:47
@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please smoke test

1 similar comment
@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please smoke test

@PatrickPijnappel PatrickPijnappel force-pushed the range-overlaps-performance-impl branch from 4e1afce to 87f02b1 Compare May 3, 2019 23:15
@PatrickPijnappel
Copy link
Contributor Author

@swift-ci Please smoke test

@PatrickPijnappel
Copy link
Contributor Author

@airspeedswift Should be good to merge!

@PatrickPijnappel PatrickPijnappel merged commit 70be266 into swiftlang:master May 13, 2019
@PatrickPijnappel PatrickPijnappel deleted the range-overlaps-performance-impl branch May 13, 2019 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants