Skip to content

[String] Custom Iterators for String Views #20438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 12, 2018

Conversation

milseman
Copy link
Member

@milseman milseman commented Nov 8, 2018

Provide a custom iterator rather than relying a the IndexingIterator,
as an indexing model is less efficient for stateful processing of
strings. Provides around a 30% speedup.

@milseman
Copy link
Member Author

milseman commented Nov 8, 2018

@swift-ci please benchmark

@milseman
Copy link
Member Author

milseman commented Nov 8, 2018

@swift-ci please smoke test

@milseman
Copy link
Member Author

milseman commented Nov 8, 2018

@swift-ci please benchmark

@milseman
Copy link
Member Author

milseman commented Nov 8, 2018

@swift-ci please smoke test

@milseman
Copy link
Member Author

milseman commented Nov 8, 2018

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Nov 8, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
StringWalk 3805 1397 -63.3% 2.72x
StrComplexWalk 5174 2314 -55.3% 2.24x
CharIteration_utf16_unicodeScalars 6560 3991 -39.2% 1.64x
CharIteration_tweet_unicodeScalars 7485 5045 -32.6% 1.48x
CharIteration_ascii_unicodeScalars 3785 2583 -31.8% 1.47x
CharIteration_korean_unicodeScalars 5174 3734 -27.8% 1.39x
CharIteration_russian_unicodeScalars 4499 3262 -27.5% 1.38x
CharIteration_punctuated_unicodeScalars 928 694 -25.2% 1.34x
CharIteration_chinese_unicodeScalars 3997 2990 -25.2% 1.34x
CharIteration_japanese_unicodeScalars 6790 5119 -24.6% 1.33x
CharIteration_punctuatedJapanese_unicodeScalars 1044 800 -23.4% 1.30x
StringEqualPointerComparison 657 600 -8.7% 1.09x
CharIteration_ascii_unicodeScalars_Backwards 5085 4714 -7.3% 1.08x
CharIteration_tweet_unicodeScalars_Backwards 10095 9369 -7.2% 1.08x
CharIndexing_tweet_unicodeScalars_Backwards 10237 9505 -7.2% 1.08x
CharIndexing_ascii_unicodeScalars_Backwards 5218 4861 -6.8% 1.07x

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
StrComplexWalk.o 3361 2753 -18.1% 1.22x
CharacterProperties.o 26301 22021 -16.3% 1.19x
StringWalk.o 42919 42007 -2.1% 1.02x
Hash.o 29570 29058 -1.7% 1.02x
CSVParsing.o 32721 32273 -1.4% 1.01x
WordCount.o 45460 44996 -1.0% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
StrComplexWalk 5900 6422 +8.8% 0.92x
Improvement
StringWalk 8131 3462 -57.4% 2.35x
CharIteration_ascii_unicodeScalars 4661 2535 -45.6% 1.84x
CharIteration_tweet_unicodeScalars 8945 4949 -44.7% 1.81x
CharIteration_utf16_unicodeScalars 6742 3968 -41.1% 1.70x
CharIteration_punctuatedJapanese_unicodeScalars 1168 697 -40.3% 1.68x
CharIteration_russian_unicodeScalars 5270 3211 -39.1% 1.64x
CharIteration_japanese_unicodeScalars 7688 4854 -36.9% 1.58x
CharIteration_korean_unicodeScalars 5977 3794 -36.5% 1.58x
CharIteration_chinese_unicodeScalars 4562 2920 -36.0% 1.56x
CharIteration_punctuated_unicodeScalars 1097 714 -34.9% 1.54x
CSVParsingAlt2 2293 1713 -25.3% 1.34x
HashTest 2419 1948 -19.5% 1.24x
StringEdits 98907 88202 -10.8% 1.12x
CharIteration_ascii_unicodeScalars_Backwards 5424 4905 -9.6% 1.11x
StringEqualPointerComparison 628 571 -9.1% 1.10x
CharIteration_russian_unicodeScalars_Backwards 5824 5371 -7.8% 1.08x
Combos 619 575 -7.1% 1.08x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
StrComplexWalk.o 3282 2718 -17.2% 1.21x
StringWalk.o 37400 36424 -2.6% 1.03x
Hash.o 22447 21911 -2.4% 1.02x
CSVParsing.o 33089 32641 -1.4% 1.01x
WordCount.o 41708 41292 -1.0% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DictionaryOfAnyHashableStrings_insert 7358 9596 +30.4% 0.77x
ArrayOfPOD 781 859 +10.0% 0.91x (?)
Improvement
StrComplexWalk 46794 8755 -81.3% 5.34x
StringWalk 46496 14491 -68.8% 3.21x
CharIteration_utf16_unicodeScalars 202303 116505 -42.4% 1.74x
CharIteration_japanese_unicodeScalars 288605 170986 -40.8% 1.69x
CharIteration_punctuatedJapanese_unicodeScalars 41411 25145 -39.3% 1.65x
CharIteration_tweet_unicodeScalars 454547 277199 -39.0% 1.64x
CharIteration_chinese_unicodeScalars 176922 108191 -38.8% 1.64x
CharIteration_ascii_unicodeScalars 230115 140877 -38.8% 1.63x
CharIteration_korean_unicodeScalars 227345 139197 -38.8% 1.63x
CharIteration_punctuated_unicodeScalars 48939 31775 -35.1% 1.54x
CharIteration_russian_unicodeScalars 181574 118288 -34.9% 1.54x
CSVParsingAlt2 4014 2669 -33.5% 1.50x
ArrayAppendLazyMap 225316 162249 -28.0% 1.39x
DropLastArrayLazy 12629 10224 -19.0% 1.24x
PrefixArrayLazy 37399 30601 -18.2% 1.22x
DropFirstArrayLazy 37341 30568 -18.1% 1.22x
SuffixArrayLazy 12449 10249 -17.7% 1.21x
FatCompactMap 337483 279160 -17.3% 1.21x
HashTest 11930 10151 -14.9% 1.18x
CharIndexing_chinese_unicodeScalars 269307 246717 -8.4% 1.09x
CharIndexing_japanese_unicodeScalars 426563 391057 -8.3% 1.09x
CharIndexing_punctuated_unicodeScalars 78224 71969 -8.0% 1.09x
CharIndexing_russian_unicodeScalars 294021 271007 -7.8% 1.08x
CharIndexing_korean_unicodeScalars 343917 317381 -7.7% 1.08x
CharIndexing_punctuatedJapanese_unicodeScalars 61719 56989 -7.7% 1.08x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@milseman milseman changed the title [String] Custom iterator for UnicodeScalarView [String] Custom iterator String Views Nov 9, 2018
@milseman milseman changed the title [String] Custom iterator String Views [String] Custom Iterators for String Views Nov 9, 2018
Provide a custom iterator rather than relying a the IndexingIterator,
as an indexing model is less efficient for stateful processing of
strings. Provides around a 30% speedup.
Gives us modest wins on complex grapheme strings, but up to 40% on
heavy-ASCII strings.
@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please test

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Nov 9, 2018

Build failed
Swift Test OS X Platform
Git Sha - 2c48ac2646ddb46512581c989287a9af4b2f874d

@swift-ci
Copy link
Contributor

swift-ci commented Nov 9, 2018

Build failed
Swift Test Linux Platform
Git Sha - 2c48ac2646ddb46512581c989287a9af4b2f874d

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test linux platform

@swift-ci
Copy link
Contributor

swift-ci commented Nov 9, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
NopDeinit 51933 67722 +30.4% 0.77x
Improvement
StringWalk 3805 1397 -63.3% 2.72x
StrComplexWalk 5211 2361 -54.7% 2.21x
StringHasPrefixUnicode 181333 108627 -40.1% 1.67x
CharIteration_utf16_unicodeScalars 6451 3976 -38.4% 1.62x
CharIteration_tweet_unicodeScalars 7485 5044 -32.6% 1.48x
CharIteration_ascii_unicodeScalars 3791 2583 -31.9% 1.47x
CountAlgoString 3381 2317 -31.5% 1.46x
CharIteration_korean_unicodeScalars 5193 3729 -28.2% 1.39x
CharIteration_punctuatedJapanese_unicodeScalars 1047 771 -26.4% 1.36x
CharIteration_punctuated_unicodeScalars 917 688 -25.0% 1.33x
CharIteration_chinese_unicodeScalars 3981 2987 -25.0% 1.33x
CharIteration_japanese_unicodeScalars 6787 5118 -24.6% 1.33x
CharacterPropertiesPrecomputed 1146 871 -24.0% 1.32x
CharIteration_russian_unicodeScalars 4499 3498 -22.2% 1.29x
RemoveWhereFilterString 506 409 -19.2% 1.24x
RomanNumbers 94392 83222 -11.8% 1.13x

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
CharacterProperties.o 26301 19045 -27.6% 1.38x
Combos.o 9901 7361 -25.7% 1.35x
StrComplexWalk.o 3361 2753 -18.1% 1.22x
RomanNumbers.o 5935 5199 -12.4% 1.14x
CountAlgo.o 14884 13391 -10.0% 1.11x
StringEdits.o 14327 12951 -9.6% 1.11x
StringRemoveDupes.o 8255 7600 -7.9% 1.09x
ReduceInto.o 19258 17897 -7.1% 1.08x
StringWalk.o 42919 40938 -4.6% 1.05x
RemoveWhere.o 27351 26407 -3.5% 1.04x
WordCount.o 45460 44276 -2.6% 1.03x
Hash.o 29570 29058 -1.7% 1.02x
DriverUtils.o 146703 144463 -1.5% 1.02x
CSVParsing.o 32721 32257 -1.4% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
StrComplexWalk 5900 6877 +16.6% 0.86x
StringEqualPointerComparison 571 628 +10.0% 0.91x
CStringLongAscii 466 509 +9.2% 0.92x
Improvement
StringWalk 7945 3648 -54.1% 2.18x
CharIteration_tweet_unicodeScalars 8945 4997 -44.1% 1.79x
CharIteration_ascii_unicodeScalars 4525 2555 -43.5% 1.77x
CharIteration_utf16_unicodeScalars 6767 3978 -41.2% 1.70x
CharIteration_punctuatedJapanese_unicodeScalars 1168 697 -40.3% 1.68x
StringHasPrefixUnicode 181456 108690 -40.1% 1.67x
CharIteration_russian_unicodeScalars 5277 3211 -39.2% 1.64x
CharIteration_korean_unicodeScalars 5977 3733 -37.5% 1.60x
CharIteration_japanese_unicodeScalars 7685 4854 -36.8% 1.58x
CharIteration_chinese_unicodeScalars 4562 2920 -36.0% 1.56x
CharIteration_punctuated_unicodeScalars 1097 714 -34.9% 1.54x
CountAlgoString 3259 2268 -30.4% 1.44x
RemoveWhereFilterString 508 402 -20.9% 1.26x
HashTest 2409 1931 -19.8% 1.25x
CSVParsingAlt2 2120 1706 -19.5% 1.24x
RomanNumbers 93117 83681 -10.1% 1.11x
CharIteration_russian_unicodeScalars_Backwards 5817 5389 -7.4% 1.08x
CharIteration_ascii_unicodeScalars_Backwards 5268 4913 -6.7% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
Combos.o 10245 7729 -24.6% 1.33x
StrComplexWalk.o 3282 2718 -17.2% 1.21x
CharacterProperties.o 21637 19229 -11.1% 1.13x
CountAlgo.o 14668 13072 -10.9% 1.12x
RomanNumbers.o 6662 6046 -9.2% 1.10x
StringEdits.o 13342 12126 -9.1% 1.10x
ReduceInto.o 14360 13250 -7.7% 1.08x
StringRemoveDupes.o 8201 7609 -7.2% 1.08x
StringWalk.o 37400 35418 -5.3% 1.06x
RemoveWhere.o 25137 24238 -3.6% 1.04x
WordCount.o 41708 40596 -2.7% 1.03x
Hash.o 22447 21911 -2.4% 1.02x
DriverUtils.o 134471 132439 -1.5% 1.02x
CSVParsing.o 33089 32625 -1.4% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
SetSubtractingInt100 581 867 +49.2% 0.67x
SetSubtractingInt50 509 648 +27.3% 0.79x
SetSubtractingInt25 463 566 +22.2% 0.82x
SetSubtractingInt0 369 445 +20.6% 0.83x
DictionaryGroup 4554 4907 +7.8% 0.93x (?)
Improvement
StrComplexWalk 81510 9253 -88.6% 8.81x
StringWalk 46854 14784 -68.4% 3.17x
RemoveWhereFilterString 1272 675 -46.9% 1.88x
CSVParsingAlt2 4198 2464 -41.3% 1.70x
CharIteration_punctuatedJapanese_unicodeScalars 55326 33760 -39.0% 1.64x
StringHasPrefixUnicode 186065 114035 -38.7% 1.63x
CharIteration_japanese_unicodeScalars 376640 232205 -38.3% 1.62x
CharIteration_ascii_unicodeScalars 311884 192338 -38.3% 1.62x
CharIteration_korean_unicodeScalars 304635 188069 -38.3% 1.62x
CharIteration_chinese_unicodeScalars 237645 146789 -38.2% 1.62x
CharIteration_tweet_unicodeScalars 616328 382375 -38.0% 1.61x
CharIteration_russian_unicodeScalars 257353 160875 -37.5% 1.60x
CharIteration_punctuated_unicodeScalars 64442 43022 -33.2% 1.50x
StringRemoveDupes 1135 761 -33.0% 1.49x
CharIteration_utf16_unicodeScalars 252845 177729 -29.7% 1.42x
CountAlgoString 5354 4067 -24.0% 1.32x
FrequenciesUsingReduceInto 4895 4075 -16.8% 1.20x
HashTest 12080 10095 -16.4% 1.20x
StringEdits 165909 139038 -16.2% 1.19x
DropLastArrayLazy 11264 10221 -9.3% 1.10x
SuffixArrayLazy 11254 10216 -9.2% 1.10x
ArrayOfPOD 858 780 -9.1% 1.10x (?)
LessSubstringSubstringGenericComparable 25 23 -8.0% 1.09x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test and merge

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

(full testing passed both platforms prior to the merge conflict on the ABI checker file)

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test and merge

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

(again, passed testing but there was a merge conflict on abi file)

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test and merge

1 similar comment
@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test and merge

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test

@milseman
Copy link
Member Author

milseman commented Nov 9, 2018

@swift-ci please smoke test os x platform

@milseman
Copy link
Member Author

@swift-ci please smoke test linux platform

Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@milseman milseman merged commit 9315b3a into swiftlang:master Nov 12, 2018
@milseman milseman deleted the uniterator branch November 12, 2018 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants