Skip to content

[stdlib] Make some trivial Substring methods inlinable #22616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2020

Conversation

karwa
Copy link
Contributor

@karwa karwa commented Feb 14, 2019

These look like simple oversights.

CC @milseman

@karwa karwa force-pushed the substring_inline branch 2 times, most recently from 451edf6 to 2c00a23 Compare February 14, 2019 15:02
@slavapestov
Copy link
Contributor

This limits future evolution of the standard library. Is there a measurable performance benefit from these changes that justifies any loss in flexibility?

@karwa
Copy link
Contributor Author

karwa commented Feb 14, 2019

@slavapestov

  • Substring.(subscript|replaceSubrange): trivial forwarding to Slice, like everything else. No way the stdlib could evolve differently given everything else that is inlinable. Almost everything else in Substring is already inlinable.
  • Substring.UTF8View: the equivalent methods in .UTF16View are inlinable.

@milseman should be able to say more about if these were intentionally withheld from inlining (if so, maybe it's worth a comment).

I don't know if we actually benchmark these code-paths. I think it's likely going to be worth it to inline the subscript, at least, and replaceSubrange can now be specialised.

@milseman
Copy link
Member

@karwa the reasoning is usually the other way around: we prefer to not mark something inlinable unless there is sufficient justification. Marking a method inlinable can significantly harm performance, as it will block inlining of non-inlinable code that the method calls.

String.replaceSubrange is not inlinable, so making Substring.replaceSubrange is more likely to hurt performance than not. If Substring.replaceSubrange is not inlinable, then String.replaceSubrange can be inlined fully into it.

Substring.subscript could be a candidate, as well as the Substring.UTF8View methods. Do you have a benchmark or scenario demonstrating the performance improvement?

@karwa
Copy link
Contributor Author

karwa commented Feb 16, 2019

@milseman Nope, I just noticed these while adding withContiguousStorageIfAvailable forwarding (String.UTF8View implements it, Substring.UTF8View does not). Could you request some from CI to start with? It can take a solid half-day to rebuild a branch and run benchmarks locally for me.

I didn't notice that String.replaceSubrange wasn't inlinable. Reverted that change.

@milseman
Copy link
Member

@swift-ci please benchmark

@karwa
Copy link
Contributor Author

karwa commented Feb 19, 2019

@milseman Hmm... that doesn't seem to have worked. Mind trying again?
EDIT: Oh, okay.

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
Data.hash.Empty 65 74 +13.8% 0.88x (?)
LessSubstringSubstring 39 44 +12.8% 0.89x
EqualStringSubstring 39 44 +12.8% 0.89x (?)
EqualSubstringSubstringGenericEquatable 39 44 +12.8% 0.89x
EqualSubstringString 39 44 +12.8% 0.89x
LessSubstringSubstringGenericComparable 39 44 +12.8% 0.89x (?)
SortStringsUnicode 3255 3640 +11.8% 0.89x
EqualSubstringSubstring 40 44 +10.0% 0.91x
DataSetCountSmall 131 142 +8.4% 0.92x (?)
StringComparison_latin1 720 776 +7.8% 0.93x (?)
Improvement
StringUTF16Builder 390 350 -10.3% 1.11x
StringBuilder 369 334 -9.5% 1.10x (?)
FlattenListLoop 4331 3970 -8.3% 1.09x (?)
Array2D 7520 6912 -8.1% 1.09x (?)
StringAdder 467 430 -7.9% 1.09x
Breadcrumbs.UTF16ToIdx.longMixed 398 367 -7.8% 1.08x (?)
MapReduce 397 368 -7.3% 1.08x (?)
ObjectiveCBridgeStubNSDateRefAccess 400 371 -7.2% 1.08x (?)
MapReduceAnyCollection 398 370 -7.0% 1.08x
FlattenListFlatMap 6871 6408 -6.7% 1.07x (?)

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
StringMatch.o 4606 5358 +16.3% 0.86x
StringEdits.o 11983 12735 +6.3% 0.94x
RomanNumbers.o 8507 8795 +3.4% 0.97x
CSVParsing.o 31265 32017 +2.4% 0.98x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 36800 51200 +39.1% 0.72x (?)
Data.hash.Empty 68 77 +13.2% 0.88x
EqualSubstringSubstring 39 44 +12.8% 0.89x
EqualStringSubstring 39 44 +12.8% 0.89x
EqualSubstringString 39 44 +12.8% 0.89x (?)
LessSubstringSubstringGenericComparable 39 44 +12.8% 0.89x
SortStringsUnicode 3265 3600 +10.3% 0.91x
LessSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x
StringComparison_latin1 722 786 +8.9% 0.92x (?)
StringMatch 8500 9200 +8.2% 0.92x
DataSetCountSmall 134 145 +8.2% 0.92x (?)
Improvement
StringBuilder 361 327 -9.4% 1.10x (?)
RandomIntegersLCG 828 757 -8.6% 1.09x (?)
SortLettersInPlace 561 514 -8.4% 1.09x (?)
StringBuilderSmallReservingCapacity 385 353 -8.3% 1.09x (?)

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
StringMatch.o 4513 5353 +18.6% 0.84x
StringEdits.o 11958 12694 +6.2% 0.94x
RomanNumbers.o 8842 9282 +5.0% 0.95x
CSVParsing.o 31009 31833 +2.7% 0.97x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
Data.hash.Empty 148 165 +11.5% 0.90x (?)
ArrayOfPOD 776 855 +10.2% 0.91x (?)
StrComplexWalk 6700 7330 +9.4% 0.91x (?)
Improvement
Dictionary3 707 613 -13.3% 1.15x
Dictionary4 1056 970 -8.1% 1.09x (?)

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Regression
libswiftStdlibUnittest.dylib 360448 364544 +1.1% 0.99x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@milseman
Copy link
Member

(Just to double check the trend)

@swift-ci please benchmark

@CodaFi
Copy link
Contributor

CodaFi commented Nov 22, 2019

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
ArrayAppendGenericStructs 1340 2140 +59.7% 0.63x (?)
Data.hash.Empty 74 80 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
CSVParsing.UTF8 61 44 -27.9% 1.39x
LazilyFilteredArrayContains 34800 29500 -15.2% 1.18x
RandomShuffleLCG2 768 704 -8.3% 1.09x
RemoveWhereMoveInts 36 33 -8.3% 1.09x (?)
FlattenListLoop 4453 4092 -8.1% 1.09x (?)
RemoveWhereSwapInts 65 60 -7.7% 1.08x
ObjectiveCBridgeStubFromNSDateRef 4200 3890 -7.4% 1.08x (?)
MapReduce 400 371 -7.2% 1.08x (?)
Set.isSubset.Seq.Int.Empty 210 195 -7.1% 1.08x (?)
MapReduceAnyCollection 397 369 -7.1% 1.08x (?)
Set.isStrictSuperset.Seq.Empty.Int 290 270 -6.9% 1.07x (?)
Set.isSuperset.Seq.Empty.Int 88 82 -6.8% 1.07x (?)
Set.isSubset.Int.Empty 88 82 -6.8% 1.07x (?)
Set.isStrictSubset.Int.Empty 88 82 -6.8% 1.07x (?)
Set.isStrictSubset.Seq.Int.Empty 208 194 -6.7% 1.07x (?)
ArrayLiteral2 137 128 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StringMatch.o 4752 5248 +10.4% 0.91x
FindStringNaive.o 10979 11475 +4.5% 0.96x
StringEdits.o 12143 12639 +4.1% 0.96x

Performance: -Osize

Regression OLD NEW DELTA RATIO
FindString.Loop1.Substring 817 916 +12.1% 0.89x (?)
StringMatch 8200 8900 +8.5% 0.92x
Data.hash.Empty 74 80 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
CSVParsing.UTF8 61 45 -26.2% 1.36x
Set.isSubset.Int.Empty 96 88 -8.3% 1.09x
ObjectiveCBridgeStubNSDateRefAccess 400 371 -7.2% 1.08x (?)
Set.isDisjoint.Seq.Int.Empty 90 84 -6.7% 1.07x (?)
Set.isSuperset.Seq.Empty.Int 90 84 -6.7% 1.07x
Set.isStrictSubset.Int.Empty 90 84 -6.7% 1.07x

Code size: -Osize

Regression OLD NEW DELTA RATIO
StringMatch.o 4572 5055 +10.6% 0.90x
FindStringNaive.o 10491 11075 +5.6% 0.95x
StringEdits.o 12032 12528 +4.1% 0.96x
RomanNumbers.o 8149 8333 +2.3% 0.98x

Performance: -Onone

Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDateRef 4850 4370 -9.9% 1.11x (?)
DataCreateSmallArray 94200 87650 -7.0% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@milseman
Copy link
Member

milseman commented Dec 3, 2019

Some of those seem spurious, but the CSVParser.UTF8 one doesn't.

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Dec 3, 2019

Performance: -O

Regression OLD NEW DELTA RATIO
NSStringConversion.Long 514 584 +13.6% 0.88x (?)
IterateData 889 1004 +12.9% 0.89x (?)
NSStringConversion.Medium 246 265 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
CSVParsing.UTF8 45 31 -31.1% 1.45x
EqualSubstringSubstring 28 22 -21.4% 1.27x
LessSubstringSubstring 28 22 -21.4% 1.27x
EqualStringSubstring 28 22 -21.4% 1.27x
EqualSubstringSubstringGenericEquatable 28 22 -21.4% 1.27x
EqualSubstringString 28 22 -21.4% 1.27x
LessSubstringSubstringGenericComparable 28 22 -21.4% 1.27x
StringComparison_longSharedPrefix 364 327 -10.2% 1.11x (?)
StringHasPrefixAscii 1370 1280 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StringMatch.o 4752 5248 +10.4% 0.91x
FindStringNaive.o 10979 11475 +4.5% 0.96x
StringEdits.o 12143 12639 +4.1% 0.96x

Performance: -Osize

Regression OLD NEW DELTA RATIO
NSStringConversion.Long 488 560 +14.8% 0.87x (?)
Set.subtracting.Empty.Box 7 8 +14.3% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
CSVParsing.UTF8 45 31 -31.1% 1.45x
LessSubstringSubstring 27 21 -22.2% 1.29x
EqualSubstringString 27 21 -22.2% 1.29x
LessSubstringSubstringGenericComparable 27 21 -22.2% 1.29x
EqualSubstringSubstring 28 22 -21.4% 1.27x (?)
EqualStringSubstring 28 22 -21.4% 1.27x
EqualSubstringSubstringGenericEquatable 28 22 -21.4% 1.27x (?)
FlattenListLoop 3217 2734 -15.0% 1.18x (?)
ObjectiveCBridgeStubNSDateRefAccess 196 174 -11.2% 1.13x (?)
StringComparison_longSharedPrefix 357 320 -10.4% 1.12x (?)
OpenClose 63 58 -7.9% 1.09x (?)
SortIntPyramid 475 440 -7.4% 1.08x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
StringMatch.o 4572 5055 +10.6% 0.90x
FindStringNaive.o 10491 11075 +5.6% 0.95x
StringEdits.o 12032 12528 +4.1% 0.96x
RomanNumbers.o 8149 8333 +2.3% 0.98x

Performance: -Onone

Improvement OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 32 25 -21.9% 1.28x (?)
LessSubstringSubstringGenericComparable 32 25 -21.9% 1.28x
EqualSubstringSubstring 32 26 -18.7% 1.23x
LessSubstringSubstring 32 26 -18.7% 1.23x
EqualStringSubstring 32 27 -15.6% 1.19x
EqualSubstringString 32 27 -15.6% 1.19x (?)
ArrayOfGenericPOD2 793 712 -10.2% 1.11x

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@karwa
Copy link
Contributor Author

karwa commented Dec 5, 2019

@milseman what do you think?

@milseman
Copy link
Member

milseman commented Dec 9, 2019

I'm in favor

@swift-ci please test

@milseman
Copy link
Member

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - ab36b51

@karwa
Copy link
Contributor Author

karwa commented Feb 11, 2020

@milseman It seems the test was manually aborted:

00:21:39 ninja: build stopped: interrupted by user.
00:21:39 Build was aborted
00:21:39 Aborted by [email protected]

@CodaFi
Copy link
Contributor

CodaFi commented Feb 13, 2020

@swift-ci test

@milseman milseman merged commit 2897daa into swiftlang:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants