Skip to content

Implementation for SE-0228: Fix ExpressibleByStringInterpolation #19963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 46 commits into from

Conversation

beccadax
Copy link
Contributor

@beccadax beccadax commented Oct 20, 2018

Just a rebase of the previous PR, #18590.

This PR implements a new string interpolation API with a different, finalized ABI from what we're currently shipping. We'd like to land it before the ABI freezes; otherwise we'll have to also support the existing initializers forever.

Remaining issues to resolve:

  • ~1% compiler performance regression (down from about 1.5% before). Still looking into that final percentage point.
  • StringWordBuilderReservingCapacity showing ~10% slowdown. Need to investigate.
  • A few other slowdowns in random benchmarks with no apparent connection to string interpolation. Need to see how much of that is noise and how much is real.
  • I need to do one last sweep through it for little cleanups I might have missed.

Issues we'll wait to address:

  • Double and Float80 are faster to interpolate than before, but Float is slower. It's hard to test this in isolation while the rest of the branch is in flux, so we plan to land the change and then look at Float interpolation.
  • On Linux only, the optimizer can't completely optimize away a string interpolation whose result is not used. We were already skipping this test on 32-bit architectures because it was broken there. Filed as SR-9008 so we don't lose track of it.

Implements SE-0228. Resolves SR-1260, SR-2303, and SR-3969. Resolves rdar://43621912.

cc @milseman @ravikandhadai

@beccadax
Copy link
Contributor Author

@swift-ci please test

@beccadax
Copy link
Contributor Author

@swift-ci please test compiler performance

@swift-ci
Copy link
Contributor

Build comment file:

Summary for master full

Unexpected test results, excluded stats for RxSwift, Alamofire, Wordy, ReactiveSwift

Regressions found (see below)

Debug-batch

debug-batch brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 12,640,921,351,020 12,641,723,574,681 802,223,661 0.01%
LLVM.NumLLVMBytesOutput 662,083,340 664,908,042 2,824,702 0.43%
time.swift-driver.wall 1425.6s 1436.2s 10.6s 0.74%

debug-batch detailed

Regressed (8)
name old new delta delta_pct
AST.NumSourceLinesPerSecond 1,030,294 1,041,031 10,737 1.04% ⛔
Driver.NumDriverPipePolls 243,822 251,876 8,054 3.3% ⛔
Driver.NumDriverPipeReads 256,987 265,129 8,142 3.17% ⛔
Sema.IsDynamicRequest 1,048,711 1,059,694 10,983 1.05% ⛔
Sema.IsObjCRequest 876,684 886,004 9,320 1.06% ⛔
Sema.NumConstraintsConsideredForEdgeContraction 19,046,552 30,570,720 11,524,168 60.51% ⛔
Sema.NumDeclsValidated 1,082,903 1,139,247 56,344 5.2% ⛔
Sema.SetterAccessLevelRequest 68,767 78,408 9,641 14.02% ⛔
Improved (13)
name old new delta delta_pct
IRModule.NumIRInsts 27,074,209 26,458,651 -615,558 -2.27% ✅
Sema.ExtendedNominalRequest 1,515,737 1,496,810 -18,927 -1.25% ✅
Sema.InheritedDeclsReferencedRequest 57,332,776 56,299,090 -1,033,686 -1.8% ✅
Sema.NamedLazyMemberLoadSuccessCount 12,667,636 10,718,389 -1,949,247 -15.39% ✅
Sema.NominalTypeLookupDirectCount 20,069,298 18,948,643 -1,120,655 -5.58% ✅
Sema.NumConstraintScopes 11,941,716 11,741,707 -200,009 -1.67% ✅
Sema.NumDeclsDeserialized 15,526,965 15,323,480 -203,485 -1.31% ✅
Sema.NumLazyGenericEnvironments 3,353,325 3,299,042 -54,283 -1.62% ✅
Sema.NumLazyGenericEnvironmentsLoaded 87,831 86,648 -1,183 -1.35% ✅
Sema.NumLeafScopes 8,887,552 8,076,125 -811,427 -9.13% ✅
Sema.NumTypesDeserialized 6,440,344 6,366,955 -73,389 -1.14% ✅
Sema.SelfBoundsFromWhereClauseRequest 31,574,686 29,330,767 -2,243,919 -7.11% ✅
Sema.SuperclassDeclRequest 47,777,023 47,079,359 -697,664 -1.46% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (74)
name old new delta delta_pct
AST.NumASTBytesAllocated 15,174,614,322 15,242,256,677 67,642,355 0.45%
AST.NumDecls 45,698 45,698 0 0.0%
AST.NumDependencies 99,631 99,632 1 0.0%
AST.NumImportedExternalDefinitions 722,995 720,065 -2,930 -0.41%
AST.NumInfixOperators 19,270 19,270 0 0.0%
AST.NumLinkLibraries 0 0 0 0.0%
AST.NumLoadedModules 120,032 120,032 0 0.0%
AST.NumLocalTypeDecls 79 79 0 0.0%
AST.NumObjCMethods 11,948 11,948 0 0.0%
AST.NumPostfixOperators 14 14 0 0.0%
AST.NumPrecedenceGroups 8,925 8,925 0 0.0%
AST.NumPrefixOperators 61 61 0 0.0%
AST.NumReferencedDynamicNames 38 38 0 0.0%
AST.NumReferencedMemberNames 2,473,362 2,451,711 -21,651 -0.88%
AST.NumReferencedTopLevelNames 147,846 148,146 300 0.2%
AST.NumSourceBuffers 139,775 139,762 -13 -0.01%
AST.NumSourceLines 1,508,176 1,508,176 0 0.0%
AST.NumTotalClangImportedEntities 2,534,085 2,532,304 -1,781 -0.07%
AST.NumUsedConformances 141,724 140,612 -1,112 -0.78%
Driver.ChildrenMaxRSS 49,801,834,496 50,130,507,776 328,673,280 0.66%
Driver.DriverDepCascadingDynamic 0 0 0 0.0%
Driver.DriverDepCascadingExternal 0 0 0 0.0%
Driver.DriverDepCascadingMember 0 0 0 0.0%
Driver.DriverDepCascadingNominal 0 0 0 0.0%
Driver.DriverDepCascadingTopLevel 0 0 0 0.0%
Driver.DriverDepDynamic 0 0 0 0.0%
Driver.DriverDepExternal 0 0 0 0.0%
Driver.DriverDepMember 0 0 0 0.0%
Driver.DriverDepNominal 0 0 0 0.0%
Driver.DriverDepTopLevel 0 0 0 0.0%
Driver.NumDriverJobsRun 9,620 9,620 0 0.0%
Driver.NumDriverJobsSkipped 0 0 0 0.0%
Driver.NumProcessFailures 0 0 0 0.0%
Frontend.MaxMallocUsage 198,157,522,144 198,100,226,504 -57,295,640 -0.03%
Frontend.NumInstructionsExecuted 12,640,921,351,020 12,641,723,574,681 802,223,661 0.01%
Frontend.NumProcessFailures 0 0 0 0.0%
IRModule.NumIRAliases 70,865 70,865 0 0.0%
IRModule.NumIRBasicBlocks 2,392,943 2,400,545 7,602 0.32%
IRModule.NumIRComdatSymbols 0 0 0 0.0%
IRModule.NumIRFunctions 1,238,912 1,243,693 4,781 0.39%
IRModule.NumIRGlobals 1,455,063 1,458,333 3,270 0.22%
IRModule.NumIRIFuncs 0 0 0 0.0%
IRModule.NumIRNamedMetaData 46,916 46,916 0 0.0%
IRModule.NumIRValueSymbols 2,388,557 2,396,590 8,033 0.34%
LLVM.NumLLVMBytesOutput 662,083,340 664,908,042 2,824,702 0.43%
Parse.NumFunctionsParsed 1,023,954 1,023,954 0 0.0%
Parse.NumIterableDeclContextParsed 361,425 361,425 0 0.0%
SILModule.NumSILGenDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILGenFunctions 1,179,679 1,183,105 3,426 0.29%
SILModule.NumSILGenGlobalVariables 24,089 24,089 0 0.0%
SILModule.NumSILGenVtables 4,500 4,500 0 0.0%
SILModule.NumSILGenWitnessTables 27,028 27,028 0 0.0%
SILModule.NumSILOptDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILOptFunctions 916,962 916,008 -954 -0.1%
SILModule.NumSILOptGlobalVariables 24,509 24,509 0 0.0%
SILModule.NumSILOptVtables 8,605 8,605 0 0.0%
SILModule.NumSILOptWitnessTables 54,529 54,550 21 0.04%
Sema.AccessLevelRequest 1,167,662 1,177,266 9,604 0.82%
Sema.DefaultAndMaxAccessLevelRequest 27,369 27,363 -6 -0.02%
Sema.EnumRawTypeRequest 8,565 8,565 0 0.0%
Sema.InheritedTypeRequest 338,038 337,718 -320 -0.09%
Sema.NamedLazyMemberLoadFailureCount 14,232 14,232 0 0.0%
Sema.NumConformancesDeserialized 1,694,357 1,684,348 -10,009 -0.59%
Sema.NumFunctionsTypechecked 672,598 670,475 -2,123 -0.32%
Sema.NumGenericSignatureBuilders 535,438 532,014 -3,424 -0.64%
Sema.NumLazyIterableDeclContexts 3,062,676 3,059,301 -3,375 -0.11%
Sema.NumTypesValidated 698,799 698,707 -92 -0.01%
Sema.NumUnloadedLazyIterableDeclContexts 2,437,132 2,437,970 838 0.03%
Sema.OverriddenDeclsRequest 921,881 930,528 8,647 0.94%
Sema.RequirementRequest 21,734 21,734 0 0.0%
Sema.SuperclassTypeRequest 16,038 16,038 0 0.0%
Sema.TypeDeclsFromWhereClauseRequest 12,947 12,946 -1 -0.01%
Sema.USRGenerationRequest 228,857 228,857 0 0.0%
Sema.UnderlyingTypeDeclsReferencedRequest 1,710,833 1,709,608 -1,225 -0.07%

Release

release brief

Regressed (1)
name old new delta delta_pct
time.swift-driver.wall 3058.6s 3091.9s 33.3s 1.09% ⛔
Improved (1)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 41,281,893,753,877 34,802,066,407,765 -6,479,827,346,112 -15.7% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (1)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 549,242,072 552,630,532 3,388,460 0.62%

release detailed

Regressed (2)
name old new delta delta_pct
IRModule.NumIRBasicBlocks 2,308,975 2,424,123 115,148 4.99% ⛔
Sema.NumDeclsValidated 558,735 613,398 54,663 9.78% ⛔
Improved (4)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 165,607 163,001 -2,606 -1.57% ✅
AST.NumTotalClangImportedEntities 552,778 545,887 -6,891 -1.25% ✅
Sema.NumConstraintScopes 10,919,367 10,786,844 -132,523 -1.21% ✅
Sema.NumLazyIterableDeclContexts 494,296 487,725 -6,571 -1.33% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (17)
name old new delta delta_pct
AST.NumLoadedModules 10,292 10,226 -66 -0.64%
AST.NumUsedConformances 145,741 144,477 -1,264 -0.87%
IRModule.NumIRFunctions 991,062 990,640 -422 -0.04%
IRModule.NumIRGlobals 1,097,600 1,092,929 -4,671 -0.43%
IRModule.NumIRInsts 19,305,606 19,261,314 -44,292 -0.23%
IRModule.NumIRValueSymbols 1,926,368 1,920,856 -5,512 -0.29%
LLVM.NumLLVMBytesOutput 549,242,072 552,630,532 3,388,460 0.62%
SILModule.NumSILGenFunctions 433,067 432,955 -112 -0.03%
SILModule.NumSILOptFunctions 618,660 621,313 2,653 0.43%
Sema.NumConformancesDeserialized 1,250,131 1,252,896 2,765 0.22%
Sema.NumDeclsDeserialized 3,794,782 3,764,277 -30,505 -0.8%
Sema.NumFunctionsTypechecked 336,401 334,487 -1,914 -0.57%
Sema.NumGenericSignatureBuilders 122,587 121,967 -620 -0.51%
Sema.NumLazyGenericEnvironments 800,778 792,880 -7,898 -0.99%
Sema.NumLazyGenericEnvironmentsLoaded 15,578 15,433 -145 -0.93%
Sema.NumTypesDeserialized 2,129,694 2,114,506 -15,188 -0.71%
Sema.NumTypesValidated 254,946 254,836 -110 -0.04%

@beccadax
Copy link
Contributor Author

@swift-ci please smoke benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
FloatingPointPrinting_Float_interpolated 37043 46170 +24.6% 0.80x
StringInterpolation 8666 10708 +23.6% 0.81x
IterateData 1623 1904 +17.3% 0.85x
RangeIterationSigned 171 200 +17.0% 0.86x
RandomDoubleLCG 913 1055 +15.6% 0.87x
ChainedFilterMap 1219 1404 +15.2% 0.87x (?)
Improvement
StringInterpolationManySmallSegments 17695 7846 -55.7% 2.26x
StringInterpolationSmall 4034 2142 -46.9% 1.88x
FloatingPointPrinting_Double_interpolated 60377 49248 -18.4% 1.23x
FloatingPointPrinting_Float80_interpolated 67680 56542 -16.5% 1.20x
ArrayAppendAsciiSubstring 29732 25304 -14.9% 1.17x
ArrayAppendStrings 8740 7943 -9.1% 1.10x
Array2D 7512 6909 -8.0% 1.09x
MapReduceAnyCollection 398 370 -7.0% 1.08x
MapReduce 427 398 -6.8% 1.07x
Added
CustomStringInterpolation 10568 10796 10646
CustomStringNoInterpolation 174 178 175

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
StringInterpolation.o 11386 16803 +47.6% 0.68x
DeadArray.o 1872 2706 +44.6% 0.69x
SequenceAlgos.o 23331 27827 +19.3% 0.84x
Exclusivity.o 4083 4659 +14.1% 0.88x
TwoSum.o 5960 6739 +13.1% 0.88x
DictionarySwap.o 27574 30790 +11.7% 0.90x
ByteSwap.o 1960 2179 +11.2% 0.90x
Fibonacci.o 1936 2150 +11.1% 0.90x
StringBuilder.o 11679 12951 +10.9% 0.90x
PopFront.o 5577 6179 +10.8% 0.90x
LinkedList.o 2263 2486 +9.9% 0.91x
DictionaryKeysContains.o 16211 17806 +9.8% 0.91x
DictionaryRemove.o 15166 16110 +6.2% 0.94x
DictTest4Legacy.o 26994 28370 +5.1% 0.95x
DropLast.o 25451 26747 +5.1% 0.95x
ErrorHandling.o 2758 2886 +4.6% 0.96x
PopFrontGeneric.o 5061 5285 +4.4% 0.96x
DictTest3.o 28392 29608 +4.3% 0.96x
DictTest4.o 26116 27220 +4.2% 0.96x
NopDeinit.o 5907 6140 +3.9% 0.96x
DictTest2.o 19736 20504 +3.9% 0.96x
HashQuadratic.o 5800 6019 +3.8% 0.96x
DriverUtils.o 168161 172849 +2.8% 0.97x
CountAlgo.o 20893 21468 +2.8% 0.97x
RomanNumbers.o 11205 11509 +2.7% 0.97x
RGBHistogram.o 24773 25349 +2.3% 0.98x
SevenBoom.o 1950 1992 +2.2% 0.98x
DictionaryGroup.o 17295 17647 +2.0% 0.98x
XorLoop.o 2280 2322 +1.8% 0.98x
main.o 46322 47154 +1.8% 0.98x
Memset.o 2392 2434 +1.8% 0.98x
StringEdits.o 14052 14292 +1.7% 0.98x
Suffix.o 26169 26601 +1.7% 0.98x
WordCount.o 65304 66296 +1.5% 0.99x
MonteCarloPi.o 1920 1946 +1.4% 0.99x
NSDictionaryCastToSwift.o 1984 2010 +1.3% 0.99x
PointerArithmetics.o 2086 2112 +1.2% 0.99x
Ackermann.o 2160 2186 +1.2% 0.99x
BitCount.o 2200 2226 +1.2% 0.99x
RandomValues.o 4135 4177 +1.0% 0.99x
Improvement
MapReduce.o 24653 21517 -12.7% 1.15x
StringTests.o 10695 9607 -10.2% 1.11x
ObjectAllocation.o 4635 4171 -10.0% 1.11x
StackPromo.o 2583 2343 -9.3% 1.10x
DropWhile.o 23724 21788 -8.2% 1.09x
ChainedFilterMap.o 3492 3230 -7.5% 1.08x
BinaryFloatingPointProperties.o 8039 7489 -6.8% 1.07x
DictionaryCopy.o 8929 8337 -6.6% 1.07x
ReduceInto.o 24735 23183 -6.3% 1.07x
DropFirst.o 25500 24028 -5.8% 1.06x
LazyFilter.o 9794 9234 -5.7% 1.06x
Hash.o 29075 27683 -4.8% 1.05x
SetTests.o 59689 57113 -4.3% 1.05x
MonteCarloE.o 3624 3490 -3.7% 1.04x
DictionaryBridgeToObjC.o 6959 6703 -3.7% 1.04x
DictionaryCompactMapValues.o 22510 21694 -3.6% 1.04x
PrefixWhile.o 24046 23342 -2.9% 1.03x
Prefix.o 24945 24241 -2.8% 1.03x
SortIntPyramids.o 9701 9429 -2.8% 1.03x
ObjectiveCBridgingStubs.o 9915 9637 -2.8% 1.03x
ObjectiveCBridging.o 45589 44325 -2.8% 1.03x
ArrayLiteral.o 3531 3461 -2.0% 1.02x
ArrayAppend.o 38982 38310 -1.7% 1.02x
Walsh.o 9432 9282 -1.6% 1.02x
Queue.o 14603 14411 -1.3% 1.01x
StrToInt.o 6659 6579 -1.2% 1.01x
StrComplexWalk.o 3456 3418 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
MapReduceLazyCollectionShort 62 85 +37.1% 0.73x
FloatingPointPrinting_Float_interpolated 36978 46166 +24.8% 0.80x
StringInterpolation 8940 11041 +23.5% 0.81x
RangeIterationSigned 171 200 +17.0% 0.86x
IterateData 1581 1779 +12.5% 0.89x
Improvement
StringInterpolationManySmallSegments 16073 7856 -51.1% 2.05x
StringInterpolationSmall 4000 2169 -45.8% 1.84x
ArrayAppendStrings 7858 7024 -10.6% 1.12x
PrefixWhileAnyCollectionLazy 176 159 -9.7% 1.11x
DropLastAnyCollection 65 59 -9.2% 1.10x
Array2D 7208 6611 -8.3% 1.09x
RandomDoubleLCG 967 895 -7.4% 1.08x
Added
CustomStringInterpolation 10727 10842 10766
CustomStringNoInterpolation 195 198 196

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
StringInterpolation.o 9163 14627 +59.6% 0.63x
DeadArray.o 1698 2494 +46.9% 0.68x
Fibonacci.o 1845 2218 +20.2% 0.83x
ByteSwap.o 1869 2242 +20.0% 0.83x
LinkedList.o 2124 2497 +17.6% 0.85x
Exclusivity.o 3815 4295 +12.6% 0.89x
StringBuilder.o 11210 12599 +12.4% 0.89x
TwoSum.o 5645 6302 +11.6% 0.90x
RomanNumbers.o 5806 6382 +9.9% 0.91x
PopFront.o 5214 5694 +9.2% 0.92x
DictionaryKeysContains.o 15035 16374 +8.9% 0.92x
PopFrontGeneric.o 4937 5369 +8.8% 0.92x
SequenceAlgos.o 26092 28195 +8.1% 0.93x
DictTest4Legacy.o 22649 24073 +6.3% 0.94x
DictTest4.o 21259 22571 +6.2% 0.94x
DictTest2.o 15986 16882 +5.6% 0.95x
DictTest3.o 22514 23586 +4.8% 0.95x
ErrorHandling.o 3062 3206 +4.7% 0.96x
DriverUtils.o 144185 149977 +4.0% 0.96x
CountAlgo.o 14368 14848 +3.3% 0.97x
main.o 43593 44393 +1.8% 0.98x
PointerArithmetics.o 1987 2019 +1.6% 0.98x
DictOfArraysToArrayOfDicts.o 32500 33017 +1.6% 0.98x
XorLoop.o 2074 2106 +1.5% 0.98x
Ackermann.o 2085 2117 +1.5% 0.98x
Memset.o 2141 2173 +1.5% 0.99x
RemoveWhere.o 24302 24606 +1.3% 0.99x
CSVParsing.o 37068 37516 +1.2% 0.99x
CString.o 6322 6386 +1.0% 0.99x
Improvement
MapReduce.o 22365 18605 -16.8% 1.20x
StringTests.o 8497 7185 -15.4% 1.18x
ObjectAllocation.o 4505 3850 -14.5% 1.17x
DropWhile.o 23420 20668 -11.8% 1.13x
ReduceInto.o 17179 15531 -9.6% 1.11x
DropFirst.o 24692 22452 -9.1% 1.10x
ChainedFilterMap.o 3492 3188 -8.7% 1.10x
BinaryFloatingPointProperties.o 7737 7097 -8.3% 1.09x
DictionaryCopy.o 8137 7465 -8.3% 1.09x
DictionarySwap.o 25339 23371 -7.8% 1.08x
DictionarySubscriptDefault.o 27147 25163 -7.3% 1.08x
Suffix.o 25953 24081 -7.2% 1.08x
LazyFilter.o 9137 8481 -7.2% 1.08x
Hash.o 22303 20783 -6.8% 1.07x
SetTests.o 52561 49601 -5.6% 1.06x
MonteCarloE.o 3858 3650 -5.4% 1.06x
DictionaryRemove.o 14027 13291 -5.2% 1.06x
ObjectiveCBridging.o 44135 41895 -5.1% 1.05x
PrefixWhile.o 24446 23246 -4.9% 1.05x
DictionaryBridgeToObjC.o 6605 6285 -4.8% 1.05x
DictionaryCompactMapValues.o 21054 20190 -4.1% 1.04x
Walsh.o 6322 6122 -3.2% 1.03x
SortIntPyramids.o 10010 9706 -3.0% 1.03x
ObjectiveCBridgingStubs.o 9101 8831 -3.0% 1.03x
Prefix.o 24441 23849 -2.4% 1.02x
ArrayLiteral.o 3160 3096 -2.0% 1.02x
StackPromo.o 2446 2398 -2.0% 1.02x
ArrayAppend.o 37886 37278 -1.6% 1.02x
DictTest.o 48465 47777 -1.4% 1.01x
Substring.o 20129 19857 -1.4% 1.01x
RGBHistogram.o 22685 22381 -1.3% 1.01x
StrComplexWalk.o 3573 3526 -1.3% 1.01x
ReversedCollections.o 11365 11245 -1.1% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ArrayOfPOD 757 859 +13.5% 0.88x (?)
StringEqualPointerComparison 3571 4028 +12.8% 0.89x
StringHasPrefixAscii 5032 5543 +10.2% 0.91x
StringHasSuffixAscii 5114 5628 +10.1% 0.91x
Improvement
StringInterpolationManySmallSegments 18586 10880 -41.5% 1.71x
StringInterpolationSmall 5753 3454 -40.0% 1.67x
FloatingPointPrinting_Float80_interpolated 119243 99108 -16.9% 1.20x
ArrayAppendStrings 10363 8672 -16.3% 1.19x
Added
CustomStringInterpolation 12038 12179 12085
CustomStringNoInterpolation 1058 1058 1058

Code size: Swift libraries

TEST OLD NEW DELTA RATIO
Regression
libswiftSwiftReflectionTest.dylib 49152 61440 +25.0% 0.80x
libswiftSwiftPrivate.dylib 40960 45056 +10.0% 0.91x
libswiftStdlibUnittest.dylib 409600 442368 +8.0% 0.93x
libswiftsimd.dylib 286720 303104 +5.7% 0.95x
libswiftXCTest.dylib 81920 86016 +5.0% 0.95x
libswiftNetwork.dylib 163840 167936 +2.5% 0.98x
libswiftSwiftOnoneSupport.dylib 217088 221184 +1.9% 0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib 24576 20480 -16.7% 1.20x
libswiftFoundation.dylib 1835008 1609728 -12.3% 1.14x
libswiftCore.dylib 3969024 3833856 -3.4% 1.04x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@eeckstein
Copy link
Contributor

We should investigate the performance and code size regressions before we land this

@beccadax
Copy link
Contributor Author

@swift-ci please smoke benchmark

@beccadax
Copy link
Contributor Author

Let's see how much StringInterpolation.o actually grew...

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
FloatingPointPrinting_Float_interpolated 37087 53130 +43.3% 0.70x
StringInterpolation 8916 11536 +29.4% 0.77x
RangeIterationSigned 171 200 +17.0% 0.86x
RandomDoubleLCG 910 1057 +16.2% 0.86x
ChainedFilterMap 1220 1405 +15.2% 0.87x (?)
Improvement
StringInterpolationManySmallSegments 17360 8067 -53.5% 2.15x
StringInterpolationSmall 4050 2112 -47.9% 1.92x
ArrayAppendAsciiSubstring 29641 25308 -14.6% 1.17x
ArrayAppendStrings 8641 7956 -7.9% 1.09x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1872 2706 +44.6% 0.69x
SequenceAlgos.o 23331 27827 +19.3% 0.84x
Exclusivity.o 4083 4659 +14.1% 0.88x
TwoSum.o 5960 6739 +13.1% 0.88x
DictionarySwap.o 27574 30790 +11.7% 0.90x
ByteSwap.o 1960 2179 +11.2% 0.90x
Fibonacci.o 1936 2150 +11.1% 0.90x
StringBuilder.o 11679 12951 +10.9% 0.90x
PopFront.o 5577 6179 +10.8% 0.90x
LinkedList.o 2263 2486 +9.9% 0.91x
DictionaryKeysContains.o 16211 17806 +9.8% 0.91x
DictionaryRemove.o 15166 16110 +6.2% 0.94x
DictTest4Legacy.o 26994 28370 +5.1% 0.95x
DropLast.o 25451 26747 +5.1% 0.95x
StringInterpolation.o 11386 11957 +5.0% 0.95x
ErrorHandling.o 2758 2886 +4.6% 0.96x
PopFrontGeneric.o 5061 5285 +4.4% 0.96x
DictTest3.o 28392 29608 +4.3% 0.96x
DictTest4.o 26116 27220 +4.2% 0.96x
NopDeinit.o 5907 6140 +3.9% 0.96x
DictTest2.o 19736 20504 +3.9% 0.96x
HashQuadratic.o 5800 6019 +3.8% 0.96x
DriverUtils.o 168161 172849 +2.8% 0.97x
CountAlgo.o 20893 21468 +2.8% 0.97x
RomanNumbers.o 11205 11509 +2.7% 0.97x
RGBHistogram.o 24773 25349 +2.3% 0.98x
SevenBoom.o 1950 1992 +2.2% 0.98x
DictionaryGroup.o 17295 17647 +2.0% 0.98x
XorLoop.o 2280 2322 +1.8% 0.98x
Memset.o 2392 2434 +1.8% 0.98x
StringEdits.o 14052 14292 +1.7% 0.98x
Suffix.o 26169 26601 +1.7% 0.98x
WordCount.o 65304 66296 +1.5% 0.99x
MonteCarloPi.o 1920 1946 +1.4% 0.99x
NSDictionaryCastToSwift.o 1984 2010 +1.3% 0.99x
PointerArithmetics.o 2086 2112 +1.2% 0.99x
Ackermann.o 2160 2186 +1.2% 0.99x
BitCount.o 2200 2226 +1.2% 0.99x
RandomValues.o 4135 4177 +1.0% 0.99x
Improvement
MapReduce.o 24653 21517 -12.7% 1.15x
StringTests.o 10695 9607 -10.2% 1.11x
ObjectAllocation.o 4635 4171 -10.0% 1.11x
StackPromo.o 2583 2343 -9.3% 1.10x
DropWhile.o 23724 21788 -8.2% 1.09x
ChainedFilterMap.o 3492 3230 -7.5% 1.08x
BinaryFloatingPointProperties.o 8039 7489 -6.8% 1.07x
DictionaryCopy.o 8929 8337 -6.6% 1.07x
ReduceInto.o 24735 23183 -6.3% 1.07x
DropFirst.o 25212 23708 -6.0% 1.06x
LazyFilter.o 9794 9234 -5.7% 1.06x
Hash.o 29075 27683 -4.8% 1.05x
SetTests.o 59689 57113 -4.3% 1.05x
MonteCarloE.o 3624 3490 -3.7% 1.04x
DictionaryBridgeToObjC.o 6959 6703 -3.7% 1.04x
DictionaryCompactMapValues.o 22510 21694 -3.6% 1.04x
PrefixWhile.o 24046 23342 -2.9% 1.03x
SortIntPyramids.o 9701 9429 -2.8% 1.03x
ObjectiveCBridgingStubs.o 9915 9637 -2.8% 1.03x
ObjectiveCBridging.o 45589 44325 -2.8% 1.03x
Prefix.o 24673 24161 -2.1% 1.02x
ArrayLiteral.o 3531 3461 -2.0% 1.02x
ArrayAppend.o 38982 38310 -1.7% 1.02x
Walsh.o 9432 9282 -1.6% 1.02x
Queue.o 14603 14411 -1.3% 1.01x
StrToInt.o 6659 6579 -1.2% 1.01x
StrComplexWalk.o 3456 3418 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
MapReduceLazyCollectionShort 41 85 +107.3% 0.48x
StringInterpolation 8700 11824 +35.9% 0.74x
FloatingPointPrinting_Float_interpolated 37257 46839 +25.7% 0.80x
RangeIterationSigned 171 200 +17.0% 0.86x
StringWordBuilderReservingCapacity 1146 1246 +8.7% 0.92x (?)
Improvement
StringInterpolationManySmallSegments 16037 7920 -50.6% 2.02x
StringInterpolationSmall 3962 2112 -46.7% 1.88x
FloatingPointPrinting_Double_interpolated 60113 49863 -17.1% 1.21x
FloatingPointPrinting_Float80_interpolated 64788 56270 -13.1% 1.15x
PrefixWhileAnyCollectionLazy 176 159 -9.7% 1.11x
DropLastAnyCollection 65 59 -9.2% 1.10x
IterateData 1768 1609 -9.0% 1.10x (?)
RandomDoubleLCG 981 893 -9.0% 1.10x (?)
ArrayAppendStrings 7797 7113 -8.8% 1.10x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1698 2494 +46.9% 0.68x
Fibonacci.o 1845 2218 +20.2% 0.83x
ByteSwap.o 1869 2242 +20.0% 0.83x
LinkedList.o 2124 2497 +17.6% 0.85x
Exclusivity.o 3815 4295 +12.6% 0.89x
StringBuilder.o 11210 12599 +12.4% 0.89x
TwoSum.o 5645 6302 +11.6% 0.90x
RomanNumbers.o 5806 6382 +9.9% 0.91x
StringInterpolation.o 9163 10039 +9.6% 0.91x
PopFront.o 5214 5694 +9.2% 0.92x
DictionaryKeysContains.o 15035 16374 +8.9% 0.92x
PopFrontGeneric.o 4937 5369 +8.8% 0.92x
SequenceAlgos.o 26092 28195 +8.1% 0.93x
DictTest4Legacy.o 22649 24073 +6.3% 0.94x
DictTest4.o 21259 22571 +6.2% 0.94x
DictTest2.o 15986 16882 +5.6% 0.95x
DictTest3.o 22514 23586 +4.8% 0.95x
ErrorHandling.o 3062 3206 +4.7% 0.96x
DriverUtils.o 144185 149977 +4.0% 0.96x
CountAlgo.o 14368 14848 +3.3% 0.97x
PointerArithmetics.o 1987 2019 +1.6% 0.98x
DictOfArraysToArrayOfDicts.o 32500 33017 +1.6% 0.98x
XorLoop.o 2074 2106 +1.5% 0.98x
Ackermann.o 2085 2117 +1.5% 0.98x
Memset.o 2141 2173 +1.5% 0.99x
RemoveWhere.o 24302 24606 +1.3% 0.99x
CSVParsing.o 37068 37516 +1.2% 0.99x
CString.o 6322 6386 +1.0% 0.99x
Improvement
MapReduce.o 22365 18605 -16.8% 1.20x
StringTests.o 8497 7185 -15.4% 1.18x
ObjectAllocation.o 4505 3850 -14.5% 1.17x
DropWhile.o 23420 20668 -11.8% 1.13x
ReduceInto.o 17179 15531 -9.6% 1.11x
DropFirst.o 23796 21556 -9.4% 1.10x
ChainedFilterMap.o 3492 3188 -8.7% 1.10x
BinaryFloatingPointProperties.o 7737 7097 -8.3% 1.09x
DictionaryCopy.o 8137 7465 -8.3% 1.09x
DictionarySwap.o 25339 23371 -7.8% 1.08x
DictionarySubscriptDefault.o 27147 25163 -7.3% 1.08x
Suffix.o 25953 24081 -7.2% 1.08x
LazyFilter.o 9137 8481 -7.2% 1.08x
Hash.o 22303 20783 -6.8% 1.07x
SetTests.o 52561 49601 -5.6% 1.06x
MonteCarloE.o 3858 3650 -5.4% 1.06x
DictionaryRemove.o 14027 13291 -5.2% 1.06x
ObjectiveCBridging.o 44135 41895 -5.1% 1.05x
PrefixWhile.o 24446 23246 -4.9% 1.05x
DictionaryBridgeToObjC.o 6605 6285 -4.8% 1.05x
DictionaryCompactMapValues.o 21054 20190 -4.1% 1.04x
Walsh.o 6322 6122 -3.2% 1.03x
SortIntPyramids.o 10010 9706 -3.0% 1.03x
ObjectiveCBridgingStubs.o 9101 8831 -3.0% 1.03x
Prefix.o 23777 23169 -2.6% 1.03x
ArrayLiteral.o 3160 3096 -2.0% 1.02x
StackPromo.o 2446 2398 -2.0% 1.02x
ArrayAppend.o 37886 37278 -1.6% 1.02x
DictTest.o 48465 47777 -1.4% 1.01x
Substring.o 20129 19857 -1.4% 1.01x
RGBHistogram.o 22685 22381 -1.3% 1.01x
StrComplexWalk.o 3573 3526 -1.3% 1.01x
ReversedCollections.o 11365 11245 -1.1% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StringEqualPointerComparison 3600 4057 +12.7% 0.89x
ArrayOfPOD 755 842 +11.5% 0.90x
StringHasPrefixAscii 5032 5600 +11.3% 0.90x
ArrayOfGenericPOD2 1066 1180 +10.7% 0.90x (?)
StringHasSuffixAscii 5171 5714 +10.5% 0.90x
Improvement
StringInterpolationSmall 6349 3636 -42.7% 1.75x
StringInterpolationManySmallSegments 18220 10691 -41.3% 1.70x
ArrayAppendStrings 10400 8645 -16.9% 1.20x
FloatingPointPrinting_Double_interpolated 95618 80134 -16.2% 1.19x
Combos 2494 2219 -11.0% 1.12x (?)

Code size: Swift libraries

TEST OLD NEW DELTA RATIO
Regression
libswiftSwiftReflectionTest.dylib 49152 57344 +16.7% 0.86x
libswiftSwiftPrivate.dylib 40960 45056 +10.0% 0.91x
libswiftStdlibUnittest.dylib 409600 442368 +8.0% 0.93x
libswiftsimd.dylib 286720 303104 +5.7% 0.95x
libswiftXCTest.dylib 81920 86016 +5.0% 0.95x
libswiftNetwork.dylib 163840 167936 +2.5% 0.98x
libswiftSwiftOnoneSupport.dylib 217088 221184 +1.9% 0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib 24576 20480 -16.7% 1.20x
libswiftFoundation.dylib 1830912 1609728 -12.1% 1.14x
libswiftCore.dylib 3964928 3829760 -3.4% 1.04x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@beccadax
Copy link
Contributor Author

So StringInterpolation.o's existing benchmarks grew 5% in -O and 9.6% in -Osize; it's only the new benchmarks that made it look like it had grown 50-60%.

Still need to get a handle on the rest of the code size increase, of course. I'm guessing it's because we're inlining more—we just need to find the right balance there.

Before that, though, let's try a potential cheap fix for the FloatingPointPrinting_Float_interpolated performance regression.

@beccadax
Copy link
Contributor Author

@swift-ci please smoke benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
FloatingPointPrinting_Float_interpolated 37033 50196 +35.5% 0.74x
StringInterpolation 8676 10734 +23.7% 0.81x
ChainedFilterMap 1219 1405 +15.3% 0.87x (?)
RandomDoubleLCG 933 1057 +13.3% 0.88x (?)
RangeIterationSigned 181 200 +10.5% 0.91x
Improvement
StringInterpolationManySmallSegments 18000 8131 -54.8% 2.21x
StringInterpolationSmall 3970 2111 -46.8% 1.88x
ArrayAppendAsciiSubstring 29655 25414 -14.3% 1.17x
ArrayAppendStrings 8660 8029 -7.3% 1.08x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1872 2706 +44.6% 0.69x
SequenceAlgos.o 23331 27827 +19.3% 0.84x
Exclusivity.o 4083 4659 +14.1% 0.88x
TwoSum.o 5960 6739 +13.1% 0.88x
DictionarySwap.o 27574 30790 +11.7% 0.90x
ByteSwap.o 1960 2179 +11.2% 0.90x
Fibonacci.o 1936 2150 +11.1% 0.90x
StringBuilder.o 11679 12951 +10.9% 0.90x
PopFront.o 5577 6179 +10.8% 0.90x
LinkedList.o 2263 2486 +9.9% 0.91x
DictionaryKeysContains.o 16211 17806 +9.8% 0.91x
DictionaryRemove.o 15166 16110 +6.2% 0.94x
DictTest4Legacy.o 26994 28370 +5.1% 0.95x
DropLast.o 25451 26747 +5.1% 0.95x
StringInterpolation.o 11386 11957 +5.0% 0.95x
ErrorHandling.o 2758 2886 +4.6% 0.96x
PopFrontGeneric.o 5061 5285 +4.4% 0.96x
DictTest3.o 28392 29608 +4.3% 0.96x
DictTest4.o 26116 27220 +4.2% 0.96x
FloatingPointPrinting.o 7223 7511 +4.0% 0.96x
NopDeinit.o 5907 6140 +3.9% 0.96x
DictTest2.o 19736 20504 +3.9% 0.96x
HashQuadratic.o 5800 6019 +3.8% 0.96x
DriverUtils.o 168161 172849 +2.8% 0.97x
CountAlgo.o 20893 21468 +2.8% 0.97x
RomanNumbers.o 11205 11509 +2.7% 0.97x
RGBHistogram.o 24773 25349 +2.3% 0.98x
SevenBoom.o 1950 1992 +2.2% 0.98x
DictionaryGroup.o 17295 17647 +2.0% 0.98x
XorLoop.o 2280 2322 +1.8% 0.98x
Memset.o 2392 2434 +1.8% 0.98x
StringEdits.o 14052 14292 +1.7% 0.98x
Suffix.o 26169 26601 +1.7% 0.98x
WordCount.o 65304 66296 +1.5% 0.99x
MonteCarloPi.o 1920 1946 +1.4% 0.99x
NSDictionaryCastToSwift.o 1984 2010 +1.3% 0.99x
PointerArithmetics.o 2086 2112 +1.2% 0.99x
Ackermann.o 2160 2186 +1.2% 0.99x
BitCount.o 2200 2226 +1.2% 0.99x
RandomValues.o 4135 4177 +1.0% 0.99x
Improvement
MapReduce.o 24653 21517 -12.7% 1.15x
StringTests.o 10695 9607 -10.2% 1.11x
ObjectAllocation.o 4635 4171 -10.0% 1.11x
StackPromo.o 2583 2343 -9.3% 1.10x
DropWhile.o 23724 21788 -8.2% 1.09x
ChainedFilterMap.o 3492 3230 -7.5% 1.08x
BinaryFloatingPointProperties.o 8039 7489 -6.8% 1.07x
DictionaryCopy.o 8929 8337 -6.6% 1.07x
ReduceInto.o 24735 23183 -6.3% 1.07x
DropFirst.o 25212 23708 -6.0% 1.06x
LazyFilter.o 9794 9234 -5.7% 1.06x
Hash.o 29075 27683 -4.8% 1.05x
SetTests.o 59689 57113 -4.3% 1.05x
MonteCarloE.o 3624 3490 -3.7% 1.04x
DictionaryBridgeToObjC.o 6959 6703 -3.7% 1.04x
DictionaryCompactMapValues.o 22510 21694 -3.6% 1.04x
PrefixWhile.o 24046 23342 -2.9% 1.03x
SortIntPyramids.o 9701 9429 -2.8% 1.03x
ObjectiveCBridgingStubs.o 9915 9637 -2.8% 1.03x
ObjectiveCBridging.o 45589 44325 -2.8% 1.03x
Prefix.o 24673 24161 -2.1% 1.02x
ArrayLiteral.o 3531 3461 -2.0% 1.02x
ArrayAppend.o 38982 38310 -1.7% 1.02x
Walsh.o 9432 9282 -1.6% 1.02x
Queue.o 14603 14411 -1.3% 1.01x
StrToInt.o 6659 6579 -1.2% 1.01x
StrComplexWalk.o 3456 3418 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
MapReduceLazyCollectionShort 41 85 +107.3% 0.48x
FloatingPointPrinting_Float_interpolated 37529 51607 +37.5% 0.73x
StringInterpolation 8642 10812 +25.1% 0.80x
RangeIterationSigned 171 200 +17.0% 0.86x
StringWordBuilderReservingCapacity 1146 1246 +8.7% 0.92x (?)
Improvement
StringInterpolationManySmallSegments 16367 7806 -52.3% 2.10x
StringInterpolationSmall 3959 2107 -46.8% 1.88x
FloatingPointPrinting_Double_interpolated 60776 48993 -19.4% 1.24x
IterateData 1774 1591 -10.3% 1.12x
PrefixWhileAnyCollectionLazy 176 159 -9.7% 1.11x
ArrayAppendStrings 7825 7081 -9.5% 1.11x (?)
DropLastAnyCollection 65 59 -9.2% 1.10x
RandomDoubleLCG 971 891 -8.2% 1.09x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1698 2494 +46.9% 0.68x
Fibonacci.o 1845 2218 +20.2% 0.83x
ByteSwap.o 1869 2242 +20.0% 0.83x
LinkedList.o 2124 2497 +17.6% 0.85x
Exclusivity.o 3815 4295 +12.6% 0.89x
StringBuilder.o 11210 12599 +12.4% 0.89x
TwoSum.o 5645 6302 +11.6% 0.90x
RomanNumbers.o 5806 6382 +9.9% 0.91x
StringInterpolation.o 9163 10039 +9.6% 0.91x
PopFront.o 5214 5694 +9.2% 0.92x
DictionaryKeysContains.o 15035 16374 +8.9% 0.92x
PopFrontGeneric.o 4937 5369 +8.8% 0.92x
SequenceAlgos.o 26092 28195 +8.1% 0.93x
DictTest4Legacy.o 22649 24073 +6.3% 0.94x
DictTest4.o 21259 22571 +6.2% 0.94x
DictTest2.o 15986 16882 +5.6% 0.95x
DictTest3.o 22514 23586 +4.8% 0.95x
ErrorHandling.o 3062 3206 +4.7% 0.96x
DriverUtils.o 144185 149977 +4.0% 0.96x
CountAlgo.o 14368 14848 +3.3% 0.97x
FloatingPointPrinting.o 6576 6768 +2.9% 0.97x
PointerArithmetics.o 1987 2019 +1.6% 0.98x
DictOfArraysToArrayOfDicts.o 32500 33017 +1.6% 0.98x
XorLoop.o 2074 2106 +1.5% 0.98x
Ackermann.o 2085 2117 +1.5% 0.98x
Memset.o 2141 2173 +1.5% 0.99x
RemoveWhere.o 24302 24606 +1.3% 0.99x
CSVParsing.o 37068 37516 +1.2% 0.99x
CString.o 6322 6386 +1.0% 0.99x
Improvement
MapReduce.o 22365 18605 -16.8% 1.20x
StringTests.o 8497 7185 -15.4% 1.18x
ObjectAllocation.o 4505 3850 -14.5% 1.17x
DropWhile.o 23420 20668 -11.8% 1.13x
ReduceInto.o 17179 15531 -9.6% 1.11x
DropFirst.o 23796 21556 -9.4% 1.10x
ChainedFilterMap.o 3492 3188 -8.7% 1.10x
BinaryFloatingPointProperties.o 7737 7097 -8.3% 1.09x
DictionaryCopy.o 8137 7465 -8.3% 1.09x
DictionarySwap.o 25339 23371 -7.8% 1.08x
DictionarySubscriptDefault.o 27147 25163 -7.3% 1.08x
Suffix.o 25953 24081 -7.2% 1.08x
LazyFilter.o 9137 8481 -7.2% 1.08x
Hash.o 22303 20783 -6.8% 1.07x
SetTests.o 52561 49601 -5.6% 1.06x
MonteCarloE.o 3858 3650 -5.4% 1.06x
DictionaryRemove.o 14027 13291 -5.2% 1.06x
ObjectiveCBridging.o 44135 41895 -5.1% 1.05x
PrefixWhile.o 24446 23246 -4.9% 1.05x
DictionaryBridgeToObjC.o 6605 6285 -4.8% 1.05x
DictionaryCompactMapValues.o 21054 20190 -4.1% 1.04x
Walsh.o 6322 6122 -3.2% 1.03x
SortIntPyramids.o 10010 9706 -3.0% 1.03x
ObjectiveCBridgingStubs.o 9101 8831 -3.0% 1.03x
Prefix.o 23777 23169 -2.6% 1.03x
ArrayLiteral.o 3160 3096 -2.0% 1.02x
StackPromo.o 2446 2398 -2.0% 1.02x
ArrayAppend.o 37886 37278 -1.6% 1.02x
DictTest.o 48465 47777 -1.4% 1.01x
Substring.o 20129 19857 -1.4% 1.01x
RGBHistogram.o 22685 22381 -1.3% 1.01x
StrComplexWalk.o 3573 3526 -1.3% 1.01x
ReversedCollections.o 11365 11245 -1.1% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StringEqualPointerComparison 3600 4057 +12.7% 0.89x
ArrayOfPOD 756 842 +11.4% 0.90x
StringHasPrefixAscii 5032 5600 +11.3% 0.90x
ArrayOfGenericPOD2 1066 1180 +10.7% 0.90x (?)
StringHasSuffixAscii 5171 5714 +10.5% 0.90x
Improvement
StringInterpolationSmall 6342 3389 -46.6% 1.87x
StringInterpolationManySmallSegments 18108 10585 -41.5% 1.71x
FloatingPointPrinting_Double_interpolated 102227 69273 -32.2% 1.48x
FloatingPointPrinting_Float80_interpolated 135522 100147 -26.1% 1.35x
ArrayAppendStrings 10767 8671 -19.5% 1.24x

Code size: Swift libraries

TEST OLD NEW DELTA RATIO
Regression
libswiftSwiftReflectionTest.dylib 49152 57344 +16.7% 0.86x
libswiftSwiftPrivate.dylib 40960 45056 +10.0% 0.91x
libswiftStdlibUnittest.dylib 409600 442368 +8.0% 0.93x
libswiftsimd.dylib 286720 303104 +5.7% 0.95x
libswiftXCTest.dylib 81920 86016 +5.0% 0.95x
libswiftNetwork.dylib 163840 167936 +2.5% 0.98x
libswiftSwiftOnoneSupport.dylib 217088 221184 +1.9% 0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib 24576 20480 -16.7% 1.20x
libswiftFoundation.dylib 1830912 1609728 -12.1% 1.14x
libswiftCore.dylib 3964928 3829760 -3.4% 1.04x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@beccadax beccadax force-pushed the interpolation-rework branch from 4033d0d to cbb1678 Compare October 23, 2018 01:24
@beccadax
Copy link
Contributor Author

beccadax commented Oct 23, 2018

Did some offline benchmarking since CI was looking a little noisy. The TextOutputStreamable conformance does actually improve the performance of FloatingPointPrinting_Float_interpolated:

IMPLEMENTATION MIN(μs) MAX(μs) MEAN(μs) SD(μs) MEDIAN(μs)
No TextOutputStreamable 108,746 114,023 109,509 571 109,359
description.write(to:) 108,830 111,132 109,499 524 109,311
_writeASCII(_:) (proposed) 106,688 111,552 107,625 725 107,491

It's just that something else about new string interpolation is slower for all float types than the old version, and TextOutputStreamable helps Double and Float80 enough to make up for it, but not Float.

Comparing this branch's -emit-sil dump of run_FloatingPointPrinting_Float_interpolated(_:) to master's, this branch generates 551 lines of SIL in 70 basic blocks, while master generates only 390 lines in 47 basic blocks. The broad sketches of the two functions are similar, but this branch generates a number of basic blocks that either are never generated in master or are optimized away. I'll try to clean things up tomorrow so I can get a reasonable diff and figure out what the changes represent.

@beccadax
Copy link
Contributor Author

beccadax commented Oct 25, 2018

A large part of the problem is the inlining of an early exit from append(_:). This is intended to improve performance, and on the whole it does, but it also tends to hugely convolute control flow. For example, run_Fibonacci(_:) (which uses string interpolation only on a failure path inlined from CheckResults(_:)) has 13 basic blocks without this inlining, but 69(!) with it. It also inflates code size (changes < 1% omitted):

Benchmark Size with inlining Size without inlining Percentage Change
Regression (2)
MapReduce.o 34,875 b 35,627 b 2.2%
DropWhile.o 23,835 b 24,251 b 1.7%
Improvement (74)
DeadArray.o 3,472 b 2,437 b -29.8%
ByteSwap.o 3,029 b 2,245 b -25.9%
Fibonacci.o 2,869 b 2,165 b -24.5%
LinkedList.o 3,013 b 2,309 b -23.4%
StringInterpolation.o 16,241 b 12,529 b -22.9%
Exclusivity.o 4,858 b 3,861 b -20.5%
SequenceAlgos.o 31,691 b 25,531 b -19.4%
MonteCarloPi.o 2,528 b 2,149 b -15.0%
SevenBoom.o 2,560 b 2,181 b -14.8%
RangeIteration.o 2,592 b 2,213 b -14.6%
PointerArithmetics.o 2,640 b 2,261 b -14.4%
ProtocolDispatch2.o 2,653 b 2,277 b -14.2%
BitCount.o 2,848 b 2,469 b -13.3%
NSDictionaryCastToSwift.o 3,072 b 2,693 b -12.3%
Integrate.o 3,104 b 2,725 b -12.2%
StackPromo.o 3,435 b 3,035 b -11.6%
Ackermann.o 3,520 b 3,141 b -10.8%
FloatingPointPrinting.o 7,221 b 6,453 b -10.6%
ArrayLiteral.o 3,888 b 3,509 b -9.7%
DictionaryBridge.o 3,995 b 3,627 b -9.2%
StringBuilder.o 16,443 b 14,939 b -9.1%
TwoSum.o 13,163 b 11,963 b -9.1%
Memset.o 4,368 b 3,989 b -8.7%
RandomValues.o 4,336 b 3,957 b -8.7%
XorLoop.o 4,336 b 3,957 b -8.7%
OpenClose.o 5,237 b 4,871 b -7.0%
Calculator.o 5,648 b 5,257 b -6.9%
DriverUtils.o 222,971 b 208,251 b -6.6%
HashQuadratic.o 12,571 b 11,803 b -6.1%
PopFrontGeneric.o 12,203 b 11,499 b -5.8%
ChainedFilterMap.o 4,834 b 4,562 b -5.6%
ObjectAllocation.o 4,552 b 4,325 b -5.0%
ObjectiveCNoBridgingStubs.o 7,616 b 7,237 b -5.0%
DictionaryKeysContains.o 29,403 b 27,989 b -4.8%
MonteCarloE.o 6,795 b 6,475 b -4.7%
RC4.o 8,139 b 7,771 b -4.5%
NopDeinit.o 8,241 b 7,888 b -4.3%
RomanNumbers.o 20,155 b 19,291 b -4.3%
DictTest2.o 26,987 b 25,867 b -4.2%
DropLast.o 33,051 b 31,675 b -4.2%
StrComplexWalk.o 8,944 b 8,568 b -4.2%
DictionarySwap.o 44,331 b 42,651 b -3.8%
Suffix.o 35,403 b 34,043 b -3.8%
StrToInt.o 10,237 b 9,869 b -3.6%
ArraySubscript.o 10,763 b 10,395 b -3.4%
Queue.o 21,339 b 20,603 b -3.4%
RangeAssignment.o 11,323 b 10,955 b -3.3%
ObjectiveCBridging.o 56,635 b 54,891 b -3.1%
DictionaryGroup.o 29,307 b 28,491 b -2.8%
ObjectiveCBridgingStubs.o 10,512 b 10,215 b -2.8%
SortLettersInPlace.o 13,632 b 13,258 b -2.7%
CountAlgo.o 26,651 b 25,963 b -2.6%
DictionaryBridgeToObjC.o 10,539 b 10,267 b -2.6%
RangeReplaceableCollectionPlusDefault.o 14,171 b 13,803 b -2.6%
Substring.o 45,563 b 44,379 b -2.6%
Combos.o 28,331 b 27,611 b -2.5%
PopFront.o 11,547 b 11,275 b -2.4%
DictTest4.o 36,459 b 35,611 b -2.3%
RGBHistogram.o 37,003 b 36,203 b -2.2%
Walsh.o 14,811 b 14,491 b -2.2%
Hash.o 37,819 b 37,019 b -2.1%
NibbleSort.o 17,243 b 16,875 b -2.1%
COWTree.o 18,539 b 18,171 b -2.0%
TestsUtils.o 21,067 b 20,699 b -1.7%
SortIntPyramids.o 17,051 b 16,779 b -1.6%
CString.o 24,455 b 24,087 b -1.5%
DictionaryCompactMapValues.o 37,115 b 36,555 b -1.5%
LuhnAlgoEager.o 24,459 b 24,091 b -1.5%
LuhnAlgoLazy.o 24,459 b 24,091 b -1.5%
SortLargeExistentials.o 23,787 b 23,419 b -1.5%
LazyFilter.o 13,947 b 13,755 b -1.4%
StringRemoveDupes.o 27,227 b 26,859 b -1.4%
WordCount.o 86,539 b 85,403 b -1.3%
BinaryFloatingPointProperties.o 15,755 b 15,579 b -1.1%
DictionarySubscriptDefault.o 47,915 b 47,387 b -1.1%

Even with the costs, though, inlining this check is often profitable, and removing it will cause performance regressions. On an iMac Pro:


With inlining vs. without — Regression (8)
TEST WITH INLINING WITHOUT INLINING DELTA RATIO
DataReplaceLarge 19054 23629 +24.0% 0.81x (?)
PointerArithmetics 22377 26316 +17.6% 0.85x
StringEqualPointerComparison 245 273 +11.4% 0.90x
DropWhileArrayLazy 88 96 +9.1% 0.92x
PrefixWhileAnyCollectionLazy 35 38 +8.6% 0.92x
DropWhileAnyCollectionLazy 65 70 +7.7% 0.93x
DataAppendSequence 12690 13485 +6.3% 0.94x (?)
ArrayAppendFromGeneric 335 355 +6.0% 0.94x (?)
With inlining vs. without — Improvement (14)
TEST WITH INLINING WITHOUT INLINING DELTA RATIO
Dictionary4 319 231 -27.6% 1.38x
SumUsingReduceInto 450 349 -22.4% 1.29x
SumUsingReduce 451 353 -21.7% 1.28x
DataAppendDataLargeToMedium 23393 18523 -20.8% 1.26x (?)
Dictionary4OfObjects 383 311 -18.8% 1.23x
StaticArray 6 5 -16.7% 1.20x (?)
DataCount 23 20 -13.0% 1.15x
DictionaryCompactMapValuesOfCastValue 10597 9329 -12.0% 1.14x
DataAppendBytes 3939 3564 -9.5% 1.11x (?)
StringAdder 863 791 -8.3% 1.09x
FatCompactMap 876 810 -7.5% 1.08x
DataAppendDataSmallToMedium 4933 4606 -6.6% 1.07x (?)
DataAppendDataMediumToMedium 5261 4917 -6.5% 1.07x
ObjectiveCBridgeStubNSDataAppend 1678 1595 -4.9% 1.05x (?)

The test being inlined is complicated—probably unnecessarily—and I'm hoping a simpler version will get us the best of both worlds. Barring that, perhaps the SIL optimization folks can make some suggestions. I mean, we can do better than this:

Incredibly convoluted control flow graph, with almost all of the blocks in the failure path.

@milseman
Copy link
Member

Just do the part that checks for the canonical empty string, that eliminates most of that CFG.

@beccadax
Copy link
Contributor Author

@milseman That’s what I left my computer benchmarking—_isEmptySingleton check in an inlinable method, count == 0 && !_isNative in a non-inlinable one.

@beccadax
Copy link
Contributor Author

On my own machine, with everything closed down, literally unplugged from the network, I got wildly different benchmark results for this last night vs. this morning. But I think this is at least an improvement. We'll see what CI says.

@beccadax beccadax force-pushed the interpolation-rework branch 2 times, most recently from 30b4055 to 878057b Compare October 25, 2018 18:38
@beccadax
Copy link
Contributor Author

@swift-ci please smoke benchmark


if (auto subExpr = expr->getSubExpr()) {
auto subExprType = CS.getType(subExpr);
CS.addConstraint(ConstraintKind::Bind, subExprType, tv, locator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change which I think might work here - instead of biding sub-expr type to type variable, you can return subExprType directly and allocate type variable only if there was no sub-expression...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower priority than the runtime performance stuff, but I'll look at it. Thanks!

interpolationProto->lookupDirect(tc.Context.Id_StringInterpolation);
if (associatedTypeArray.empty()) {
tc.diagnose(expr->getStartLoc(), diag::interpolation_broken_proto);
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think interpolation protocol lookup and field lookup could be moved to TypeChecker just like we do for integers e.g. TypeChecker::getMaxIntegerType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other parts of CSGen look up associated types in various ad-hoc ways like "find the one associated type in this protocol (let's hope there's just one)" or "directly call getIdentifier() with a string literal containing the associated type's name, then look it up". That's not necessarily a good thing either, though, so maybe I should just clean all of those up.

// Must be Conversion; if it's Equal, then in semi-rare cases, the
// interpolation temporary variable cannot be @lvalue.
CS.addConstraint(ConstraintKind::Conversion, appendingExprType,
interpolationTV, appendingLocator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really curious of what the example of behavior described in the comment might look like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a minor, non-ABI improvement I want to make to this code that I can't do because it triggers this behavior. Once the branch is merged, I'll redo that work and show you the problem.

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
StringInterpolation 8810 10452 +18.6% 0.84x
RandomDoubleLCG 910 1057 +16.2% 0.86x
FloatingPointPrinting_Float_interpolated 43105 49647 +15.2% 0.87x
MapReduceLazyCollectionShort 31 34 +9.7% 0.91x
StringWordBuilderReservingCapacity 1160 1260 +8.6% 0.92x (?)
Improvement
StringInterpolationManySmallSegments 18122 7862 -56.6% 2.31x
StringInterpolationSmall 3995 1998 -50.0% 2.00x
FloatingPointPrinting_Double_interpolated 69313 52757 -23.9% 1.31x
FloatingPointPrinting_Float80_interpolated 70158 56722 -19.2% 1.24x
ArrayAppendAsciiSubstring 29653 25276 -14.8% 1.17x
StringBuilderSmallReservingCapacity 500 452 -9.6% 1.11x
ArrayAppendStrings 8684 7938 -8.6% 1.09x
StringBuilder 490 452 -7.8% 1.08x
StringAdder 552 510 -7.6% 1.08x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1872 2141 +14.4% 0.87x
SequenceAlgos.o 23375 25432 +8.8% 0.92x
DictionaryKeysContains.o 16211 17566 +8.4% 0.92x
DropWhile.o 23724 25388 +7.0% 0.93x
PrefixWhile.o 24046 25678 +6.8% 0.94x
Prefix.o 24673 26161 +6.0% 0.94x
MapReduce.o 24653 25933 +5.2% 0.95x
DropFirst.o 25212 26460 +5.0% 0.95x
DropLast.o 25451 26683 +4.8% 0.95x
ObjectAllocation.o 4635 4859 +4.8% 0.95x
Suffix.o 26169 27401 +4.7% 0.96x
ObjectiveCBridging.o 45589 47381 +3.9% 0.96x
StringBuilder.o 11679 12087 +3.5% 0.97x
StringTests.o 10695 11063 +3.4% 0.97x
DictionarySubscriptDefault.o 30379 31387 +3.3% 0.97x
Hash.o 29075 29971 +3.1% 0.97x
DictionarySwap.o 27598 28302 +2.6% 0.98x
DictTest3.o 28420 29108 +2.4% 0.98x
ReduceInto.o 24735 25327 +2.4% 0.98x
Substring.o 27833 28457 +2.2% 0.98x
DictionaryCopy.o 8929 9121 +2.2% 0.98x
WordCount.o 65304 66568 +1.9% 0.98x
SetTests.o 59717 60741 +1.7% 0.98x
DictionaryGroup.o 17319 17575 +1.5% 0.99x
DictionaryRemove.o 15194 15386 +1.3% 0.99x
DictTest4Legacy.o 27042 27378 +1.2% 0.99x
DictTest4.o 26168 26488 +1.2% 0.99x
ErrorHandling.o 2794 2826 +1.1% 0.99x
ObjectiveCBridgingStubs.o 9915 10021 +1.1% 0.99x
BinaryFloatingPointProperties.o 8039 8124 +1.1% 0.99x
Improvement
StackPromo.o 2583 2247 -13.0% 1.15x
Fibonacci.o 1936 1734 -10.4% 1.12x
SevenBoom.o 1950 1751 -10.2% 1.11x
RangeIteration.o 1976 1777 -10.1% 1.11x
ByteSwap.o 1960 1763 -10.1% 1.11x
NSDictionaryCastToSwift.o 1984 1785 -10.0% 1.11x
MonteCarloPi.o 1920 1728 -10.0% 1.11x
PointerArithmetics.o 2086 1887 -9.5% 1.11x
ProtocolDispatch2.o 2254 2042 -9.4% 1.10x
Ackermann.o 2160 1964 -9.1% 1.10x
BitCount.o 2200 2003 -9.0% 1.10x
ArrayLiteral.o 3531 3216 -8.9% 1.10x
XorLoop.o 2280 2083 -8.6% 1.09x
LinkedList.o 2263 2070 -8.5% 1.09x
Memset.o 2392 2199 -8.1% 1.09x
Integrate.o 2828 2629 -7.0% 1.08x
Walsh.o 9432 8837 -6.3% 1.07x
OpenClose.o 3926 3728 -5.0% 1.05x
ArraySubscript.o 4248 4036 -5.0% 1.05x
RandomValues.o 4135 3936 -4.8% 1.05x
FloatingPointPrinting.o 7223 6887 -4.7% 1.05x
StrToInt.o 6659 6358 -4.5% 1.05x
StrComplexWalk.o 3456 3305 -4.4% 1.05x
PopFrontGeneric.o 5080 4872 -4.1% 1.04x
RC4.o 4959 4771 -3.8% 1.04x
RangeAssignment.o 5264 5069 -3.7% 1.04x
StringInterpolation.o 11402 10993 -3.6% 1.04x
Calculator.o 3336 3218 -3.5% 1.04x
DictionaryBridge.o 3634 3508 -3.5% 1.04x
HashQuadratic.o 5800 5603 -3.4% 1.04x
NopDeinit.o 5907 5708 -3.4% 1.03x
ObjectiveCNoBridgingStubs.o 8323 8111 -2.5% 1.03x
RangeReplaceableCollectionPlusDefault.o 7620 7428 -2.5% 1.03x
CString.o 8795 8587 -2.4% 1.02x
ArrayAppend.o 38982 38182 -2.1% 1.02x
TwoSum.o 5960 5844 -1.9% 1.02x
SortLettersInPlace.o 12042 11811 -1.9% 1.02x
MonteCarloE.o 3624 3565 -1.6% 1.02x
ChainedFilterMap.o 3492 3437 -1.6% 1.02x
RomanNumbers.o 11205 11029 -1.6% 1.02x
Exclusivity.o 4083 4019 -1.6% 1.02x
NibbleSort.o 14520 14301 -1.5% 1.02x
COWTree.o 14600 14394 -1.4% 1.01x
Queue.o 14603 14411 -1.3% 1.01x
DriverUtils.o 166903 164807 -1.3% 1.01x
LuhnAlgoLazy.o 18256 18055 -1.1% 1.01x
LuhnAlgoEager.o 18258 18058 -1.1% 1.01x
Combos.o 16541 16366 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
FloatingPointPrinting_Float_interpolated 38481 49758 +29.3% 0.77x
StringInterpolation 8714 11099 +27.4% 0.79x
StringWordBuilderReservingCapacity 1146 1274 +11.2% 0.90x (?)
DropWhileAnyCollectionLazy 260 282 +8.5% 0.92x
FloatingPointPrinting_Float_description_uniform 5130 5543 +8.1% 0.93x
Improvement
StringInterpolationManySmallSegments 16684 8198 -50.9% 2.04x
StringInterpolationSmall 4003 2005 -49.9% 2.00x
FloatingPointPrinting_Double_interpolated 64524 52737 -18.3% 1.22x
FloatingPointPrinting_Float80_interpolated 67332 56759 -15.7% 1.19x
ArrayAppendStrings 8009 7030 -12.2% 1.14x
DropLastAnyCollection 67 59 -11.9% 1.14x
StringBuilderSmallReservingCapacity 499 453 -9.2% 1.10x
StringBuilder 489 453 -7.4% 1.08x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DeadArray.o 1698 2150 +26.6% 0.79x
DictionaryKeysContains.o 15035 16278 +8.3% 0.92x
StringBuilder.o 11210 11735 +4.7% 0.96x
DictTest3.o 22542 23438 +4.0% 0.96x
Substring.o 20129 20801 +3.3% 0.97x
ObjectiveCBridgingStubs.o 9101 9373 +3.0% 0.97x
Hash.o 22303 22847 +2.4% 0.98x
ObjectAllocation.o 4505 4606 +2.2% 0.98x
SequenceAlgos.o 26136 26552 +1.6% 0.98x
DictTest4Legacy.o 22697 23049 +1.6% 0.98x
ErrorHandling.o 3098 3146 +1.5% 0.98x
RomanNumbers.o 5806 5886 +1.4% 0.99x
StringRemoveDupes.o 8989 9101 +1.2% 0.99x
ReduceInto.o 17179 17387 +1.2% 0.99x
DictTest4.o 21311 21535 +1.1% 0.99x
WordCount.o 54952 55512 +1.0% 0.99x
Improvement
FloatingPointPrinting.o 6576 6160 -6.3% 1.07x
StackPromo.o 2446 2318 -5.2% 1.06x
Fibonacci.o 1845 1770 -4.1% 1.04x
ByteSwap.o 1869 1794 -4.0% 1.04x
NSDictionaryCastToSwift.o 1877 1802 -4.0% 1.04x
BitCount.o 1837 1765 -3.9% 1.04x
MonteCarloPi.o 1781 1713 -3.8% 1.04x
ProtocolDispatch2.o 2135 2056 -3.7% 1.04x
ArrayLiteral.o 3160 3048 -3.5% 1.04x
SevenBoom.o 2019 1948 -3.5% 1.04x
LinkedList.o 2124 2065 -2.8% 1.03x
Integrate.o 2705 2630 -2.8% 1.03x
DictionaryBridge.o 3671 3591 -2.2% 1.02x
ArrayAppend.o 37886 37134 -2.0% 1.02x
Queue.o 13307 13075 -1.7% 1.02x
StringInterpolation.o 9179 9027 -1.7% 1.02x
RangeIteration.o 1861 1834 -1.5% 1.01x
PointerArithmetics.o 1987 1960 -1.4% 1.01x
XorLoop.o 2074 2052 -1.1% 1.01x
ReversedCollections.o 11365 11245 -1.1% 1.01x
HashQuadratic.o 5352 5296 -1.0% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StringHasPrefixAscii 5030 6655 +32.3% 0.76x
StringEqualPointerComparison 3585 4086 +14.0% 0.88x
StringHasSuffixAscii 5114 5800 +13.4% 0.88x
Dictionary3 576 642 +11.5% 0.90x
ArrayOfPOD 759 841 +10.8% 0.90x
ArrayOfGenericPOD2 1066 1179 +10.6% 0.90x (?)
Improvement
StringInterpolationSmall 6373 3343 -47.5% 1.91x
StringInterpolationManySmallSegments 18109 10768 -40.5% 1.68x
FloatingPointPrinting_Double_interpolated 99132 76103 -23.2% 1.30x
ArrayAppendStrings 10357 8730 -15.7% 1.19x

Code size: Swift libraries

TEST OLD NEW DELTA RATIO
Regression
libswiftSwiftReflectionTest.dylib 49152 57344 +16.7% 0.86x
libswiftSwiftPrivate.dylib 40960 45056 +10.0% 0.91x
libswiftsimd.dylib 286720 290816 +1.4% 0.99x
Improvement
libswiftSwiftPrivateLibcExtras.dylib 24576 20480 -16.7% 1.20x
libswiftFoundation.dylib 1765376 1503232 -14.8% 1.17x
libswiftCore.dylib 3964928 3842048 -3.1% 1.03x
libswiftStdlibUnittest.dylib 409600 405504 -1.0% 1.01x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@beccadax
Copy link
Contributor Author

beccadax commented Oct 25, 2018

Not there yet, but much, much better.

CFG for Fibonacci now looks like this:

The error path contains a long chain with three basic blocks per potential append.

In principle, I think we should be able to resolve these at compile time in most or all cases: An instance which has passed through _StringGuts.reserveCapacitySlow(_:) is always native storage rather than the empty singleton; an instance which doesn't pass through reserveCapacitySlow(_:) should have a value known at compile time; and we can tell at compile time whether we'll call reserveCapacitySlow(_:) or not. But I don't know how to convince the optimizer of this (nor of how to extend this past the first append, since _appendSlow(_:) can turn self into the empty singleton on occasion, although I think never in string interpolation).

@beccadax
Copy link
Contributor Author

Gonna rebase to get floating-point string formatting improvements from @airspeedswift.

@milseman
Copy link
Member

reserveCapacitySlow, IIRC, can be called on lazily bridged Cocoa strings, so not native necessarily.

Otherwise, yeah, we should come up with some pattern or behavior to allow the optimizer to constant-fold all the subsequent branches.

@beccadax beccadax force-pushed the interpolation-rework branch from 878057b to 3e6ac59 Compare October 25, 2018 22:11
@beccadax
Copy link
Contributor Author

@milseman Right—what I meant is that it's never the empty singleton. (Although in string interpolation, I don't think we can ever call reserveCapacitySlow(_:) on a Cocoa string—we always create our own strings.)

@beccadax

This comment has been minimized.

beccadax and others added 27 commits October 31, 2018 20:58
Previously, the parser generated compound method names for appendInterpolation(…) methods because this helped when finding appendInterpolation methods declared in the same file. However, this turned out to prevent default arguments from working.

This commit returns it to adding base names only and instead explicitly calls loadAllMembers() on the StringInterpolation type. I’m not sure why types in other expressions don’t need this but types in these generated expressions do, but it’s much closer to the problem and doesn’t seem to have any ill effects.
This change:

1. Adds accessors for the bit attached to NominalTypeDecl::LookupTable. These let you more easily search for and break on changes; more importantly, they give this bit a name and some documentation explaining what (I think) it means.

2. Includes the “ignoreNewExtensions” parameter in the debug output from NominalTypeDecl::lookupDirect().

3. Adds a MemberLookupTable::dump() method.
These methods are never used and appear to be holdovers from an earlier implementation.
Previously, nothing would guarantee that, if a nominal type `T` with lazy members had extensions adding members with name `foo`, and these members were already in `T`’s LookupTable, and a new extension to `T` added another member named `foo`, the new extension’s member would be added to the LookupTable.

This change makes it so that adding an extension clears the isLookupTablePopulated() flag, and so that when the flag is cleared on a type with lazy members, it updates all existing entries instead of just the ones present in the type itself.

Finally, it removes a workaround in string interpolation which is no longer necessary because of this change.
Apparently, the macOS STL is more forgiving about tuples vs. pairs than the Linux one.
As currently written, the optimizer can completely remove the involvement of CustomString in some cases. We probably don’t want that, so let’s fix it.
Probably responsible for at least part of performance gap.
This change tests that, in string interpolation, code completion doesn’t suggest Void functions but suggests functions returning other types.

This is a change in behavior, but I’m not convinced the old version was correct in the first place—why would you want to interpolate a function that returns Void? It’s valid but not useful, and code completion seems to try to avoid suggesting Void functions in other similar circumstances.
This already didn't work on 32-bit platforms; now it's not working on 64-bit Linux. Filed as SR-9008.
They will return, probably in a separate pull request; I just want to see how much they're inflating StringInterpolation.o's code size.
Gets most of the performance win without most of the code size loss.
Should avoid relying so heavily on the optimizer realizing it can remove branches.

This had somewhat complicated performance impacts in local benchmarking; we’ll see what it does on CI.
This should allow us to separately control how appends are inlined for string interpolation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants