Implementation for SE-0228: Fix ExpressibleByStringInterpolation #19963

beccadax · 2018-10-20T01:13:02Z

Just a rebase of the previous PR, #18590.

This PR implements a new string interpolation API with a different, finalized ABI from what we're currently shipping. We'd like to land it before the ABI freezes; otherwise we'll have to also support the existing initializers forever.

Remaining issues to resolve:

~1% compiler performance regression (down from about 1.5% before). Still looking into that final percentage point.
StringWordBuilderReservingCapacity showing ~10% slowdown. Need to investigate.
A few other slowdowns in random benchmarks with no apparent connection to string interpolation. Need to see how much of that is noise and how much is real.
I need to do one last sweep through it for little cleanups I might have missed.

Issues we'll wait to address:

Double and Float80 are faster to interpolate than before, but Float is slower. It's hard to test this in isolation while the rest of the branch is in flux, so we plan to land the change and then look at Float interpolation.
On Linux only, the optimizer can't completely optimize away a string interpolation whose result is not used. We were already skipping this test on 32-bit architectures because it was broken there. Filed as SR-9008 so we don't lose track of it.

Implements SE-0228. Resolves SR-1260, SR-2303, and SR-3969. Resolves rdar://43621912.

cc @milseman @ravikandhadai

beccadax · 2018-10-20T01:15:04Z

@swift-ci please test

beccadax · 2018-10-20T07:13:22Z

@swift-ci please test compiler performance

swift-ci · 2018-10-20T14:21:52Z

Build comment file:

Summary for master full

Unexpected test results, excluded stats for RxSwift, Alamofire, Wordy, ReactiveSwift

Regressions found (see below)

Debug-batch
- debug-batch brief
- debug-batch detailed
Release
- release brief
- release detailed

Debug-batch

debug-batch brief

Regressed (0)

name	old	new	delta	delta_pct

Improved (0)

name	old	new	delta	delta_pct

Unchanged (delta < 1.0% or delta < 100.0ms) (3)

name	old	new	delta	delta_pct
Frontend.NumInstructionsExecuted	12,640,921,351,020	12,641,723,574,681	802,223,661	0.01%
LLVM.NumLLVMBytesOutput	662,083,340	664,908,042	2,824,702	0.43%
time.swift-driver.wall	1425.6s	1436.2s	10.6s	0.74%

debug-batch detailed

Regressed (8)

name	old	new	delta	delta_pct
AST.NumSourceLinesPerSecond	1,030,294	1,041,031	10,737	1.04% ⛔
Driver.NumDriverPipePolls	243,822	251,876	8,054	3.3% ⛔
Driver.NumDriverPipeReads	256,987	265,129	8,142	3.17% ⛔
Sema.IsDynamicRequest	1,048,711	1,059,694	10,983	1.05% ⛔
Sema.IsObjCRequest	876,684	886,004	9,320	1.06% ⛔
Sema.NumConstraintsConsideredForEdgeContraction	19,046,552	30,570,720	11,524,168	60.51% ⛔
Sema.NumDeclsValidated	1,082,903	1,139,247	56,344	5.2% ⛔
Sema.SetterAccessLevelRequest	68,767	78,408	9,641	14.02% ⛔

Improved (13)

name	old	new	delta	delta_pct
IRModule.NumIRInsts	27,074,209	26,458,651	-615,558	-2.27% ✅
Sema.ExtendedNominalRequest	1,515,737	1,496,810	-18,927	-1.25% ✅
Sema.InheritedDeclsReferencedRequest	57,332,776	56,299,090	-1,033,686	-1.8% ✅
Sema.NamedLazyMemberLoadSuccessCount	12,667,636	10,718,389	-1,949,247	-15.39% ✅
Sema.NominalTypeLookupDirectCount	20,069,298	18,948,643	-1,120,655	-5.58% ✅
Sema.NumConstraintScopes	11,941,716	11,741,707	-200,009	-1.67% ✅
Sema.NumDeclsDeserialized	15,526,965	15,323,480	-203,485	-1.31% ✅
Sema.NumLazyGenericEnvironments	3,353,325	3,299,042	-54,283	-1.62% ✅
Sema.NumLazyGenericEnvironmentsLoaded	87,831	86,648	-1,183	-1.35% ✅
Sema.NumLeafScopes	8,887,552	8,076,125	-811,427	-9.13% ✅
Sema.NumTypesDeserialized	6,440,344	6,366,955	-73,389	-1.14% ✅
Sema.SelfBoundsFromWhereClauseRequest	31,574,686	29,330,767	-2,243,919	-7.11% ✅
Sema.SuperclassDeclRequest	47,777,023	47,079,359	-697,664	-1.46% ✅

Unchanged (delta < 1.0% or delta < 100.0ms) (74)

name	old	new	delta	delta_pct
AST.NumASTBytesAllocated	15,174,614,322	15,242,256,677	67,642,355	0.45%
AST.NumDecls	45,698	45,698	0	0.0%
AST.NumDependencies	99,631	99,632	1	0.0%
AST.NumImportedExternalDefinitions	722,995	720,065	-2,930	-0.41%
AST.NumInfixOperators	19,270	19,270	0	0.0%
AST.NumLinkLibraries	0	0	0	0.0%
AST.NumLoadedModules	120,032	120,032	0	0.0%
AST.NumLocalTypeDecls	79	79	0	0.0%
AST.NumObjCMethods	11,948	11,948	0	0.0%
AST.NumPostfixOperators	14	14	0	0.0%
AST.NumPrecedenceGroups	8,925	8,925	0	0.0%
AST.NumPrefixOperators	61	61	0	0.0%
AST.NumReferencedDynamicNames	38	38	0	0.0%
AST.NumReferencedMemberNames	2,473,362	2,451,711	-21,651	-0.88%
AST.NumReferencedTopLevelNames	147,846	148,146	300	0.2%
AST.NumSourceBuffers	139,775	139,762	-13	-0.01%
AST.NumSourceLines	1,508,176	1,508,176	0	0.0%
AST.NumTotalClangImportedEntities	2,534,085	2,532,304	-1,781	-0.07%
AST.NumUsedConformances	141,724	140,612	-1,112	-0.78%
Driver.ChildrenMaxRSS	49,801,834,496	50,130,507,776	328,673,280	0.66%
Driver.DriverDepCascadingDynamic	0	0	0	0.0%
Driver.DriverDepCascadingExternal	0	0	0	0.0%
Driver.DriverDepCascadingMember	0	0	0	0.0%
Driver.DriverDepCascadingNominal	0	0	0	0.0%
Driver.DriverDepCascadingTopLevel	0	0	0	0.0%
Driver.DriverDepDynamic	0	0	0	0.0%
Driver.DriverDepExternal	0	0	0	0.0%
Driver.DriverDepMember	0	0	0	0.0%
Driver.DriverDepNominal	0	0	0	0.0%
Driver.DriverDepTopLevel	0	0	0	0.0%
Driver.NumDriverJobsRun	9,620	9,620	0	0.0%
Driver.NumDriverJobsSkipped	0	0	0	0.0%
Driver.NumProcessFailures	0	0	0	0.0%
Frontend.MaxMallocUsage	198,157,522,144	198,100,226,504	-57,295,640	-0.03%
Frontend.NumInstructionsExecuted	12,640,921,351,020	12,641,723,574,681	802,223,661	0.01%
Frontend.NumProcessFailures	0	0	0	0.0%
IRModule.NumIRAliases	70,865	70,865	0	0.0%
IRModule.NumIRBasicBlocks	2,392,943	2,400,545	7,602	0.32%
IRModule.NumIRComdatSymbols	0	0	0	0.0%
IRModule.NumIRFunctions	1,238,912	1,243,693	4,781	0.39%
IRModule.NumIRGlobals	1,455,063	1,458,333	3,270	0.22%
IRModule.NumIRIFuncs	0	0	0	0.0%
IRModule.NumIRNamedMetaData	46,916	46,916	0	0.0%
IRModule.NumIRValueSymbols	2,388,557	2,396,590	8,033	0.34%
LLVM.NumLLVMBytesOutput	662,083,340	664,908,042	2,824,702	0.43%
Parse.NumFunctionsParsed	1,023,954	1,023,954	0	0.0%
Parse.NumIterableDeclContextParsed	361,425	361,425	0	0.0%
SILModule.NumSILGenDefaultWitnessTables	0	0	0	0.0%
SILModule.NumSILGenFunctions	1,179,679	1,183,105	3,426	0.29%
SILModule.NumSILGenGlobalVariables	24,089	24,089	0	0.0%
SILModule.NumSILGenVtables	4,500	4,500	0	0.0%
SILModule.NumSILGenWitnessTables	27,028	27,028	0	0.0%
SILModule.NumSILOptDefaultWitnessTables	0	0	0	0.0%
SILModule.NumSILOptFunctions	916,962	916,008	-954	-0.1%
SILModule.NumSILOptGlobalVariables	24,509	24,509	0	0.0%
SILModule.NumSILOptVtables	8,605	8,605	0	0.0%
SILModule.NumSILOptWitnessTables	54,529	54,550	21	0.04%
Sema.AccessLevelRequest	1,167,662	1,177,266	9,604	0.82%
Sema.DefaultAndMaxAccessLevelRequest	27,369	27,363	-6	-0.02%
Sema.EnumRawTypeRequest	8,565	8,565	0	0.0%
Sema.InheritedTypeRequest	338,038	337,718	-320	-0.09%
Sema.NamedLazyMemberLoadFailureCount	14,232	14,232	0	0.0%
Sema.NumConformancesDeserialized	1,694,357	1,684,348	-10,009	-0.59%
Sema.NumFunctionsTypechecked	672,598	670,475	-2,123	-0.32%
Sema.NumGenericSignatureBuilders	535,438	532,014	-3,424	-0.64%
Sema.NumLazyIterableDeclContexts	3,062,676	3,059,301	-3,375	-0.11%
Sema.NumTypesValidated	698,799	698,707	-92	-0.01%
Sema.NumUnloadedLazyIterableDeclContexts	2,437,132	2,437,970	838	0.03%
Sema.OverriddenDeclsRequest	921,881	930,528	8,647	0.94%
Sema.RequirementRequest	21,734	21,734	0	0.0%
Sema.SuperclassTypeRequest	16,038	16,038	0	0.0%
Sema.TypeDeclsFromWhereClauseRequest	12,947	12,946	-1	-0.01%
Sema.USRGenerationRequest	228,857	228,857	0	0.0%
Sema.UnderlyingTypeDeclsReferencedRequest	1,710,833	1,709,608	-1,225	-0.07%

Release

release brief

Regressed (1)

name	old	new	delta	delta_pct
time.swift-driver.wall	3058.6s	3091.9s	33.3s	1.09% ⛔

Improved (1)

name	old	new	delta	delta_pct
Frontend.NumInstructionsExecuted	41,281,893,753,877	34,802,066,407,765	-6,479,827,346,112	-15.7% ✅

Unchanged (delta < 1.0% or delta < 100.0ms) (1)

name	old	new	delta	delta_pct
LLVM.NumLLVMBytesOutput	549,242,072	552,630,532	3,388,460	0.62%

release detailed

Regressed (2)

name	old	new	delta	delta_pct
IRModule.NumIRBasicBlocks	2,308,975	2,424,123	115,148	4.99% ⛔
Sema.NumDeclsValidated	558,735	613,398	54,663	9.78% ⛔

Improved (4)

name	old	new	delta	delta_pct
AST.NumImportedExternalDefinitions	165,607	163,001	-2,606	-1.57% ✅
AST.NumTotalClangImportedEntities	552,778	545,887	-6,891	-1.25% ✅
Sema.NumConstraintScopes	10,919,367	10,786,844	-132,523	-1.21% ✅
Sema.NumLazyIterableDeclContexts	494,296	487,725	-6,571	-1.33% ✅

Unchanged (delta < 1.0% or delta < 100.0ms) (17)

name	old	new	delta	delta_pct
AST.NumLoadedModules	10,292	10,226	-66	-0.64%
AST.NumUsedConformances	145,741	144,477	-1,264	-0.87%
IRModule.NumIRFunctions	991,062	990,640	-422	-0.04%
IRModule.NumIRGlobals	1,097,600	1,092,929	-4,671	-0.43%
IRModule.NumIRInsts	19,305,606	19,261,314	-44,292	-0.23%
IRModule.NumIRValueSymbols	1,926,368	1,920,856	-5,512	-0.29%
LLVM.NumLLVMBytesOutput	549,242,072	552,630,532	3,388,460	0.62%
SILModule.NumSILGenFunctions	433,067	432,955	-112	-0.03%
SILModule.NumSILOptFunctions	618,660	621,313	2,653	0.43%
Sema.NumConformancesDeserialized	1,250,131	1,252,896	2,765	0.22%
Sema.NumDeclsDeserialized	3,794,782	3,764,277	-30,505	-0.8%
Sema.NumFunctionsTypechecked	336,401	334,487	-1,914	-0.57%
Sema.NumGenericSignatureBuilders	122,587	121,967	-620	-0.51%
Sema.NumLazyGenericEnvironments	800,778	792,880	-7,898	-0.99%
Sema.NumLazyGenericEnvironmentsLoaded	15,578	15,433	-145	-0.93%
Sema.NumTypesDeserialized	2,129,694	2,114,506	-15,188	-0.71%
Sema.NumTypesValidated	254,946	254,836	-110	-0.04%

beccadax · 2018-10-20T15:41:34Z

@swift-ci please smoke benchmark

swift-ci · 2018-10-20T16:32:10Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
FloatingPointPrinting_Float_interpolated	37043	46170	+24.6%	0.80x
StringInterpolation	8666	10708	+23.6%	0.81x
IterateData	1623	1904	+17.3%	0.85x
RangeIterationSigned	171	200	+17.0%	0.86x
RandomDoubleLCG	913	1055	+15.6%	0.87x
ChainedFilterMap	1219	1404	+15.2%	0.87x (?)
Improvement
StringInterpolationManySmallSegments	17695	7846	-55.7%	2.26x
StringInterpolationSmall	4034	2142	-46.9%	1.88x
FloatingPointPrinting_Double_interpolated	60377	49248	-18.4%	1.23x
FloatingPointPrinting_Float80_interpolated	67680	56542	-16.5%	1.20x
ArrayAppendAsciiSubstring	29732	25304	-14.9%	1.17x
ArrayAppendStrings	8740	7943	-9.1%	1.10x
Array2D	7512	6909	-8.0%	1.09x
MapReduceAnyCollection	398	370	-7.0%	1.08x
MapReduce	427	398	-6.8%	1.07x
Added
CustomStringInterpolation	10568	10796	10646	—
CustomStringNoInterpolation	174	178	175	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringInterpolation.o	11386	16803	+47.6%	0.68x
DeadArray.o	1872	2706	+44.6%	0.69x
SequenceAlgos.o	23331	27827	+19.3%	0.84x
Exclusivity.o	4083	4659	+14.1%	0.88x
TwoSum.o	5960	6739	+13.1%	0.88x
DictionarySwap.o	27574	30790	+11.7%	0.90x
ByteSwap.o	1960	2179	+11.2%	0.90x
Fibonacci.o	1936	2150	+11.1%	0.90x
StringBuilder.o	11679	12951	+10.9%	0.90x
PopFront.o	5577	6179	+10.8%	0.90x
LinkedList.o	2263	2486	+9.9%	0.91x
DictionaryKeysContains.o	16211	17806	+9.8%	0.91x
DictionaryRemove.o	15166	16110	+6.2%	0.94x
DictTest4Legacy.o	26994	28370	+5.1%	0.95x
DropLast.o	25451	26747	+5.1%	0.95x
ErrorHandling.o	2758	2886	+4.6%	0.96x
PopFrontGeneric.o	5061	5285	+4.4%	0.96x
DictTest3.o	28392	29608	+4.3%	0.96x
DictTest4.o	26116	27220	+4.2%	0.96x
NopDeinit.o	5907	6140	+3.9%	0.96x
DictTest2.o	19736	20504	+3.9%	0.96x
HashQuadratic.o	5800	6019	+3.8%	0.96x
DriverUtils.o	168161	172849	+2.8%	0.97x
CountAlgo.o	20893	21468	+2.8%	0.97x
RomanNumbers.o	11205	11509	+2.7%	0.97x
RGBHistogram.o	24773	25349	+2.3%	0.98x
SevenBoom.o	1950	1992	+2.2%	0.98x
DictionaryGroup.o	17295	17647	+2.0%	0.98x
XorLoop.o	2280	2322	+1.8%	0.98x
main.o	46322	47154	+1.8%	0.98x
Memset.o	2392	2434	+1.8%	0.98x
StringEdits.o	14052	14292	+1.7%	0.98x
Suffix.o	26169	26601	+1.7%	0.98x
WordCount.o	65304	66296	+1.5%	0.99x
MonteCarloPi.o	1920	1946	+1.4%	0.99x
NSDictionaryCastToSwift.o	1984	2010	+1.3%	0.99x
PointerArithmetics.o	2086	2112	+1.2%	0.99x
Ackermann.o	2160	2186	+1.2%	0.99x
BitCount.o	2200	2226	+1.2%	0.99x
RandomValues.o	4135	4177	+1.0%	0.99x
Improvement
MapReduce.o	24653	21517	-12.7%	1.15x
StringTests.o	10695	9607	-10.2%	1.11x
ObjectAllocation.o	4635	4171	-10.0%	1.11x
StackPromo.o	2583	2343	-9.3%	1.10x
DropWhile.o	23724	21788	-8.2%	1.09x
ChainedFilterMap.o	3492	3230	-7.5%	1.08x
BinaryFloatingPointProperties.o	8039	7489	-6.8%	1.07x
DictionaryCopy.o	8929	8337	-6.6%	1.07x
ReduceInto.o	24735	23183	-6.3%	1.07x
DropFirst.o	25500	24028	-5.8%	1.06x
LazyFilter.o	9794	9234	-5.7%	1.06x
Hash.o	29075	27683	-4.8%	1.05x
SetTests.o	59689	57113	-4.3%	1.05x
MonteCarloE.o	3624	3490	-3.7%	1.04x
DictionaryBridgeToObjC.o	6959	6703	-3.7%	1.04x
DictionaryCompactMapValues.o	22510	21694	-3.6%	1.04x
PrefixWhile.o	24046	23342	-2.9%	1.03x
Prefix.o	24945	24241	-2.8%	1.03x
SortIntPyramids.o	9701	9429	-2.8%	1.03x
ObjectiveCBridgingStubs.o	9915	9637	-2.8%	1.03x
ObjectiveCBridging.o	45589	44325	-2.8%	1.03x
ArrayLiteral.o	3531	3461	-2.0%	1.02x
ArrayAppend.o	38982	38310	-1.7%	1.02x
Walsh.o	9432	9282	-1.6%	1.02x
Queue.o	14603	14411	-1.3%	1.01x
StrToInt.o	6659	6579	-1.2%	1.01x
StrComplexWalk.o	3456	3418	-1.1%	1.01x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduceLazyCollectionShort	62	85	+37.1%	0.73x
FloatingPointPrinting_Float_interpolated	36978	46166	+24.8%	0.80x
StringInterpolation	8940	11041	+23.5%	0.81x
RangeIterationSigned	171	200	+17.0%	0.86x
IterateData	1581	1779	+12.5%	0.89x
Improvement
StringInterpolationManySmallSegments	16073	7856	-51.1%	2.05x
StringInterpolationSmall	4000	2169	-45.8%	1.84x
ArrayAppendStrings	7858	7024	-10.6%	1.12x
PrefixWhileAnyCollectionLazy	176	159	-9.7%	1.11x
DropLastAnyCollection	65	59	-9.2%	1.10x
Array2D	7208	6611	-8.3%	1.09x
RandomDoubleLCG	967	895	-7.4%	1.08x
Added
CustomStringInterpolation	10727	10842	10766	—
CustomStringNoInterpolation	195	198	196	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StringInterpolation.o	9163	14627	+59.6%	0.63x
DeadArray.o	1698	2494	+46.9%	0.68x
Fibonacci.o	1845	2218	+20.2%	0.83x
ByteSwap.o	1869	2242	+20.0%	0.83x
LinkedList.o	2124	2497	+17.6%	0.85x
Exclusivity.o	3815	4295	+12.6%	0.89x
StringBuilder.o	11210	12599	+12.4%	0.89x
TwoSum.o	5645	6302	+11.6%	0.90x
RomanNumbers.o	5806	6382	+9.9%	0.91x
PopFront.o	5214	5694	+9.2%	0.92x
DictionaryKeysContains.o	15035	16374	+8.9%	0.92x
PopFrontGeneric.o	4937	5369	+8.8%	0.92x
SequenceAlgos.o	26092	28195	+8.1%	0.93x
DictTest4Legacy.o	22649	24073	+6.3%	0.94x
DictTest4.o	21259	22571	+6.2%	0.94x
DictTest2.o	15986	16882	+5.6%	0.95x
DictTest3.o	22514	23586	+4.8%	0.95x
ErrorHandling.o	3062	3206	+4.7%	0.96x
DriverUtils.o	144185	149977	+4.0%	0.96x
CountAlgo.o	14368	14848	+3.3%	0.97x
main.o	43593	44393	+1.8%	0.98x
PointerArithmetics.o	1987	2019	+1.6%	0.98x
DictOfArraysToArrayOfDicts.o	32500	33017	+1.6%	0.98x
XorLoop.o	2074	2106	+1.5%	0.98x
Ackermann.o	2085	2117	+1.5%	0.98x
Memset.o	2141	2173	+1.5%	0.99x
RemoveWhere.o	24302	24606	+1.3%	0.99x
CSVParsing.o	37068	37516	+1.2%	0.99x
CString.o	6322	6386	+1.0%	0.99x
Improvement
MapReduce.o	22365	18605	-16.8%	1.20x
StringTests.o	8497	7185	-15.4%	1.18x
ObjectAllocation.o	4505	3850	-14.5%	1.17x
DropWhile.o	23420	20668	-11.8%	1.13x
ReduceInto.o	17179	15531	-9.6%	1.11x
DropFirst.o	24692	22452	-9.1%	1.10x
ChainedFilterMap.o	3492	3188	-8.7%	1.10x
BinaryFloatingPointProperties.o	7737	7097	-8.3%	1.09x
DictionaryCopy.o	8137	7465	-8.3%	1.09x
DictionarySwap.o	25339	23371	-7.8%	1.08x
DictionarySubscriptDefault.o	27147	25163	-7.3%	1.08x
Suffix.o	25953	24081	-7.2%	1.08x
LazyFilter.o	9137	8481	-7.2%	1.08x
Hash.o	22303	20783	-6.8%	1.07x
SetTests.o	52561	49601	-5.6%	1.06x
MonteCarloE.o	3858	3650	-5.4%	1.06x
DictionaryRemove.o	14027	13291	-5.2%	1.06x
ObjectiveCBridging.o	44135	41895	-5.1%	1.05x
PrefixWhile.o	24446	23246	-4.9%	1.05x
DictionaryBridgeToObjC.o	6605	6285	-4.8%	1.05x
DictionaryCompactMapValues.o	21054	20190	-4.1%	1.04x
Walsh.o	6322	6122	-3.2%	1.03x
SortIntPyramids.o	10010	9706	-3.0%	1.03x
ObjectiveCBridgingStubs.o	9101	8831	-3.0%	1.03x
Prefix.o	24441	23849	-2.4%	1.02x
ArrayLiteral.o	3160	3096	-2.0%	1.02x
StackPromo.o	2446	2398	-2.0%	1.02x
ArrayAppend.o	37886	37278	-1.6%	1.02x
DictTest.o	48465	47777	-1.4%	1.01x
Substring.o	20129	19857	-1.4%	1.01x
RGBHistogram.o	22685	22381	-1.3%	1.01x
StrComplexWalk.o	3573	3526	-1.3%	1.01x
ReversedCollections.o	11365	11245	-1.1%	1.01x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
ArrayOfPOD	757	859	+13.5%	0.88x (?)
StringEqualPointerComparison	3571	4028	+12.8%	0.89x
StringHasPrefixAscii	5032	5543	+10.2%	0.91x
StringHasSuffixAscii	5114	5628	+10.1%	0.91x
Improvement
StringInterpolationManySmallSegments	18586	10880	-41.5%	1.71x
StringInterpolationSmall	5753	3454	-40.0%	1.67x
FloatingPointPrinting_Float80_interpolated	119243	99108	-16.9%	1.20x
ArrayAppendStrings	10363	8672	-16.3%	1.19x
Added
CustomStringInterpolation	12038	12179	12085	—
CustomStringNoInterpolation	1058	1058	1058	—

Code size: Swift libraries

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSwiftReflectionTest.dylib	49152	61440	+25.0%	0.80x
libswiftSwiftPrivate.dylib	40960	45056	+10.0%	0.91x
libswiftStdlibUnittest.dylib	409600	442368	+8.0%	0.93x
libswiftsimd.dylib	286720	303104	+5.7%	0.95x
libswiftXCTest.dylib	81920	86016	+5.0%	0.95x
libswiftNetwork.dylib	163840	167936	+2.5%	0.98x
libswiftSwiftOnoneSupport.dylib	217088	221184	+1.9%	0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib	24576	20480	-16.7%	1.20x
libswiftFoundation.dylib	1835008	1609728	-12.3%	1.14x
libswiftCore.dylib	3969024	3833856	-3.4%	1.04x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

eeckstein · 2018-10-22T17:33:02Z

We should investigate the performance and code size regressions before we land this

beccadax · 2018-10-22T21:23:36Z

@swift-ci please smoke benchmark

beccadax · 2018-10-22T21:24:07Z

Let's see how much StringInterpolation.o actually grew...

swift-ci · 2018-10-22T22:35:03Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
FloatingPointPrinting_Float_interpolated	37087	53130	+43.3%	0.70x
StringInterpolation	8916	11536	+29.4%	0.77x
RangeIterationSigned	171	200	+17.0%	0.86x
RandomDoubleLCG	910	1057	+16.2%	0.86x
ChainedFilterMap	1220	1405	+15.2%	0.87x (?)
Improvement
StringInterpolationManySmallSegments	17360	8067	-53.5%	2.15x
StringInterpolationSmall	4050	2112	-47.9%	1.92x
ArrayAppendAsciiSubstring	29641	25308	-14.6%	1.17x
ArrayAppendStrings	8641	7956	-7.9%	1.09x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1872	2706	+44.6%	0.69x
SequenceAlgos.o	23331	27827	+19.3%	0.84x
Exclusivity.o	4083	4659	+14.1%	0.88x
TwoSum.o	5960	6739	+13.1%	0.88x
DictionarySwap.o	27574	30790	+11.7%	0.90x
ByteSwap.o	1960	2179	+11.2%	0.90x
Fibonacci.o	1936	2150	+11.1%	0.90x
StringBuilder.o	11679	12951	+10.9%	0.90x
PopFront.o	5577	6179	+10.8%	0.90x
LinkedList.o	2263	2486	+9.9%	0.91x
DictionaryKeysContains.o	16211	17806	+9.8%	0.91x
DictionaryRemove.o	15166	16110	+6.2%	0.94x
DictTest4Legacy.o	26994	28370	+5.1%	0.95x
DropLast.o	25451	26747	+5.1%	0.95x
StringInterpolation.o	11386	11957	+5.0%	0.95x
ErrorHandling.o	2758	2886	+4.6%	0.96x
PopFrontGeneric.o	5061	5285	+4.4%	0.96x
DictTest3.o	28392	29608	+4.3%	0.96x
DictTest4.o	26116	27220	+4.2%	0.96x
NopDeinit.o	5907	6140	+3.9%	0.96x
DictTest2.o	19736	20504	+3.9%	0.96x
HashQuadratic.o	5800	6019	+3.8%	0.96x
DriverUtils.o	168161	172849	+2.8%	0.97x
CountAlgo.o	20893	21468	+2.8%	0.97x
RomanNumbers.o	11205	11509	+2.7%	0.97x
RGBHistogram.o	24773	25349	+2.3%	0.98x
SevenBoom.o	1950	1992	+2.2%	0.98x
DictionaryGroup.o	17295	17647	+2.0%	0.98x
XorLoop.o	2280	2322	+1.8%	0.98x
Memset.o	2392	2434	+1.8%	0.98x
StringEdits.o	14052	14292	+1.7%	0.98x
Suffix.o	26169	26601	+1.7%	0.98x
WordCount.o	65304	66296	+1.5%	0.99x
MonteCarloPi.o	1920	1946	+1.4%	0.99x
NSDictionaryCastToSwift.o	1984	2010	+1.3%	0.99x
PointerArithmetics.o	2086	2112	+1.2%	0.99x
Ackermann.o	2160	2186	+1.2%	0.99x
BitCount.o	2200	2226	+1.2%	0.99x
RandomValues.o	4135	4177	+1.0%	0.99x
Improvement
MapReduce.o	24653	21517	-12.7%	1.15x
StringTests.o	10695	9607	-10.2%	1.11x
ObjectAllocation.o	4635	4171	-10.0%	1.11x
StackPromo.o	2583	2343	-9.3%	1.10x
DropWhile.o	23724	21788	-8.2%	1.09x
ChainedFilterMap.o	3492	3230	-7.5%	1.08x
BinaryFloatingPointProperties.o	8039	7489	-6.8%	1.07x
DictionaryCopy.o	8929	8337	-6.6%	1.07x
ReduceInto.o	24735	23183	-6.3%	1.07x
DropFirst.o	25212	23708	-6.0%	1.06x
LazyFilter.o	9794	9234	-5.7%	1.06x
Hash.o	29075	27683	-4.8%	1.05x
SetTests.o	59689	57113	-4.3%	1.05x
MonteCarloE.o	3624	3490	-3.7%	1.04x
DictionaryBridgeToObjC.o	6959	6703	-3.7%	1.04x
DictionaryCompactMapValues.o	22510	21694	-3.6%	1.04x
PrefixWhile.o	24046	23342	-2.9%	1.03x
SortIntPyramids.o	9701	9429	-2.8%	1.03x
ObjectiveCBridgingStubs.o	9915	9637	-2.8%	1.03x
ObjectiveCBridging.o	45589	44325	-2.8%	1.03x
Prefix.o	24673	24161	-2.1%	1.02x
ArrayLiteral.o	3531	3461	-2.0%	1.02x
ArrayAppend.o	38982	38310	-1.7%	1.02x
Walsh.o	9432	9282	-1.6%	1.02x
Queue.o	14603	14411	-1.3%	1.01x
StrToInt.o	6659	6579	-1.2%	1.01x
StrComplexWalk.o	3456	3418	-1.1%	1.01x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduceLazyCollectionShort	41	85	+107.3%	0.48x
StringInterpolation	8700	11824	+35.9%	0.74x
FloatingPointPrinting_Float_interpolated	37257	46839	+25.7%	0.80x
RangeIterationSigned	171	200	+17.0%	0.86x
StringWordBuilderReservingCapacity	1146	1246	+8.7%	0.92x (?)
Improvement
StringInterpolationManySmallSegments	16037	7920	-50.6%	2.02x
StringInterpolationSmall	3962	2112	-46.7%	1.88x
FloatingPointPrinting_Double_interpolated	60113	49863	-17.1%	1.21x
FloatingPointPrinting_Float80_interpolated	64788	56270	-13.1%	1.15x
PrefixWhileAnyCollectionLazy	176	159	-9.7%	1.11x
DropLastAnyCollection	65	59	-9.2%	1.10x
IterateData	1768	1609	-9.0%	1.10x (?)
RandomDoubleLCG	981	893	-9.0%	1.10x (?)
ArrayAppendStrings	7797	7113	-8.8%	1.10x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1698	2494	+46.9%	0.68x
Fibonacci.o	1845	2218	+20.2%	0.83x
ByteSwap.o	1869	2242	+20.0%	0.83x
LinkedList.o	2124	2497	+17.6%	0.85x
Exclusivity.o	3815	4295	+12.6%	0.89x
StringBuilder.o	11210	12599	+12.4%	0.89x
TwoSum.o	5645	6302	+11.6%	0.90x
RomanNumbers.o	5806	6382	+9.9%	0.91x
StringInterpolation.o	9163	10039	+9.6%	0.91x
PopFront.o	5214	5694	+9.2%	0.92x
DictionaryKeysContains.o	15035	16374	+8.9%	0.92x
PopFrontGeneric.o	4937	5369	+8.8%	0.92x
SequenceAlgos.o	26092	28195	+8.1%	0.93x
DictTest4Legacy.o	22649	24073	+6.3%	0.94x
DictTest4.o	21259	22571	+6.2%	0.94x
DictTest2.o	15986	16882	+5.6%	0.95x
DictTest3.o	22514	23586	+4.8%	0.95x
ErrorHandling.o	3062	3206	+4.7%	0.96x
DriverUtils.o	144185	149977	+4.0%	0.96x
CountAlgo.o	14368	14848	+3.3%	0.97x
PointerArithmetics.o	1987	2019	+1.6%	0.98x
DictOfArraysToArrayOfDicts.o	32500	33017	+1.6%	0.98x
XorLoop.o	2074	2106	+1.5%	0.98x
Ackermann.o	2085	2117	+1.5%	0.98x
Memset.o	2141	2173	+1.5%	0.99x
RemoveWhere.o	24302	24606	+1.3%	0.99x
CSVParsing.o	37068	37516	+1.2%	0.99x
CString.o	6322	6386	+1.0%	0.99x
Improvement
MapReduce.o	22365	18605	-16.8%	1.20x
StringTests.o	8497	7185	-15.4%	1.18x
ObjectAllocation.o	4505	3850	-14.5%	1.17x
DropWhile.o	23420	20668	-11.8%	1.13x
ReduceInto.o	17179	15531	-9.6%	1.11x
DropFirst.o	23796	21556	-9.4%	1.10x
ChainedFilterMap.o	3492	3188	-8.7%	1.10x
BinaryFloatingPointProperties.o	7737	7097	-8.3%	1.09x
DictionaryCopy.o	8137	7465	-8.3%	1.09x
DictionarySwap.o	25339	23371	-7.8%	1.08x
DictionarySubscriptDefault.o	27147	25163	-7.3%	1.08x
Suffix.o	25953	24081	-7.2%	1.08x
LazyFilter.o	9137	8481	-7.2%	1.08x
Hash.o	22303	20783	-6.8%	1.07x
SetTests.o	52561	49601	-5.6%	1.06x
MonteCarloE.o	3858	3650	-5.4%	1.06x
DictionaryRemove.o	14027	13291	-5.2%	1.06x
ObjectiveCBridging.o	44135	41895	-5.1%	1.05x
PrefixWhile.o	24446	23246	-4.9%	1.05x
DictionaryBridgeToObjC.o	6605	6285	-4.8%	1.05x
DictionaryCompactMapValues.o	21054	20190	-4.1%	1.04x
Walsh.o	6322	6122	-3.2%	1.03x
SortIntPyramids.o	10010	9706	-3.0%	1.03x
ObjectiveCBridgingStubs.o	9101	8831	-3.0%	1.03x
Prefix.o	23777	23169	-2.6%	1.03x
ArrayLiteral.o	3160	3096	-2.0%	1.02x
StackPromo.o	2446	2398	-2.0%	1.02x
ArrayAppend.o	37886	37278	-1.6%	1.02x
DictTest.o	48465	47777	-1.4%	1.01x
Substring.o	20129	19857	-1.4%	1.01x
RGBHistogram.o	22685	22381	-1.3%	1.01x
StrComplexWalk.o	3573	3526	-1.3%	1.01x
ReversedCollections.o	11365	11245	-1.1%	1.01x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringEqualPointerComparison	3600	4057	+12.7%	0.89x
ArrayOfPOD	755	842	+11.5%	0.90x
StringHasPrefixAscii	5032	5600	+11.3%	0.90x
ArrayOfGenericPOD2	1066	1180	+10.7%	0.90x (?)
StringHasSuffixAscii	5171	5714	+10.5%	0.90x
Improvement
StringInterpolationSmall	6349	3636	-42.7%	1.75x
StringInterpolationManySmallSegments	18220	10691	-41.3%	1.70x
ArrayAppendStrings	10400	8645	-16.9%	1.20x
FloatingPointPrinting_Double_interpolated	95618	80134	-16.2%	1.19x
Combos	2494	2219	-11.0%	1.12x (?)

Code size: Swift libraries

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSwiftReflectionTest.dylib	49152	57344	+16.7%	0.86x
libswiftSwiftPrivate.dylib	40960	45056	+10.0%	0.91x
libswiftStdlibUnittest.dylib	409600	442368	+8.0%	0.93x
libswiftsimd.dylib	286720	303104	+5.7%	0.95x
libswiftXCTest.dylib	81920	86016	+5.0%	0.95x
libswiftNetwork.dylib	163840	167936	+2.5%	0.98x
libswiftSwiftOnoneSupport.dylib	217088	221184	+1.9%	0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib	24576	20480	-16.7%	1.20x
libswiftFoundation.dylib	1830912	1609728	-12.1%	1.14x
libswiftCore.dylib	3964928	3829760	-3.4%	1.04x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

beccadax · 2018-10-22T22:48:38Z

So StringInterpolation.o's existing benchmarks grew 5% in -O and 9.6% in -Osize; it's only the new benchmarks that made it look like it had grown 50-60%.

Still need to get a handle on the rest of the code size increase, of course. I'm guessing it's because we're inlining more—we just need to find the right balance there.

Before that, though, let's try a potential cheap fix for the FloatingPointPrinting_Float_interpolated performance regression.

beccadax · 2018-10-22T22:48:55Z

@swift-ci please smoke benchmark

swift-ci · 2018-10-22T23:18:23Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
FloatingPointPrinting_Float_interpolated	37033	50196	+35.5%	0.74x
StringInterpolation	8676	10734	+23.7%	0.81x
ChainedFilterMap	1219	1405	+15.3%	0.87x (?)
RandomDoubleLCG	933	1057	+13.3%	0.88x (?)
RangeIterationSigned	181	200	+10.5%	0.91x
Improvement
StringInterpolationManySmallSegments	18000	8131	-54.8%	2.21x
StringInterpolationSmall	3970	2111	-46.8%	1.88x
ArrayAppendAsciiSubstring	29655	25414	-14.3%	1.17x
ArrayAppendStrings	8660	8029	-7.3%	1.08x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1872	2706	+44.6%	0.69x
SequenceAlgos.o	23331	27827	+19.3%	0.84x
Exclusivity.o	4083	4659	+14.1%	0.88x
TwoSum.o	5960	6739	+13.1%	0.88x
DictionarySwap.o	27574	30790	+11.7%	0.90x
ByteSwap.o	1960	2179	+11.2%	0.90x
Fibonacci.o	1936	2150	+11.1%	0.90x
StringBuilder.o	11679	12951	+10.9%	0.90x
PopFront.o	5577	6179	+10.8%	0.90x
LinkedList.o	2263	2486	+9.9%	0.91x
DictionaryKeysContains.o	16211	17806	+9.8%	0.91x
DictionaryRemove.o	15166	16110	+6.2%	0.94x
DictTest4Legacy.o	26994	28370	+5.1%	0.95x
DropLast.o	25451	26747	+5.1%	0.95x
StringInterpolation.o	11386	11957	+5.0%	0.95x
ErrorHandling.o	2758	2886	+4.6%	0.96x
PopFrontGeneric.o	5061	5285	+4.4%	0.96x
DictTest3.o	28392	29608	+4.3%	0.96x
DictTest4.o	26116	27220	+4.2%	0.96x
FloatingPointPrinting.o	7223	7511	+4.0%	0.96x
NopDeinit.o	5907	6140	+3.9%	0.96x
DictTest2.o	19736	20504	+3.9%	0.96x
HashQuadratic.o	5800	6019	+3.8%	0.96x
DriverUtils.o	168161	172849	+2.8%	0.97x
CountAlgo.o	20893	21468	+2.8%	0.97x
RomanNumbers.o	11205	11509	+2.7%	0.97x
RGBHistogram.o	24773	25349	+2.3%	0.98x
SevenBoom.o	1950	1992	+2.2%	0.98x
DictionaryGroup.o	17295	17647	+2.0%	0.98x
XorLoop.o	2280	2322	+1.8%	0.98x
Memset.o	2392	2434	+1.8%	0.98x
StringEdits.o	14052	14292	+1.7%	0.98x
Suffix.o	26169	26601	+1.7%	0.98x
WordCount.o	65304	66296	+1.5%	0.99x
MonteCarloPi.o	1920	1946	+1.4%	0.99x
NSDictionaryCastToSwift.o	1984	2010	+1.3%	0.99x
PointerArithmetics.o	2086	2112	+1.2%	0.99x
Ackermann.o	2160	2186	+1.2%	0.99x
BitCount.o	2200	2226	+1.2%	0.99x
RandomValues.o	4135	4177	+1.0%	0.99x
Improvement
MapReduce.o	24653	21517	-12.7%	1.15x
StringTests.o	10695	9607	-10.2%	1.11x
ObjectAllocation.o	4635	4171	-10.0%	1.11x
StackPromo.o	2583	2343	-9.3%	1.10x
DropWhile.o	23724	21788	-8.2%	1.09x
ChainedFilterMap.o	3492	3230	-7.5%	1.08x
BinaryFloatingPointProperties.o	8039	7489	-6.8%	1.07x
DictionaryCopy.o	8929	8337	-6.6%	1.07x
ReduceInto.o	24735	23183	-6.3%	1.07x
DropFirst.o	25212	23708	-6.0%	1.06x
LazyFilter.o	9794	9234	-5.7%	1.06x
Hash.o	29075	27683	-4.8%	1.05x
SetTests.o	59689	57113	-4.3%	1.05x
MonteCarloE.o	3624	3490	-3.7%	1.04x
DictionaryBridgeToObjC.o	6959	6703	-3.7%	1.04x
DictionaryCompactMapValues.o	22510	21694	-3.6%	1.04x
PrefixWhile.o	24046	23342	-2.9%	1.03x
SortIntPyramids.o	9701	9429	-2.8%	1.03x
ObjectiveCBridgingStubs.o	9915	9637	-2.8%	1.03x
ObjectiveCBridging.o	45589	44325	-2.8%	1.03x
Prefix.o	24673	24161	-2.1%	1.02x
ArrayLiteral.o	3531	3461	-2.0%	1.02x
ArrayAppend.o	38982	38310	-1.7%	1.02x
Walsh.o	9432	9282	-1.6%	1.02x
Queue.o	14603	14411	-1.3%	1.01x
StrToInt.o	6659	6579	-1.2%	1.01x
StrComplexWalk.o	3456	3418	-1.1%	1.01x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduceLazyCollectionShort	41	85	+107.3%	0.48x
FloatingPointPrinting_Float_interpolated	37529	51607	+37.5%	0.73x
StringInterpolation	8642	10812	+25.1%	0.80x
RangeIterationSigned	171	200	+17.0%	0.86x
StringWordBuilderReservingCapacity	1146	1246	+8.7%	0.92x (?)
Improvement
StringInterpolationManySmallSegments	16367	7806	-52.3%	2.10x
StringInterpolationSmall	3959	2107	-46.8%	1.88x
FloatingPointPrinting_Double_interpolated	60776	48993	-19.4%	1.24x
IterateData	1774	1591	-10.3%	1.12x
PrefixWhileAnyCollectionLazy	176	159	-9.7%	1.11x
ArrayAppendStrings	7825	7081	-9.5%	1.11x (?)
DropLastAnyCollection	65	59	-9.2%	1.10x
RandomDoubleLCG	971	891	-8.2%	1.09x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1698	2494	+46.9%	0.68x
Fibonacci.o	1845	2218	+20.2%	0.83x
ByteSwap.o	1869	2242	+20.0%	0.83x
LinkedList.o	2124	2497	+17.6%	0.85x
Exclusivity.o	3815	4295	+12.6%	0.89x
StringBuilder.o	11210	12599	+12.4%	0.89x
TwoSum.o	5645	6302	+11.6%	0.90x
RomanNumbers.o	5806	6382	+9.9%	0.91x
StringInterpolation.o	9163	10039	+9.6%	0.91x
PopFront.o	5214	5694	+9.2%	0.92x
DictionaryKeysContains.o	15035	16374	+8.9%	0.92x
PopFrontGeneric.o	4937	5369	+8.8%	0.92x
SequenceAlgos.o	26092	28195	+8.1%	0.93x
DictTest4Legacy.o	22649	24073	+6.3%	0.94x
DictTest4.o	21259	22571	+6.2%	0.94x
DictTest2.o	15986	16882	+5.6%	0.95x
DictTest3.o	22514	23586	+4.8%	0.95x
ErrorHandling.o	3062	3206	+4.7%	0.96x
DriverUtils.o	144185	149977	+4.0%	0.96x
CountAlgo.o	14368	14848	+3.3%	0.97x
FloatingPointPrinting.o	6576	6768	+2.9%	0.97x
PointerArithmetics.o	1987	2019	+1.6%	0.98x
DictOfArraysToArrayOfDicts.o	32500	33017	+1.6%	0.98x
XorLoop.o	2074	2106	+1.5%	0.98x
Ackermann.o	2085	2117	+1.5%	0.98x
Memset.o	2141	2173	+1.5%	0.99x
RemoveWhere.o	24302	24606	+1.3%	0.99x
CSVParsing.o	37068	37516	+1.2%	0.99x
CString.o	6322	6386	+1.0%	0.99x
Improvement
MapReduce.o	22365	18605	-16.8%	1.20x
StringTests.o	8497	7185	-15.4%	1.18x
ObjectAllocation.o	4505	3850	-14.5%	1.17x
DropWhile.o	23420	20668	-11.8%	1.13x
ReduceInto.o	17179	15531	-9.6%	1.11x
DropFirst.o	23796	21556	-9.4%	1.10x
ChainedFilterMap.o	3492	3188	-8.7%	1.10x
BinaryFloatingPointProperties.o	7737	7097	-8.3%	1.09x
DictionaryCopy.o	8137	7465	-8.3%	1.09x
DictionarySwap.o	25339	23371	-7.8%	1.08x
DictionarySubscriptDefault.o	27147	25163	-7.3%	1.08x
Suffix.o	25953	24081	-7.2%	1.08x
LazyFilter.o	9137	8481	-7.2%	1.08x
Hash.o	22303	20783	-6.8%	1.07x
SetTests.o	52561	49601	-5.6%	1.06x
MonteCarloE.o	3858	3650	-5.4%	1.06x
DictionaryRemove.o	14027	13291	-5.2%	1.06x
ObjectiveCBridging.o	44135	41895	-5.1%	1.05x
PrefixWhile.o	24446	23246	-4.9%	1.05x
DictionaryBridgeToObjC.o	6605	6285	-4.8%	1.05x
DictionaryCompactMapValues.o	21054	20190	-4.1%	1.04x
Walsh.o	6322	6122	-3.2%	1.03x
SortIntPyramids.o	10010	9706	-3.0%	1.03x
ObjectiveCBridgingStubs.o	9101	8831	-3.0%	1.03x
Prefix.o	23777	23169	-2.6%	1.03x
ArrayLiteral.o	3160	3096	-2.0%	1.02x
StackPromo.o	2446	2398	-2.0%	1.02x
ArrayAppend.o	37886	37278	-1.6%	1.02x
DictTest.o	48465	47777	-1.4%	1.01x
Substring.o	20129	19857	-1.4%	1.01x
RGBHistogram.o	22685	22381	-1.3%	1.01x
StrComplexWalk.o	3573	3526	-1.3%	1.01x
ReversedCollections.o	11365	11245	-1.1%	1.01x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringEqualPointerComparison	3600	4057	+12.7%	0.89x
ArrayOfPOD	756	842	+11.4%	0.90x
StringHasPrefixAscii	5032	5600	+11.3%	0.90x
ArrayOfGenericPOD2	1066	1180	+10.7%	0.90x (?)
StringHasSuffixAscii	5171	5714	+10.5%	0.90x
Improvement
StringInterpolationSmall	6342	3389	-46.6%	1.87x
StringInterpolationManySmallSegments	18108	10585	-41.5%	1.71x
FloatingPointPrinting_Double_interpolated	102227	69273	-32.2%	1.48x
FloatingPointPrinting_Float80_interpolated	135522	100147	-26.1%	1.35x
ArrayAppendStrings	10767	8671	-19.5%	1.24x

Code size: Swift libraries

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSwiftReflectionTest.dylib	49152	57344	+16.7%	0.86x
libswiftSwiftPrivate.dylib	40960	45056	+10.0%	0.91x
libswiftStdlibUnittest.dylib	409600	442368	+8.0%	0.93x
libswiftsimd.dylib	286720	303104	+5.7%	0.95x
libswiftXCTest.dylib	81920	86016	+5.0%	0.95x
libswiftNetwork.dylib	163840	167936	+2.5%	0.98x
libswiftSwiftOnoneSupport.dylib	217088	221184	+1.9%	0.98x
Improvement
libswiftSwiftPrivateLibcExtras.dylib	24576	20480	-16.7%	1.20x
libswiftFoundation.dylib	1830912	1609728	-12.1%	1.14x
libswiftCore.dylib	3964928	3829760	-3.4%	1.04x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

beccadax · 2018-10-23T03:05:24Z

Did some offline benchmarking since CI was looking a little noisy. The TextOutputStreamable conformance does actually improve the performance of FloatingPointPrinting_Float_interpolated:

IMPLEMENTATION	MIN(μs)	MAX(μs)	MEAN(μs)	SD(μs)	MEDIAN(μs)
No `TextOutputStreamable`	108,746	114,023	109,509	571	109,359
`description.write(to:)`	108,830	111,132	109,499	524	109,311
`_writeASCII(_:)` (proposed)	106,688	111,552	107,625	725	107,491

It's just that something else about new string interpolation is slower for all float types than the old version, and TextOutputStreamable helps Double and Float80 enough to make up for it, but not Float.

Comparing this branch's -emit-sil dump of run_FloatingPointPrinting_Float_interpolated(_:) to master's, this branch generates 551 lines of SIL in 70 basic blocks, while master generates only 390 lines in 47 basic blocks. The broad sketches of the two functions are similar, but this branch generates a number of basic blocks that either are never generated in master or are optimized away. I'll try to clean things up tomorrow so I can get a reasonable diff and figure out what the changes represent.

beccadax · 2018-10-25T00:03:30Z

A large part of the problem is the inlining of an early exit from append(_:). This is intended to improve performance, and on the whole it does, but it also tends to hugely convolute control flow. For example, run_Fibonacci(_:) (which uses string interpolation only on a failure path inlined from CheckResults(_:)) has 13 basic blocks without this inlining, but 69(!) with it. It also inflates code size (changes < 1% omitted):

Benchmark	Size with inlining	Size without inlining	Percentage Change
Regression (2)
MapReduce.o	34,875 b	35,627 b	2.2%
DropWhile.o	23,835 b	24,251 b	1.7%
Improvement (74)
DeadArray.o	3,472 b	2,437 b	-29.8%
ByteSwap.o	3,029 b	2,245 b	-25.9%
Fibonacci.o	2,869 b	2,165 b	-24.5%
LinkedList.o	3,013 b	2,309 b	-23.4%
StringInterpolation.o	16,241 b	12,529 b	-22.9%
Exclusivity.o	4,858 b	3,861 b	-20.5%
SequenceAlgos.o	31,691 b	25,531 b	-19.4%
MonteCarloPi.o	2,528 b	2,149 b	-15.0%
SevenBoom.o	2,560 b	2,181 b	-14.8%
RangeIteration.o	2,592 b	2,213 b	-14.6%
PointerArithmetics.o	2,640 b	2,261 b	-14.4%
ProtocolDispatch2.o	2,653 b	2,277 b	-14.2%
BitCount.o	2,848 b	2,469 b	-13.3%
NSDictionaryCastToSwift.o	3,072 b	2,693 b	-12.3%
Integrate.o	3,104 b	2,725 b	-12.2%
StackPromo.o	3,435 b	3,035 b	-11.6%
Ackermann.o	3,520 b	3,141 b	-10.8%
FloatingPointPrinting.o	7,221 b	6,453 b	-10.6%
ArrayLiteral.o	3,888 b	3,509 b	-9.7%
DictionaryBridge.o	3,995 b	3,627 b	-9.2%
StringBuilder.o	16,443 b	14,939 b	-9.1%
TwoSum.o	13,163 b	11,963 b	-9.1%
Memset.o	4,368 b	3,989 b	-8.7%
RandomValues.o	4,336 b	3,957 b	-8.7%
XorLoop.o	4,336 b	3,957 b	-8.7%
OpenClose.o	5,237 b	4,871 b	-7.0%
Calculator.o	5,648 b	5,257 b	-6.9%
DriverUtils.o	222,971 b	208,251 b	-6.6%
HashQuadratic.o	12,571 b	11,803 b	-6.1%
PopFrontGeneric.o	12,203 b	11,499 b	-5.8%
ChainedFilterMap.o	4,834 b	4,562 b	-5.6%
ObjectAllocation.o	4,552 b	4,325 b	-5.0%
ObjectiveCNoBridgingStubs.o	7,616 b	7,237 b	-5.0%
DictionaryKeysContains.o	29,403 b	27,989 b	-4.8%
MonteCarloE.o	6,795 b	6,475 b	-4.7%
RC4.o	8,139 b	7,771 b	-4.5%
NopDeinit.o	8,241 b	7,888 b	-4.3%
RomanNumbers.o	20,155 b	19,291 b	-4.3%
DictTest2.o	26,987 b	25,867 b	-4.2%
DropLast.o	33,051 b	31,675 b	-4.2%
StrComplexWalk.o	8,944 b	8,568 b	-4.2%
DictionarySwap.o	44,331 b	42,651 b	-3.8%
Suffix.o	35,403 b	34,043 b	-3.8%
StrToInt.o	10,237 b	9,869 b	-3.6%
ArraySubscript.o	10,763 b	10,395 b	-3.4%
Queue.o	21,339 b	20,603 b	-3.4%
RangeAssignment.o	11,323 b	10,955 b	-3.3%
ObjectiveCBridging.o	56,635 b	54,891 b	-3.1%
DictionaryGroup.o	29,307 b	28,491 b	-2.8%
ObjectiveCBridgingStubs.o	10,512 b	10,215 b	-2.8%
SortLettersInPlace.o	13,632 b	13,258 b	-2.7%
CountAlgo.o	26,651 b	25,963 b	-2.6%
DictionaryBridgeToObjC.o	10,539 b	10,267 b	-2.6%
RangeReplaceableCollectionPlusDefault.o	14,171 b	13,803 b	-2.6%
Substring.o	45,563 b	44,379 b	-2.6%
Combos.o	28,331 b	27,611 b	-2.5%
PopFront.o	11,547 b	11,275 b	-2.4%
DictTest4.o	36,459 b	35,611 b	-2.3%
RGBHistogram.o	37,003 b	36,203 b	-2.2%
Walsh.o	14,811 b	14,491 b	-2.2%
Hash.o	37,819 b	37,019 b	-2.1%
NibbleSort.o	17,243 b	16,875 b	-2.1%
COWTree.o	18,539 b	18,171 b	-2.0%
TestsUtils.o	21,067 b	20,699 b	-1.7%
SortIntPyramids.o	17,051 b	16,779 b	-1.6%
CString.o	24,455 b	24,087 b	-1.5%
DictionaryCompactMapValues.o	37,115 b	36,555 b	-1.5%
LuhnAlgoEager.o	24,459 b	24,091 b	-1.5%
LuhnAlgoLazy.o	24,459 b	24,091 b	-1.5%
SortLargeExistentials.o	23,787 b	23,419 b	-1.5%
LazyFilter.o	13,947 b	13,755 b	-1.4%
StringRemoveDupes.o	27,227 b	26,859 b	-1.4%
WordCount.o	86,539 b	85,403 b	-1.3%
BinaryFloatingPointProperties.o	15,755 b	15,579 b	-1.1%
DictionarySubscriptDefault.o	47,915 b	47,387 b	-1.1%

Even with the costs, though, inlining this check is often profitable, and removing it will cause performance regressions. On an iMac Pro:

With inlining vs. without — Regression (8)

TEST	WITH INLINING	WITHOUT INLINING	DELTA	RATIO
DataReplaceLarge	19054	23629	+24.0%	0.81x (?)
PointerArithmetics	22377	26316	+17.6%	0.85x
StringEqualPointerComparison	245	273	+11.4%	0.90x
DropWhileArrayLazy	88	96	+9.1%	0.92x
PrefixWhileAnyCollectionLazy	35	38	+8.6%	0.92x
DropWhileAnyCollectionLazy	65	70	+7.7%	0.93x
DataAppendSequence	12690	13485	+6.3%	0.94x (?)
ArrayAppendFromGeneric	335	355	+6.0%	0.94x (?)

With inlining vs. without — Improvement (14)

TEST	WITH INLINING	WITHOUT INLINING	DELTA	RATIO
Dictionary4	319	231	-27.6%	1.38x
SumUsingReduceInto	450	349	-22.4%	1.29x
SumUsingReduce	451	353	-21.7%	1.28x
DataAppendDataLargeToMedium	23393	18523	-20.8%	1.26x (?)
Dictionary4OfObjects	383	311	-18.8%	1.23x
StaticArray	6	5	-16.7%	1.20x (?)
DataCount	23	20	-13.0%	1.15x
DictionaryCompactMapValuesOfCastValue	10597	9329	-12.0%	1.14x
DataAppendBytes	3939	3564	-9.5%	1.11x (?)
StringAdder	863	791	-8.3%	1.09x
FatCompactMap	876	810	-7.5%	1.08x
DataAppendDataSmallToMedium	4933	4606	-6.6%	1.07x (?)
DataAppendDataMediumToMedium	5261	4917	-6.5%	1.07x
ObjectiveCBridgeStubNSDataAppend	1678	1595	-4.9%	1.05x (?)

The test being inlined is complicated—probably unnecessarily—and I'm hoping a simpler version will get us the best of both worlds. Barring that, perhaps the SIL optimization folks can make some suggestions. I mean, we can do better than this:

milseman · 2018-10-25T01:57:10Z

Just do the part that checks for the canonical empty string, that eliminates most of that CFG.

beccadax · 2018-10-25T02:48:40Z

@milseman That’s what I left my computer benchmarking—_isEmptySingleton check in an inlinable method, count == 0 && !_isNative in a non-inlinable one.

beccadax · 2018-10-25T17:58:47Z

On my own machine, with everything closed down, literally unplugged from the network, I got wildly different benchmark results for this last night vs. this morning. But I think this is at least an improvement. We'll see what CI says.

beccadax · 2018-10-25T18:46:12Z

@swift-ci please smoke benchmark

xedin · 2018-10-25T18:39:58Z

lib/Sema/CSGen.cpp

+
+      if (auto subExpr = expr->getSubExpr()) {
+        auto subExprType = CS.getType(subExpr);
+        CS.addConstraint(ConstraintKind::Bind, subExprType, tv, locator);


One small change which I think might work here - instead of biding sub-expr type to type variable, you can return subExprType directly and allocate type variable only if there was no sub-expression...

Lower priority than the runtime performance stuff, but I'll look at it. Thanks!

xedin · 2018-10-25T18:41:29Z

lib/Sema/CSGen.cpp

+          interpolationProto->lookupDirect(tc.Context.Id_StringInterpolation);
+        if (associatedTypeArray.empty()) {
+          tc.diagnose(expr->getStartLoc(), diag::interpolation_broken_proto);
+          return nullptr;


I think interpolation protocol lookup and field lookup could be moved to TypeChecker just like we do for integers e.g. TypeChecker::getMaxIntegerType

Other parts of CSGen look up associated types in various ad-hoc ways like "find the one associated type in this protocol (let's hope there's just one)" or "directly call getIdentifier() with a string literal containing the associated type's name, then look it up". That's not necessarily a good thing either, though, so maybe I should just clean all of those up.

xedin · 2018-10-25T18:42:38Z

lib/Sema/CSGen.cpp

+        // Must be Conversion; if it's Equal, then in semi-rare cases, the 
+        // interpolation temporary variable cannot be @lvalue.
+        CS.addConstraint(ConstraintKind::Conversion, appendingExprType,
+                         interpolationTV, appendingLocator);


I'm really curious of what the example of behavior described in the comment might look like

There's a minor, non-ABI improvement I want to make to this code that I can't do because it triggers this behavior. Once the branch is merged, I'll redo that work and show you the problem.

swift-ci · 2018-10-25T19:35:40Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringInterpolation	8810	10452	+18.6%	0.84x
RandomDoubleLCG	910	1057	+16.2%	0.86x
FloatingPointPrinting_Float_interpolated	43105	49647	+15.2%	0.87x
MapReduceLazyCollectionShort	31	34	+9.7%	0.91x
StringWordBuilderReservingCapacity	1160	1260	+8.6%	0.92x (?)
Improvement
StringInterpolationManySmallSegments	18122	7862	-56.6%	2.31x
StringInterpolationSmall	3995	1998	-50.0%	2.00x
FloatingPointPrinting_Double_interpolated	69313	52757	-23.9%	1.31x
FloatingPointPrinting_Float80_interpolated	70158	56722	-19.2%	1.24x
ArrayAppendAsciiSubstring	29653	25276	-14.8%	1.17x
StringBuilderSmallReservingCapacity	500	452	-9.6%	1.11x
ArrayAppendStrings	8684	7938	-8.6%	1.09x
StringBuilder	490	452	-7.8%	1.08x
StringAdder	552	510	-7.6%	1.08x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1872	2141	+14.4%	0.87x
SequenceAlgos.o	23375	25432	+8.8%	0.92x
DictionaryKeysContains.o	16211	17566	+8.4%	0.92x
DropWhile.o	23724	25388	+7.0%	0.93x
PrefixWhile.o	24046	25678	+6.8%	0.94x
Prefix.o	24673	26161	+6.0%	0.94x
MapReduce.o	24653	25933	+5.2%	0.95x
DropFirst.o	25212	26460	+5.0%	0.95x
DropLast.o	25451	26683	+4.8%	0.95x
ObjectAllocation.o	4635	4859	+4.8%	0.95x
Suffix.o	26169	27401	+4.7%	0.96x
ObjectiveCBridging.o	45589	47381	+3.9%	0.96x
StringBuilder.o	11679	12087	+3.5%	0.97x
StringTests.o	10695	11063	+3.4%	0.97x
DictionarySubscriptDefault.o	30379	31387	+3.3%	0.97x
Hash.o	29075	29971	+3.1%	0.97x
DictionarySwap.o	27598	28302	+2.6%	0.98x
DictTest3.o	28420	29108	+2.4%	0.98x
ReduceInto.o	24735	25327	+2.4%	0.98x
Substring.o	27833	28457	+2.2%	0.98x
DictionaryCopy.o	8929	9121	+2.2%	0.98x
WordCount.o	65304	66568	+1.9%	0.98x
SetTests.o	59717	60741	+1.7%	0.98x
DictionaryGroup.o	17319	17575	+1.5%	0.99x
DictionaryRemove.o	15194	15386	+1.3%	0.99x
DictTest4Legacy.o	27042	27378	+1.2%	0.99x
DictTest4.o	26168	26488	+1.2%	0.99x
ErrorHandling.o	2794	2826	+1.1%	0.99x
ObjectiveCBridgingStubs.o	9915	10021	+1.1%	0.99x
BinaryFloatingPointProperties.o	8039	8124	+1.1%	0.99x
Improvement
StackPromo.o	2583	2247	-13.0%	1.15x
Fibonacci.o	1936	1734	-10.4%	1.12x
SevenBoom.o	1950	1751	-10.2%	1.11x
RangeIteration.o	1976	1777	-10.1%	1.11x
ByteSwap.o	1960	1763	-10.1%	1.11x
NSDictionaryCastToSwift.o	1984	1785	-10.0%	1.11x
MonteCarloPi.o	1920	1728	-10.0%	1.11x
PointerArithmetics.o	2086	1887	-9.5%	1.11x
ProtocolDispatch2.o	2254	2042	-9.4%	1.10x
Ackermann.o	2160	1964	-9.1%	1.10x
BitCount.o	2200	2003	-9.0%	1.10x
ArrayLiteral.o	3531	3216	-8.9%	1.10x
XorLoop.o	2280	2083	-8.6%	1.09x
LinkedList.o	2263	2070	-8.5%	1.09x
Memset.o	2392	2199	-8.1%	1.09x
Integrate.o	2828	2629	-7.0%	1.08x
Walsh.o	9432	8837	-6.3%	1.07x
OpenClose.o	3926	3728	-5.0%	1.05x
ArraySubscript.o	4248	4036	-5.0%	1.05x
RandomValues.o	4135	3936	-4.8%	1.05x
FloatingPointPrinting.o	7223	6887	-4.7%	1.05x
StrToInt.o	6659	6358	-4.5%	1.05x
StrComplexWalk.o	3456	3305	-4.4%	1.05x
PopFrontGeneric.o	5080	4872	-4.1%	1.04x
RC4.o	4959	4771	-3.8%	1.04x
RangeAssignment.o	5264	5069	-3.7%	1.04x
StringInterpolation.o	11402	10993	-3.6%	1.04x
Calculator.o	3336	3218	-3.5%	1.04x
DictionaryBridge.o	3634	3508	-3.5%	1.04x
HashQuadratic.o	5800	5603	-3.4%	1.04x
NopDeinit.o	5907	5708	-3.4%	1.03x
ObjectiveCNoBridgingStubs.o	8323	8111	-2.5%	1.03x
RangeReplaceableCollectionPlusDefault.o	7620	7428	-2.5%	1.03x
CString.o	8795	8587	-2.4%	1.02x
ArrayAppend.o	38982	38182	-2.1%	1.02x
TwoSum.o	5960	5844	-1.9%	1.02x
SortLettersInPlace.o	12042	11811	-1.9%	1.02x
MonteCarloE.o	3624	3565	-1.6%	1.02x
ChainedFilterMap.o	3492	3437	-1.6%	1.02x
RomanNumbers.o	11205	11029	-1.6%	1.02x
Exclusivity.o	4083	4019	-1.6%	1.02x
NibbleSort.o	14520	14301	-1.5%	1.02x
COWTree.o	14600	14394	-1.4%	1.01x
Queue.o	14603	14411	-1.3%	1.01x
DriverUtils.o	166903	164807	-1.3%	1.01x
LuhnAlgoLazy.o	18256	18055	-1.1%	1.01x
LuhnAlgoEager.o	18258	18058	-1.1%	1.01x
Combos.o	16541	16366	-1.1%	1.01x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
FloatingPointPrinting_Float_interpolated	38481	49758	+29.3%	0.77x
StringInterpolation	8714	11099	+27.4%	0.79x
StringWordBuilderReservingCapacity	1146	1274	+11.2%	0.90x (?)
DropWhileAnyCollectionLazy	260	282	+8.5%	0.92x
FloatingPointPrinting_Float_description_uniform	5130	5543	+8.1%	0.93x
Improvement
StringInterpolationManySmallSegments	16684	8198	-50.9%	2.04x
StringInterpolationSmall	4003	2005	-49.9%	2.00x
FloatingPointPrinting_Double_interpolated	64524	52737	-18.3%	1.22x
FloatingPointPrinting_Float80_interpolated	67332	56759	-15.7%	1.19x
ArrayAppendStrings	8009	7030	-12.2%	1.14x
DropLastAnyCollection	67	59	-11.9%	1.14x
StringBuilderSmallReservingCapacity	499	453	-9.2%	1.10x
StringBuilder	489	453	-7.4%	1.08x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DeadArray.o	1698	2150	+26.6%	0.79x
DictionaryKeysContains.o	15035	16278	+8.3%	0.92x
StringBuilder.o	11210	11735	+4.7%	0.96x
DictTest3.o	22542	23438	+4.0%	0.96x
Substring.o	20129	20801	+3.3%	0.97x
ObjectiveCBridgingStubs.o	9101	9373	+3.0%	0.97x
Hash.o	22303	22847	+2.4%	0.98x
ObjectAllocation.o	4505	4606	+2.2%	0.98x
SequenceAlgos.o	26136	26552	+1.6%	0.98x
DictTest4Legacy.o	22697	23049	+1.6%	0.98x
ErrorHandling.o	3098	3146	+1.5%	0.98x
RomanNumbers.o	5806	5886	+1.4%	0.99x
StringRemoveDupes.o	8989	9101	+1.2%	0.99x
ReduceInto.o	17179	17387	+1.2%	0.99x
DictTest4.o	21311	21535	+1.1%	0.99x
WordCount.o	54952	55512	+1.0%	0.99x
Improvement
FloatingPointPrinting.o	6576	6160	-6.3%	1.07x
StackPromo.o	2446	2318	-5.2%	1.06x
Fibonacci.o	1845	1770	-4.1%	1.04x
ByteSwap.o	1869	1794	-4.0%	1.04x
NSDictionaryCastToSwift.o	1877	1802	-4.0%	1.04x
BitCount.o	1837	1765	-3.9%	1.04x
MonteCarloPi.o	1781	1713	-3.8%	1.04x
ProtocolDispatch2.o	2135	2056	-3.7%	1.04x
ArrayLiteral.o	3160	3048	-3.5%	1.04x
SevenBoom.o	2019	1948	-3.5%	1.04x
LinkedList.o	2124	2065	-2.8%	1.03x
Integrate.o	2705	2630	-2.8%	1.03x
DictionaryBridge.o	3671	3591	-2.2%	1.02x
ArrayAppend.o	37886	37134	-2.0%	1.02x
Queue.o	13307	13075	-1.7%	1.02x
StringInterpolation.o	9179	9027	-1.7%	1.02x
RangeIteration.o	1861	1834	-1.5%	1.01x
PointerArithmetics.o	1987	1960	-1.4%	1.01x
XorLoop.o	2074	2052	-1.1%	1.01x
ReversedCollections.o	11365	11245	-1.1%	1.01x
HashQuadratic.o	5352	5296	-1.0%	1.01x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHasPrefixAscii	5030	6655	+32.3%	0.76x
StringEqualPointerComparison	3585	4086	+14.0%	0.88x
StringHasSuffixAscii	5114	5800	+13.4%	0.88x
Dictionary3	576	642	+11.5%	0.90x
ArrayOfPOD	759	841	+10.8%	0.90x
ArrayOfGenericPOD2	1066	1179	+10.6%	0.90x (?)
Improvement
StringInterpolationSmall	6373	3343	-47.5%	1.91x
StringInterpolationManySmallSegments	18109	10768	-40.5%	1.68x
FloatingPointPrinting_Double_interpolated	99132	76103	-23.2%	1.30x
ArrayAppendStrings	10357	8730	-15.7%	1.19x

Code size: Swift libraries

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSwiftReflectionTest.dylib	49152	57344	+16.7%	0.86x
libswiftSwiftPrivate.dylib	40960	45056	+10.0%	0.91x
libswiftsimd.dylib	286720	290816	+1.4%	0.99x
Improvement
libswiftSwiftPrivateLibcExtras.dylib	24576	20480	-16.7%	1.20x
libswiftFoundation.dylib	1765376	1503232	-14.8%	1.17x
libswiftCore.dylib	3964928	3842048	-3.1%	1.03x
libswiftStdlibUnittest.dylib	409600	405504	-1.0%	1.01x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

beccadax · 2018-10-25T21:27:36Z

Not there yet, but much, much better.

CFG for Fibonacci now looks like this:

In principle, I think we should be able to resolve these at compile time in most or all cases: An instance which has passed through _StringGuts.reserveCapacitySlow(_:) is always native storage rather than the empty singleton; an instance which doesn't pass through reserveCapacitySlow(_:) should have a value known at compile time; and we can tell at compile time whether we'll call reserveCapacitySlow(_:) or not. But I don't know how to convince the optimizer of this (nor of how to extend this past the first append, since _appendSlow(_:) can turn self into the empty singleton on occasion, although I think never in string interpolation).

beccadax · 2018-10-25T21:42:50Z

Gonna rebase to get floating-point string formatting improvements from @airspeedswift.

milseman · 2018-10-25T21:54:14Z

reserveCapacitySlow, IIRC, can be called on lazily bridged Cocoa strings, so not native necessarily.

Otherwise, yeah, we should come up with some pattern or behavior to allow the optimizer to constant-fold all the subsequent branches.

beccadax · 2018-10-25T22:26:37Z

@milseman Right—what I meant is that it's never the empty singleton. (Although in string interpolation, I don't think we can ever call reserveCapacitySlow(_:) on a Cocoa string—we always create our own strings.)

Previously, the parser generated compound method names for appendInterpolation(…) methods because this helped when finding appendInterpolation methods declared in the same file. However, this turned out to prevent default arguments from working. This commit returns it to adding base names only and instead explicitly calls loadAllMembers() on the StringInterpolation type. I’m not sure why types in other expressions don’t need this but types in these generated expressions do, but it’s much closer to the problem and doesn’t seem to have any ill effects.

This change: 1. Adds accessors for the bit attached to NominalTypeDecl::LookupTable. These let you more easily search for and break on changes; more importantly, they give this bit a name and some documentation explaining what (I think) it means. 2. Includes the “ignoreNewExtensions” parameter in the debug output from NominalTypeDecl::lookupDirect(). 3. Adds a MemberLookupTable::dump() method.

These methods are never used and appear to be holdovers from an earlier implementation.

Previously, nothing would guarantee that, if a nominal type `T` with lazy members had extensions adding members with name `foo`, and these members were already in `T`’s LookupTable, and a new extension to `T` added another member named `foo`, the new extension’s member would be added to the LookupTable. This change makes it so that adding an extension clears the isLookupTablePopulated() flag, and so that when the flag is cleared on a type with lazy members, it updates all existing entries instead of just the ones present in the type itself. Finally, it removes a workaround in string interpolation which is no longer necessary because of this change.

Apparently, the macOS STL is more forgiving about tuples vs. pairs than the Linux one.

As currently written, the optimizer can completely remove the involvement of CustomString in some cases. We probably don’t want that, so let’s fix it.

Probably responsible for at least part of performance gap.

This change tests that, in string interpolation, code completion doesn’t suggest Void functions but suggests functions returning other types. This is a change in behavior, but I’m not convinced the old version was correct in the first place—why would you want to interpolate a function that returns Void? It’s valid but not useful, and code completion seems to try to avoid suggesting Void functions in other similar circumstances.

This already didn't work on 32-bit platforms; now it's not working on 64-bit Linux. Filed as SR-9008.

They will return, probably in a separate pull request; I just want to see how much they're inflating StringInterpolation.o's code size.

Gets most of the performance win without most of the code size loss.

Should avoid relying so heavily on the optimizer realizing it can remove branches. This had somewhat complicated performance impacts in local benchmarking; we’ll see what it does on CI.

This should allow us to separately control how appends are inlined for string interpolation.

beccadax mentioned this pull request Oct 20, 2018

[DNM] String interpolation rework #18590

Closed

beccadax force-pushed the interpolation-rework branch from 4033d0d to cbb1678 Compare October 23, 2018 01:24

beccadax force-pushed the interpolation-rework branch 2 times, most recently from 30b4055 to 878057b Compare October 25, 2018 18:38

xedin reviewed Oct 25, 2018

View reviewed changes

beccadax force-pushed the interpolation-rework branch from 878057b to 3e6ac59 Compare October 25, 2018 22:11

This comment has been minimized.

Sign in to view

beccadax and others added 27 commits October 31, 2018 20:58

Remove dead MemberLookupTable::addExtensionMembers()

6edf5ab

These methods are never used and appear to be holdovers from an earlier implementation.

Fix Linux-only build failure

dcf8c36

Apparently, the macOS STL is more forgiving about tuples vs. pairs than the Linux one.

Benchmark interpolation wtih custom types

5721732

Correct arguable CustomStringInterpolation benchmark flaw

a846c1a

As currently written, the optimizer can completely remove the involvement of CustomString in some cases. We probably don’t want that, so let’s fix it.

Add missing @inlinable annotation

d0fda51

Probably responsible for at least part of performance gap.

Test inlining an empty-string check

9060b9c

Inline the SmallString check for reserveCapacity()

13c002b

Update for function type changes

2cb546b

Update sil_location test for updated mangling

a7732fa

Use multiline string literals in examples

5f375c4

Remove inaccurate comments

6b9184e

Improve "informal requirement" comment

d6a265a

Test mixture of throwing and nonthrowing interpolations

6e690a2

Style fixes

a124952

Delete ExpressibleByStringInterpolation deprecation test

a9d6d2d

XFAIL Linux-only test failure

2acf62b

This already didn't work on 32-bit platforms; now it's not working on 64-bit Linux. Filed as SR-9008.

Tweak constraints to improve performance

3a2297d

Remove new string interpolation benchmarks

435880c

They will return, probably in a separate pull request; I just want to see how much they're inflating StringInterpolation.o's code size.

Inline only part of the append(_:) length check

4b31761

Gets most of the performance win without most of the code size loss.

Create interpolation strings with appropriate initial capacity

ed44e2f

Should avoid relying so heavily on the optimizer realizing it can remove branches. This had somewhat complicated performance impacts in local benchmarking; we’ll see what it does on CI.

Give DefaultStringInterpolation its own append paths

b52f807

This should allow us to separately control how appends are inlined for string interpolation.

Confess my (current) source and ABI stability sins

94260a9

Skip += fast path for literal segments, too

f23d1cd

beccadax force-pushed the interpolation-rework branch from e854689 to f23d1cd Compare November 1, 2018 04:35

beccadax mentioned this pull request Nov 1, 2018

Final: Implementation for SE-0228: Fix ExpressibleByStringInterpolation #20214

Merged

beccadax closed this Nov 1, 2018

Implementation for SE-0228: Fix ExpressibleByStringInterpolation #19963

Implementation for SE-0228: Fix ExpressibleByStringInterpolation #19963

Uh oh!

Conversation

beccadax commented Oct 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beccadax commented Oct 20, 2018

Uh oh!

beccadax commented Oct 20, 2018

Uh oh!

swift-ci commented Oct 20, 2018

Build comment file:

Summary for master full

Debug-batch

debug-batch brief

debug-batch detailed

Release

release brief

release detailed

Uh oh!

beccadax commented Oct 20, 2018

Uh oh!

swift-ci commented Oct 20, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: Swift libraries

Uh oh!

eeckstein commented Oct 22, 2018

Uh oh!

beccadax commented Oct 22, 2018

Uh oh!

beccadax commented Oct 22, 2018

Uh oh!

swift-ci commented Oct 22, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: Swift libraries

Uh oh!

beccadax commented Oct 22, 2018

Uh oh!

beccadax commented Oct 22, 2018

Uh oh!

swift-ci commented Oct 22, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: Swift libraries

Uh oh!

beccadax commented Oct 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beccadax commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milseman commented Oct 25, 2018

Uh oh!

beccadax commented Oct 25, 2018

Uh oh!

beccadax commented Oct 25, 2018

Uh oh!

beccadax commented Oct 25, 2018

Uh oh!

xedin Oct 25, 2018

Choose a reason for hiding this comment

Uh oh!

beccadax Oct 31, 2018

Choose a reason for hiding this comment

Uh oh!

beccadax commented Oct 20, 2018 •

edited

Loading

beccadax commented Oct 23, 2018 •

edited

Loading

beccadax commented Oct 25, 2018 •

edited

Loading

beccadax commented Oct 25, 2018 •

edited

Loading