SILOptimizer: a new optimization to hoist destroys of memory locations #26803

eeckstein · 2019-08-23T13:27:12Z

DestroyHoisting moves destroys of memory locations up the control flow as far as possible.
Beside destroy_addr, also "store [assign]" is considered a destroy, because is is equivalent to an destroy_addr + a "store [init]".
The main purpose of this optimization is to minimize copy-on-write operations for arrays, etc. Especially if such COW containers are used as enum payloads and modified in-place. E.g.

switch e {
  case .A(var arr):
    arr.append(x)
    self = .A(arr)
...

In such a case DestroyHoisting can move the destroy of the self-assignment up before the switch and thus let the array buffer only be single-referenced at the time of the append.

When we have ownership SIL throughout the pass pipeline this optimization will replace the current destroy hoisting optimization in CopyForwarding.
For now, this optimization only runs in the mandatory pipeline (but not for -Onone) where we already have ownership SIL.

https://bugs.swift.org/browse/SR-10605
rdar://problem/50463362

eeckstein · 2019-08-23T21:06:52Z

@swift-ci test

eeckstein · 2019-08-23T21:07:00Z

@swift-ci benchmark

swift-ci · 2019-08-23T22:12:45Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
DropWhileArray	25	35	+40.0%	0.71x
Dictionary4	154	201	+30.5%	0.77x
PrefixWhileArray	41	53	+29.3%	0.77x
Calculator	131	161	+22.9%	0.81x
Chars2	2950	3550	+20.3%	0.83x
String.replaceSubrange.RepChar	1696	1993	+17.5%	0.85x (?)
Set.isDisjoint.Int.Empty	53	61	+15.1%	0.87x
OpenClose	60	69	+15.0%	0.87x (?)
DropFirstSequence	29	33	+13.8%	0.88x (?)
DropFirstSequenceLazy	29	33	+13.8%	0.88x (?)
Dictionary4OfObjects	198	224	+13.1%	0.88x
CharacterLiteralsLarge	63	71	+12.7%	0.89x
RemoveWhereMoveInts	16	18	+12.5%	0.89x (?)
DataAppendBytesSmall	165	185	+12.1%	0.89x (?)
StringComparison_ascii	338	378	+11.8%	0.89x (?)
ProtocolDispatch	196	217	+10.7%	0.90x (?)
PointerArithmetics	21700	23900	+10.1%	0.91x
Set.isSubset.Seq.Int.Empty	119	129	+8.4%	0.92x (?)
Set.isStrictSubset.Int.Empty	52	56	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RemoveWhereFilterStrings	372	297	-20.2%	1.25x
Set.isSuperset.Seq.Box0	239	201	-15.9%	1.19x
RemoveWhereMoveStrings	354	299	-15.5%	1.18x
PrefixWhileAnyCollectionLazy	40	34	-15.0%	1.18x (?)
DropLastAnySeqCntRangeLazy	32318	27624	-14.5%	1.17x
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x
SequenceAlgosAnySequence	19200	16500	-14.1%	1.16x (?)
DropLastAnySeqCRangeIterLazy	32087	27641	-13.9%	1.16x (?)
SuffixAnySeqCntRangeLazy	31778	27448	-13.6%	1.16x
SuffixAnySeqCRangeIterLazy	32003	27703	-13.4%	1.16x
ObjectiveCBridgeFromNSArrayAnyObject	16400	14200	-13.4%	1.15x (?)
ObjectiveCBridgeStubFromNSStringRef	94	83	-11.7%	1.13x (?)
ObjectiveCBridgeStubNSDateRefAccess	196	174	-11.2%	1.13x (?)
StringEnumRawValueInitialization	400	360	-10.0%	1.11x (?)
DataAccessBytesMedium	50	45	-10.0%	1.11x
String.replaceSubrange.RepChar.Small	261	235	-10.0%	1.11x
ArrayAppendUTF16Substring	10440	9432	-9.7%	1.11x (?)
NopDeinit	31600	28600	-9.5%	1.10x (?)
ArrayAppendAsciiSubstring	10440	9468	-9.3%	1.10x (?)
ParseFloat.Float.Exp	11	10	-9.1%	1.10x (?)
SuffixAnyCollectionLazy	15768	14432	-8.5%	1.09x (?)
ObjectAllocation	101	93	-7.9%	1.09x (?)
LazilyFilteredArrayContains	22000	20300	-7.7%	1.08x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString	29200	27000	-7.5%	1.08x (?)
PrefixAnyCollectionLazy	47072	43564	-7.5%	1.08x (?)
MapReduceSequence	431	399	-7.4%	1.08x (?)
PrefixWhileAnySeqCntRange	190	176	-7.4%	1.08x (?)
DropLastAnyCollectionLazy	15639	14498	-7.3%	1.08x (?)
DropFirstAnyCollectionLazy	46895	43497	-7.2%	1.08x (?)
PrefixWhileAnySeqCRangeIter	189	176	-6.9%	1.07x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryKeysContains.o	9962	10139	+1.8%	0.98x
ErrorHandling.o	2959	3007	+1.6%	0.98x
StringMatch.o	4726	4774	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
DictOfArraysToArrayOfDicts.o	27184	26192	-3.6%	1.04x
ObserverUnappliedMethod.o	5102	4926	-3.4%	1.04x
RangeReplaceableCollectionPlusDefault.o	6198	6038	-2.6%	1.03x
BinaryFloatingPointConversionFromBinaryInteger.o	25360	24805	-2.2%	1.02x
LinkedList.o	2231	2183	-2.2%	1.02x
InsertCharacter.o	5336	5224	-2.1%	1.02x
COWTree.o	12610	12346	-2.1%	1.02x
DiffingMyers.o	8523	8347	-2.1%	1.02x
NSDictionaryCastToSwift.o	1685	1653	-1.9%	1.02x
ObserverForwarderStruct.o	3554	3490	-1.8%	1.02x
RemoveWhere.o	24591	24207	-1.6%	1.02x
RomanNumbers.o	8417	8321	-1.1%	1.01x
Substring.o	18473	18265	-1.1%	1.01x
CSVParsing.o	58616	57960	-1.1%	1.01x
StringRemoveDupes.o	7483	7403	-1.1%	1.01x
SequenceAlgos.o	20103	19895	-1.0%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DropLastArrayLazy	5	9	+80.0%	0.56x
ArrayAppendSequence	680	850	+25.0%	0.80x (?)
String.replaceSubrange.RepChar	1714	1965	+14.6%	0.87x (?)
DropWhileAnySeqCRangeIter	97	111	+14.4%	0.87x
PrefixWhileAnyCollection	99	112	+13.1%	0.88x
ObjectiveCBridgeStubNSDateRefAccess	174	196	+12.6%	0.89x (?)
Set.isDisjoint.Seq.Int.Empty	49	55	+12.2%	0.89x (?)
Calculator	133	148	+11.3%	0.90x (?)
Chars2	3200	3550	+10.9%	0.90x (?)
Set.isStrictSubset.Int.Empty	51	55	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RemoveWhereFilterStrings	373	299	-19.8%	1.25x
Set.isSuperset.Seq.Box0	246	203	-17.5%	1.21x
RemoveWhereMoveStrings	352	292	-17.0%	1.21x
DropLastAnyCollection	32	27	-15.6%	1.19x (?)
DropLastAnySeqCntRangeLazy	32057	27482	-14.3%	1.17x
SuffixAnySeqCntRangeLazy	31893	27375	-14.2%	1.17x
SequenceAlgosAnySequence	19200	16500	-14.1%	1.16x
PrefixWhileAnySeqCRangeIterLazy	94	81	-13.8%	1.16x
PrefixWhileAnySeqCntRangeLazy	94	81	-13.8%	1.16x
PrefixAnySeqCntRange	24429	21053	-13.8%	1.16x
DropLastAnySeqCRangeIterLazy	32021	27614	-13.8%	1.16x
SuffixAnySeqCRangeIterLazy	31987	27585	-13.8%	1.16x
DropFirstAnySeqCntRange	32116	27702	-13.7%	1.16x
DropFirstAnySeqCRangeIter	32078	27686	-13.7%	1.16x
PrefixAnySeqCRangeIter	24389	21053	-13.7%	1.16x
ObjectiveCBridgeFromNSArrayAnyObject	16400	14200	-13.4%	1.15x (?)
ArrayAppendLatin1Substring	11484	10116	-11.9%	1.14x (?)
ArrayAppendUTF16Substring	11268	9972	-11.5%	1.13x (?)
String.replaceSubrange.RepChar.Small	261	234	-10.3%	1.12x (?)
ArrayAppendAsciiSubstring	11268	10116	-10.2%	1.11x (?)
StringEnumRawValueInitialization	400	360	-10.0%	1.11x (?)
Set.isDisjoint.Int.Empty	61	55	-9.8%	1.11x (?)
Set.isDisjoint.Seq.Box.Empty	99	91	-8.1%	1.09x
ObjectiveCBridgeFromNSArrayAnyObjectToString	29200	26900	-7.9%	1.09x (?)
SuffixAnyCollectionLazy	15704	14506	-7.6%	1.08x (?)
PrefixAnyCollectionLazy	46891	43511	-7.2%	1.08x (?)
CharIteration_russian_unicodeScalars_Backwards	3360	3120	-7.1%	1.08x (?)
ObjectAllocation	122	114	-6.6%	1.07x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
Queue.o	13065	14361	+9.9%	0.91x
StringMatch.o	4530	4594	+1.4%	0.99x
ErrorHandling.o	2879	2911	+1.1%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
StringRemoveDupes.o	4769	4529	-5.0%	1.05x
InsertCharacter.o	4815	4671	-3.0%	1.03x
SortIntPyramids.o	12333	12013	-2.6%	1.03x
BinaryFloatingPointConversionFromBinaryInteger.o	25472	24853	-2.4%	1.02x
ObserverForwarderStruct.o	3706	3626	-2.2%	1.02x
ReversedCollections.o	8991	8799	-2.1%	1.02x
LinkedList.o	2248	2200	-2.1%	1.02x
RangeReplaceableCollectionPlusDefault.o	5654	5542	-2.0%	1.02x
DictOfArraysToArrayOfDicts.o	24898	24514	-1.5%	1.02x
ObserverUnappliedMethod.o	5246	5166	-1.5%	1.02x
CSVParsing.o	57940	57252	-1.2%	1.01x
DictTest3.o	13594	13450	-1.1%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateMutation	348	457	+31.3%	0.76x
ArrayAppendReserved	2190	2700	+23.3%	0.81x
ArrayAppend	2440	2960	+21.3%	0.82x (?)
DataSubscriptMedium	45	54	+20.0%	0.83x
BinaryFloatingPointPropertiesNextUp	83	98	+18.1%	0.85x
DataCreateEmpty	650	760	+16.9%	0.86x (?)
BinaryFloatingPointPropertiesBinade	50	58	+16.0%	0.86x
String.replaceSubrange.RepChar	1773	2036	+14.8%	0.87x (?)
Memset	7217	8196	+13.6%	0.88x
PrefixCountableRange	287	323	+12.5%	0.89x
DropFirstCountableRange	287	323	+12.5%	0.89x
ObjectiveCBridgeStubToNSDateRef	2780	3100	+11.5%	0.90x (?)
SuffixCountableRange	99	110	+11.1%	0.90x
DropLastCountableRange	100	110	+10.0%	0.91x (?)
ObjectiveCBridgeStringHash	43	47	+9.3%	0.91x (?)
DataSubscriptSmall	69	75	+8.7%	0.92x (?)
ArraySetElement	869	942	+8.4%	0.92x
BinaryFloatingPointPropertiesUlp	87	94	+8.0%	0.93x (?)
Sim2DArray	6168	6662	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.Long	3611	2688	-25.6%	1.34x
NSStringConversion.Medium	1400	1088	-22.3%	1.29x
NSStringConversion.LongUTF8	1349	1081	-19.9%	1.25x
Dict.FilterAllMatch.24k	2915	2369	-18.7%	1.23x
ArrayAppendRepeatCol	241480	196670	-18.6%	1.23x
Data.init.Sequence.64kB.Count.RE	19861	16234	-18.3%	1.22x
Data.init.Sequence.809B.Count.RE	24592	20108	-18.2%	1.22x
Dict.FilterAllMatch.20k	2502	2048	-18.1%	1.22x
Data.append.Sequence.64kB.Count.RE	19850	16265	-18.1%	1.22x
Data.append.Sequence.64kB.Count.RE.I	19806	16234	-18.0%	1.22x
Data.append.Sequence.809B.Count.RE.I	24609	20179	-18.0%	1.22x
Data.append.Sequence.809B.Count.RE	24533	20123	-18.0%	1.22x
Data.init.Sequence.64kB.Count.RE.I	19814	16268	-17.9%	1.22x
Data.init.Sequence.809B.Count.RE.I	24587	20196	-17.9%	1.22x
Data.init.Sequence.64kB.Count0.RE.I	19754	16235	-17.8%	1.22x
Data.init.Sequence.809B.Count0.RE.I	24659	20271	-17.8%	1.22x
Data.append.Sequence.64kB.Count0.RE	19714	16214	-17.8%	1.22x
Data.init.Sequence.64kB.Count0.RE	19692	16222	-17.6%	1.21x
DataAppendSequence	2444600	2015500	-17.6%	1.21x
Data.append.Sequence.64kB.Count0.RE.I	19683	16251	-17.4%	1.21x
Data.append.Sequence.809B.Count0.RE.I	24479	20251	-17.3%	1.21x
StringBuilderWithLongSubstring	2030	1680	-17.2%	1.21x (?)
SetSubtractingInt0	250	207	-17.2%	1.21x
Dict.FilterAllMatch.16k	2125	1762	-17.1%	1.21x
Data.init.Sequence.809B.Count0.RE	24513	20342	-17.0%	1.21x
Data.append.Sequence.809B.Count0.RE	24528	20366	-17.0%	1.20x
StringMatch	35300	29600	-16.1%	1.19x
SuffixArrayLazy	26851	22520	-16.1%	1.19x
Dict.FilterAllMatch.28k	3955	3333	-15.7%	1.19x (?)
DropLastArrayLazy	26708	22539	-15.6%	1.18x
DropWhileAnyCollection	32469	27412	-15.6%	1.18x
PrefixWhileAnyCollection	47849	40445	-15.5%	1.18x
PrefixWhileCountableRange	23533	19903	-15.4%	1.18x
DropFirstAnySeqCRangeIterLazy	32757	27750	-15.3%	1.18x
PrefixAnyCollection	24545	20814	-15.2%	1.18x
SuffixAnySeqCRangeIter	36825	31228	-15.2%	1.18x
PrefixAnySeqCRangeIterLazy	24606	20874	-15.2%	1.18x (?)
SetSubtractingInt25	317	269	-15.1%	1.18x
SequenceAlgosRange	1976130	1678200	-15.1%	1.18x
DropWhileAnySeqCntRange	35054	29785	-15.0%	1.18x
DropFirstAnySeqCRangeIter	32744	27829	-15.0%	1.18x
SuffixAnySeqCntRange	36485	31027	-15.0%	1.18x
DropFirstAnySeqCntRangeLazy	32844	27936	-14.9%	1.18x
SequenceAlgosAnySequence	20100	17100	-14.9%	1.18x
DropFirstAnyCollection	24559	20896	-14.9%	1.18x
PrefixArrayLazy	79975	68051	-14.9%	1.18x (?)
DropFirstAnySeqCntRange	32723	27858	-14.9%	1.17x
PrefixAnySeqCRangeIter	24848	21157	-14.9%	1.17x
DropWhileCountableRange	8049	6854	-14.8%	1.17x (?)
DropLastAnyCollection	8192	6980	-14.8%	1.17x
PrefixAnySeqCntRangeLazy	24507	20898	-14.7%	1.17x
SuffixAnySeqCntRangeLazy	36708	31307	-14.7%	1.17x
SuffixAnySeqCRangeIterLazy	37090	31664	-14.6%	1.17x
DropWhileCountableRangeLazy	33273	28406	-14.6%	1.17x
DropWhileAnySeqCntRangeLazy	33726	28794	-14.6%	1.17x
DropWhileAnySeqCRangeIter	34856	29779	-14.6%	1.17x
LazilyFilteredRange	822850	703180	-14.5%	1.17x
DropLastAnySeqCntRange	37555	32096	-14.5%	1.17x
RandomIntegersLCG	31153	26627	-14.5%	1.17x
SuffixAnyCollection	8168	6992	-14.4%	1.17x
DropLastAnySeqCntRangeLazy	37776	32354	-14.4%	1.17x
SuffixCountableRangeLazy	29171	24992	-14.3%	1.17x
DropLastCountableRangeLazy	29016	24896	-14.2%	1.17x
DropFirstCountableRangeLazy	87140	74767	-14.2%	1.17x
DropLastAnySeqCRangeIter	37569	32253	-14.1%	1.16x
PrefixWhileAnySeqCRangeIterLazy	26532	22782	-14.1%	1.16x
DropWhileAnyCollectionLazy	33591	28850	-14.1%	1.16x
DropWhileAnySeqCRangeIterLazy	33568	28837	-14.1%	1.16x
DropFirstArrayLazy	79637	68439	-14.1%	1.16x
PrefixAnySeqCntRange	24881	21389	-14.0%	1.16x
PrefixWhileAnySeqCntRangeLazy	26407	22794	-13.7%	1.16x
DropLastAnySeqCRangeIterLazy	37731	32596	-13.6%	1.16x
PrefixWhileCountableRangeLazy	25988	22476	-13.5%	1.16x
PrefixWhileAnyCollectionLazy	26449	22891	-13.5%	1.16x
RandomDoubleLCG	39298	34026	-13.4%	1.15x
PrefixWhileAnySeqCntRange	37555	32564	-13.3%	1.15x (?)
NSStringConversion.UTF8	1732	1505	-13.1%	1.15x
PrefixCountableRangeLazy	87709	76412	-12.9%	1.15x
StringUTF16SubstringBuilder	11520	10060	-12.7%	1.15x (?)
PrefixWhileAnySeqCRangeIter	37253	32590	-12.5%	1.14x
ObjectiveCBridgeFromNSArrayAnyObject	17700	15500	-12.4%	1.14x (?)
DropLastAnyCollectionLazy	47088	41264	-12.4%	1.14x
SuffixAnyCollectionLazy	46734	40996	-12.3%	1.14x
PrefixAnyCollectionLazy	141682	124386	-12.2%	1.14x
SetSubtractingInt50	353	312	-11.6%	1.13x
DropFirstAnyCollectionLazy	140899	124827	-11.4%	1.13x (?)
DictionaryGroup	5280	4698	-11.0%	1.12x (?)
String.replaceSubrange.RepChar.Small	269	240	-10.8%	1.12x (?)
SetSubtractingInt100	416	375	-9.9%	1.11x (?)
CSVParsingAltIndices2	7392	6666	-9.8%	1.11x (?)
Set.subtracting.Seq.Int0	4759	4298	-9.7%	1.11x (?)
RemoveWhereFilterStrings	2844	2571	-9.6%	1.11x (?)
RomanNumbers2	7120	6462	-9.2%	1.10x (?)
Set.isStrictSuperset.Seq.Box0	70692	64159	-9.2%	1.10x
Set.isStrictSuperset.Seq.Box25	7149	6489	-9.2%	1.10x
Set.isSubset.Seq.Box0	7118	6463	-9.2%	1.10x
RemoveWhereQuadraticInts	8728	7926	-9.2%	1.10x (?)
Set.isStrictSubset.Seq.Box0	7117	6468	-9.1%	1.10x
Set.isSuperset.Seq.Box0	1629	1484	-8.9%	1.10x (?)
Set.isSuperset.Seq.Int.Empty	1312	1196	-8.8%	1.10x (?)
ParseInt.Large.Binary	8516	7774	-8.7%	1.10x
CSVParsing.Scalar	495	452	-8.7%	1.10x (?)
Set.isStrictSubset.Seq.Box25	7605	6950	-8.6%	1.09x
Set.subtracting.Seq.Int25	5016	4584	-8.6%	1.09x
ParseInt.Large.UncommonRadix	3361	3072	-8.6%	1.09x (?)
ParseInt.Large.Hex	2674	2446	-8.5%	1.09x
Set.isSuperset.Seq.Int0	1400	1281	-8.5%	1.09x (?)
RemoveWhereQuadraticStrings	9757	8929	-8.5%	1.09x (?)
ParseInt.Large.Decimal	2914	2667	-8.5%	1.09x (?)
Set.isSubset.Seq.Box25	7613	6970	-8.4%	1.09x
Set.subtracting.Seq.Box0	6135	5617	-8.4%	1.09x
RandomDoubleDef	55100	50500	-8.3%	1.09x (?)
Set.subtracting.Seq.Int50	5187	4754	-8.3%	1.09x (?)
SumUsingReduceInto	202010	185163	-8.3%	1.09x (?)
CSVParsing.UTF16	450	413	-8.2%	1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString	30600	28100	-8.2%	1.09x (?)
CSVParsing.UTF8	482	443	-8.1%	1.09x (?)
SuffixArray	3777	3476	-8.0%	1.09x (?)
PrefixArray	11355	10452	-8.0%	1.09x (?)
Set.subtracting.Seq.Int100	5620	5184	-7.8%	1.08x (?)
ReversedArray2	19340	17849	-7.7%	1.08x (?)
ParseInt.Small.Binary	16859	15575	-7.6%	1.08x
RemoveWhereQuadraticString	2726	2519	-7.6%	1.08x (?)
DropFirstArray	11349	10496	-7.5%	1.08x (?)
DropLastArray	3768	3487	-7.5%	1.08x (?)
Set.subtracting.Seq.Box25	6940	6424	-7.4%	1.08x
StrToInt	42140	39010	-7.4%	1.08x
StringEdits	125700	116700	-7.2%	1.08x (?)
SequenceAlgosContiguousArray	947360	879610	-7.2%	1.08x (?)
Set.isDisjoint.Seq.Int50	2226	2067	-7.1%	1.08x (?)
Set.isSuperset.Seq.Int50	2175	2020	-7.1%	1.08x (?)
Set.isSuperset.Seq.Int100	4343	4034	-7.1%	1.08x (?)
ParseInt.Small.Hex	9501	8829	-7.1%	1.08x (?)
Set.isDisjoint.Seq.Int25	3307	3076	-7.0%	1.08x (?)
Set.intersection.Seq.Box0	1807	1681	-7.0%	1.07x (?)
SequenceAlgosArray	1005420	935500	-7.0%	1.07x (?)
DictionaryCompactMapValuesOfCastValue	41904	39042	-6.8%	1.07x (?)
ParseInt.Small.Decimal	9635	8978	-6.8%	1.07x (?)
Set.intersection.Seq.Box25	2139	1994	-6.8%	1.07x (?)
SumUsingReduce	203892	190108	-6.8%	1.07x (?)
Breadcrumbs.MutatedUTF16ToIdx.ASCII	15	14	-6.7%	1.07x (?)
Breadcrumbs.UTF16ToIdxRange.longASCII	241	225	-6.6%	1.07x (?)
UTF8Decode	36951	34506	-6.6%	1.07x (?)

Code size: -swiftlibs

Regression	OLD	NEW	DELTA	RATIO
libswiftStdlibUnittest.dylib	356352	368640	+3.4%	0.97x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

swift-ci · 2019-08-23T23:00:31Z

Build failed
Swift Test OS X Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

swift-ci · 2019-08-24T00:10:38Z

Build failed
Swift Test Linux Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

eeckstein · 2019-08-24T17:22:19Z

@swift-ci test

swift-ci · 2019-08-24T18:38:39Z

Build failed
Swift Test Linux Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

swift-ci · 2019-08-24T19:15:59Z

Build failed
Swift Test OS X Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

atrick

For this review I'll need to assume that general destroy_addr hoisting and splitting is something we will want at -O. Initializing the MemoryLocation data and global dataflow state needed simultanesouly solve for all memory sublocations in the function is nontrivial. It would be gratuitous to do that just to handle a specific switch_enum pattern that might occur somewhere in that function. But I can instead view this as an eventual replacement for DestroyHoisting in CopyForwarding--that was originally designed as a -Onone peephole, which we later decided not to enable at -Onone for fear of affecting debugging. Our intention (for the past few years) was to just replace CopyForwarding/DestroyHoisting with Opaque Values and CopyPropagation. Assuming that happens, I wasn't sure under what circumstance we would still need destroy_addr hoisting. The problem of switching over a mutable enum is at least one good example of why destroy_addr hoisting will remain useful. I have a lingering concern because it should be possible to mutate a CoW enum payload in place at -Onone. For that, maybe we'll just need to wait for "move" semantics in the language.

This approach is also very different because it aggresively splits destroys. I've always assumed that canonical SIL should have combined destroys so it's easy to discover the lifetime of aggregates.

Setting all that aside, and assuming we just want to hoist and split all destroy_addr instructions at -O, this is a beautiful way to accomplish that. It will be great to have a complete solution that doesn't depend on a particular shape of SIL. The implementation in this PR is very clean and high quality.

I like the way that MemoryLocations and MemoryDataflow can be used as utilities now, each with independent state and lifetime, without being wedged into some inheritance pattern.

On the algorithm:

In the top-level comments, there's no explanation of how aggregates and their sublocations are handled. But that's really essential to why this approach was taken, rather than a more efficient per-variable lifetime discovery. I think of this design as a "destroy splitting" approach as much as destroy hoisting. Can you explain that it splits destroys into many smaller destroy operations, and why that would usually be desirable?

use(pair.first)
use(pair.second)
// any side effect
destroy(pair)

becomes:

use(pair.first)
destroy(pair.first)
use(pair.second)
destroy(pair.second)

Isn't this potentially a code size problem?

Also, this seems contrary to most other SIL analyses where the code tries to recognize aggregate value lifetimes as a single unit. For example, I'm not sure CopyForwarding will even kick in after this hoisting (and destroy splitting) is done.

On analysis invalidation:

This pass mutates the IR while querying the MemoryDataflow and MemoryLocations. For functions that do this, there should be a note explaining why this is guaranteed safe. I'm sure you thought about it, but I didn't try to prove to myself why it's safe. When other developers modify this code or cargo cult it, I want them to understand that this is normally something you need to think very carefully about. In particular explain why registerProjection is necessary and sufficient (I still wouldn't be able to explain that).

On performance:

It would be interesting to know why destroy hoisting causes significant regressions only to understand if there is some later optimization that currently depends on the shape of those destroys--maybe that optimization needs to be fixed.

I'm also baffled by the Onone performance gains that were reported. Seeing that, I initially assumed you had enabled this at Onone. But clearly these changes should not affect Onone, so what's going on?

On testing:

I'm guessing you already did this, but want to point out that store splitting should be fully tested (on SCK at least) without applying the isSplittingEnabler filter.

On compile time:

I think this style of dataflow can become expensive. For compile time, it's the pathological cases that matter, not the typical case. I usually see the cost show up as time needed to (re)initialize all the per-block bitsets. It will just boil down to how many memory locations need to be tracked and how often it gets recomputed. The way our pipeline works, analyses can easily be recomputed thousands of times. I'm not too worried about recomputing lifetimes yet since it's only being used a single function pass for now. But I'm afraid that running before SROA, and without opaque values means tracking lots of locations for no reason.

I'm sure, initially, the cost is hidden by other gratuitously costly parts of the pipeline: type checker, ARC, RLE, DSE, LLVM. But can you determine the worst case time in SCK, or maybe count the amount of worst-case allocation? If it shows up at all, we might want to filter the address producers to avoid allocating locations that are obviously going to be SROA'd anyway, or obviously can't be optimized, like address-only types. Alternatively, if you just implement a simple location filter and determine that it then never allocates more than 64 locations, there's nothing at all to worry about.

atrick · 2019-08-26T16:50:57Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+  Bits usedLocs(locations.getNumLocations());
+
+  // Initialize the dataflow, which tells us which destroy_addr instructions are
+  // reachable through a switch_enum, without a use of the location inbetween.


Please add brief comments explaining how this problem maps to the foward data flow: e.g.

Compute forward reaching split-enable-points. The split-enable-point is the first
program point at which it is useful to hoist the destroy. genSet is initialized
at each memory location's split-enable-point. killSet are the points
where the location is used. If a location is used on some path from
the gen-point to the destroy, then the destroy will not be hoisted
beyond that point. If the location is used on all paths, the destroy
will not be hoisted at all. If it is not used on some path, it will
be hoisted to the latest use points. Hence the use of
solveForwardWithUnion.

atrick · 2019-08-26T16:53:48Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+// This enables the switch-enum optimization (see above).
+// TODO: investigate the benchmark regressions and enable store-splitting more
+//       aggressively.
+void DestroyHoisting::splitStores(MemoryDataflow &dataFlow) {


To me, splitting a store means breaking it into multiple narrower stores. This can be called either decomposition or expansion, take your pick.

atrick · 2019-08-26T16:54:09Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+//    destroy_addr %a
+//    store %s to [init] %a
+//
+// This is not a good thing in general. If we would do it unconditionally, it


I'm very curious why it's bad to split the stores. I'm guessing you are too and just haven't investigated yet...

atrick · 2019-08-26T16:54:38Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+}
+
+// Initialize the dataflow for moving destroys up the control flow.
+void DestroyHoisting::initDataflow(MemoryDataflow &dataFlow) {


Again, I'd like a brief comment somewhere that maps the problem to the dataflow.

Compute backward reaching destroys. genSet is initialized at the
destroy. killSet are the points where locations are used.

atrick · 2019-08-26T16:55:22Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+// Handling blocks which eventually end up in an unreachable is surprisingly
+// complicated. The conservative approach would be to assume all locations are
+// dead at the unreachable. But that would block release hoisting whenever there
+// is a unreachable block in the way (which is pretty common for calling


I'm confused. Do you mean that we actually want to hoist real destroy_addr's that are on the unreachable path? So, if you pretend there's a destroy at the exit that will somehow prevent hoisting of the real destroy? I don't know what that would happen. Could you clarify this in the comments?

atrick · 2019-08-26T16:56:16Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+  if (!dataFlow.getState(singlePred)->exitReachable)
+    return false;
+
+  // Check if none of the locations are touched in the unreachable-block.


This comment illustrates why I prefer per-variable, path discovery data flow...

atrick · 2019-08-26T16:56:44Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+// point of \p builder.
+SILValue DestroyHoisting::createAddress(unsigned locIdx, SILBuilder &builder) {
+  auto *loc = locations.getLocation(locIdx);
+  if (loc->parentIdx < 0)


It might be worthwhile to have a MemoryLocationIndex wrapper so we don't need to guess what it means for a location to be negative.

atrick · 2019-08-26T16:57:33Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+  // Recursively create (or get) the address of the parent location.
+  SILValue baseAddr = createAddress(loc->parentIdx, builder);
+  auto *projInst = cast<SingleValueInstruction>(loc->representativeValue);
+  if (!isa<BeginAccessInst>(loc->representativeValue) &&


Why can't you handle beginAccess addresses? No matter where the address came from, the code always assumes that all address users are within the access scope.

atrick · 2019-08-26T16:57:55Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+    domTree = DA->get(function);
+
+  // Recursively create (or get) the address of the parent location.
+  SILValue baseAddr = createAddress(loc->parentIdx, builder);


Why is baseAddr created before checking if loc->representativeValue is available?

atrick · 2019-08-26T16:58:49Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+      // The function entry block: insert all destroys which are still active.
+      insertDestroys(activeDestroys, activeDestroys, toRemove,
+                     nullptr, block);
+    } else if (SILBasicBlock *pred = block->getSinglePredecessorBlock()) {


Please add a note that this condition assumes no critical edges are present.

eeckstein · 2019-08-27T08:41:49Z

@atrick Thanks for reviewing!
I will address your specific comments on the code in a new version (I also have to fix a bug).
Here are some answers to your top-level comments:

First of all: this optimization does not split or combine destroys. It only moves destroys. I will add a comment explaining this.

analysis invalidation: I'll add comments

Onone performance: It's like with all other optimizations which run with -O/-Osize: the -Onone benchmarks mainly call generic code in the (optimized) stdlib. So optimization differences in the stdlib will be reflected in -Onone performance.

I investigated some of the performance regressions and it turned out again, that many of them are false alarms. The code is identical, but different code alignment causes the perf differences.

testing: yes, I tested the benchmarks with unconditional store splitting. I can also run the validation tests.

compile time: it's a good idea to get statistics on the max size of the bitfields. I'll do that.

… files and directories. Replacing the TemporaryFile/TemporaryDirectory classes with withTemporaryFile/withTemporaryDirectory functions. Using classes+deinit for resource management is problematic because the lifetime of classes is not guaranteed to be the lexical scope. Compiler optimizations may shrink the lifetime to the last use. One way to fix this is would be to make sure that the TemporaryFile class instances are referenced as long as the managed resource (= the file or directory) is in use. That could be done e.g. with stdlib's withExtendedLifetime function. But that would be cumbersome. The better way to fix this is to change the ABI to a closure based approach (there are several examples for this in the swift stdlib, e.g. withExtendedLifetime). It does not only fix the lifetime problem, but also makes the lifetime it immediately visible in the source code. Also, it avoids the memory allocation for allocating the class instance. This change is triggered by a new compiler optimization: swiftlang/swift#26803. With this optimization the temporary file in the ManifestLoader.parse() function would break. Although it would be easy to workaround the problem just in ManifestLoader.parse(), I decided to do the right fix, which is more future prove. Implementation note: I kept the TemporaryFile as a struct, because it contains several useful fields. It's passed as argument to the closure. In case of withTemporaryDirectory, only the directory path is relevant, which is now directly passed to the closure. So no need of a TemporaryDirectory struct.

atrick · 2019-08-27T17:00:10Z

@eeckstein I'm much more comfortable with this approach now that I see it's not splitting destroys. This is really hidden though. I read the code as if getUsedLocationsOfInst did what it said and only reported locations used by that instruction. I see that under the hood, it's calling getUsedLocationsOfAddr which very sneakily calls loc->selfAndParents. There should be at least a prominent comment on getUsedLocationsOfInst since it isn't clear from the function body. It's also helpful to explain this in the top-level description of the algorithm since it's really central to what this pass does.

… more flexible. Add the possibility to solve with a custom join operator.

It causes a use-after-free problem with more aggressive destroy hoisting in the compiler. Also replace tabs with spaces in this file.

eeckstein · 2019-08-28T10:26:26Z

@swift-ci test

swift-ci · 2019-08-28T10:36:21Z

Build failed
Swift Test OS X Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

eeckstein · 2019-08-28T10:41:45Z

@swift-ci test macOS

eeckstein · 2019-08-28T10:42:16Z

@swift-ci test macOS

swift-ci · 2019-08-28T10:43:49Z

Build failed
Swift Test OS X Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

swift-ci · 2019-08-28T11:28:48Z

Build failed
Swift Test Linux Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

DestroyHoisting moves destroys of memory locations up the control flow as far as possible. Beside destroy_addr, also "store [assign]" is considered a destroy, because is is equivalent to an destroy_addr + a "store [init]". The main purpose of this optimization is to minimize copy-on-write operations for arrays, etc. Especially if such COW containers are used as enum payloads and modified in-place. E.g. switch e { case .A(var arr): arr.append(x) self = .A(arr) ... In such a case DestroyHoisting can move the destroy of the self-assignment up before the switch and thus let the array buffer only be single-referenced at the time of the append. When we have ownership SIL throughout the pass pipeline this optimization will replace the current destroy hoisting optimization in CopyForwarding. For now, this optimization only runs in the mandatory pipeline (but not for -Onone) where we already have ownership SIL. SR-10605 rdar://problem/50463362

eeckstein · 2019-08-28T13:41:35Z

@swift-ci test

eeckstein · 2019-08-28T13:42:16Z

@swift-ci test

swift-ci · 2019-08-28T13:51:36Z

Build failed
Swift Test OS X Platform
Git Sha - 0e08976

eeckstein · 2019-08-28T15:26:31Z

@swift-ci test macOS

… files and directories. Replacing the TemporaryFile/TemporaryDirectory classes with withTemporaryFile/withTemporaryDirectory functions. Using classes+deinit for resource management is problematic because the lifetime of classes is not guaranteed to be the lexical scope. Compiler optimizations may shrink the lifetime to the last use. One way to fix this is would be to make sure that the TemporaryFile class instances are referenced as long as the managed resource (= the file or directory) is in use. That could be done e.g. with stdlib's withExtendedLifetime function. But that would be cumbersome. The better way to fix this is to change the ABI to a closure based approach (there are several examples for this in the swift stdlib, e.g. withExtendedLifetime). It does not only fix the lifetime problem, but also makes the lifetime it immediately visible in the source code. Also, it avoids the memory allocation for allocating the class instance. This change is triggered by a new compiler optimization: swiftlang/swift#26803. With this optimization the temporary file in the ManifestLoader.parse() function would break. Although it would be easy to workaround the problem just in ManifestLoader.parse(), I decided to do the right fix, which is more future prove. Implementation note: I kept the TemporaryFile as a struct, because it contains several useful fields. It's passed as argument to the closure. In case of withTemporaryDirectory, only the directory path is relevant, which is now directly passed to the closure. So no need of a TemporaryDirectory struct.

zoecarver · 2019-11-18T17:52:01Z

lib/SILOptimizer/Transforms/DestroyHoisting.cpp

+
+} // end anonymous namespace
+
+SILTransform *swift::createDestroyHoisting() {


AFAICT this function is never called...

Edit: nevermind. I see it is used in macros (with Passes.def).

eeckstein requested a review from atrick August 23, 2019 13:27

eeckstein requested a review from gottesmm August 26, 2019 09:11

atrick approved these changes Aug 26, 2019

View reviewed changes

BasThomas mentioned this pull request Aug 27, 2019

[142] Issue #142 - September 5, 2019 SwiftWeekly/swiftweekly.github.io#476

Closed

eeckstein mentioned this pull request Aug 27, 2019

[Basic] Fix the lifetime problems of utilities for creating temporary files and directories. swiftlang/swift-package-manager#2318

Merged

eeckstein added 4 commits August 28, 2019 12:14

tests: fix a check line in the prespecialization test

f0f95f1

SILOptimizer: handle copy_addr [take] in TempRValueOpt

8b11e7c

MemoryLifetime: Make the dataflow solving functions of MemoryDataflow…

9e67408

… more flexible. Add the possibility to solve with a custom join operator.

tests: fix array lifetime problem in the DispatchData test.

34e80b7

It causes a use-after-free problem with more aggressive destroy hoisting in the compiler. Also replace tabs with spaces in this file.

eeckstein force-pushed the destroy-hoisting branch from 9bc68b0 to 2d447be Compare August 28, 2019 10:26

eeckstein force-pushed the destroy-hoisting branch from 2d447be to 0e08976 Compare August 28, 2019 13:41

eeckstein merged commit 8ea5df1 into swiftlang:master Aug 28, 2019

eeckstein deleted the destroy-hoisting branch August 28, 2019 17:49

zoecarver reviewed Nov 18, 2019

View reviewed changes


		} // end anonymous namespace

		SILTransform *swift::createDestroyHoisting() {

SILOptimizer: a new optimization to hoist destroys of memory locations #26803

SILOptimizer: a new optimization to hoist destroys of memory locations #26803

Uh oh!

Conversation

eeckstein commented Aug 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Aug 23, 2019

Uh oh!

eeckstein commented Aug 23, 2019

Uh oh!

swift-ci commented Aug 23, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Aug 23, 2019

Uh oh!

swift-ci commented Aug 24, 2019

Uh oh!

eeckstein commented Aug 24, 2019

Uh oh!

swift-ci commented Aug 24, 2019

Uh oh!

swift-ci commented Aug 24, 2019

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Aug 27, 2019

Uh oh!

atrick commented Aug 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

swift-ci commented Aug 28, 2019

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

swift-ci commented Aug 28, 2019

Uh oh!

swift-ci commented Aug 28, 2019

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

swift-ci commented Aug 28, 2019

Uh oh!

eeckstein commented Aug 28, 2019

Uh oh!

zoecarver Nov 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

eeckstein commented Aug 23, 2019 •

edited

Loading

atrick commented Aug 27, 2019 •

edited

Loading

zoecarver Nov 18, 2019 •

edited

Loading