Skip to content

SILOptimizer: a new optimization to hoist destroys of memory locations #26803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 28, 2019

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented Aug 23, 2019

DestroyHoisting moves destroys of memory locations up the control flow as far as possible.
Beside destroy_addr, also "store [assign]" is considered a destroy, because is is equivalent to an destroy_addr + a "store [init]".
The main purpose of this optimization is to minimize copy-on-write operations for arrays, etc. Especially if such COW containers are used as enum payloads and modified in-place. E.g.

switch e {
  case .A(var arr):
    arr.append(x)
    self = .A(arr)
...

In such a case DestroyHoisting can move the destroy of the self-assignment up before the switch and thus let the array buffer only be single-referenced at the time of the append.

When we have ownership SIL throughout the pass pipeline this optimization will replace the current destroy hoisting optimization in CopyForwarding.
For now, this optimization only runs in the mandatory pipeline (but not for -Onone) where we already have ownership SIL.

https://bugs.swift.org/browse/SR-10605
rdar://problem/50463362

@eeckstein eeckstein requested a review from atrick August 23, 2019 13:27
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
DropWhileArray 25 35 +40.0% 0.71x
Dictionary4 154 201 +30.5% 0.77x
PrefixWhileArray 41 53 +29.3% 0.77x
Calculator 131 161 +22.9% 0.81x
Chars2 2950 3550 +20.3% 0.83x
String.replaceSubrange.RepChar 1696 1993 +17.5% 0.85x (?)
Set.isDisjoint.Int.Empty 53 61 +15.1% 0.87x
OpenClose 60 69 +15.0% 0.87x (?)
DropFirstSequence 29 33 +13.8% 0.88x (?)
DropFirstSequenceLazy 29 33 +13.8% 0.88x (?)
Dictionary4OfObjects 198 224 +13.1% 0.88x
CharacterLiteralsLarge 63 71 +12.7% 0.89x
RemoveWhereMoveInts 16 18 +12.5% 0.89x (?)
DataAppendBytesSmall 165 185 +12.1% 0.89x (?)
StringComparison_ascii 338 378 +11.8% 0.89x (?)
ProtocolDispatch 196 217 +10.7% 0.90x (?)
PointerArithmetics 21700 23900 +10.1% 0.91x
Set.isSubset.Seq.Int.Empty 119 129 +8.4% 0.92x (?)
Set.isStrictSubset.Int.Empty 52 56 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
RemoveWhereFilterStrings 372 297 -20.2% 1.25x
Set.isSuperset.Seq.Box0 239 201 -15.9% 1.19x
RemoveWhereMoveStrings 354 299 -15.5% 1.18x
PrefixWhileAnyCollectionLazy 40 34 -15.0% 1.18x (?)
DropLastAnySeqCntRangeLazy 32318 27624 -14.5% 1.17x
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x
SequenceAlgosAnySequence 19200 16500 -14.1% 1.16x (?)
DropLastAnySeqCRangeIterLazy 32087 27641 -13.9% 1.16x (?)
SuffixAnySeqCntRangeLazy 31778 27448 -13.6% 1.16x
SuffixAnySeqCRangeIterLazy 32003 27703 -13.4% 1.16x
ObjectiveCBridgeFromNSArrayAnyObject 16400 14200 -13.4% 1.15x (?)
ObjectiveCBridgeStubFromNSStringRef 94 83 -11.7% 1.13x (?)
ObjectiveCBridgeStubNSDateRefAccess 196 174 -11.2% 1.13x (?)
StringEnumRawValueInitialization 400 360 -10.0% 1.11x (?)
DataAccessBytesMedium 50 45 -10.0% 1.11x
String.replaceSubrange.RepChar.Small 261 235 -10.0% 1.11x
ArrayAppendUTF16Substring 10440 9432 -9.7% 1.11x (?)
NopDeinit 31600 28600 -9.5% 1.10x (?)
ArrayAppendAsciiSubstring 10440 9468 -9.3% 1.10x (?)
ParseFloat.Float.Exp 11 10 -9.1% 1.10x (?)
SuffixAnyCollectionLazy 15768 14432 -8.5% 1.09x (?)
ObjectAllocation 101 93 -7.9% 1.09x (?)
LazilyFilteredArrayContains 22000 20300 -7.7% 1.08x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString 29200 27000 -7.5% 1.08x (?)
PrefixAnyCollectionLazy 47072 43564 -7.5% 1.08x (?)
MapReduceSequence 431 399 -7.4% 1.08x (?)
PrefixWhileAnySeqCntRange 190 176 -7.4% 1.08x (?)
DropLastAnyCollectionLazy 15639 14498 -7.3% 1.08x (?)
DropFirstAnyCollectionLazy 46895 43497 -7.2% 1.08x (?)
PrefixWhileAnySeqCRangeIter 189 176 -6.9% 1.07x

Code size: -O

Regression OLD NEW DELTA RATIO
DictionaryKeysContains.o 9962 10139 +1.8% 0.98x
ErrorHandling.o 2959 3007 +1.6% 0.98x
StringMatch.o 4726 4774 +1.0% 0.99x
 
Improvement OLD NEW DELTA RATIO
DictOfArraysToArrayOfDicts.o 27184 26192 -3.6% 1.04x
ObserverUnappliedMethod.o 5102 4926 -3.4% 1.04x
RangeReplaceableCollectionPlusDefault.o 6198 6038 -2.6% 1.03x
BinaryFloatingPointConversionFromBinaryInteger.o 25360 24805 -2.2% 1.02x
LinkedList.o 2231 2183 -2.2% 1.02x
InsertCharacter.o 5336 5224 -2.1% 1.02x
COWTree.o 12610 12346 -2.1% 1.02x
DiffingMyers.o 8523 8347 -2.1% 1.02x
NSDictionaryCastToSwift.o 1685 1653 -1.9% 1.02x
ObserverForwarderStruct.o 3554 3490 -1.8% 1.02x
RemoveWhere.o 24591 24207 -1.6% 1.02x
RomanNumbers.o 8417 8321 -1.1% 1.01x
Substring.o 18473 18265 -1.1% 1.01x
CSVParsing.o 58616 57960 -1.1% 1.01x
StringRemoveDupes.o 7483 7403 -1.1% 1.01x
SequenceAlgos.o 20103 19895 -1.0% 1.01x

Performance: -Osize

Regression OLD NEW DELTA RATIO
DropLastArrayLazy 5 9 +80.0% 0.56x
ArrayAppendSequence 680 850 +25.0% 0.80x (?)
String.replaceSubrange.RepChar 1714 1965 +14.6% 0.87x (?)
DropWhileAnySeqCRangeIter 97 111 +14.4% 0.87x
PrefixWhileAnyCollection 99 112 +13.1% 0.88x
ObjectiveCBridgeStubNSDateRefAccess 174 196 +12.6% 0.89x (?)
Set.isDisjoint.Seq.Int.Empty 49 55 +12.2% 0.89x (?)
Calculator 133 148 +11.3% 0.90x (?)
Chars2 3200 3550 +10.9% 0.90x (?)
Set.isStrictSubset.Int.Empty 51 55 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
RemoveWhereFilterStrings 373 299 -19.8% 1.25x
Set.isSuperset.Seq.Box0 246 203 -17.5% 1.21x
RemoveWhereMoveStrings 352 292 -17.0% 1.21x
DropLastAnyCollection 32 27 -15.6% 1.19x (?)
DropLastAnySeqCntRangeLazy 32057 27482 -14.3% 1.17x
SuffixAnySeqCntRangeLazy 31893 27375 -14.2% 1.17x
SequenceAlgosAnySequence 19200 16500 -14.1% 1.16x
PrefixWhileAnySeqCRangeIterLazy 94 81 -13.8% 1.16x
PrefixWhileAnySeqCntRangeLazy 94 81 -13.8% 1.16x
PrefixAnySeqCntRange 24429 21053 -13.8% 1.16x
DropLastAnySeqCRangeIterLazy 32021 27614 -13.8% 1.16x
SuffixAnySeqCRangeIterLazy 31987 27585 -13.8% 1.16x
DropFirstAnySeqCntRange 32116 27702 -13.7% 1.16x
DropFirstAnySeqCRangeIter 32078 27686 -13.7% 1.16x
PrefixAnySeqCRangeIter 24389 21053 -13.7% 1.16x
ObjectiveCBridgeFromNSArrayAnyObject 16400 14200 -13.4% 1.15x (?)
ArrayAppendLatin1Substring 11484 10116 -11.9% 1.14x (?)
ArrayAppendUTF16Substring 11268 9972 -11.5% 1.13x (?)
String.replaceSubrange.RepChar.Small 261 234 -10.3% 1.12x (?)
ArrayAppendAsciiSubstring 11268 10116 -10.2% 1.11x (?)
StringEnumRawValueInitialization 400 360 -10.0% 1.11x (?)
Set.isDisjoint.Int.Empty 61 55 -9.8% 1.11x (?)
Set.isDisjoint.Seq.Box.Empty 99 91 -8.1% 1.09x
ObjectiveCBridgeFromNSArrayAnyObjectToString 29200 26900 -7.9% 1.09x (?)
SuffixAnyCollectionLazy 15704 14506 -7.6% 1.08x (?)
PrefixAnyCollectionLazy 46891 43511 -7.2% 1.08x (?)
CharIteration_russian_unicodeScalars_Backwards 3360 3120 -7.1% 1.08x (?)
ObjectAllocation 122 114 -6.6% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Queue.o 13065 14361 +9.9% 0.91x
StringMatch.o 4530 4594 +1.4% 0.99x
ErrorHandling.o 2879 2911 +1.1% 0.99x
 
Improvement OLD NEW DELTA RATIO
StringRemoveDupes.o 4769 4529 -5.0% 1.05x
InsertCharacter.o 4815 4671 -3.0% 1.03x
SortIntPyramids.o 12333 12013 -2.6% 1.03x
BinaryFloatingPointConversionFromBinaryInteger.o 25472 24853 -2.4% 1.02x
ObserverForwarderStruct.o 3706 3626 -2.2% 1.02x
ReversedCollections.o 8991 8799 -2.1% 1.02x
LinkedList.o 2248 2200 -2.1% 1.02x
RangeReplaceableCollectionPlusDefault.o 5654 5542 -2.0% 1.02x
DictOfArraysToArrayOfDicts.o 24898 24514 -1.5% 1.02x
ObserverUnappliedMethod.o 5246 5166 -1.5% 1.02x
CSVParsing.o 57940 57252 -1.2% 1.01x
DictTest3.o 13594 13450 -1.1% 1.01x

Performance: -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateMutation 348 457 +31.3% 0.76x
ArrayAppendReserved 2190 2700 +23.3% 0.81x
ArrayAppend 2440 2960 +21.3% 0.82x (?)
DataSubscriptMedium 45 54 +20.0% 0.83x
BinaryFloatingPointPropertiesNextUp 83 98 +18.1% 0.85x
DataCreateEmpty 650 760 +16.9% 0.86x (?)
BinaryFloatingPointPropertiesBinade 50 58 +16.0% 0.86x
String.replaceSubrange.RepChar 1773 2036 +14.8% 0.87x (?)
Memset 7217 8196 +13.6% 0.88x
PrefixCountableRange 287 323 +12.5% 0.89x
DropFirstCountableRange 287 323 +12.5% 0.89x
ObjectiveCBridgeStubToNSDateRef 2780 3100 +11.5% 0.90x (?)
SuffixCountableRange 99 110 +11.1% 0.90x
DropLastCountableRange 100 110 +10.0% 0.91x (?)
ObjectiveCBridgeStringHash 43 47 +9.3% 0.91x (?)
DataSubscriptSmall 69 75 +8.7% 0.92x (?)
ArraySetElement 869 942 +8.4% 0.92x
BinaryFloatingPointPropertiesUlp 87 94 +8.0% 0.93x (?)
Sim2DArray 6168 6662 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.Long 3611 2688 -25.6% 1.34x
NSStringConversion.Medium 1400 1088 -22.3% 1.29x
NSStringConversion.LongUTF8 1349 1081 -19.9% 1.25x
Dict.FilterAllMatch.24k 2915 2369 -18.7% 1.23x
ArrayAppendRepeatCol 241480 196670 -18.6% 1.23x
Data.init.Sequence.64kB.Count.RE 19861 16234 -18.3% 1.22x
Data.init.Sequence.809B.Count.RE 24592 20108 -18.2% 1.22x
Dict.FilterAllMatch.20k 2502 2048 -18.1% 1.22x
Data.append.Sequence.64kB.Count.RE 19850 16265 -18.1% 1.22x
Data.append.Sequence.64kB.Count.RE.I 19806 16234 -18.0% 1.22x
Data.append.Sequence.809B.Count.RE.I 24609 20179 -18.0% 1.22x
Data.append.Sequence.809B.Count.RE 24533 20123 -18.0% 1.22x
Data.init.Sequence.64kB.Count.RE.I 19814 16268 -17.9% 1.22x
Data.init.Sequence.809B.Count.RE.I 24587 20196 -17.9% 1.22x
Data.init.Sequence.64kB.Count0.RE.I 19754 16235 -17.8% 1.22x
Data.init.Sequence.809B.Count0.RE.I 24659 20271 -17.8% 1.22x
Data.append.Sequence.64kB.Count0.RE 19714 16214 -17.8% 1.22x
Data.init.Sequence.64kB.Count0.RE 19692 16222 -17.6% 1.21x
DataAppendSequence 2444600 2015500 -17.6% 1.21x
Data.append.Sequence.64kB.Count0.RE.I 19683 16251 -17.4% 1.21x
Data.append.Sequence.809B.Count0.RE.I 24479 20251 -17.3% 1.21x
StringBuilderWithLongSubstring 2030 1680 -17.2% 1.21x (?)
SetSubtractingInt0 250 207 -17.2% 1.21x
Dict.FilterAllMatch.16k 2125 1762 -17.1% 1.21x
Data.init.Sequence.809B.Count0.RE 24513 20342 -17.0% 1.21x
Data.append.Sequence.809B.Count0.RE 24528 20366 -17.0% 1.20x
StringMatch 35300 29600 -16.1% 1.19x
SuffixArrayLazy 26851 22520 -16.1% 1.19x
Dict.FilterAllMatch.28k 3955 3333 -15.7% 1.19x (?)
DropLastArrayLazy 26708 22539 -15.6% 1.18x
DropWhileAnyCollection 32469 27412 -15.6% 1.18x
PrefixWhileAnyCollection 47849 40445 -15.5% 1.18x
PrefixWhileCountableRange 23533 19903 -15.4% 1.18x
DropFirstAnySeqCRangeIterLazy 32757 27750 -15.3% 1.18x
PrefixAnyCollection 24545 20814 -15.2% 1.18x
SuffixAnySeqCRangeIter 36825 31228 -15.2% 1.18x
PrefixAnySeqCRangeIterLazy 24606 20874 -15.2% 1.18x (?)
SetSubtractingInt25 317 269 -15.1% 1.18x
SequenceAlgosRange 1976130 1678200 -15.1% 1.18x
DropWhileAnySeqCntRange 35054 29785 -15.0% 1.18x
DropFirstAnySeqCRangeIter 32744 27829 -15.0% 1.18x
SuffixAnySeqCntRange 36485 31027 -15.0% 1.18x
DropFirstAnySeqCntRangeLazy 32844 27936 -14.9% 1.18x
SequenceAlgosAnySequence 20100 17100 -14.9% 1.18x
DropFirstAnyCollection 24559 20896 -14.9% 1.18x
PrefixArrayLazy 79975 68051 -14.9% 1.18x (?)
DropFirstAnySeqCntRange 32723 27858 -14.9% 1.17x
PrefixAnySeqCRangeIter 24848 21157 -14.9% 1.17x
DropWhileCountableRange 8049 6854 -14.8% 1.17x (?)
DropLastAnyCollection 8192 6980 -14.8% 1.17x
PrefixAnySeqCntRangeLazy 24507 20898 -14.7% 1.17x
SuffixAnySeqCntRangeLazy 36708 31307 -14.7% 1.17x
SuffixAnySeqCRangeIterLazy 37090 31664 -14.6% 1.17x
DropWhileCountableRangeLazy 33273 28406 -14.6% 1.17x
DropWhileAnySeqCntRangeLazy 33726 28794 -14.6% 1.17x
DropWhileAnySeqCRangeIter 34856 29779 -14.6% 1.17x
LazilyFilteredRange 822850 703180 -14.5% 1.17x
DropLastAnySeqCntRange 37555 32096 -14.5% 1.17x
RandomIntegersLCG 31153 26627 -14.5% 1.17x
SuffixAnyCollection 8168 6992 -14.4% 1.17x
DropLastAnySeqCntRangeLazy 37776 32354 -14.4% 1.17x
SuffixCountableRangeLazy 29171 24992 -14.3% 1.17x
DropLastCountableRangeLazy 29016 24896 -14.2% 1.17x
DropFirstCountableRangeLazy 87140 74767 -14.2% 1.17x
DropLastAnySeqCRangeIter 37569 32253 -14.1% 1.16x
PrefixWhileAnySeqCRangeIterLazy 26532 22782 -14.1% 1.16x
DropWhileAnyCollectionLazy 33591 28850 -14.1% 1.16x
DropWhileAnySeqCRangeIterLazy 33568 28837 -14.1% 1.16x
DropFirstArrayLazy 79637 68439 -14.1% 1.16x
PrefixAnySeqCntRange 24881 21389 -14.0% 1.16x
PrefixWhileAnySeqCntRangeLazy 26407 22794 -13.7% 1.16x
DropLastAnySeqCRangeIterLazy 37731 32596 -13.6% 1.16x
PrefixWhileCountableRangeLazy 25988 22476 -13.5% 1.16x
PrefixWhileAnyCollectionLazy 26449 22891 -13.5% 1.16x
RandomDoubleLCG 39298 34026 -13.4% 1.15x
PrefixWhileAnySeqCntRange 37555 32564 -13.3% 1.15x (?)
NSStringConversion.UTF8 1732 1505 -13.1% 1.15x
PrefixCountableRangeLazy 87709 76412 -12.9% 1.15x
StringUTF16SubstringBuilder 11520 10060 -12.7% 1.15x (?)
PrefixWhileAnySeqCRangeIter 37253 32590 -12.5% 1.14x
ObjectiveCBridgeFromNSArrayAnyObject 17700 15500 -12.4% 1.14x (?)
DropLastAnyCollectionLazy 47088 41264 -12.4% 1.14x
SuffixAnyCollectionLazy 46734 40996 -12.3% 1.14x
PrefixAnyCollectionLazy 141682 124386 -12.2% 1.14x
SetSubtractingInt50 353 312 -11.6% 1.13x
DropFirstAnyCollectionLazy 140899 124827 -11.4% 1.13x (?)
DictionaryGroup 5280 4698 -11.0% 1.12x (?)
String.replaceSubrange.RepChar.Small 269 240 -10.8% 1.12x (?)
SetSubtractingInt100 416 375 -9.9% 1.11x (?)
CSVParsingAltIndices2 7392 6666 -9.8% 1.11x (?)
Set.subtracting.Seq.Int0 4759 4298 -9.7% 1.11x (?)
RemoveWhereFilterStrings 2844 2571 -9.6% 1.11x (?)
RomanNumbers2 7120 6462 -9.2% 1.10x (?)
Set.isStrictSuperset.Seq.Box0 70692 64159 -9.2% 1.10x
Set.isStrictSuperset.Seq.Box25 7149 6489 -9.2% 1.10x
Set.isSubset.Seq.Box0 7118 6463 -9.2% 1.10x
RemoveWhereQuadraticInts 8728 7926 -9.2% 1.10x (?)
Set.isStrictSubset.Seq.Box0 7117 6468 -9.1% 1.10x
Set.isSuperset.Seq.Box0 1629 1484 -8.9% 1.10x (?)
Set.isSuperset.Seq.Int.Empty 1312 1196 -8.8% 1.10x (?)
ParseInt.Large.Binary 8516 7774 -8.7% 1.10x
CSVParsing.Scalar 495 452 -8.7% 1.10x (?)
Set.isStrictSubset.Seq.Box25 7605 6950 -8.6% 1.09x
Set.subtracting.Seq.Int25 5016 4584 -8.6% 1.09x
ParseInt.Large.UncommonRadix 3361 3072 -8.6% 1.09x (?)
ParseInt.Large.Hex 2674 2446 -8.5% 1.09x
Set.isSuperset.Seq.Int0 1400 1281 -8.5% 1.09x (?)
RemoveWhereQuadraticStrings 9757 8929 -8.5% 1.09x (?)
ParseInt.Large.Decimal 2914 2667 -8.5% 1.09x (?)
Set.isSubset.Seq.Box25 7613 6970 -8.4% 1.09x
Set.subtracting.Seq.Box0 6135 5617 -8.4% 1.09x
RandomDoubleDef 55100 50500 -8.3% 1.09x (?)
Set.subtracting.Seq.Int50 5187 4754 -8.3% 1.09x (?)
SumUsingReduceInto 202010 185163 -8.3% 1.09x (?)
CSVParsing.UTF16 450 413 -8.2% 1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString 30600 28100 -8.2% 1.09x (?)
CSVParsing.UTF8 482 443 -8.1% 1.09x (?)
SuffixArray 3777 3476 -8.0% 1.09x (?)
PrefixArray 11355 10452 -8.0% 1.09x (?)
Set.subtracting.Seq.Int100 5620 5184 -7.8% 1.08x (?)
ReversedArray2 19340 17849 -7.7% 1.08x (?)
ParseInt.Small.Binary 16859 15575 -7.6% 1.08x
RemoveWhereQuadraticString 2726 2519 -7.6% 1.08x (?)
DropFirstArray 11349 10496 -7.5% 1.08x (?)
DropLastArray 3768 3487 -7.5% 1.08x (?)
Set.subtracting.Seq.Box25 6940 6424 -7.4% 1.08x
StrToInt 42140 39010 -7.4% 1.08x
StringEdits 125700 116700 -7.2% 1.08x (?)
SequenceAlgosContiguousArray 947360 879610 -7.2% 1.08x (?)
Set.isDisjoint.Seq.Int50 2226 2067 -7.1% 1.08x (?)
Set.isSuperset.Seq.Int50 2175 2020 -7.1% 1.08x (?)
Set.isSuperset.Seq.Int100 4343 4034 -7.1% 1.08x (?)
ParseInt.Small.Hex 9501 8829 -7.1% 1.08x (?)
Set.isDisjoint.Seq.Int25 3307 3076 -7.0% 1.08x (?)
Set.intersection.Seq.Box0 1807 1681 -7.0% 1.07x (?)
SequenceAlgosArray 1005420 935500 -7.0% 1.07x (?)
DictionaryCompactMapValuesOfCastValue 41904 39042 -6.8% 1.07x (?)
ParseInt.Small.Decimal 9635 8978 -6.8% 1.07x (?)
Set.intersection.Seq.Box25 2139 1994 -6.8% 1.07x (?)
SumUsingReduce 203892 190108 -6.8% 1.07x (?)
Breadcrumbs.MutatedUTF16ToIdx.ASCII 15 14 -6.7% 1.07x (?)
Breadcrumbs.UTF16ToIdxRange.longASCII 241 225 -6.6% 1.07x (?)
UTF8Decode 36951 34506 -6.6% 1.07x (?)

Code size: -swiftlibs

Regression OLD NEW DELTA RATIO
libswiftStdlibUnittest.dylib 356352 368640 +3.4% 0.97x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 9bc68b0e551e6f279d6ba5dc9cf2a6cf5d80d073

@eeckstein eeckstein requested a review from gottesmm August 26, 2019 09:11
Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this review I'll need to assume that general destroy_addr hoisting and splitting is something we will want at -O. Initializing the MemoryLocation data and global dataflow state needed simultanesouly solve for all memory sublocations in the function is nontrivial. It would be gratuitous to do that just to handle a specific switch_enum pattern that might occur somewhere in that function. But I can instead view this as an eventual replacement for DestroyHoisting in CopyForwarding--that was originally designed as a -Onone peephole, which we later decided not to enable at -Onone for fear of affecting debugging. Our intention (for the past few years) was to just replace CopyForwarding/DestroyHoisting with Opaque Values and CopyPropagation. Assuming that happens, I wasn't sure under what circumstance we would still need destroy_addr hoisting. The problem of switching over a mutable enum is at least one good example of why destroy_addr hoisting will remain useful. I have a lingering concern because it should be possible to mutate a CoW enum payload in place at -Onone. For that, maybe we'll just need to wait for "move" semantics in the language.

This approach is also very different because it aggresively splits destroys. I've always assumed that canonical SIL should have combined destroys so it's easy to discover the lifetime of aggregates.

Setting all that aside, and assuming we just want to hoist and split all destroy_addr instructions at -O, this is a beautiful way to accomplish that. It will be great to have a complete solution that doesn't depend on a particular shape of SIL. The implementation in this PR is very clean and high quality.

I like the way that MemoryLocations and MemoryDataflow can be used as utilities now, each with independent state and lifetime, without being wedged into some inheritance pattern.

On the algorithm:

In the top-level comments, there's no explanation of how aggregates and their sublocations are handled. But that's really essential to why this approach was taken, rather than a more efficient per-variable lifetime discovery. I think of this design as a "destroy splitting" approach as much as destroy hoisting. Can you explain that it splits destroys into many smaller destroy operations, and why that would usually be desirable?

use(pair.first)
use(pair.second)
// any side effect
destroy(pair)

becomes:

use(pair.first)
destroy(pair.first)
use(pair.second)
destroy(pair.second)

Isn't this potentially a code size problem?

Also, this seems contrary to most other SIL analyses where the code tries to recognize aggregate value lifetimes as a single unit. For example, I'm not sure CopyForwarding will even kick in after this hoisting (and destroy splitting) is done.

On analysis invalidation:

This pass mutates the IR while querying the MemoryDataflow and MemoryLocations. For functions that do this, there should be a note explaining why this is guaranteed safe. I'm sure you thought about it, but I didn't try to prove to myself why it's safe. When other developers modify this code or cargo cult it, I want them to understand that this is normally something you need to think very carefully about. In particular explain why registerProjection is necessary and sufficient (I still wouldn't be able to explain that).

On performance:

It would be interesting to know why destroy hoisting causes significant regressions only to understand if there is some later optimization that currently depends on the shape of those destroys--maybe that optimization needs to be fixed.

I'm also baffled by the Onone performance gains that were reported. Seeing that, I initially assumed you had enabled this at Onone. But clearly these changes should not affect Onone, so what's going on?

On testing:

I'm guessing you already did this, but want to point out that store splitting should be fully tested (on SCK at least) without applying the isSplittingEnabler filter.

On compile time:

I think this style of dataflow can become expensive. For compile time, it's the pathological cases that matter, not the typical case. I usually see the cost show up as time needed to (re)initialize all the per-block bitsets. It will just boil down to how many memory locations need to be tracked and how often it gets recomputed. The way our pipeline works, analyses can easily be recomputed thousands of times. I'm not too worried about recomputing lifetimes yet since it's only being used a single function pass for now. But I'm afraid that running before SROA, and without opaque values means tracking lots of locations for no reason.

I'm sure, initially, the cost is hidden by other gratuitously costly parts of the pipeline: type checker, ARC, RLE, DSE, LLVM. But can you determine the worst case time in SCK, or maybe count the amount of worst-case allocation? If it shows up at all, we might want to filter the address producers to avoid allocating locations that are obviously going to be SROA'd anyway, or obviously can't be optimized, like address-only types. Alternatively, if you just implement a simple location filter and determine that it then never allocates more than 64 locations, there's nothing at all to worry about.

Bits usedLocs(locations.getNumLocations());

// Initialize the dataflow, which tells us which destroy_addr instructions are
// reachable through a switch_enum, without a use of the location inbetween.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in between

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add brief comments explaining how this problem maps to the foward data flow: e.g.

Compute forward reaching split-enable-points. The split-enable-point is the first
program point at which it is useful to hoist the destroy. genSet is initialized
at each memory location's split-enable-point. killSet are the points
where the location is used. If a location is used on some path from
the gen-point to the destroy, then the destroy will not be hoisted
beyond that point. If the location is used on all paths, the destroy
will not be hoisted at all. If it is not used on some path, it will
be hoisted to the latest use points. Hence the use of
solveForwardWithUnion.

// This enables the switch-enum optimization (see above).
// TODO: investigate the benchmark regressions and enable store-splitting more
// aggressively.
void DestroyHoisting::splitStores(MemoryDataflow &dataFlow) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, splitting a store means breaking it into multiple narrower stores. This can be called either decomposition or expansion, take your pick.

// destroy_addr %a
// store %s to [init] %a
//
// This is not a good thing in general. If we would do it unconditionally, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very curious why it's bad to split the stores. I'm guessing you are too and just haven't investigated yet...

}

// Initialize the dataflow for moving destroys up the control flow.
void DestroyHoisting::initDataflow(MemoryDataflow &dataFlow) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd like a brief comment somewhere that maps the problem to the dataflow.

Compute backward reaching destroys. genSet is initialized at the
destroy. killSet are the points where locations are used.

// Handling blocks which eventually end up in an unreachable is surprisingly
// complicated. The conservative approach would be to assume all locations are
// dead at the unreachable. But that would block release hoisting whenever there
// is a unreachable block in the way (which is pretty common for calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Do you mean that we actually want to hoist real destroy_addr's that are on the unreachable path? So, if you pretend there's a destroy at the exit that will somehow prevent hoisting of the real destroy? I don't know what that would happen. Could you clarify this in the comments?

if (!dataFlow.getState(singlePred)->exitReachable)
return false;

// Check if none of the locations are touched in the unreachable-block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment illustrates why I prefer per-variable, path discovery data flow...

// point of \p builder.
SILValue DestroyHoisting::createAddress(unsigned locIdx, SILBuilder &builder) {
auto *loc = locations.getLocation(locIdx);
if (loc->parentIdx < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to have a MemoryLocationIndex wrapper so we don't need to guess what it means for a location to be negative.

// Recursively create (or get) the address of the parent location.
SILValue baseAddr = createAddress(loc->parentIdx, builder);
auto *projInst = cast<SingleValueInstruction>(loc->representativeValue);
if (!isa<BeginAccessInst>(loc->representativeValue) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you handle beginAccess addresses? No matter where the address came from, the code always assumes that all address users are within the access scope.

domTree = DA->get(function);

// Recursively create (or get) the address of the parent location.
SILValue baseAddr = createAddress(loc->parentIdx, builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is baseAddr created before checking if loc->representativeValue is available?

// The function entry block: insert all destroys which are still active.
insertDestroys(activeDestroys, activeDestroys, toRemove,
nullptr, block);
} else if (SILBasicBlock *pred = block->getSinglePredecessorBlock()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note that this condition assumes no critical edges are present.

@eeckstein
Copy link
Contributor Author

@atrick Thanks for reviewing!
I will address your specific comments on the code in a new version (I also have to fix a bug).
Here are some answers to your top-level comments:

First of all: this optimization does not split or combine destroys. It only moves destroys. I will add a comment explaining this.

analysis invalidation: I'll add comments

Onone performance: It's like with all other optimizations which run with -O/-Osize: the -Onone benchmarks mainly call generic code in the (optimized) stdlib. So optimization differences in the stdlib will be reflected in -Onone performance.

I investigated some of the performance regressions and it turned out again, that many of them are false alarms. The code is identical, but different code alignment causes the perf differences.

testing: yes, I tested the benchmarks with unconditional store splitting. I can also run the validation tests.

compile time: it's a good idea to get statistics on the max size of the bitfields. I'll do that.

eeckstein added a commit to eeckstein/swift-package-manager that referenced this pull request Aug 27, 2019
… files and directories.

Replacing the TemporaryFile/TemporaryDirectory classes with withTemporaryFile/withTemporaryDirectory functions.

Using classes+deinit for resource management is problematic because the lifetime of classes is not guaranteed to be the lexical scope. Compiler optimizations may shrink the lifetime to the last use. One way to fix this is would be to make sure that the TemporaryFile class instances are referenced as long as the managed resource (= the file or directory) is in use. That could be done e.g. with stdlib's withExtendedLifetime function. But that would be cumbersome.

The better way to fix this is to change the ABI to a closure based approach (there are several examples for this in the swift stdlib, e.g. withExtendedLifetime). It does not only fix the lifetime problem, but also makes the lifetime it immediately visible in the source code. Also, it avoids the memory allocation for allocating the class instance.

This change is triggered by a new compiler optimization: swiftlang/swift#26803. With this optimization the temporary file in the ManifestLoader.parse() function would break.
Although it would be easy to workaround the problem just in ManifestLoader.parse(), I decided to do the right fix, which is more future prove.

Implementation note: I kept the TemporaryFile as a struct, because it contains several useful fields. It's passed as argument to the closure. In case of withTemporaryDirectory, only the directory path is relevant, which is now directly passed to the closure. So no need of a TemporaryDirectory struct.
@atrick
Copy link
Contributor

atrick commented Aug 27, 2019

@eeckstein I'm much more comfortable with this approach now that I see it's not splitting destroys. This is really hidden though. I read the code as if getUsedLocationsOfInst did what it said and only reported locations used by that instruction. I see that under the hood, it's calling getUsedLocationsOfAddr which very sneakily calls loc->selfAndParents. There should be at least a prominent comment on getUsedLocationsOfInst since it isn't clear from the function body. It's also helpful to explain this in the top-level description of the algorithm since it's really central to what this pass does.

… more flexible.

Add the possibility to solve with a custom join operator.
It causes a use-after-free problem with more aggressive destroy hoisting in the compiler.
Also replace tabs with spaces in this file.
@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

@eeckstein
Copy link
Contributor Author

@swift-ci test macOS

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test macOS

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2d447be565c7253d398930a20977629bf176820b

DestroyHoisting moves destroys of memory locations up the control flow as far as possible.
Beside destroy_addr, also "store [assign]" is considered a destroy, because is is equivalent to an destroy_addr + a "store [init]".
The main purpose of this optimization is to minimize copy-on-write operations for arrays, etc. Especially if such COW containers  are used as enum payloads and modified in-place. E.g.

switch e {
  case .A(var arr):
    arr.append(x)
    self = .A(arr)
...

In such a case DestroyHoisting can move the destroy of the self-assignment up before the switch and thus let the array buffer only be single-referenced at the time of the append.

When we have ownership SIL throughout the pass pipeline this optimization will replace the current destroy hoisting optimization in CopyForwarding.
For now, this optimization only runs in the mandatory pipeline (but not for -Onone) where we already have ownership SIL.

SR-10605
rdar://problem/50463362
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 0e08976

@eeckstein
Copy link
Contributor Author

@swift-ci test macOS

@eeckstein eeckstein merged commit 8ea5df1 into swiftlang:master Aug 28, 2019
@eeckstein eeckstein deleted the destroy-hoisting branch August 28, 2019 17:49
aemino pushed a commit to aemino/swift-package-manager that referenced this pull request Sep 14, 2019
… files and directories.

Replacing the TemporaryFile/TemporaryDirectory classes with withTemporaryFile/withTemporaryDirectory functions.

Using classes+deinit for resource management is problematic because the lifetime of classes is not guaranteed to be the lexical scope. Compiler optimizations may shrink the lifetime to the last use. One way to fix this is would be to make sure that the TemporaryFile class instances are referenced as long as the managed resource (= the file or directory) is in use. That could be done e.g. with stdlib's withExtendedLifetime function. But that would be cumbersome.

The better way to fix this is to change the ABI to a closure based approach (there are several examples for this in the swift stdlib, e.g. withExtendedLifetime). It does not only fix the lifetime problem, but also makes the lifetime it immediately visible in the source code. Also, it avoids the memory allocation for allocating the class instance.

This change is triggered by a new compiler optimization: swiftlang/swift#26803. With this optimization the temporary file in the ManifestLoader.parse() function would break.
Although it would be easy to workaround the problem just in ManifestLoader.parse(), I decided to do the right fix, which is more future prove.

Implementation note: I kept the TemporaryFile as a struct, because it contains several useful fields. It's passed as argument to the closure. In case of withTemporaryDirectory, only the directory path is relevant, which is now directly passed to the closure. So no need of a TemporaryDirectory struct.

} // end anonymous namespace

SILTransform *swift::createDestroyHoisting() {
Copy link
Contributor

@zoecarver zoecarver Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this function is never called...

Edit: nevermind. I see it is used in macros (with Passes.def).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants