Skip to content

SILOptimizer: A new "TempLValueOpt" optimization for copy_addr #32428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 22, 2020

Conversation

eeckstein
Copy link
Contributor

Optimizes copies from a temporary (an "l-value") to a destination.

    %temp = alloc_stack $Ty
    instructions_which_store_to %temp
    copy_addr [take] %temp to [initialization] %destination
    dealloc_stack %temp

is optimized to

    instructions_which_store_to %destination

The name tempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The tempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack, the copy-destination must be initialized).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.

This PR also contains some other small improvements related to copy_addr and other address instructions. For details see the commit messages.

The effect of this optimizations is that code with address-only types (e.g. generic functions) has significantly less copy instructions, especially initializations of address-only enums (e.g. optionals).
This is a benefit for performance and code size and can also reduce the amount of generic metadata which is generated at runtime.

@eeckstein eeckstein requested a review from atrick June 17, 2020 10:57
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - b48a8544e1824f4f2dac7b6cae1df0352baa2bc3

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - b48a8544e1824f4f2dac7b6cae1df0352baa2bc3

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 139 167 +20.1% 0.83x (?)
UTF8Decode_InitFromCustom_contiguous 140 168 +20.0% 0.83x
ObjectiveCBridgeStubDateAccess 130 152 +16.9% 0.86x
Data.init.Sequence.2047B.Count.I 54 63 +16.7% 0.86x
Data.hash.Empty 45 52 +15.6% 0.87x (?)
Dictionary4 195 224 +14.9% 0.87x
Data.init.Sequence.2049B.Count.I 55 62 +12.7% 0.89x
IterateData 840 945 +12.5% 0.89x (?)
Dictionary4OfObjects 224 252 +12.5% 0.89x
NSStringConversion.LongUTF8 298 321 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Data.append.Sequence.64kB.Count.RE.I 44 21 -52.3% 2.10x
Data.append.Sequence.64kB.Count.RE 44 21 -52.3% 2.10x
AnyHashableWithAClass 84000 52500 -37.5% 1.60x
Data.append.Sequence.809B.Count.RE 88 58 -34.1% 1.52x
DataAppendSequence 8800 6100 -30.7% 1.44x
DataCreateEmptyArray 1150 800 -30.4% 1.44x
Data.append.Sequence.809B.Count.RE.I 90 63 -30.0% 1.43x
ObserverForwarderStruct 715 505 -29.4% 1.42x (?)
ObserverUnappliedMethod 960 700 -27.1% 1.37x
StringEnumRawValueInitialization 460 360 -21.7% 1.28x
String.replaceSubrange.RepChar 2049 1652 -19.4% 1.24x
ArrayLiteral2 81 67 -17.3% 1.21x (?)
Set.isSuperset.Seq.Empty.Int 50 42 -16.0% 1.19x
DataCreateSmallArray 2150 1850 -14.0% 1.16x
ObjectiveCBridgeFromNSArrayAnyObject 14000 12300 -12.1% 1.14x (?)
Set.isStrictSubset.Int.Empty 52 46 -11.5% 1.13x (?)
DictionaryOfAnyHashableStrings_lookup 2784 2472 -11.2% 1.13x
Set.isStrictSubset.Seq.Int.Empty 126 112 -11.1% 1.12x
Set.subtracting.Empty.Box 9 8 -11.1% 1.12x
Diffing.Pangrams 3205 2849 -11.1% 1.12x (?)
Set.isDisjoint.Seq.Box.Empty 90 81 -10.0% 1.11x (?)
Set.isDisjoint.Seq.Int.Empty 52 47 -9.6% 1.11x (?)
Set.isSubset.Seq.Int.Empty 126 114 -9.5% 1.11x (?)
ObjectiveCBridgeFromNSSetAnyObject 25200 23000 -8.7% 1.10x (?)
ObjectiveCBridgeFromNSDictionaryAnyObject 24800 22700 -8.5% 1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString 25700 23600 -8.2% 1.09x (?)
Set.isStrictSuperset.Seq.Empty.Int 172 158 -8.1% 1.09x (?)
SortLargeExistentials 14900 13800 -7.4% 1.08x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToStringForced 25200 23400 -7.1% 1.08x (?)
DictionaryOfAnyHashableStrings_insert 3220 2996 -7.0% 1.07x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 209 195 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
SortIntPyramids.o 8743 8959 +2.5% 0.98x
ObserverUnappliedMethod.o 5041 5113 +1.4% 0.99x
 
Improvement OLD NEW DELTA RATIO
DictionaryKeysContains.o 8620 8384 -2.7% 1.03x
PrimsNonStrongRef.o 123028 121044 -1.6% 1.02x
BinaryFloatingPointConversionFromBinaryInteger.o 29568 29192 -1.3% 1.01x
ChaCha.o 14377 14209 -1.2% 1.01x

Performance: -Osize

Regression OLD NEW DELTA RATIO
Dictionary4 177 246 +39.0% 0.72x
Dictionary4OfObjects 228 284 +24.6% 0.80x
UTF8Decode_InitDecoding 138 165 +19.6% 0.84x
UTF8Decode_InitFromCustom_contiguous 139 165 +18.7% 0.84x
DataSubscriptSmall 17 19 +11.8% 0.89x
Array2D 4384 4832 +10.2% 0.91x (?)
FlattenListLoop 1375 1512 +10.0% 0.91x (?)
Set.isSuperset.Seq.Int.Empty 83 91 +9.6% 0.91x (?)
IterateData 867 949 +9.5% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
AnyHashableWithAClass 84000 52500 -37.5% 1.60x
DataCreateEmptyArray 1200 800 -33.3% 1.50x
ObserverForwarderStruct 665 465 -30.1% 1.43x (?)
ObserverUnappliedMethod 970 710 -26.8% 1.37x (?)
String.replaceSubrange.RepChar 2058 1639 -20.4% 1.26x
FrequenciesUsingReduce 3970 3200 -19.4% 1.24x
DataCreateSmallArray 2300 1900 -17.4% 1.21x (?)
Set.isDisjoint.Seq.Box.Empty 95 81 -14.7% 1.17x (?)
Set.isDisjoint.Seq.Int.Empty 50 43 -14.0% 1.16x (?)
DictionaryRemoveOfObjects 19000 16400 -13.7% 1.16x (?)
ObjectiveCBridgeFromNSArrayAnyObject 13800 12100 -12.3% 1.14x (?)
DataCountSmall 17 15 -11.8% 1.13x
Set.isSubset.Seq.Int.Empty 121 107 -11.6% 1.13x (?)
SortIntPyramid 835 740 -11.4% 1.13x (?)
Set.isSuperset.Seq.Empty.Int 53 47 -11.3% 1.13x (?)
Set.isStrictSubset.Seq.Int.Empty 126 114 -9.5% 1.11x (?)
SortAdjacentIntPyramids 1055 960 -9.0% 1.10x (?)
DictionaryOfAnyHashableStrings_lookup 2712 2472 -8.8% 1.10x (?)
Set.isDisjoint.Box.Empty 96 88 -8.3% 1.09x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 210 195 -7.1% 1.08x (?)
ObjectiveCBridgeFromNSDictionaryAnyObjectToString 43500 40500 -6.9% 1.07x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 205 191 -6.8% 1.07x (?)
ObjectiveCBridgeFromNSSetAnyObject 24400 22800 -6.6% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
StaticArray.o 10895 11279 +3.5% 0.97x
SortIntPyramids.o 8466 8688 +2.6% 0.97x
ObserverForwarderStruct.o 2787 2855 +2.4% 0.98x
ObserverUnappliedMethod.o 5089 5161 +1.4% 0.99x
 
Improvement OLD NEW DELTA RATIO
DictionaryKeysContains.o 8443 8175 -3.2% 1.03x
ChaCha.o 11355 11155 -1.8% 1.02x
DictOfArraysToArrayOfDicts.o 21385 21065 -1.5% 1.02x
BinaryFloatingPointConversionFromBinaryInteger.o 28512 28152 -1.3% 1.01x

Performance: -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous 150 177 +18.0% 0.85x
UTF8Decode_InitDecoding 147 173 +17.7% 0.85x (?)
ArrayAppendReserved 2720 2950 +8.5% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
AnyHashableWithAClass 86000 54500 -36.6% 1.58x
Set.isDisjoint.Int0 938 692 -26.2% 1.36x
SetSubtractingInt0 204 159 -22.1% 1.28x
String.replaceSubrange.RepChar 2135 1689 -20.9% 1.26x
SetSubtractingInt25 264 213 -19.3% 1.24x
SetSubtractingBox25 635 523 -17.6% 1.21x
Set.isDisjoint.Int50 771 636 -17.5% 1.21x
SetUnionInt50 345 286 -17.1% 1.21x
SetSubtractingInt50 304 253 -16.8% 1.20x (?)
Set.isDisjoint.Int25 788 656 -16.8% 1.20x
SetUnionInt25 389 325 -16.5% 1.20x
Data.init.Sequence.2049B.Count.I 5481 4628 -15.6% 1.18x
Data.init.Sequence.809B.Count 4354 3684 -15.4% 1.18x (?)
Data.init.Sequence.809B.Count.I 4352 3684 -15.3% 1.18x (?)
Data.init.Sequence.64kB.Count 3481 2947 -15.3% 1.18x (?)
Data.init.Sequence.2047B.Count.I 5460 4623 -15.3% 1.18x
Data.init.Sequence.513B.Count.I 4180 3541 -15.3% 1.18x
Data.init.Sequence.511B.Count.I 4155 3524 -15.2% 1.18x
Data.append.Sequence.809B.Count.I 4359 3703 -15.0% 1.18x
Data.append.Sequence.809B.Count 4348 3701 -14.9% 1.17x
SetUnion 6020 5130 -14.8% 1.17x (?)
SetUnionInt0 604 515 -14.7% 1.17x (?)
Data.append.Sequence.64kB.Count 3493 2991 -14.4% 1.17x (?)
SetSubtractingInt100 369 316 -14.4% 1.17x (?)
SetUnionBox25 977 838 -14.2% 1.17x (?)
Data.init.Sequence.64kB.Count.I 3491 2998 -14.1% 1.16x (?)
Data.append.Sequence.64kB.Count.I 3503 3010 -14.1% 1.16x (?)
SetUnionBox0 1721 1482 -13.9% 1.16x
SetUnion_OfObjects 17200 14830 -13.8% 1.16x (?)
CharIndexing_tweet_unicodeScalars_Backwards 859160 745720 -13.2% 1.15x (?)
Set.isDisjoint.Box0 2100 1825 -13.1% 1.15x
FrequenciesUsingReduce 6310 5510 -12.7% 1.15x (?)
CharIndexing_korean_unicodeScalars_Backwards 420520 368040 -12.5% 1.14x (?)
SetSubtractingBox0 445 391 -12.1% 1.14x (?)
CharIteration_japanese_unicodeScalars_Backwards 305920 268800 -12.1% 1.14x (?)
CharIndexing_punctuated_unicodeScalars_Backwards 93480 82280 -12.0% 1.14x (?)
CharIndexing_ascii_unicodeScalars_Backwards 426800 376200 -11.9% 1.13x (?)
CharIndexing_punctuatedJapanese_unicodeScalars_Backwards 73280 64600 -11.8% 1.13x (?)
Set.isDisjoint.Int100 551 487 -11.6% 1.13x (?)
CharIndexing_russian_unicodeScalars_Backwards 357240 315920 -11.6% 1.13x (?)
DataCreateSmall 53850 47700 -11.4% 1.13x (?)
CharIndexing_japanese_unicodeScalars_Backwards 510520 452800 -11.3% 1.13x (?)
CharIndexing_chinese_unicodeScalars_Backwards 323240 286720 -11.3% 1.13x (?)
Hanoi 8800 7840 -10.9% 1.12x (?)
ObjectiveCBridgeFromNSArrayAnyObject 14700 13100 -10.9% 1.12x (?)
DataCreateMedium 451300 402500 -10.8% 1.12x (?)
Data.append.Sequence.64kB.Count0 3341 2999 -10.2% 1.11x (?)
CharIteration_punctuatedJapanese_unicodeScalars_Backwards 43360 38960 -10.1% 1.11x (?)
Data.init.Sequence.511B.Count0.I 4089 3682 -10.0% 1.11x (?)
Data.init.Sequence.2049B.Count0.I 5312 4788 -9.9% 1.11x (?)
Data.init.Sequence.513B.Count0.I 4100 3696 -9.9% 1.11x (?)
CharIteration_punctuated_unicodeScalars_Backwards 55320 49880 -9.8% 1.11x (?)
Dict.CopyKeyValue.24k 3546 3202 -9.7% 1.11x (?)
Data.init.Sequence.64kB.Count0.I 3351 3026 -9.7% 1.11x (?)
Data.append.Sequence.809B.Count0 4193 3787 -9.7% 1.11x (?)
Dict.CopyKeyValue.16k 2533 2288 -9.7% 1.11x (?)
Data.init.Sequence.2047B.Count0.I 5303 4791 -9.7% 1.11x (?)
CharIteration_tweet_unicodeScalars_Backwards 495320 447600 -9.6% 1.11x (?)
Dict.CopyKeyValue.20k 3017 2728 -9.6% 1.11x (?)
Dict.CopyKeyValue.28k 4631 4193 -9.5% 1.10x (?)
Data.init.Sequence.64kB.Count0 3361 3044 -9.4% 1.10x (?)
CharIteration_korean_unicodeScalars_Backwards 245200 222480 -9.3% 1.10x (?)
Data.append.Sequence.64kB.Count0.I 3337 3034 -9.1% 1.10x (?)
CharIteration_chinese_unicodeScalars_Backwards 189880 172840 -9.0% 1.10x (?)
CharIteration_ascii_unicodeScalars_Backwards 250200 227760 -9.0% 1.10x (?)
BitCount 40937 37292 -8.9% 1.10x
CharIndexing_chinese_unicodeScalars 304680 277800 -8.8% 1.10x (?)
DictionaryRemoveOfObjects 31500 28800 -8.6% 1.09x (?)
CharIndexing_punctuated_unicodeScalars 87800 80360 -8.5% 1.09x (?)
CharIteration_utf16_unicodeScalars_Backwards 198800 182000 -8.5% 1.09x (?)
CharIndexing_punctuatedJapanese_unicodeScalars 69200 63360 -8.4% 1.09x (?)
CharIndexing_ascii_unicodeScalars 400960 367240 -8.4% 1.09x (?)
CharIndexing_japanese_unicodeScalars 483120 443200 -8.3% 1.09x (?)
ArrayOfRef 4820 4430 -8.1% 1.09x (?)
PrefixArrayLazy 61414 56503 -8.0% 1.09x (?)
ObjectiveCBridgeFromNSSetAnyObject 25400 23400 -7.9% 1.09x (?)
ArrayOfGenericRef 4960 4570 -7.9% 1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToStringForced 26000 24000 -7.7% 1.08x (?)
CharIndexing_tweet_unicodeScalars 797280 737880 -7.5% 1.08x (?)
PrefixAnyCollectionLazy 135587 125498 -7.4% 1.08x (?)
DropLastArrayLazy 20519 19009 -7.4% 1.08x (?)
CharIndexing_russian_unicodeScalars 333080 308640 -7.3% 1.08x (?)
HTTP2StateMachine 2519740 2336859 -7.3% 1.08x (?)
CharIteration_russian_unicodeScalars_Backwards 210640 195360 -7.3% 1.08x (?)
CharIndexing_korean_unicodeScalars 388760 360680 -7.2% 1.08x (?)
FindString.Rec3.String 560 520 -7.1% 1.08x (?)
Radix2CooleyTukeyf 107136 99936 -6.7% 1.07x (?)
FindString.Rec3.Substring 569 531 -6.7% 1.07x (?)
PrefixCountableRangeLazy 80014 74735 -6.6% 1.07x (?)

Code size: -swiftlibs

Improvement OLD NEW DELTA RATIO
libswift_Differentiation.dylib 188416 184320 -2.2% 1.02x
libswiftAccelerate.dylib 258048 253952 -1.6% 1.02x
libswiftCore.dylib 4128768 4075520 -1.3% 1.01x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@eeckstein
Copy link
Contributor Author

@swift-ci test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test

@atrick
Copy link
Contributor

atrick commented Jun 17, 2020

On Alias Analysis support for enums. This is something alias analysis should definitely handle. I'm not surprised is wasn't handled, since the code is neglected. But it's definitely worth fixing.

With something so fundamental, I think it's good to have negative tests for any complex cases, particularly in aliasAddressProjection, which I'm not sure I fully understand. The way aliasAddressProjection and aliasInner are mutually recursive makes it hard to reason about. We obviously need to be very careful about introducing false positives here.

For example, we should test that
E -> init_existential/enum
aliases with
E -> init_existential/enum -> struct_element_addr

While we're at it:
E -> init_existential/enum -> struct_element_addr #1
does not alias with
E -> init_existential/enum -> struct_element_addr #2

More intersting:
V -> struct_element_addr -> init_existential/enum
aliases with
V -> struct_element_addr -> init_existential/enum

What about the fact that existential storage can actually alias even though the existential values are from different storage addresses? I think you at least need to come up with a proof that this it's generally safe to treat them as non-aliasing. I don't think it's as simple as "it makes sense for alias analysis to look through them". How do we know it's always correct to "look through them"?

@atrick
Copy link
Contributor

atrick commented Jun 17, 2020

On MemoryBehaviorVisitor::visitCopyAddrInst. This makes sense to me. I'm surprised we didn't have it already. Again, I would have added a couple of negative tests though... those are really much more important than the positive test!

@atrick
Copy link
Contributor

atrick commented Jun 18, 2020

On SILCombiner::combineCopyAndDestroy...

In principle, I don't like scanning the instruction stream within a SILCombineVisitor. SILCombine should only be for SSA-based peepholes. (I have the same criticism of SILCombiner::visitInjectEnumAddrInst).

This particular case might be justifiable in practice, since it bails at the first instruction that mayWriteMemory, but only if it's really needed by SILCombine. However, I don't see any reason this should be in SILCombine in the first place. Is it needed to enable some other SILCombines? Why would we add a SILCombine for something that's already handled in another pass by a more general analysis?

combineCopyAndDestroy is something that should be done during destroy hoisting.

CopyForwarding::hoistDestroy can easily be run as a separate pass. (The implementation is mostly in CopyForwarding::forwardCopiesOf--it's being gated by the copy analysis of just to avoid doing extra work.) Then we can do "DestroyHoisting" without paying the huge cost of doing a global dataflow analysis of all memory sublocations simultaneously--and it will handle more cases. Just by adding an AliasAnalysis query, it should be able to handle any case where either (a) we can recognize all address uses (aka known-use) or (b) alias analysis can recognize no potential aliasing access between the copy and destroy (aka known-instruction-window).

CopyForwarding::hoistDestroy is not limited to a single basic block and not limited to a single destroy. Any case that would be handled by SILCombiner::combineCopyAndDestroy would trivially be handled by CopyForwarding::hoistDestroy.

CopyForwarding::hoistDestroy currently assumes a "known-use" model. As with most our analyses, we really need both a "known-use" and "known-instruction-window" mode. The known-use mode is great because it's very efficient, its cost does not scale with the size of the function, and it handles arbitrarily complex code. Of course, it doesn't handle any SSA def-use pattern that we haven't specifically taught it to recognize.

A known-instruction-window mode is good for handling many patterns where we can't analyze the a value's source and can't find all aliases in the known-use mode. It's expensive, because it needs Alias and Escape Analysis, but important at -O. (CopyForwarding was originally supposed to be a -Onone peephole).

So, in short, we can just add an AliasAnalysis query to the existing destroy hoisting and easily separate it from forwardCopiesOf if need be.

Note that the AliasAnalysis query needs to bail on anything that can either read, write, or release memory, so it's fairly limited, but could still handle cases where the destroy is close to the copy, but we can't analysis the copy's source address for some reason.

Somehow, your current implementation seems to be ignoring cases like this:

copy_addr %source to %destination
use %source
destroy_addr %source

@atrick
Copy link
Contributor

atrick commented Jun 18, 2020

From what I can tell so far, tempLValueOpt is an attempt to add AliasAnalysis to CopyForwarding. I think it's a very bad idea for SILCombine to depend on Alias/Escape Analysis. SILCombine should be SSA peepholes that can rerun frequently after other transforms without recomputing interprocedural analysis. (We really need to do SILCombiner::visitInjectEnumAddrInst in a separate pass too).

Scanning the instructions in SILCombine also doesn't make sense to me. That's quadratic even if SILCombine does nothing, and cubic or worse of SILCombine makes changes--multiplied by the cost of the alias analysis queries. I can't even reason about what happens in the presence of other SILCombines that invalidate alias and escape analysis or adds new instructions that are unrecognized by the previous analysis.

It would be fine to just add AliasAnalysis to CopyForwarding, since we're not using it as the -Onone peephole that it was meant to be.

I haven't looked yet at why CopyForwarding doesn't handle these patterns without your change, but there are two reasonable approaches:

  1. If the address use analysis currently can't recognize something, add the SSA def-use patterns to visitAddressUse.

OR

  1. When we can't analyze all address uses, don't bailout entirely. Instead, just skip AnalyzeBackwardUse, and query AliasAnalysis. If the loop in backwardPropagateCopy is too hard to read, then it would be easy factor it into two separate routines, one with a loop that checks known-uses, and the other with a loop that checks AliasAnalysis. But aside from that, the function of the pass is basically the same.

@atrick
Copy link
Contributor

atrick commented Jun 18, 2020

Ok, TempLValue is actually a different algorithm than backwardPropagateCopy. This is really just handling the case of a true "temporary":

alloc_stack temp
init temp
copy temp -> dest
dealloc_stack temp

...with no other uses of the temp outside the range of instructions from alloc to copy.

So this could be a separate TempLValue pass. It's easy to extract destroy hoisting out of CopyForwarding since this new pass also requires it. Should I do that?

I'm still not sure why backwardPropagateCopy wouldn't handle these cases if it used AliasAnalysis, but it may make sense to separate passes that depend on AliasAnalysis from those that don't.

@eeckstein
Copy link
Contributor Author

@atrick Thanks for reviewing!

  • alias analysis and membehavior: 100% agreed. I added tests. I even found some deficiencies because of the tests and therefore changed the implementation a bit.

  • existentials: AFAIK, existentials only share the same memory if they are copied. But initialization with init_existential_addr can only return the operand address or a new allocated memory. And even if we would also handle open_existential_addr, it would be fine, because copy-on-write existentials are only an "implementation detail" of IRGen and this behavior not "observable" in SIL.

  • TempLValueOpt: I thought long about if I should make it a separate pass or put it into SILCombine. My laziness won. I agree, it's better as a separate pass. I changed it.

  • copyaddr-destroy peephole: I tried to add this to CopyForwarding, but somehow it didn't work because (I think) CopyForwarding only handles destroys which are in a certain relationship with propagated copies (but I might be wrong). I don't fully understand the logic here. Anyway, this peephole is also done by the DestroyHoisting pass, but it only runs on OSSA. Once we have OSSA in the full pipeline we can run DestroyHoisting later and then then we can remove the peephole from TempLValueOpt.

@eeckstein
Copy link
Contributor Author

@swift-ci test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - efaa464a0481047c19ec51076034957c84d10fa6

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - efaa464a0481047c19ec51076034957c84d10fa6

@atrick
Copy link
Contributor

atrick commented Jun 18, 2020

This is mostly for the record:

The problem with CopyForwarding is that it's analyzing copy's destination just so it doesn't need to check AliasAnalysis later.

One problem is that findAddressRootAndUsers only looks through one level of init_enum/existential, and some of your cases have two levels, which must cause isIdentifiedDestValue to fail. That's actually pretty trivial to fix.

But the more basic problem is that visitAddressUsers bails out on all kinds of interesting uses. I'm not sure if they are relevant to your cases, but that entire analysis of the destination's lifetime is irrelevant once you use alias analysis.

@atrick
Copy link
Contributor

atrick commented Jun 18, 2020

It would be super helpful to describe this analysis and the transformation in a way that someone can reason about. It's simple enough to specify in a comment:

alloc_stack %temp
"first use" %temp // (beginning of %temp lifetime)
// no definition of %dest_root
// no writes to %dest
copy_addr [take] %temp to [initialization] %dest // (end of %temp lifetime)
// no uses of %temp
dealloc_stack %temp

All projections between %dest_root and %dest are hoisted above beginning of %temp lifetime.

All uses of %temp are then directly replaced by %dest.

Also, explain why this does not handle reads from %dest:

alloc_stack %temp
"first use" %temp // (beginning of %temp lifetime)
use %dest
copy_addr [take] %temp to [initialization] %dest // (end of %temp lifetime)

Are you just assuming there needs to be a write to %dest before the initialization? I'm not sure where that's guaranteed. I'm not even sure what a "write" has to do with a "destroy". You could either somehow check that %dest isn't be destroyed, or just check for both reads and writes--then you don't care if the copy_addr is an initialization.


Actually, I'm not even sure why there needs to be a destroy at all before the [initialization] for concrete types. The original problem with CopyForwarding is that it wanted SILGen to produce straightforward initialize-destroy patterns. That was a deliberate assumption for something that was supposed to run right after SILGen, but not verified in canonical SIL. Eventually we wanted to rerun CopyForwarding later and decided passes shouldn't make unverified SIL assumptions (unless the pass happens to be ARC).

@eeckstein
Copy link
Contributor Author

@swift-ci test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

You are right, destroy_addr is not modeled as "write". I have to check for reads and writes.

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 8a58a2a87d036f0bd052cfa0386144fc2f9dfa59

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 8a58a2a87d036f0bd052cfa0386144fc2f9dfa59

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein eeckstein changed the title SILCombine: a new "tempLValueOpt" optimization for copy_addr SILOptimizer: A new "TempLValueOpt" optimization for copy_addr Jun 19, 2020
@eeckstein
Copy link
Contributor Author

Looks like we need to model a take (e.g. load [take], copy_addr [take]) as a write to memory. I added a commit which fixes this. It can cause memory lifetime failures in OSSA.

destroy->eraseFromParent();
return true;
}
if (inst->mayWriteToMemory())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where destroy hoisting does not check for reads. Maybe you didn't push the latest change yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to change this. I pushed a new version with a fix

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the two comment blocks in destroy hoisting.

…ntial_addr.

Look "through" those instructions when trying to find the underlying objects.
Currently we only have load [take] in OSSA which needed to be changed.
(copy_addr is not handled in MemBehavior at all, yet)

Even if the memory is physically not modified, conceptually it's "destroyed" when the value is taken.
Optimizations, like TempRValueOpt rely on this behavior when the check for may-writes.

This fixes a MemoryLifetime failure in TempRValueOpt.
Only let copy_addr have side effects if the source or destination really aliases with the address in question.
Optimizes copies from a temporary (an "l-value") to a destination.

    %temp = alloc_stack $Ty
    instructions_which_store_to %temp
    copy_addr [take] %temp to %destination
    dealloc_stack %temp

is optimized to

    destroy_addr %destination
    instructions_which_store_to %destination

The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.

This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences.

    copy_addr %source to %destination
    destroy_addr %source

is replace with

    copy_addr [take] %source to %destination
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein eeckstein merged commit a09112a into swiftlang:master Jun 22, 2020
@eeckstein eeckstein deleted the templvalueopt branch June 22, 2020 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants