SILOptimizer: A new "TempLValueOpt" optimization for copy_addr #32428

eeckstein · 2020-06-17T10:57:39Z

Optimizes copies from a temporary (an "l-value") to a destination.

    %temp = alloc_stack $Ty
    instructions_which_store_to %temp
    copy_addr [take] %temp to [initialization] %destination
    dealloc_stack %temp

is optimized to

    instructions_which_store_to %destination

The name tempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The tempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack, the copy-destination must be initialized).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.

This PR also contains some other small improvements related to copy_addr and other address instructions. For details see the commit messages.

The effect of this optimizations is that code with address-only types (e.g. generic functions) has significantly less copy instructions, especially initializations of address-only enums (e.g. optionals).
This is a benefit for performance and code size and can also reduce the amount of generic metadata which is generated at runtime.

eeckstein · 2020-06-17T10:57:52Z

@swift-ci test

eeckstein · 2020-06-17T10:58:37Z

@swift-ci benchmark

swift-ci · 2020-06-17T12:10:03Z

Build failed
Swift Test OS X Platform
Git Sha - b48a8544e1824f4f2dac7b6cae1df0352baa2bc3

swift-ci · 2020-06-17T12:16:46Z

Build failed
Swift Test Linux Platform
Git Sha - b48a8544e1824f4f2dac7b6cae1df0352baa2bc3

swift-ci · 2020-06-17T12:20:46Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	139	167	+20.1%	0.83x (?)
UTF8Decode_InitFromCustom_contiguous	140	168	+20.0%	0.83x
ObjectiveCBridgeStubDateAccess	130	152	+16.9%	0.86x
Data.init.Sequence.2047B.Count.I	54	63	+16.7%	0.86x
Data.hash.Empty	45	52	+15.6%	0.87x (?)
Dictionary4	195	224	+14.9%	0.87x
Data.init.Sequence.2049B.Count.I	55	62	+12.7%	0.89x
IterateData	840	945	+12.5%	0.89x (?)
Dictionary4OfObjects	224	252	+12.5%	0.89x
NSStringConversion.LongUTF8	298	321	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Data.append.Sequence.64kB.Count.RE.I	44	21	-52.3%	2.10x
Data.append.Sequence.64kB.Count.RE	44	21	-52.3%	2.10x
AnyHashableWithAClass	84000	52500	-37.5%	1.60x
Data.append.Sequence.809B.Count.RE	88	58	-34.1%	1.52x
DataAppendSequence	8800	6100	-30.7%	1.44x
DataCreateEmptyArray	1150	800	-30.4%	1.44x
Data.append.Sequence.809B.Count.RE.I	90	63	-30.0%	1.43x
ObserverForwarderStruct	715	505	-29.4%	1.42x (?)
ObserverUnappliedMethod	960	700	-27.1%	1.37x
StringEnumRawValueInitialization	460	360	-21.7%	1.28x
String.replaceSubrange.RepChar	2049	1652	-19.4%	1.24x
ArrayLiteral2	81	67	-17.3%	1.21x (?)
Set.isSuperset.Seq.Empty.Int	50	42	-16.0%	1.19x
DataCreateSmallArray	2150	1850	-14.0%	1.16x
ObjectiveCBridgeFromNSArrayAnyObject	14000	12300	-12.1%	1.14x (?)
Set.isStrictSubset.Int.Empty	52	46	-11.5%	1.13x (?)
DictionaryOfAnyHashableStrings_lookup	2784	2472	-11.2%	1.13x
Set.isStrictSubset.Seq.Int.Empty	126	112	-11.1%	1.12x
Set.subtracting.Empty.Box	9	8	-11.1%	1.12x
Diffing.Pangrams	3205	2849	-11.1%	1.12x (?)
Set.isDisjoint.Seq.Box.Empty	90	81	-10.0%	1.11x (?)
Set.isDisjoint.Seq.Int.Empty	52	47	-9.6%	1.11x (?)
Set.isSubset.Seq.Int.Empty	126	114	-9.5%	1.11x (?)
ObjectiveCBridgeFromNSSetAnyObject	25200	23000	-8.7%	1.10x (?)
ObjectiveCBridgeFromNSDictionaryAnyObject	24800	22700	-8.5%	1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToString	25700	23600	-8.2%	1.09x (?)
Set.isStrictSuperset.Seq.Empty.Int	172	158	-8.1%	1.09x (?)
SortLargeExistentials	14900	13800	-7.4%	1.08x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToStringForced	25200	23400	-7.1%	1.08x (?)
DictionaryOfAnyHashableStrings_insert	3220	2996	-7.0%	1.07x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	209	195	-6.7%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
SortIntPyramids.o	8743	8959	+2.5%	0.98x
ObserverUnappliedMethod.o	5041	5113	+1.4%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
DictionaryKeysContains.o	8620	8384	-2.7%	1.03x
PrimsNonStrongRef.o	123028	121044	-1.6%	1.02x
BinaryFloatingPointConversionFromBinaryInteger.o	29568	29192	-1.3%	1.01x
ChaCha.o	14377	14209	-1.2%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
Dictionary4	177	246	+39.0%	0.72x
Dictionary4OfObjects	228	284	+24.6%	0.80x
UTF8Decode_InitDecoding	138	165	+19.6%	0.84x
UTF8Decode_InitFromCustom_contiguous	139	165	+18.7%	0.84x
DataSubscriptSmall	17	19	+11.8%	0.89x
Array2D	4384	4832	+10.2%	0.91x (?)
FlattenListLoop	1375	1512	+10.0%	0.91x (?)
Set.isSuperset.Seq.Int.Empty	83	91	+9.6%	0.91x (?)
IterateData	867	949	+9.5%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
AnyHashableWithAClass	84000	52500	-37.5%	1.60x
DataCreateEmptyArray	1200	800	-33.3%	1.50x
ObserverForwarderStruct	665	465	-30.1%	1.43x (?)
ObserverUnappliedMethod	970	710	-26.8%	1.37x (?)
String.replaceSubrange.RepChar	2058	1639	-20.4%	1.26x
FrequenciesUsingReduce	3970	3200	-19.4%	1.24x
DataCreateSmallArray	2300	1900	-17.4%	1.21x (?)
Set.isDisjoint.Seq.Box.Empty	95	81	-14.7%	1.17x (?)
Set.isDisjoint.Seq.Int.Empty	50	43	-14.0%	1.16x (?)
DictionaryRemoveOfObjects	19000	16400	-13.7%	1.16x (?)
ObjectiveCBridgeFromNSArrayAnyObject	13800	12100	-12.3%	1.14x (?)
DataCountSmall	17	15	-11.8%	1.13x
Set.isSubset.Seq.Int.Empty	121	107	-11.6%	1.13x (?)
SortIntPyramid	835	740	-11.4%	1.13x (?)
Set.isSuperset.Seq.Empty.Int	53	47	-11.3%	1.13x (?)
Set.isStrictSubset.Seq.Int.Empty	126	114	-9.5%	1.11x (?)
SortAdjacentIntPyramids	1055	960	-9.0%	1.10x (?)
DictionaryOfAnyHashableStrings_lookup	2712	2472	-8.8%	1.10x (?)
Set.isDisjoint.Box.Empty	96	88	-8.3%	1.09x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	210	195	-7.1%	1.08x (?)
ObjectiveCBridgeFromNSDictionaryAnyObjectToString	43500	40500	-6.9%	1.07x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	205	191	-6.8%	1.07x (?)
ObjectiveCBridgeFromNSSetAnyObject	24400	22800	-6.6%	1.07x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
StaticArray.o	10895	11279	+3.5%	0.97x
SortIntPyramids.o	8466	8688	+2.6%	0.97x
ObserverForwarderStruct.o	2787	2855	+2.4%	0.98x
ObserverUnappliedMethod.o	5089	5161	+1.4%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
DictionaryKeysContains.o	8443	8175	-3.2%	1.03x
ChaCha.o	11355	11155	-1.8%	1.02x
DictOfArraysToArrayOfDicts.o	21385	21065	-1.5%	1.02x
BinaryFloatingPointConversionFromBinaryInteger.o	28512	28152	-1.3%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous	150	177	+18.0%	0.85x
UTF8Decode_InitDecoding	147	173	+17.7%	0.85x (?)
ArrayAppendReserved	2720	2950	+8.5%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
AnyHashableWithAClass	86000	54500	-36.6%	1.58x
Set.isDisjoint.Int0	938	692	-26.2%	1.36x
SetSubtractingInt0	204	159	-22.1%	1.28x
String.replaceSubrange.RepChar	2135	1689	-20.9%	1.26x
SetSubtractingInt25	264	213	-19.3%	1.24x
SetSubtractingBox25	635	523	-17.6%	1.21x
Set.isDisjoint.Int50	771	636	-17.5%	1.21x
SetUnionInt50	345	286	-17.1%	1.21x
SetSubtractingInt50	304	253	-16.8%	1.20x (?)
Set.isDisjoint.Int25	788	656	-16.8%	1.20x
SetUnionInt25	389	325	-16.5%	1.20x
Data.init.Sequence.2049B.Count.I	5481	4628	-15.6%	1.18x
Data.init.Sequence.809B.Count	4354	3684	-15.4%	1.18x (?)
Data.init.Sequence.809B.Count.I	4352	3684	-15.3%	1.18x (?)
Data.init.Sequence.64kB.Count	3481	2947	-15.3%	1.18x (?)
Data.init.Sequence.2047B.Count.I	5460	4623	-15.3%	1.18x
Data.init.Sequence.513B.Count.I	4180	3541	-15.3%	1.18x
Data.init.Sequence.511B.Count.I	4155	3524	-15.2%	1.18x
Data.append.Sequence.809B.Count.I	4359	3703	-15.0%	1.18x
Data.append.Sequence.809B.Count	4348	3701	-14.9%	1.17x
SetUnion	6020	5130	-14.8%	1.17x (?)
SetUnionInt0	604	515	-14.7%	1.17x (?)
Data.append.Sequence.64kB.Count	3493	2991	-14.4%	1.17x (?)
SetSubtractingInt100	369	316	-14.4%	1.17x (?)
SetUnionBox25	977	838	-14.2%	1.17x (?)
Data.init.Sequence.64kB.Count.I	3491	2998	-14.1%	1.16x (?)
Data.append.Sequence.64kB.Count.I	3503	3010	-14.1%	1.16x (?)
SetUnionBox0	1721	1482	-13.9%	1.16x
SetUnion_OfObjects	17200	14830	-13.8%	1.16x (?)
CharIndexing_tweet_unicodeScalars_Backwards	859160	745720	-13.2%	1.15x (?)
Set.isDisjoint.Box0	2100	1825	-13.1%	1.15x
FrequenciesUsingReduce	6310	5510	-12.7%	1.15x (?)
CharIndexing_korean_unicodeScalars_Backwards	420520	368040	-12.5%	1.14x (?)
SetSubtractingBox0	445	391	-12.1%	1.14x (?)
CharIteration_japanese_unicodeScalars_Backwards	305920	268800	-12.1%	1.14x (?)
CharIndexing_punctuated_unicodeScalars_Backwards	93480	82280	-12.0%	1.14x (?)
CharIndexing_ascii_unicodeScalars_Backwards	426800	376200	-11.9%	1.13x (?)
CharIndexing_punctuatedJapanese_unicodeScalars_Backwards	73280	64600	-11.8%	1.13x (?)
Set.isDisjoint.Int100	551	487	-11.6%	1.13x (?)
CharIndexing_russian_unicodeScalars_Backwards	357240	315920	-11.6%	1.13x (?)
DataCreateSmall	53850	47700	-11.4%	1.13x (?)
CharIndexing_japanese_unicodeScalars_Backwards	510520	452800	-11.3%	1.13x (?)
CharIndexing_chinese_unicodeScalars_Backwards	323240	286720	-11.3%	1.13x (?)
Hanoi	8800	7840	-10.9%	1.12x (?)
ObjectiveCBridgeFromNSArrayAnyObject	14700	13100	-10.9%	1.12x (?)
DataCreateMedium	451300	402500	-10.8%	1.12x (?)
Data.append.Sequence.64kB.Count0	3341	2999	-10.2%	1.11x (?)
CharIteration_punctuatedJapanese_unicodeScalars_Backwards	43360	38960	-10.1%	1.11x (?)
Data.init.Sequence.511B.Count0.I	4089	3682	-10.0%	1.11x (?)
Data.init.Sequence.2049B.Count0.I	5312	4788	-9.9%	1.11x (?)
Data.init.Sequence.513B.Count0.I	4100	3696	-9.9%	1.11x (?)
CharIteration_punctuated_unicodeScalars_Backwards	55320	49880	-9.8%	1.11x (?)
Dict.CopyKeyValue.24k	3546	3202	-9.7%	1.11x (?)
Data.init.Sequence.64kB.Count0.I	3351	3026	-9.7%	1.11x (?)
Data.append.Sequence.809B.Count0	4193	3787	-9.7%	1.11x (?)
Dict.CopyKeyValue.16k	2533	2288	-9.7%	1.11x (?)
Data.init.Sequence.2047B.Count0.I	5303	4791	-9.7%	1.11x (?)
CharIteration_tweet_unicodeScalars_Backwards	495320	447600	-9.6%	1.11x (?)
Dict.CopyKeyValue.20k	3017	2728	-9.6%	1.11x (?)
Dict.CopyKeyValue.28k	4631	4193	-9.5%	1.10x (?)
Data.init.Sequence.64kB.Count0	3361	3044	-9.4%	1.10x (?)
CharIteration_korean_unicodeScalars_Backwards	245200	222480	-9.3%	1.10x (?)
Data.append.Sequence.64kB.Count0.I	3337	3034	-9.1%	1.10x (?)
CharIteration_chinese_unicodeScalars_Backwards	189880	172840	-9.0%	1.10x (?)
CharIteration_ascii_unicodeScalars_Backwards	250200	227760	-9.0%	1.10x (?)
BitCount	40937	37292	-8.9%	1.10x
CharIndexing_chinese_unicodeScalars	304680	277800	-8.8%	1.10x (?)
DictionaryRemoveOfObjects	31500	28800	-8.6%	1.09x (?)
CharIndexing_punctuated_unicodeScalars	87800	80360	-8.5%	1.09x (?)
CharIteration_utf16_unicodeScalars_Backwards	198800	182000	-8.5%	1.09x (?)
CharIndexing_punctuatedJapanese_unicodeScalars	69200	63360	-8.4%	1.09x (?)
CharIndexing_ascii_unicodeScalars	400960	367240	-8.4%	1.09x (?)
CharIndexing_japanese_unicodeScalars	483120	443200	-8.3%	1.09x (?)
ArrayOfRef	4820	4430	-8.1%	1.09x (?)
PrefixArrayLazy	61414	56503	-8.0%	1.09x (?)
ObjectiveCBridgeFromNSSetAnyObject	25400	23400	-7.9%	1.09x (?)
ArrayOfGenericRef	4960	4570	-7.9%	1.09x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToStringForced	26000	24000	-7.7%	1.08x (?)
CharIndexing_tweet_unicodeScalars	797280	737880	-7.5%	1.08x (?)
PrefixAnyCollectionLazy	135587	125498	-7.4%	1.08x (?)
DropLastArrayLazy	20519	19009	-7.4%	1.08x (?)
CharIndexing_russian_unicodeScalars	333080	308640	-7.3%	1.08x (?)
HTTP2StateMachine	2519740	2336859	-7.3%	1.08x (?)
CharIteration_russian_unicodeScalars_Backwards	210640	195360	-7.3%	1.08x (?)
CharIndexing_korean_unicodeScalars	388760	360680	-7.2%	1.08x (?)
FindString.Rec3.String	560	520	-7.1%	1.08x (?)
Radix2CooleyTukeyf	107136	99936	-6.7%	1.07x (?)
FindString.Rec3.Substring	569	531	-6.7%	1.07x (?)
PrefixCountableRangeLazy	80014	74735	-6.6%	1.07x (?)

Code size: -swiftlibs

Improvement	OLD	NEW	DELTA	RATIO
libswift_Differentiation.dylib	188416	184320	-2.2%	1.02x
libswiftAccelerate.dylib	258048	253952	-1.6%	1.02x
libswiftCore.dylib	4128768	4075520	-1.3%	1.01x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

eeckstein · 2020-06-17T12:56:04Z

@swift-ci test

eeckstein · 2020-06-17T12:56:29Z

@swift-ci test

atrick · 2020-06-17T23:34:37Z

On Alias Analysis support for enums. This is something alias analysis should definitely handle. I'm not surprised is wasn't handled, since the code is neglected. But it's definitely worth fixing.

With something so fundamental, I think it's good to have negative tests for any complex cases, particularly in aliasAddressProjection, which I'm not sure I fully understand. The way aliasAddressProjection and aliasInner are mutually recursive makes it hard to reason about. We obviously need to be very careful about introducing false positives here.

For example, we should test that
E -> init_existential/enum
aliases with
E -> init_existential/enum -> struct_element_addr

While we're at it:
E -> init_existential/enum -> struct_element_addr #1
does not alias with
E -> init_existential/enum -> struct_element_addr #2

More intersting:
V -> struct_element_addr -> init_existential/enum
aliases with
V -> struct_element_addr -> init_existential/enum

What about the fact that existential storage can actually alias even though the existential values are from different storage addresses? I think you at least need to come up with a proof that this it's generally safe to treat them as non-aliasing. I don't think it's as simple as "it makes sense for alias analysis to look through them". How do we know it's always correct to "look through them"?

atrick · 2020-06-17T23:39:50Z

On MemoryBehaviorVisitor::visitCopyAddrInst. This makes sense to me. I'm surprised we didn't have it already. Again, I would have added a couple of negative tests though... those are really much more important than the positive test!

atrick · 2020-06-18T07:29:03Z

On SILCombiner::combineCopyAndDestroy...

In principle, I don't like scanning the instruction stream within a SILCombineVisitor. SILCombine should only be for SSA-based peepholes. (I have the same criticism of SILCombiner::visitInjectEnumAddrInst).

This particular case might be justifiable in practice, since it bails at the first instruction that mayWriteMemory, but only if it's really needed by SILCombine. However, I don't see any reason this should be in SILCombine in the first place. Is it needed to enable some other SILCombines? Why would we add a SILCombine for something that's already handled in another pass by a more general analysis?

combineCopyAndDestroy is something that should be done during destroy hoisting.

CopyForwarding::hoistDestroy can easily be run as a separate pass. (The implementation is mostly in CopyForwarding::forwardCopiesOf--it's being gated by the copy analysis of just to avoid doing extra work.) Then we can do "DestroyHoisting" without paying the huge cost of doing a global dataflow analysis of all memory sublocations simultaneously--and it will handle more cases. Just by adding an AliasAnalysis query, it should be able to handle any case where either (a) we can recognize all address uses (aka known-use) or (b) alias analysis can recognize no potential aliasing access between the copy and destroy (aka known-instruction-window).

CopyForwarding::hoistDestroy is not limited to a single basic block and not limited to a single destroy. Any case that would be handled by SILCombiner::combineCopyAndDestroy would trivially be handled by CopyForwarding::hoistDestroy.

CopyForwarding::hoistDestroy currently assumes a "known-use" model. As with most our analyses, we really need both a "known-use" and "known-instruction-window" mode. The known-use mode is great because it's very efficient, its cost does not scale with the size of the function, and it handles arbitrarily complex code. Of course, it doesn't handle any SSA def-use pattern that we haven't specifically taught it to recognize.

A known-instruction-window mode is good for handling many patterns where we can't analyze the a value's source and can't find all aliases in the known-use mode. It's expensive, because it needs Alias and Escape Analysis, but important at -O. (CopyForwarding was originally supposed to be a -Onone peephole).

So, in short, we can just add an AliasAnalysis query to the existing destroy hoisting and easily separate it from forwardCopiesOf if need be.

Note that the AliasAnalysis query needs to bail on anything that can either read, write, or release memory, so it's fairly limited, but could still handle cases where the destroy is close to the copy, but we can't analysis the copy's source address for some reason.

Somehow, your current implementation seems to be ignoring cases like this:

copy_addr %source to %destination
use %source
destroy_addr %source

atrick · 2020-06-18T07:30:56Z

From what I can tell so far, tempLValueOpt is an attempt to add AliasAnalysis to CopyForwarding. I think it's a very bad idea for SILCombine to depend on Alias/Escape Analysis. SILCombine should be SSA peepholes that can rerun frequently after other transforms without recomputing interprocedural analysis. (We really need to do SILCombiner::visitInjectEnumAddrInst in a separate pass too).

Scanning the instructions in SILCombine also doesn't make sense to me. That's quadratic even if SILCombine does nothing, and cubic or worse of SILCombine makes changes--multiplied by the cost of the alias analysis queries. I can't even reason about what happens in the presence of other SILCombines that invalidate alias and escape analysis or adds new instructions that are unrecognized by the previous analysis.

It would be fine to just add AliasAnalysis to CopyForwarding, since we're not using it as the -Onone peephole that it was meant to be.

I haven't looked yet at why CopyForwarding doesn't handle these patterns without your change, but there are two reasonable approaches:

If the address use analysis currently can't recognize something, add the SSA def-use patterns to visitAddressUse.

OR

When we can't analyze all address uses, don't bailout entirely. Instead, just skip AnalyzeBackwardUse, and query AliasAnalysis. If the loop in backwardPropagateCopy is too hard to read, then it would be easy factor it into two separate routines, one with a loop that checks known-uses, and the other with a loop that checks AliasAnalysis. But aside from that, the function of the pass is basically the same.

atrick · 2020-06-18T09:05:59Z

Ok, TempLValue is actually a different algorithm than backwardPropagateCopy. This is really just handling the case of a true "temporary":

alloc_stack temp
init temp
copy temp -> dest
dealloc_stack temp

...with no other uses of the temp outside the range of instructions from alloc to copy.

So this could be a separate TempLValue pass. It's easy to extract destroy hoisting out of CopyForwarding since this new pass also requires it. Should I do that?

I'm still not sure why backwardPropagateCopy wouldn't handle these cases if it used AliasAnalysis, but it may make sense to separate passes that depend on AliasAnalysis from those that don't.

eeckstein · 2020-06-18T15:09:32Z

@atrick Thanks for reviewing!

alias analysis and membehavior: 100% agreed. I added tests. I even found some deficiencies because of the tests and therefore changed the implementation a bit.
existentials: AFAIK, existentials only share the same memory if they are copied. But initialization with init_existential_addr can only return the operand address or a new allocated memory. And even if we would also handle open_existential_addr, it would be fine, because copy-on-write existentials are only an "implementation detail" of IRGen and this behavior not "observable" in SIL.
TempLValueOpt: I thought long about if I should make it a separate pass or put it into SILCombine. My laziness won. I agree, it's better as a separate pass. I changed it.
copyaddr-destroy peephole: I tried to add this to CopyForwarding, but somehow it didn't work because (I think) CopyForwarding only handles destroys which are in a certain relationship with propagated copies (but I might be wrong). I don't fully understand the logic here. Anyway, this peephole is also done by the DestroyHoisting pass, but it only runs on OSSA. Once we have OSSA in the full pipeline we can run DestroyHoisting later and then then we can remove the peephole from TempLValueOpt.

eeckstein · 2020-06-18T15:09:48Z

@swift-ci test

eeckstein · 2020-06-18T15:10:02Z

@swift-ci test

swift-ci · 2020-06-18T16:16:58Z

Build failed
Swift Test OS X Platform
Git Sha - efaa464a0481047c19ec51076034957c84d10fa6

swift-ci · 2020-06-18T16:26:28Z

Build failed
Swift Test Linux Platform
Git Sha - efaa464a0481047c19ec51076034957c84d10fa6

atrick · 2020-06-18T17:24:13Z

This is mostly for the record:

The problem with CopyForwarding is that it's analyzing copy's destination just so it doesn't need to check AliasAnalysis later.

One problem is that findAddressRootAndUsers only looks through one level of init_enum/existential, and some of your cases have two levels, which must cause isIdentifiedDestValue to fail. That's actually pretty trivial to fix.

But the more basic problem is that visitAddressUsers bails out on all kinds of interesting uses. I'm not sure if they are relevant to your cases, but that entire analysis of the destination's lifetime is irrelevant once you use alias analysis.

atrick · 2020-06-18T17:25:20Z

It would be super helpful to describe this analysis and the transformation in a way that someone can reason about. It's simple enough to specify in a comment:

alloc_stack %temp
"first use" %temp // (beginning of %temp lifetime)
// no definition of %dest_root
// no writes to %dest
copy_addr [take] %temp to [initialization] %dest // (end of %temp lifetime)
// no uses of %temp
dealloc_stack %temp

All projections between %dest_root and %dest are hoisted above beginning of %temp lifetime.

All uses of %temp are then directly replaced by %dest.

Also, explain why this does not handle reads from %dest:

alloc_stack %temp
"first use" %temp // (beginning of %temp lifetime)
use %dest
copy_addr [take] %temp to [initialization] %dest // (end of %temp lifetime)

Are you just assuming there needs to be a write to %dest before the initialization? I'm not sure where that's guaranteed. I'm not even sure what a "write" has to do with a "destroy". You could either somehow check that %dest isn't be destroyed, or just check for both reads and writes--then you don't care if the copy_addr is an initialization.

Actually, I'm not even sure why there needs to be a destroy at all before the [initialization] for concrete types. The original problem with CopyForwarding is that it wanted SILGen to produce straightforward initialize-destroy patterns. That was a deliberate assumption for something that was supposed to run right after SILGen, but not verified in canonical SIL. Eventually we wanted to rerun CopyForwarding later and decided passes shouldn't make unverified SIL assumptions (unless the pass happens to be ARC).

eeckstein · 2020-06-18T19:13:54Z

@swift-ci test

eeckstein · 2020-06-18T19:14:46Z

@swift-ci test

eeckstein · 2020-06-18T19:15:31Z

You are right, destroy_addr is not modeled as "write". I have to check for reads and writes.

swift-ci · 2020-06-18T20:38:11Z

Build failed
Swift Test Linux Platform
Git Sha - 8a58a2a87d036f0bd052cfa0386144fc2f9dfa59

swift-ci · 2020-06-18T21:02:07Z

Build failed
Swift Test OS X Platform
Git Sha - 8a58a2a87d036f0bd052cfa0386144fc2f9dfa59

eeckstein · 2020-06-19T09:46:18Z

@swift-ci test

eeckstein · 2020-06-19T09:46:31Z

@swift-ci test

eeckstein · 2020-06-19T10:39:33Z

Looks like we need to model a take (e.g. load [take], copy_addr [take]) as a write to memory. I added a commit which fixes this. It can cause memory lifetime failures in OSSA.

atrick · 2020-06-19T22:04:35Z

lib/SILOptimizer/Transforms/TempLValueOpt.cpp

+      destroy->eraseFromParent();
+      return true;
+    }
+    if (inst->mayWriteToMemory())


Here's where destroy hoisting does not check for reads. Maybe you didn't push the latest change yet.

Yes, I forgot to change this. I pushed a new version with a fix

eeckstein · 2020-06-22T08:30:12Z

@swift-ci smoke test

atrick

LGTM except for the two comment blocks in destroy hoisting.

…ntial_addr. Look "through" those instructions when trying to find the underlying objects.

Currently we only have load [take] in OSSA which needed to be changed. (copy_addr is not handled in MemBehavior at all, yet) Even if the memory is physically not modified, conceptually it's "destroyed" when the value is taken. Optimizations, like TempRValueOpt rely on this behavior when the check for may-writes. This fixes a MemoryLifetime failure in TempRValueOpt.

Only let copy_addr have side effects if the source or destination really aliases with the address in question.

Optimizes copies from a temporary (an "l-value") to a destination. %temp = alloc_stack $Ty instructions_which_store_to %temp copy_addr [take] %temp to %destination dealloc_stack %temp is optimized to destroy_addr %destination instructions_which_store_to %destination The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side. The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy. It's more restricted (e.g. the copy-source must be an alloc_stack). That enables other patterns to be optimized, which backwardPropagateCopy cannot handle. This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences. copy_addr %source to %destination destroy_addr %source is replace with copy_addr [take] %source to %destination

eeckstein · 2020-06-22T11:48:54Z

@swift-ci smoke test

eeckstein requested a review from atrick June 17, 2020 10:57

eeckstein force-pushed the templvalueopt branch from b48a854 to 84a9d78 Compare June 17, 2020 12:55

eeckstein force-pushed the templvalueopt branch from 84a9d78 to efaa464 Compare June 18, 2020 14:49

eeckstein force-pushed the templvalueopt branch from efaa464 to 8a58a2a Compare June 18, 2020 19:13

eeckstein force-pushed the templvalueopt branch from 8a58a2a to 7952833 Compare June 19, 2020 09:45

eeckstein changed the title ~~SILCombine: a new "tempLValueOpt" optimization for copy_addr~~ SILOptimizer: A new "TempLValueOpt" optimization for copy_addr Jun 19, 2020

atrick reviewed Jun 19, 2020

View reviewed changes

eeckstein force-pushed the templvalueopt branch from 7952833 to 23d49fb Compare June 22, 2020 08:28

atrick approved these changes Jun 22, 2020

View reviewed changes

eeckstein added 4 commits June 22, 2020 13:47

AliasAnalysis: better handling of init_enum_data_addr and init_existe…

c7c2c13

…ntial_addr. Look "through" those instructions when trying to find the underlying objects.

AliasAnalysis: handle copy_addr in MemBehaviorVisitor.

c82f78b

Only let copy_addr have side effects if the source or destination really aliases with the address in question.

eeckstein force-pushed the templvalueopt branch from 23d49fb to 9e92389 Compare June 22, 2020 11:48

eeckstein merged commit a09112a into swiftlang:master Jun 22, 2020

eeckstein deleted the templvalueopt branch June 22, 2020 14:09

SILOptimizer: A new "TempLValueOpt" optimization for copy_addr #32428

SILOptimizer: A new "TempLValueOpt" optimization for copy_addr #32428

Uh oh!

Conversation

eeckstein commented Jun 17, 2020

Uh oh!

eeckstein commented Jun 17, 2020

Uh oh!

eeckstein commented Jun 17, 2020

Uh oh!

swift-ci commented Jun 17, 2020

Uh oh!

swift-ci commented Jun 17, 2020

Uh oh!

swift-ci commented Jun 17, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

eeckstein commented Jun 17, 2020

Uh oh!

eeckstein commented Jun 17, 2020

Uh oh!

atrick commented Jun 17, 2020

Uh oh!

atrick commented Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atrick commented Jun 18, 2020

Uh oh!

atrick commented Jun 18, 2020

Uh oh!

atrick commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

swift-ci commented Jun 18, 2020

Uh oh!

swift-ci commented Jun 18, 2020

Uh oh!

atrick commented Jun 18, 2020

Uh oh!

atrick commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 18, 2020

Uh oh!

swift-ci commented Jun 18, 2020

Uh oh!

swift-ci commented Jun 18, 2020

Uh oh!

eeckstein commented Jun 19, 2020

Uh oh!

eeckstein commented Jun 19, 2020

Uh oh!

eeckstein commented Jun 19, 2020

Uh oh!

atrick Jun 19, 2020

Choose a reason for hiding this comment

Uh oh!

eeckstein Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Jun 22, 2020

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Jun 22, 2020

atrick commented Jun 17, 2020 •

edited

Loading

atrick commented Jun 18, 2020 •

edited

Loading