[CopyPropagation] Added lexical destroy folding. #41113

nate-chandler · 2022-02-01T00:38:34Z

The new utility folds patterns like

// %borrowee is some owned value
%lifetime = begin_borrow %borrowee
...
// %copy is some transitive copy of %borrowee
apply %f(%copy)
end_borrow %lifetime
destroy_value %borrowee

into

%move = move_value [lexical] %borrowee
%lifetime begin_borrow [lexical] %move
...
end_borrow %lifetime
apply %f(%move)

It is intended to be run after ShrinkBorrowScope moves the end_borrow up to just before a relevant apply and after CanonicalizeOSSALifetime moves destroy_value instructions up to just after their last guaranteed use, at which point these patterns will exist.

In 0325e29, the implementation of deinit barrier predicate was copied out of ShrinkBorrowScope into MemAccessUtils. Delete the source of that copy from ShrinkBorrowScope and just call the now common utility.

eeckstein

Some initial comments

lib/SILOptimizer/Utils/LexicalDestroyFolding.cpp

eeckstein · 2022-02-03T13:55:00Z

lib/SILOptimizer/Utils/LexicalDestroyFolding.cpp

+///
+/// Returns whether any change was made.
+bool LexicalDestroyFolding::run() {
+  if (!findCandidates())


We should try to use a more functional programming style to make the code easier to read. Here, it's completely obfuscated which variables are input and which are output of which function call. It's as bad as using global variables.
For example:

bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) { SmallVector<...> candidates; if (!findCandidates(introducer, candidates)) ... SmallPtrSet<...> users; if (!findUsers(introducer, users)) ... if (!filterCandidates(candidates)) ...

Ideally, the LexicalDestroyFolding class is not needed at all.

bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) {
SmallVector<...> candidates;
if (!findCandidates(introducer, candidates)) ...

SmallPtrSet<...> users;
if (!findUsers(introducer, users)) ...

if (!filterCandidates(candidates)) ...

I think the specific suggestion was reasonable. When you can break apart an algorithm functionally into stages with distinct input and output, that's fantastic.

Ideally, the LexicalDestroyFolding class is not needed at all.

But let's not get carried away. Passing around shared mutable state by reference makes the code less functional. To make the code more functional, you need to encapsulate all of the shared mutable state with the same lifetime into a single class. That allows a single stage of the algorithm to be decomposed into methods that each do one functional thing.

When an algorithm passes around shared mutable state in multiple data structures, it does several bad non-functional things:

The functions no longer have a functional API. It's impossible to distinguish the function input/output from the shared mutable state.

The shared mutable state is now accessed via many different local variables making it much harder to understand where it is accessed!

It strongly discourages breaking an algorithm up into multiple functions. The code becomes very difficult to refactor.

To understand an algorithm, the first thing you need to see are central data structures that are not localized to a single function. If those aren't encapsulated at the right level in a central class, then you need to read the entire implementation first.

Because candidates gets written to by two functions, it's now local to LexicalDestroyFolding::run. The other fields are remaining as members variables of LexicalDestroyFolding because they're written once and then immutable. It's documented when those member variables get defined.

ok, so that's the opposite of what I was trying to say. I think Erik's specific suggestion to make 'users' and 'candidates' local was fine.

The general advice of "don't use class members to define state because that isn't functional" was wrong. Defining "local" variables and passing them by non-const reference to multiple functions is not functional either. It actually hides the mutable shared state which is even worse, and it doesn't establish any connection between the code that shares mutable state. Those variables are not really local at all. They are hidden globals that assume different name in each function.

void implementation() { ... DataStructure data1 DataStructure data2 helper1(data1, data2) // pass by ref .... way later... helper2(data1, data2) // pass by ref, I also modify shared state ... helper3(data1, data2) }

It's not a problem in your small run() function, but it becomes indecipherable very quickly. You can find lots of examples in our code base where several data structures are defined locally, then all passed individually into multiple functions. Outright terrible design.

"don't use class members to define state because that isn't functional"

That's not what I wanted to say. It makes sense to encapsulate multiple variables into structs/classes, because passing dozens of small variables to functions is not good for readability either.
Also, often it makes sense to have things like caches, etc. as global state. In this example I would keep dominanceTree and deleter as class members. Often it's a tradeoff and subject to personal taste.

But I would say that the algorithm in the run function, where results are produced by some called functions and used by other called functions, is clearly a case where those values should be local variables.
Documenting those "side-effects" is only the second best choice. The best comments are the comments which are not needed, because the behavior is obvious from the source code.

For the record, I think we're all in pretty strong agreement.

lib/SILOptimizer/Utils/LexicalDestroyFolding.cpp

test/SILOptimizer/outliner.swift

swift-ci · 2022-02-04T07:32:14Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Diffing.Myers.Similar	496	551	+11.1%	0.90x (?)
Set.subtracting.Seq.Empty.Int	207	229	+10.6%	0.90x
DictionaryBridgeToObjC_Access	925	1017	+9.9%	0.91x (?)
DataToStringLargeUnicode	5950	6500	+9.2%	0.92x (?)
Set.isDisjoint.Box.Empty	80	87	+8.7%	0.92x
Set.isSuperset.Seq.Empty.Int	72	78	+8.3%	0.92x (?)
Set.isSubset.Int.Empty	72	78	+8.3%	0.92x (?)
Set.isStrictSubset.Int.Empty	62	67	+8.1%	0.93x (?)
Set.isDisjoint.Seq.Int.Empty	65	70	+7.7%	0.93x

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	2542	1669	-34.3%	1.52x (?)
FlattenListFlatMap	6151	4061	-34.0%	1.51x (?)
DictionarySubscriptDefaultMutationArray	563	426	-24.3%	1.32x
StringFromLongWholeSubstring	5	4	-20.0%	1.25x
ArrayAppendAsciiSubstring	18864	15516	-17.7%	1.22x (?)
ArrayAppendLatin1Substring	19224	15948	-17.0%	1.21x
ArrayAppendUTF16Substring	18864	15696	-16.8%	1.20x
DictionarySwapAt	828	716	-13.5%	1.16x
DictionaryBridgeToObjC_Bridge	17	15	-11.8%	1.13x (?)
StringBuilderSmallReservingCapacity	342	314	-8.2%	1.09x (?)
Array2D	7504	6912	-7.9%	1.09x (?)
RemoveWhereSwapInts	40	37	-7.5%	1.08x (?)
FloatingPointPrinting_Float80_interpolated	54400	50400	-7.4%	1.08x (?)
DropLastAnySeqCRangeIter	630	588	-6.7%	1.07x
RandomShuffleLCG2	480	448	-6.7%	1.07x (?)
DropLastSequence	596	557	-6.5%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
ErrorHandling.o	3281	3425	+4.4%	0.96x

Improvement	OLD	NEW	DELTA	RATIO
MonteCarloE.o	2905	2817	-3.0%	1.03x
FlattenList.o	3953	3841	-2.8%	1.03x
RangeAssignment.o	3363	3275	-2.6%	1.03x
BucketSort.o	8727	8551	-2.0%	1.02x
ArraySetElement.o	1324	1300	-1.8%	1.02x
ReversedCollections.o	8844	8684	-1.8%	1.02x
COWArrayGuaranteedParameterOverhead.o	1383	1359	-1.7%	1.02x
Sim2DArray.o	1404	1380	-1.7%	1.02x
Array2D.o	2933	2885	-1.6%	1.02x
RomanNumbers.o	6924	6812	-1.6%	1.02x
RangeOverlaps.o	6791	6698	-1.4%	1.01x
ArraySubscript.o	2382	2350	-1.3%	1.01x
ObserverClosure.o	2468	2436	-1.3%	1.01x
SortLettersInPlace.o	8720	8608	-1.3%	1.01x
ObserverPartiallyAppliedMethod.o	2515	2483	-1.3%	1.01x
PopFrontGeneric.o	2694	2662	-1.2%	1.01x
DictionaryGroup.o	12413	12269	-1.2%	1.01x
ClassArrayGetter.o	3853	3811	-1.1%	1.01x
Memset.o	2227	2203	-1.1%	1.01x
Combos.o	5974	5910	-1.1%	1.01x
SuperChars.o	1513	1497	-1.1%	1.01x
SortIntPyramids.o	9299	9203	-1.0%	1.01x
DictionaryOfAnyHashableStrings.o	8000	7920	-1.0%	1.01x
Join.o	1609	1593	-1.0%	1.01x
Phonebook.o	9659	9563	-1.0%	1.01x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	1769	2640	+49.2%	0.67x (?)
FlattenListFlatMap	5459	6625	+21.4%	0.82x (?)
Set.isDisjoint.Seq.Int.Empty	74	81	+9.5%	0.91x (?)
Set.isDisjoint.Int.Empty	75	81	+8.0%	0.93x (?)
Set.isStrictSubset.Int.Empty	65	70	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
NSError	266	166	-37.6%	1.60x (?)
DictionarySubscriptDefaultMutationArray	660	532	-19.4%	1.24x
DictionarySwapAt	840	740	-11.9%	1.14x
DictionaryBridgeToObjC_Bridge	17	15	-11.8%	1.13x (?)
FloatingPointPrinting_Float80_interpolated	54600	48600	-11.0%	1.12x (?)
NormalizedIterator_fastPrenormal	710	640	-9.9%	1.11x
RemoveWhereMoveInts	37	34	-8.1%	1.09x (?)
Array2D	7504	6944	-7.5%	1.08x (?)
SubstringEqualString	272	254	-6.6%	1.07x

Code size: -Osize

Improvement	OLD	NEW	DELTA	RATIO
SuperChars.o	1462	1434	-1.9%	1.02x
MonteCarloE.o	2845	2797	-1.7%	1.02x
RomanNumbers.o	5230	5149	-1.5%	1.02x
FlattenList.o	3801	3744	-1.5%	1.02x
RangeAssignment.o	3221	3173	-1.5%	1.02x
StringRemoveDupes.o	3760	3720	-1.1%	1.01x
BucketSort.o	8469	8385	-1.0%	1.01x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromNSDate	7010	8760	+25.0%	0.80x (?)
StringWordBuilderReservingCapacity	2570	2960	+15.2%	0.87x (?)
StringWordBuilder	2630	3000	+14.1%	0.88x (?)
CStringLongAscii	320	346	+8.1%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringMatch	50400	43100	-14.5%	1.17x (?)
DataReplaceMediumBuffer	7100	6300	-11.3%	1.13x (?)
SubstringTrimmingASCIIWhitespace	1051	971	-7.6%	1.08x (?)
ArrayOfPOD	1145	1068	-6.7%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

swift-ci · 2022-02-04T08:00:33Z

Build failed
Swift Test OS X Platform
Git Sha - 30ee8323def1c61c3889234bc39aba9d6dbe5bcd

atrick

I see a great improvement over the last version of the code. I can tell you put a lot of effort into decomposing the algorithm into minimal functions. And you put a lot of effort into communicating each functional piece with clear, detailed, precise comments. This code is already far better than most of the code base. I just have a few nits and questions inline.

So... as long as we're indulging in a discussion about code organization, allow me to abuse the PR comments to proselytize. The following comments are to continue the discussion with @eeckstein--I don't expect you to respond with changes in this PR...

What goes in a class? The first thing I look at are the members. I don't want to start reading implementation until I can understand the meaning of the shared state. While reading method implementation, I want to quickly refer back to those members. If you think the purpose of a class can be understood without those members, then they probably should be defined in a different class, or could be locals. So, when I see members tacked onto the end of the class for convenience, it strikes me as a misuse.

In LexicalDestroyFolding, we have several members that constitute the class context. Those are effectively immutable globals relative to the environment where the class was instantiated. This is fine. It's much better to use "globals" for things that are conceptually global than to pass pieces of context around in arguments. Passing individual pieces of context as arguments:

obscures function interfaces
makes it impossible to establish common idioms for accessing context
strongly discourages people from decomposing code into functions.
makes it 10x more difficult to refactor code or update code when context changes

In other words, superficially avoiding "globals" is actually what prevents programmers from writing good functional code. I continually run into problems in our code base caused by the need to pass around or rediscover context.

For larger passes, to avoid massive amounts of boilerplate, I wrap the context into a separate class that can passed around. I would prefer an actual global (as thread-local storage) for context so developers aren't discouraged from good modular design by extra boilerplate.

After the contextual class members, LexicalDestroyFolding contains results of the analysis. Those could be defined separately from the context. If the result of an analysis either contains more than one data type, or if it's helpful to wrap the underlying data structure in an interface, then it merits defining a class rather than passing around standalone vectors and sets.

IR transformations tend to have this structure:

  Result1 result1 = Analysis1(context).analyze()
  Result2 result2 = Analysis2(result1, context).analyze()
  Rewriter(result1, result2, context).rewrite()

And I find it simpler to do this in practice:

  Result1 result1
  result1.compute(context)
  Result2 result2
  result2.compute(result1, context)
  Rewriter(result1, result2, context).rewrite()

I do make the API surface a standalone function when possible:

  bool transform(candidate, context)

I have many times combined multiple analyses together into a class along with rewriting because

I was trying to be clever by combining overlapping parts of analyses and
rewriting into a single traversal
I was defining state at the wrong conceptual level to optimize
memory allocation
I was trying to avoid the boilerplate of defining multiple classes
and repeating the "global" context in all of them

I always regret doing that.

include/swift/SILOptimizer/Utils/CanonicalOSSALifetime.h

lib/SILOptimizer/Utils/LexicalDestroyFolding.cpp

eeckstein · 2022-02-04T10:22:58Z

@atrick looks like we posted a comment at the same time 🙂

It's much better to use "globals" for things that are conceptually global than to pass pieces of context around in arguments.

Yes.
But that does not include variables which are not global within the algorithm implemented in the class.

What I mean is: If the class is has a well designed state from the outside view, that's all good. But that's not an excuse to ignore best practices when implementing the internals of the class.

Pulled out a simple check--that CanonicalizeOSSALifetime now uses to determine whether to continue hoisting a destroy_value instruction--into a predicate that can be used by LexicalDestroyFolding.

The new utility folds patterns like TOP: // %borrowee is some owned value %lifetime = begin_borrow %borrowee BOTTOM: // %copy is some transitive copy of %borrowee apply %f(%copy) end_borrow %lifetime destroy_value %borrowee into TOP: %move = move_value [lexical] %borrowee %lifetime begin_borrow [lexical] %move BOTTOM: end_borrow %lifetime apply %f(%move) It is intended to be run after ShrinkBorrowScope moves the end_borrow up to just before a relevant apply and after CanonicalizeOSSALifetime moves destroy_value instructions up to just after their last guaranteed use, at which point these patterns will exist.

LexicalDestroyFolding allows us to elide the spurious retain/release that was being tested here. rdar://87255563

atrick · 2022-02-05T01:54:16Z

What I mean is: If the class is has a well designed state from the outside view, that's all good. But that's not an excuse to ignore best practices when implementing the internals of the class.

Right. If one class implements multiple steps, each with distinct state, that becomes confusing and violates the principle that all class members have the same lifetime. The reason for the tension here is that to fix it you either need to (1) artificially plumb data structures down through multiple levels of API. That has its own problem. Or (2) define another class. Then you end up with more total lines of code because of the boilerplate. I'm usually in favor of #2 because I don't think total lines of code matters as long as the code is modular and doesn't require mental energy to process.

@nate-chandler sorry for hijacking the PR. feel free to merge at your leisure.

nate-chandler added 3 commits January 31, 2022 16:36

[NFC] Renamed helper function.

d6ab215

[ShrinkBorrowScope] Used extract deinit barrier implementation.

8d28bec

In 0325e29, the implementation of deinit barrier predicate was copied out of ShrinkBorrowScope into MemAccessUtils. Delete the source of that copy from ShrinkBorrowScope and just call the now common utility.

[Gardening] Fixed typo in comment.

d5cbc7e

nate-chandler requested a review from atrick February 1, 2022 00:38

nate-chandler marked this pull request as draft February 1, 2022 00:38

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 4 times, most recently from 99435fa to ecaf7e8 Compare February 1, 2022 22:01

nate-chandler marked this pull request as ready for review February 1, 2022 22:02

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from ecaf7e8 to 413b3cc Compare February 1, 2022 23:16

[PrunedLiveness] Added areUsesOutsideBoundary.

2c0a35a

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 3 times, most recently from 32fe5f0 to f1148c6 Compare February 2, 2022 20:54

swiftlang deleted a comment from swift-ci Feb 2, 2022

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from f1148c6 to d8d64f9 Compare February 2, 2022 22:11

swiftlang deleted a comment from swift-ci Feb 3, 2022

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from d8d64f9 to 254e77f Compare February 3, 2022 03:18

eeckstein reviewed Feb 3, 2022

View reviewed changes

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 254e77f to c045508 Compare February 3, 2022 17:39

atrick reviewed Feb 3, 2022

View reviewed changes

test/SILOptimizer/outliner.swift Outdated Show resolved Hide resolved

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 2 times, most recently from 0cff6f6 to 30ee832 Compare February 4, 2022 01:05

atrick approved these changes Feb 4, 2022

View reviewed changes

[CanonicalizeOSSALt] Extracted destroy pred.

5c1a172

Pulled out a simple check--that CanonicalizeOSSALifetime now uses to determine whether to continue hoisting a destroy_value instruction--into a predicate that can be used by LexicalDestroyFolding.

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 30ee832 to 05e41ca Compare February 4, 2022 17:10

swiftlang deleted a comment from swift-ci Feb 4, 2022

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 05e41ca to 1e93b25 Compare February 4, 2022 19:24

swiftlang deleted a comment from swift-ci Feb 4, 2022

nate-chandler added 3 commits February 4, 2022 14:41

[CopyPropagation] Run LexicalDestroyFolding.

9e5e935

[Test] Reenabled SILOpt/shrink_borrow_scope.

257e1d3

LexicalDestroyFolding allows us to elide the spurious retain/release that was being tested here. rdar://87255563

nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 1e93b25 to 257e1d3 Compare February 4, 2022 22:41

swiftlang deleted a comment from swift-ci Feb 4, 2022

nate-chandler merged commit fe74a7d into swiftlang:main Feb 5, 2022

nate-chandler deleted the lexical_lifetimes/destroy_folding/add branch February 5, 2022 06:23

[CopyPropagation] Added lexical destroy folding. #41113

[CopyPropagation] Added lexical destroy folding. #41113

Uh oh!

Conversation

nate-chandler commented Feb 1, 2022

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeckstein Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

atrick Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nate-chandler Feb 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atrick Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

eeckstein Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

atrick Feb 5, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swift-ci commented Feb 4, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Feb 4, 2022

Uh oh!

atrick left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeckstein commented Feb 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atrick commented Feb 5, 2022

Uh oh!

Uh oh!

atrick Feb 3, 2022 •

edited

Loading

nate-chandler Feb 4, 2022 •

edited

Loading

atrick left a comment •

edited

Loading

eeckstein commented Feb 4, 2022 •

edited

Loading