Skip to content

[CopyPropagation] Added lexical destroy folding. #41113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nate-chandler
Copy link
Contributor

The new utility folds patterns like

// %borrowee is some owned value
%lifetime = begin_borrow %borrowee
...
// %copy is some transitive copy of %borrowee
apply %f(%copy)
end_borrow %lifetime
destroy_value %borrowee

into

%move = move_value [lexical] %borrowee
%lifetime begin_borrow [lexical] %move
...
end_borrow %lifetime
apply %f(%move)

It is intended to be run after ShrinkBorrowScope moves the end_borrow up to just before a relevant apply and after CanonicalizeOSSALifetime moves destroy_value instructions up to just after their last guaranteed use, at which point these patterns will exist.

In 0325e29, the implementation of
deinit barrier predicate was copied out of ShrinkBorrowScope into
MemAccessUtils.  Delete the source of that copy from ShrinkBorrowScope
and just call the now common utility.
@nate-chandler nate-chandler requested a review from atrick February 1, 2022 00:38
@nate-chandler nate-chandler marked this pull request as draft February 1, 2022 00:38
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 4 times, most recently from 99435fa to ecaf7e8 Compare February 1, 2022 22:01
@nate-chandler nate-chandler marked this pull request as ready for review February 1, 2022 22:02
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from ecaf7e8 to 413b3cc Compare February 1, 2022 23:16
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 3 times, most recently from 32fe5f0 to f1148c6 Compare February 2, 2022 20:54
@swiftlang swiftlang deleted a comment from swift-ci Feb 2, 2022
@swiftlang swiftlang deleted a comment from swift-ci Feb 2, 2022
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from f1148c6 to d8d64f9 Compare February 2, 2022 22:11
@swiftlang swiftlang deleted a comment from swift-ci Feb 3, 2022
@swiftlang swiftlang deleted a comment from swift-ci Feb 3, 2022
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from d8d64f9 to 254e77f Compare February 3, 2022 03:18
Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

///
/// Returns whether any change was made.
bool LexicalDestroyFolding::run() {
if (!findCandidates())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to use a more functional programming style to make the code easier to read. Here, it's completely obfuscated which variables are input and which are output of which function call. It's as bad as using global variables.
For example:

bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) {
  SmallVector<...> candidates;
  if (!findCandidates(introducer, candidates)) ...

  SmallPtrSet<...> users;
  if (!findUsers(introducer, users)) ...

  if (!filterCandidates(candidates)) ...

Ideally, the LexicalDestroyFolding class is not needed at all.

Copy link
Contributor

@atrick atrick Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) {
SmallVector<...> candidates;
if (!findCandidates(introducer, candidates)) ...

SmallPtrSet<...> users;
if (!findUsers(introducer, users)) ...

if (!filterCandidates(candidates)) ...

I think the specific suggestion was reasonable. When you can break apart an algorithm functionally into stages with distinct input and output, that's fantastic.

Ideally, the LexicalDestroyFolding class is not needed at all.

But let's not get carried away. Passing around shared mutable state by reference makes the code less functional. To make the code more functional, you need to encapsulate all of the shared mutable state with the same lifetime into a single class. That allows a single stage of the algorithm to be decomposed into methods that each do one functional thing.

When an algorithm passes around shared mutable state in multiple data structures, it does several bad non-functional things:

  • The functions no longer have a functional API. It's impossible to distinguish the function input/output from the shared mutable state.

  • The shared mutable state is now accessed via many different local variables making it much harder to understand where it is accessed!

  • It strongly discourages breaking an algorithm up into multiple functions. The code becomes very difficult to refactor.

  • To understand an algorithm, the first thing you need to see are central data structures that are not localized to a single function. If those aren't encapsulated at the right level in a central class, then you need to read the entire implementation first.

Copy link
Contributor Author

@nate-chandler nate-chandler Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because candidates gets written to by two functions, it's now local to LexicalDestroyFolding::run. The other fields are remaining as members variables of LexicalDestroyFolding because they're written once and then immutable. It's documented when those member variables get defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so that's the opposite of what I was trying to say. I think Erik's specific suggestion to make 'users' and 'candidates' local was fine.

The general advice of "don't use class members to define state because that isn't functional" was wrong. Defining "local" variables and passing them by non-const reference to multiple functions is not functional either. It actually hides the mutable shared state which is even worse, and it doesn't establish any connection between the code that shares mutable state. Those variables are not really local at all. They are hidden globals that assume different name in each function.

void implementation() {
  ...
  DataStructure data1
  DataStructure data2
  helper1(data1, data2) // pass by ref
 .... way later...
  helper2(data1, data2) // pass by ref, I also modify shared state
 ... 
  helper3(data1, data2)
}

It's not a problem in your small run() function, but it becomes indecipherable very quickly. You can find lots of examples in our code base where several data structures are defined locally, then all passed individually into multiple functions. Outright terrible design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"don't use class members to define state because that isn't functional"

That's not what I wanted to say. It makes sense to encapsulate multiple variables into structs/classes, because passing dozens of small variables to functions is not good for readability either.
Also, often it makes sense to have things like caches, etc. as global state. In this example I would keep dominanceTree and deleter as class members. Often it's a tradeoff and subject to personal taste.

But I would say that the algorithm in the run function, where results are produced by some called functions and used by other called functions, is clearly a case where those values should be local variables.
Documenting those "side-effects" is only the second best choice. The best comments are the comments which are not needed, because the behavior is obvious from the source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I think we're all in pretty strong agreement.

@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 254e77f to c045508 Compare February 3, 2022 17:39
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch 2 times, most recently from 0cff6f6 to 30ee832 Compare February 4, 2022 01:05
@swift-ci
Copy link
Contributor

swift-ci commented Feb 4, 2022

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
Diffing.Myers.Similar 496 551 +11.1% 0.90x (?)
Set.subtracting.Seq.Empty.Int 207 229 +10.6% 0.90x
DictionaryBridgeToObjC_Access 925 1017 +9.9% 0.91x (?)
DataToStringLargeUnicode 5950 6500 +9.2% 0.92x (?)
Set.isDisjoint.Box.Empty 80 87 +8.7% 0.92x
Set.isSuperset.Seq.Empty.Int 72 78 +8.3% 0.92x (?)
Set.isSubset.Int.Empty 72 78 +8.3% 0.92x (?)
Set.isStrictSubset.Int.Empty 62 67 +8.1% 0.93x (?)
Set.isDisjoint.Seq.Int.Empty 65 70 +7.7% 0.93x
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 2542 1669 -34.3% 1.52x (?)
FlattenListFlatMap 6151 4061 -34.0% 1.51x (?)
DictionarySubscriptDefaultMutationArray 563 426 -24.3% 1.32x
StringFromLongWholeSubstring 5 4 -20.0% 1.25x
ArrayAppendAsciiSubstring 18864 15516 -17.7% 1.22x (?)
ArrayAppendLatin1Substring 19224 15948 -17.0% 1.21x
ArrayAppendUTF16Substring 18864 15696 -16.8% 1.20x
DictionarySwapAt 828 716 -13.5% 1.16x
DictionaryBridgeToObjC_Bridge 17 15 -11.8% 1.13x (?)
StringBuilderSmallReservingCapacity 342 314 -8.2% 1.09x (?)
Array2D 7504 6912 -7.9% 1.09x (?)
RemoveWhereSwapInts 40 37 -7.5% 1.08x (?)
FloatingPointPrinting_Float80_interpolated 54400 50400 -7.4% 1.08x (?)
DropLastAnySeqCRangeIter 630 588 -6.7% 1.07x
RandomShuffleLCG2 480 448 -6.7% 1.07x (?)
DropLastSequence 596 557 -6.5% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
ErrorHandling.o 3281 3425 +4.4% 0.96x
 
Improvement OLD NEW DELTA RATIO
MonteCarloE.o 2905 2817 -3.0% 1.03x
FlattenList.o 3953 3841 -2.8% 1.03x
RangeAssignment.o 3363 3275 -2.6% 1.03x
BucketSort.o 8727 8551 -2.0% 1.02x
ArraySetElement.o 1324 1300 -1.8% 1.02x
ReversedCollections.o 8844 8684 -1.8% 1.02x
COWArrayGuaranteedParameterOverhead.o 1383 1359 -1.7% 1.02x
Sim2DArray.o 1404 1380 -1.7% 1.02x
Array2D.o 2933 2885 -1.6% 1.02x
RomanNumbers.o 6924 6812 -1.6% 1.02x
RangeOverlaps.o 6791 6698 -1.4% 1.01x
ArraySubscript.o 2382 2350 -1.3% 1.01x
ObserverClosure.o 2468 2436 -1.3% 1.01x
SortLettersInPlace.o 8720 8608 -1.3% 1.01x
ObserverPartiallyAppliedMethod.o 2515 2483 -1.3% 1.01x
PopFrontGeneric.o 2694 2662 -1.2% 1.01x
DictionaryGroup.o 12413 12269 -1.2% 1.01x
ClassArrayGetter.o 3853 3811 -1.1% 1.01x
Memset.o 2227 2203 -1.1% 1.01x
Combos.o 5974 5910 -1.1% 1.01x
SuperChars.o 1513 1497 -1.1% 1.01x
SortIntPyramids.o 9299 9203 -1.0% 1.01x
DictionaryOfAnyHashableStrings.o 8000 7920 -1.0% 1.01x
Join.o 1609 1593 -1.0% 1.01x
Phonebook.o 9659 9563 -1.0% 1.01x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 1769 2640 +49.2% 0.67x (?)
FlattenListFlatMap 5459 6625 +21.4% 0.82x (?)
Set.isDisjoint.Seq.Int.Empty 74 81 +9.5% 0.91x (?)
Set.isDisjoint.Int.Empty 75 81 +8.0% 0.93x (?)
Set.isStrictSubset.Int.Empty 65 70 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NSError 266 166 -37.6% 1.60x (?)
DictionarySubscriptDefaultMutationArray 660 532 -19.4% 1.24x
DictionarySwapAt 840 740 -11.9% 1.14x
DictionaryBridgeToObjC_Bridge 17 15 -11.8% 1.13x (?)
FloatingPointPrinting_Float80_interpolated 54600 48600 -11.0% 1.12x (?)
NormalizedIterator_fastPrenormal 710 640 -9.9% 1.11x
RemoveWhereMoveInts 37 34 -8.1% 1.09x (?)
Array2D 7504 6944 -7.5% 1.08x (?)
SubstringEqualString 272 254 -6.6% 1.07x

Code size: -Osize

Improvement OLD NEW DELTA RATIO
SuperChars.o 1462 1434 -1.9% 1.02x
MonteCarloE.o 2845 2797 -1.7% 1.02x
RomanNumbers.o 5230 5149 -1.5% 1.02x
FlattenList.o 3801 3744 -1.5% 1.02x
RangeAssignment.o 3221 3173 -1.5% 1.02x
StringRemoveDupes.o 3760 3720 -1.1% 1.01x
BucketSort.o 8469 8385 -1.0% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDate 7010 8760 +25.0% 0.80x (?)
StringWordBuilderReservingCapacity 2570 2960 +15.2% 0.87x (?)
StringWordBuilder 2630 3000 +14.1% 0.88x (?)
CStringLongAscii 320 346 +8.1% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
StringMatch 50400 43100 -14.5% 1.17x (?)
DataReplaceMediumBuffer 7100 6300 -11.3% 1.13x (?)
SubstringTrimmingASCIIWhitespace 1051 971 -7.6% 1.08x (?)
ArrayOfPOD 1145 1068 -6.7% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

swift-ci commented Feb 4, 2022

Build failed
Swift Test OS X Platform
Git Sha - 30ee8323def1c61c3889234bc39aba9d6dbe5bcd

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a great improvement over the last version of the code. I can tell you put a lot of effort into decomposing the algorithm into minimal functions. And you put a lot of effort into communicating each functional piece with clear, detailed, precise comments. This code is already far better than most of the code base. I just have a few nits and questions inline.

So... as long as we're indulging in a discussion about code organization, allow me to abuse the PR comments to proselytize. The following comments are to continue the discussion with @eeckstein--I don't expect you to respond with changes in this PR...

What goes in a class? The first thing I look at are the members. I don't want to start reading implementation until I can understand the meaning of the shared state. While reading method implementation, I want to quickly refer back to those members. If you think the purpose of a class can be understood without those members, then they probably should be defined in a different class, or could be locals. So, when I see members tacked onto the end of the class for convenience, it strikes me as a misuse.

In LexicalDestroyFolding, we have several members that constitute the class context. Those are effectively immutable globals relative to the environment where the class was instantiated. This is fine. It's much better to use "globals" for things that are conceptually global than to pass pieces of context around in arguments. Passing individual pieces of context as arguments:

  • obscures function interfaces
  • makes it impossible to establish common idioms for accessing context
  • strongly discourages people from decomposing code into functions.
  • makes it 10x more difficult to refactor code or update code when context changes

In other words, superficially avoiding "globals" is actually what prevents programmers from writing good functional code. I continually run into problems in our code base caused by the need to pass around or rediscover context.

For larger passes, to avoid massive amounts of boilerplate, I wrap the context into a separate class that can passed around. I would prefer an actual global (as thread-local storage) for context so developers aren't discouraged from good modular design by extra boilerplate.

After the contextual class members, LexicalDestroyFolding contains results of the analysis. Those could be defined separately from the context. If the result of an analysis either contains more than one data type, or if it's helpful to wrap the underlying data structure in an interface, then it merits defining a class rather than passing around standalone vectors and sets.

IR transformations tend to have this structure:

  Result1 result1 = Analysis1(context).analyze()
  Result2 result2 = Analysis2(result1, context).analyze()
  Rewriter(result1, result2, context).rewrite()

And I find it simpler to do this in practice:

  Result1 result1
  result1.compute(context)
  Result2 result2
  result2.compute(result1, context)
  Rewriter(result1, result2, context).rewrite()

I do make the API surface a standalone function when possible:

  bool transform(candidate, context)

I have many times combined multiple analyses together into a class along with rewriting because

  • I was trying to be clever by combining overlapping parts of analyses and
    rewriting into a single traversal

  • I was defining state at the wrong conceptual level to optimize
    memory allocation

  • I was trying to avoid the boilerplate of defining multiple classes
    and repeating the "global" context in all of them

I always regret doing that.

@eeckstein
Copy link
Contributor

eeckstein commented Feb 4, 2022

@atrick looks like we posted a comment at the same time 🙂

It's much better to use "globals" for things that are conceptually global than to pass pieces of context around in arguments.

Yes.
But that does not include variables which are not global within the algorithm implemented in the class.

What I mean is: If the class is has a well designed state from the outside view, that's all good. But that's not an excuse to ignore best practices when implementing the internals of the class.

Pulled out a simple check--that CanonicalizeOSSALifetime now uses to
determine whether to continue hoisting a destroy_value instruction--into
a predicate that can be used by LexicalDestroyFolding.
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 30ee832 to 05e41ca Compare February 4, 2022 17:10
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 05e41ca to 1e93b25 Compare February 4, 2022 19:24
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
The new utility folds patterns like

  TOP:
    // %borrowee is some owned value
    %lifetime = begin_borrow %borrowee

  BOTTOM:
    // %copy is some transitive copy of %borrowee
    apply %f(%copy)
    end_borrow %lifetime
    destroy_value %borrowee

into

  TOP:
    %move = move_value [lexical] %borrowee
    %lifetime begin_borrow [lexical] %move
  BOTTOM:
    end_borrow %lifetime
    apply %f(%move)

It is intended to be run after ShrinkBorrowScope moves the end_borrow up
to just before a relevant apply and after CanonicalizeOSSALifetime moves
destroy_value instructions up to just after their last guaranteed use,
at which point these patterns will exist.
LexicalDestroyFolding allows us to elide the spurious retain/release
that was being tested here.

rdar://87255563
@nate-chandler nate-chandler force-pushed the lexical_lifetimes/destroy_folding/add branch from 1e93b25 to 257e1d3 Compare February 4, 2022 22:41
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
@swiftlang swiftlang deleted a comment from swift-ci Feb 4, 2022
@atrick
Copy link
Contributor

atrick commented Feb 5, 2022

What I mean is: If the class is has a well designed state from the outside view, that's all good. But that's not an excuse to ignore best practices when implementing the internals of the class.

Right. If one class implements multiple steps, each with distinct state, that becomes confusing and violates the principle that all class members have the same lifetime. The reason for the tension here is that to fix it you either need to (1) artificially plumb data structures down through multiple levels of API. That has its own problem. Or (2) define another class. Then you end up with more total lines of code because of the boilerplate. I'm usually in favor of #2 because I don't think total lines of code matters as long as the code is modular and doesn't require mental energy to process.

@nate-chandler sorry for hijacking the PR. feel free to merge at your leisure.

@nate-chandler nate-chandler merged commit fe74a7d into swiftlang:main Feb 5, 2022
@nate-chandler nate-chandler deleted the lexical_lifetimes/destroy_folding/add branch February 5, 2022 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants