-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[CopyPropagation] Added lexical destroy folding. #41113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CopyPropagation] Added lexical destroy folding. #41113
Conversation
In 0325e29, the implementation of deinit barrier predicate was copied out of ShrinkBorrowScope into MemAccessUtils. Delete the source of that copy from ShrinkBorrowScope and just call the now common utility.
99435fa
to
ecaf7e8
Compare
ecaf7e8
to
413b3cc
Compare
32fe5f0
to
f1148c6
Compare
f1148c6
to
d8d64f9
Compare
d8d64f9
to
254e77f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments
/// | ||
/// Returns whether any change was made. | ||
bool LexicalDestroyFolding::run() { | ||
if (!findCandidates()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to use a more functional programming style to make the code easier to read. Here, it's completely obfuscated which variables are input and which are output of which function call. It's as bad as using global variables.
For example:
bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) {
SmallVector<...> candidates;
if (!findCandidates(introducer, candidates)) ...
SmallPtrSet<...> users;
if (!findUsers(introducer, users)) ...
if (!filterCandidates(candidates)) ...
Ideally, the LexicalDestroyFolding
class is not needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool LexicalDestroyFolding::run(BeginBorrowInst *introducer) {
SmallVector<...> candidates;
if (!findCandidates(introducer, candidates)) ...SmallPtrSet<...> users;
if (!findUsers(introducer, users)) ...if (!filterCandidates(candidates)) ...
I think the specific suggestion was reasonable. When you can break apart an algorithm functionally into stages with distinct input and output, that's fantastic.
Ideally, the
LexicalDestroyFolding
class is not needed at all.
But let's not get carried away. Passing around shared mutable state by reference makes the code less functional. To make the code more functional, you need to encapsulate all of the shared mutable state with the same lifetime into a single class. That allows a single stage of the algorithm to be decomposed into methods that each do one functional thing.
When an algorithm passes around shared mutable state in multiple data structures, it does several bad non-functional things:
-
The functions no longer have a functional API. It's impossible to distinguish the function input/output from the shared mutable state.
-
The shared mutable state is now accessed via many different local variables making it much harder to understand where it is accessed!
-
It strongly discourages breaking an algorithm up into multiple functions. The code becomes very difficult to refactor.
-
To understand an algorithm, the first thing you need to see are central data structures that are not localized to a single function. If those aren't encapsulated at the right level in a central class, then you need to read the entire implementation first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because candidates
gets written to by two functions, it's now local to LexicalDestroyFolding::run
. The other fields are remaining as members variables of LexicalDestroyFolding
because they're written once and then immutable. It's documented when those member variables get defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so that's the opposite of what I was trying to say. I think Erik's specific suggestion to make 'users' and 'candidates' local was fine.
The general advice of "don't use class members to define state because that isn't functional" was wrong. Defining "local" variables and passing them by non-const reference to multiple functions is not functional either. It actually hides the mutable shared state which is even worse, and it doesn't establish any connection between the code that shares mutable state. Those variables are not really local at all. They are hidden globals that assume different name in each function.
void implementation() {
...
DataStructure data1
DataStructure data2
helper1(data1, data2) // pass by ref
.... way later...
helper2(data1, data2) // pass by ref, I also modify shared state
...
helper3(data1, data2)
}
It's not a problem in your small run() function, but it becomes indecipherable very quickly. You can find lots of examples in our code base where several data structures are defined locally, then all passed individually into multiple functions. Outright terrible design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"don't use class members to define state because that isn't functional"
That's not what I wanted to say. It makes sense to encapsulate multiple variables into structs/classes, because passing dozens of small variables to functions is not good for readability either.
Also, often it makes sense to have things like caches, etc. as global state. In this example I would keep dominanceTree
and deleter
as class members. Often it's a tradeoff and subject to personal taste.
But I would say that the algorithm in the run
function, where results are produced by some called functions and used by other called functions, is clearly a case where those values should be local variables.
Documenting those "side-effects" is only the second best choice. The best comments are the comments which are not needed, because the behavior is obvious from the source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I think we're all in pretty strong agreement.
254e77f
to
c045508
Compare
0cff6f6
to
30ee832
Compare
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a great improvement over the last version of the code. I can tell you put a lot of effort into decomposing the algorithm into minimal functions. And you put a lot of effort into communicating each functional piece with clear, detailed, precise comments. This code is already far better than most of the code base. I just have a few nits and questions inline.
So... as long as we're indulging in a discussion about code organization, allow me to abuse the PR comments to proselytize. The following comments are to continue the discussion with @eeckstein--I don't expect you to respond with changes in this PR...
What goes in a class? The first thing I look at are the members. I don't want to start reading implementation until I can understand the meaning of the shared state. While reading method implementation, I want to quickly refer back to those members. If you think the purpose of a class can be understood without those members, then they probably should be defined in a different class, or could be locals. So, when I see members tacked onto the end of the class for convenience, it strikes me as a misuse.
In LexicalDestroyFolding, we have several members that constitute the class context. Those are effectively immutable globals relative to the environment where the class was instantiated. This is fine. It's much better to use "globals" for things that are conceptually global than to pass pieces of context around in arguments. Passing individual pieces of context as arguments:
- obscures function interfaces
- makes it impossible to establish common idioms for accessing context
- strongly discourages people from decomposing code into functions.
- makes it 10x more difficult to refactor code or update code when context changes
In other words, superficially avoiding "globals" is actually what prevents programmers from writing good functional code. I continually run into problems in our code base caused by the need to pass around or rediscover context.
For larger passes, to avoid massive amounts of boilerplate, I wrap the context into a separate class that can passed around. I would prefer an actual global (as thread-local storage) for context so developers aren't discouraged from good modular design by extra boilerplate.
After the contextual class members, LexicalDestroyFolding contains results of the analysis. Those could be defined separately from the context. If the result of an analysis either contains more than one data type, or if it's helpful to wrap the underlying data structure in an interface, then it merits defining a class rather than passing around standalone vectors and sets.
IR transformations tend to have this structure:
Result1 result1 = Analysis1(context).analyze()
Result2 result2 = Analysis2(result1, context).analyze()
Rewriter(result1, result2, context).rewrite()
And I find it simpler to do this in practice:
Result1 result1
result1.compute(context)
Result2 result2
result2.compute(result1, context)
Rewriter(result1, result2, context).rewrite()
I do make the API surface a standalone function when possible:
bool transform(candidate, context)
I have many times combined multiple analyses together into a class along with rewriting because
-
I was trying to be clever by combining overlapping parts of analyses and
rewriting into a single traversal -
I was defining state at the wrong conceptual level to optimize
memory allocation -
I was trying to avoid the boilerplate of defining multiple classes
and repeating the "global" context in all of them
I always regret doing that.
@atrick looks like we posted a comment at the same time 🙂
Yes. What I mean is: If the class is has a well designed state from the outside view, that's all good. But that's not an excuse to ignore best practices when implementing the internals of the class. |
Pulled out a simple check--that CanonicalizeOSSALifetime now uses to determine whether to continue hoisting a destroy_value instruction--into a predicate that can be used by LexicalDestroyFolding.
30ee832
to
05e41ca
Compare
05e41ca
to
1e93b25
Compare
The new utility folds patterns like TOP: // %borrowee is some owned value %lifetime = begin_borrow %borrowee BOTTOM: // %copy is some transitive copy of %borrowee apply %f(%copy) end_borrow %lifetime destroy_value %borrowee into TOP: %move = move_value [lexical] %borrowee %lifetime begin_borrow [lexical] %move BOTTOM: end_borrow %lifetime apply %f(%move) It is intended to be run after ShrinkBorrowScope moves the end_borrow up to just before a relevant apply and after CanonicalizeOSSALifetime moves destroy_value instructions up to just after their last guaranteed use, at which point these patterns will exist.
LexicalDestroyFolding allows us to elide the spurious retain/release that was being tested here. rdar://87255563
1e93b25
to
257e1d3
Compare
Right. If one class implements multiple steps, each with distinct state, that becomes confusing and violates the principle that all class members have the same lifetime. The reason for the tension here is that to fix it you either need to (1) artificially plumb data structures down through multiple levels of API. That has its own problem. Or (2) define another class. Then you end up with more total lines of code because of the boilerplate. I'm usually in favor of #2 because I don't think total lines of code matters as long as the code is modular and doesn't require mental energy to process. @nate-chandler sorry for hijacking the PR. feel free to merge at your leisure. |
The new utility folds patterns like
into
It is intended to be run after ShrinkBorrowScope moves the end_borrow up to just before a relevant apply and after CanonicalizeOSSALifetime moves destroy_value instructions up to just after their last guaranteed use, at which point these patterns will exist.