-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[SILOptimizer] Alter FSO arg explosion heuristic. #27239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SILOptimizer] Alter FSO arg explosion heuristic. #27239
Conversation
@swift-ci please smoke benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please test |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
My main concern here is that I'm not sure if exploding arguments for the sake of reducing ARC traffic helps at all, given that we pass most of our arguments as guaranteed. Some of the assumptions in the FSO come from a time where we had the owned-convention as default.
I think it's worth doing some more experiments (in case you didn't do it yet), e.g. compare benchmark results with disabling argument explosion at all, etc.
lib/SILOptimizer/FunctionSignatureTransforms/ArgumentExplosionTransform.cpp
Outdated
Show resolved
Hide resolved
/// live non-trivial leaf and (b) specializing will not introduce a thunk. | ||
/// Breaking apart the non-trivial leaves is beneficial because it imbues | ||
/// the leaves with different RC identities, enabling the low level ARC | ||
/// optimizer to pair retains/releases in an easier way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is beneficial now, but how about when we have ownership SIL at this point? Will it sill be beneficial?
Anyway, I think it's worth running the benchmarks with and without only this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! The comment below
/// TODO: Eliminate the second case. That splitting up is only beneficial
/// before semantic ARC.
is about exactly that, eliminating the second rule once we have ownership SIL at this point. As you say, I'll run the benchmarks without this rule to see how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, opened a second PR here to run the benchmarks without this rule: #27252 .
lib/SILOptimizer/FunctionSignatureTransforms/ArgumentExplosionTransform.cpp
Outdated
Show resolved
Hide resolved
lib/SILOptimizer/FunctionSignatureTransforms/ArgumentExplosionTransform.cpp
Outdated
Show resolved
Hide resolved
// If it is known without taking the owned-to-guaranteed transformation into | ||
// account both that exploding will reduce ARC traffic (because an upper bound | ||
// for the number of live non-trivial leaves is less thann the non-trivial | ||
// leaf count) and also that the explosion will fit within the heuristic upper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when I think about it: is this really true, given that we pass most of the arguments as guaranteed by default? I think it mainly depends on the call site. Only if we need a retain-release pair at the call site, eliminating a dead argument will give a benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eeckstein it is not that simple. What if we want to move that value pass the call site. Consider a dead guaranteed parameter. Without eliminating the argument we will not be able to reduce the lifetime and do other ARC optimizations.
I would put it this way, it only makes sense if we eliminate an ARC "constraint" on the lifetime of the non-trivially typed thing.
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@nate-chandler btw the <= 6 field heuristic used by loadable from address is here in this utility: https://github.com/apple/swift/blob/master/lib/SILOptimizer/Utils/Local.cpp#L1497 |
@swift-ci please test macos |
I'm slightly concerned the positive impact of this might be overstated a bit in the benchmark results, since it looks like most of the wins are just repeated variations of 2 tests. |
@swift-ci please test compiler performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comments
lib/SILOptimizer/FunctionSignatureTransforms/FunctionSignatureOpts.h
Outdated
Show resolved
Hide resolved
@Catfish-Man That's fair. Still doing measurements over here and specifically will be looking at the code size impact on more realistic targets including the compatibility suite. |
Added getAllLeafTypes to ProjectionTree. The new method vends, via an out paramter, a vector containing the types of all the leaves in a projection tree in the order that they appear. The method relies uses a new convenience on ProjectionTreeNode, isLeaf to include only the types of those nodes which are leaves. Excerpted from @GottesM's swiftlang#16756.
Added getUsers to ProjectionTree. The new method vands, via an out parameter, a set of all the users of all of the nodes in the projection tree that are themselves not in the projection tree by way of getNonProjUsers. Took this opportunity to tweak getNonProjUsers to vend a const ArrayRef rather than a SmallVector. Excerpted from @GottesM's swiftlang#16756.
…leases. Added getPartiallyPostDomReleaseSet to ConsumedArgToEpilogueReleaseMatcher. Given an argument, the new method returns the array of releases of the argument if there is an array thereof and if the releases therein do not jointly post-dominate the argument. Excerpted from @GottesM's swiftlang#16756.
Replaced some namespace qualified references to ArrayRef and Optional with the unqualified type. Reordered the includes per clang-format. Excerpted from @GottesM's swiftlang#16756.
Added brief doc for FunctionSignatureOpts' ArgumentDescriptor's method canOptimizeLiveArg and tweaked the style to add braces around the body of a single line if clause.
The command-line option for sil-fso-enable-generics was previously visible outside the FunctionSignatureOpts translation unit. It is not any longer.
The new flag -sil-fso-optimize-if-not-called forced function signature optimization to run even on functions which are not called. Doing so is helpful for tests to alleviate the burden of writing code to actually call a function in whose function signature optimization we are interested.
Debug:
from https://ci.swift.org/job/swift-PR-compiler-performance-macOS/539/artifact/comment.md |
Release:
|
d99fadd
to
d1f2517
Compare
@swift-ci please test |
Build failed |
Build failed |
9318377
to
3cb413c
Compare
@swift-ci please test |
1 similar comment
@swift-ci please test |
Build failed |
Build failed |
49da12f
to
87dd1b0
Compare
@swift-ci please test |
4 similar comments
@swift-ci please test |
@swift-ci please test |
@swift-ci please test |
@swift-ci please test |
Previously, CallerAnalysis::FunctionInfo.foundAllCallers(), which is documented to return true only when specialization of a function will not require a thunk, returned true for functions which are possibly used externally. Now, that member function only returns false for functions which may be used externally since dead code elimination will not be able to remove them.
The new rule is that an argument will be exploded if one of the following sets of conditions hold: (1) (a) Specializing the function will result in a thunk. That is, the thunk that is generated cannot be inlined everywhere. (b) The argument has dead non-trivial leaves. (c) The argument has fewer than three live leaves. (2) (a) Specializing the function will not result in a thunk. That is, the thunk that is generated will be inlined everywhere and eliminated as dead code. (b) The argument has dead potentially trivial leaves. (c) The argument has fewer than six live leaves. This change is based heavily on @GottesM's swiftlang#16756 . rdar://problem/39957093
87dd1b0
to
9567bd4
Compare
@swift-ci please benchmark |
@swift-ci please test |
@swift-ci please test compiler performance |
Build failed |
Build failed |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please test macos platform |
Build failed |
@swift-ci please clean smoke test macos platform |
Release:
Debug:
Source: https://ci.swift.org/job/swift-PR-compiler-performance-macOS/544/artifact/ |
@swift-ci please clean test macos platform |
1 similar comment
@swift-ci please clean test macos platform |
The new rule on the basis of which an argument will be exploded is as follows:
(1) An argument with more than 3 live leaves after considering the results of the owned-to-guaranteed transformation will not be exploded.
(2) Rule (1) permitting, an argument all of whose non-trivial leaves are live will be exploded if and only if (a) there is more than one non-trivial leaf and (b) specialization will not result in a thunk. This rule was beneficial before semantic ARC because it avoided collapsing RC identities. It is not useful in the face of semantic ARC and should be retired.
(3) Rule (1) permitting, an argument with dead non-trivial leaves will be exploded.
rdar://problem/39957093