-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Update and reimplement AddressLowering pass (for SIL opaque values). #41557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci test |
0349674
to
69f8018
Compare
@eeckstein I rewrote the header comments for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments.
I'll continue reviewing next week
/// reuse the borrowed value's storage. This means that we SIL cannot allow | ||
/// guaranteed opaque uses unless they are projections of the definition. In | ||
/// particular, borrowed structs, tuples, and enums of address-only types are | ||
/// not allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not allowed? Maybe you can add an example to clarify this.
Does that mean they are not allowed throughout the pipeline? Is this a general SIL invariant, and is this documented in SIL.rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIL.rst
Forwarding Address-Only Values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Address-only values are potentially unmovable when borrowed. This
means that they cannot be forwarded with guaranteed ownership unless
the forwarded value has the same representation as in the original
value and can reuse the same storage. Non-destructive projection is
allowed, such as `struct_extract`. Aggregation, such as `struct`, and
destructive disaggregation, such as `switch_enum` is not allowed. This
is an invariant for OSSA with opaque SIL values for these reasons:
1. To avoid implicit semantic copies. For move-only values, this allows
complete diagnostics. And in general, it makes it impossible for SIL
passes to "accidentally" create copies.
2. To reuse borrowed storage. This allows the optimizer to share the same
storage for multiple exclusive reads of the same variable, avoiding
copies. It may also be necessary to support native Swift atomics, which
will be unmovable-when-borrowed.
SILVerifier::checkOwnershipForwardingInst
// Address-only values are potentially unmovable when borrowed. Ensure that
// guaranteed address-only values are forwarded with the same
// representation. Non-destructive projection is allowed. Aggregation and
// destructive disaggregation is not allowed. See SIL.rst, Forwarding
// Addres-Only Values.
if (ownership == OwnershipKind::Guaranteed
&& OwnershipForwardingMixin::isAddressOnly(i)) {
require(OwnershipForwardingMixin::hasSameRepresentation(i),
"Forwarding a guaranteed address-only value requires the same "
"representation since no move or copy is allowed.");
}
SILVerifier::visitSILPhiArgument
// Address-only values are potentially unmovable when borrowed. See also
// checkOwnershipForwardingInst. A phi implies a move of its arguments
// because they can't necessarilly all reuse the same storage.
require((!arg->getType().isAddressOnly()
|| arg->getOwnershipKind() != OwnershipKind::Guaranteed) &&
"Guaranteed address-only phi not allowed--implies a copy");
AddressLowering file comment
/// This pass removes "opaque SILValues" by translating them into addressable
/// memory locations such as a stack locations. This is mandatory for IRGen.
///
/// Lowering to LLVM IR requires each SILValue's type to be a valid "SIL storage
/// type". Opaque SILValues have address-only types. Address-only values require
/// indirect storage in LLVM, so their SIL storage type must be an address type.
///
/// This pass never creates copies except to replace explicit value copies
/// (copy_value, load [copy], store). For move-only values, this allows complete
/// diagnostics. And in general, it makes it impossible for SIL passes to
/// "accidentally" create copies.
///
/// This pass inserts moves (copy_addr [take] [initialize]) of owned values to
/// - compose aggregates
/// - resolve phi interference
///
/// For guarantee values, this pass inserts neither copies nor
/// moves. Address-only values are potentially unmovable when borrowed. This
/// means that guaranteed address-only aggregates and phis are prohibited. This
/// SIL invariant is enforced by SILVerifier::checkOwnershipForwardingInst() and
/// SILVerifier::visitSILPhiArgument().
///
/// ## Step #1: Map opaque values
@rjmccall have I explained this well enough?
/// not allowed. | ||
/// | ||
/// When owned values are consumed by phis, multiple storage locations are | ||
/// required to avoid interfering with other phi operands. However, the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify what "phi operands" are: operands of terminators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OSSA (and even SIL in general) it vastly simplifies everything to distinguish between phis and terminator results. The opcode names are complete nonsense, but in the SILArgument API, the PhiOperand abstraction, and all the code we've written in the past year, the distinction is very clear.
Added (Phi values are block arguments in which phi's arguments are branch operands).
It seems very time consuming to fill out a git comment for every typo or minor suggestion. I just copy-paste all the suggestions into one file while reviewing. No line numbers needed. |
} | ||
}); | ||
auto allocPt = firstOpeningInst ? std::next(firstOpeningInst->getIterator()) | ||
: pass.function->begin()->begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that all alloc_stacks are created at the function begin (except the type has opened archetypes)?
Isn't this very bad for LLVM optimizations, because LLVM cannot separate the lifetimes of the individual stack values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eeckstein Initially (5 years ago) I was paranoid about -Onone compile time. Not anymore. We certainly could do an analysis here to figure out the lifetime of stack slots.
I don't think shrink wrapping will make any difference until address lowering is run after inlining, so nothing needs to happen in this PR. It is, however, still worth thinking about.
We can't actually figure out the optimal alloc_stack placement until we know all the places where we'll end up storing into the reused location. This would be a completely separate analysis. I can't think of any reason to complicate address lowering with that analysis. If that's a valuable thing to do, then why should it be confined to this pass? And why confined to address-only values? Shouldn't we perform shrink wrapping on all stack allocations?
I'm open to all suggestions.
@aschwaighofer any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern is that we have several peephole optimizations which only kick in if all uses of an alloc_stack
are in the same block.
But I agree, this pass is already very complex. It would make more sense to implement stack-location shrink wrapping as a separate pass. It might be necessary to do it before opaque values are turned on to not regress performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern is that we have several peephole optimizations which only kick in if all uses of an alloc_stack are in the same block.
That's too bad. For the sake of optimizer stability, I don't think we should have any peephole optimizations that make such an assumption.
I agree that shrink-wrapping should be done before address lowering is enabled after Mem2Reg runs at -O. Can you think of any reason this would cause a problem before Mem2Reg?
Note that any address-only local variables will still have the same local alloc_stack scope prior to Mem2Reg.
I'm not sure we would want to do shrink-wrapping at -Onone. We would need to figure out how to preserve variable inspection after last-use.
No, that's fine for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next few comments
|
||
/// Invoke \p cleanup on all paths exiting a call. | ||
static void | ||
cleanupAfterCall(FullApplySite apply, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, it would be better to use the existing
insertAfterFullEvaluation
and call apass.prepareBuilder
Yes.
which would also remove the FIXME below.
The FIXME's identify cases that don't have unit tests in this PR. So the goal here isn't to get rid of them. Follow up work is on branches blocked by this PR.
|
||
/// For use-projections, identifies the operand index of the composing use. | ||
/// Only valid for non-phi use projections. | ||
uint16_t projectedOperandNum = InvalidOper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 32 bits.
There are (crazy) situations where 16 bits are not enough. E.g. see #35289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are (crazy) situations where 16 bits are not enough. E.g. see #35289
@eeckstein In this case, I'll simply avoid projecting storage into a use beyond 64k operands. It is worthwhile for ValueStorage to be 2 words, and introducing a PointerIntPair is not worthwhile. Note that this actually still optimizes the preposterous test case you pointed out, because that type is nested.
d6f31a7
In general, the way to handle the problem of importing a static array larger than 64k is definitely not to use 32-bits for an operand index. This will blow up the compiler in any many places where the complexity is cubic or worse when the number of operands are considered proportional to the number of instructions and size of the use list.
The only hope of making the compiler handle types this large is to actively guard against analyzing operands lists larger then 64K. That means going through all the -Onone passes and inserting checks to bypass any analysis of those instructions.
// ValueStorageMap | ||
// | ||
// Map Opaque SILValues to abstract storage units. | ||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one piece of information which I found essential for understanding the algorithm and which took me a long time to figure out by reading the code. I would describe it as follows:
"
In general, each opaque SILValue is mapped to a separate alloc_stack
.
As an optimization - to avoid copies -, projected values reuse the memory of their "parent" aggregate (or phi) values.
This can be either a
- def-projection, e.g.
%x = struct_extract %y, #f
where %x
shares the memory location of field f
in %y
.
- use-projection, e.g.
%s = struct(..., %t, ...)
where %t
shares the memory location of the n'th field in %s
.
- phi-projection, e.g.
br bb2(%b)
bb2(%a):
where %b
shares the memory of %a
.
"
I'm not sure where to put this comment. Either here or in AddressLowering.h where you already described the different kind of projections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this description inside the algorithm docs. This is now documented in two places but I think that's reasonable.
2132628
The def-projections are not "just optimizations". The existing file-level docs right above the new docs explains why...
void OpaqueStorageAllocation::allocatePhi(PhiValue phi) { | ||
// Coalesces phi operand storage with the phi storage. The algorithm processes | ||
// all incoming values at once, so it is is run when visiting the block | ||
// argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a comment that in most cases all incoming values of a phi can resuse the phi's storage (at least I think so).
And provide an example, where this is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the PhiStorageOptimizer was not well documented at the top level. I'm pasting the file comment here because I can't refer to commits after rebasing
/// PhiStorageOptimizer implements an analysis used by AddressLowering
/// to reuse storage across block arguments.
///
/// In OSSA, phi operands can often be coalesced because they are
/// consuming--they end the lifetime of their operand. This optimization may
/// fail to coalesce an operand for two major reasons:
///
/// 1. This phi operand is already coalesced with other storage, possibly of a
/// different type:
///
/// %field = struct_extract %struct : $Struct<T>, #field
/// br bb(%field : $T)
///
/// bb(%phi : @owned $T):
/// ...
///
/// 2. This phi operand interferes with another coalesced phi operand.
///
/// Only one of the call results below, either %get0 or %get1, can be coalesced
/// with %phi. The %phi will itself be coalesced with this function's indirect
/// @out argument.
///
/// sil [ossa] @function : $@convention(thin) <T> () -> @out T {
/// bb0:
/// %get0 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0
/// %get1 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0
/// cond_br undef, bb2, bb1
///
/// bb1:
/// destroy_value %get0 : $T
/// br bb3(%get1 : $T)
///
/// bb2:
/// destroy_value %get1 : $T
/// br bb3(%get0 : $T)
///
/// bb3(%phi : @owned $T):
/// return %phi : $T
///
/// TODO: Liveness is currently recorded at the block level. This could be
/// extended to handle operand with nonoverlapping liveness in the same
/// block. In this case, %get0 and %get1 could both be coalesced with a bit of
/// extra book-keeping:
///
/// bb0:
/// %get0 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0
///
/// bb1:
/// destroy_value %get0 : $T
/// %get1 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0
/// br bb3(%get1 : $T)
///
/// bb2:
/// br bb3(%get0 : $T)
///
/// bb3(%phi : @owned $T):
SILValue materializeTupleExtract(SILInstruction *extractInst, | ||
SILValue elementValue, unsigned fieldIdx); | ||
|
||
SILValue materializeProjectionIntoUse(Operand *operand, bool intoPhiOperand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
materializeProjectionIntoUse
sounds like the same as materializeUseProjectionStorage
.
IIUC, materializeUseProjectionStorage
adds some caching around materializeProjectionIntoUse
.
Maybe you can find better names for these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, materializeUseProjectionStorage adds some caching around materializeProjectionIntoUse
These are mutually recursive for materializing use projections. I renamed the more general entry point to recursivelyMaterializeStorage
SILValue def = operand->get(); | ||
SILValue destAddr; | ||
if (operand->get()->getType().isAddressOnly(*pass.F)) { | ||
if (def->getType().isAddressOnly(*pass.function)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be def->getType().isAddress()
?
If it's an address type but not an address-only type we are creating a store here, which should be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be def->getType().isAddress()?
No. An address can't be passed into a struct/tuple. I renamed this to initializeComposingUse
.
Merge the AddressLowering pass from its old development branch and update it so we can begin incrementally enabling it under a flag. This has been reimplemented for simplicity. There's no point in looking at the old code.
This could happen as a result of specialization or concrete address-only values. For now, it's just tested by SIL unit tests.
Mostly documentation and typos.
Compute the latestOpeningInst, not the firstOpeningInst.
In classic compiler terminology, this is a "phi copy" algorithm. But the documentation now tries to clearly distinguish between "semantics copies" vs. moves, where moves are "storage copies".
Explain high-level objectives and terminology with more precision.
8cb6207
to
7a06d15
Compare
Avoid attempting to coalesce enum payloads.
7a06d15
to
72817df
Compare
I'm planning to merge this PR so we can collaborate on bug fixes, particularly in SILGen. The pass is guarded by a flag, so it should have no effect on 5.7 functionality. We only need it in tree for testing other areas of the compiler. |
@swift-ci test |
Merge the AddressLowering pass from its old development branch and update it so we can begin incrementally enabling it under a flag.
This has been reimplemented for simplicity. There's no point in looking at the old code.
Note: OSSA is required to correctly handle ownership when lowering. The original prototype attempted to "fake" OSSA for generic values, which wasn't viable. This new version has been completely updated for OSSA. The memory lifetime verifier is another prerequisite for this pass and is also now enabled.
The focus of the current implementation is SIL-level testing of the lowering algorithm. The objective of this algorithm is to avoid unnecessary copies by reusing storage when lowering composition of aggregates and merging subjects across phis. Tests have been developed for the algorithm's corner cases.
The next step is to test lowering immediately after SILGen and develop functional testing across the breadth of SILGen's output.