Update and reimplement AddressLowering pass (for SIL opaque values). #41557

atrick · 2022-02-25T06:06:40Z

Merge the AddressLowering pass from its old development branch and update it so we can begin incrementally enabling it under a flag.

This has been reimplemented for simplicity. There's no point in looking at the old code.

Note: OSSA is required to correctly handle ownership when lowering. The original prototype attempted to "fake" OSSA for generic values, which wasn't viable. This new version has been completely updated for OSSA. The memory lifetime verifier is another prerequisite for this pass and is also now enabled.

The focus of the current implementation is SIL-level testing of the lowering algorithm. The objective of this algorithm is to avoid unnecessary copies by reusing storage when lowering composition of aggregates and merging subjects across phis. Tests have been developed for the algorithm's corner cases.

The next step is to test lowering immediately after SILGen and develop functional testing across the breadth of SILGen's output.

atrick · 2022-02-25T06:06:50Z

@swift-ci test

atrick · 2022-03-01T08:19:41Z

@eeckstein I rewrote the header comments for ValueStorage, added comments to `runOnFunction, and clarified SIL examples a bit.

eeckstein

A few initial comments.
I'll continue reviewing next week

lib/SILOptimizer/Mandatory/AddressLowering.cpp

eeckstein · 2022-02-28T12:52:25Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+/// reuse the borrowed value's storage. This means that we SIL cannot allow
+/// guaranteed opaque uses unless they are projections of the definition. In
+/// particular, borrowed structs, tuples, and enums of address-only types are
+/// not allowed.


Why is this not allowed? Maybe you can add an example to clarify this.
Does that mean they are not allowed throughout the pipeline? Is this a general SIL invariant, and is this documented in SIL.rst?

SIL.rst

Forwarding Address-Only Values ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Address-only values are potentially unmovable when borrowed. This means that they cannot be forwarded with guaranteed ownership unless the forwarded value has the same representation as in the original value and can reuse the same storage. Non-destructive projection is allowed, such as `struct_extract`. Aggregation, such as `struct`, and destructive disaggregation, such as `switch_enum` is not allowed. This is an invariant for OSSA with opaque SIL values for these reasons: 1. To avoid implicit semantic copies. For move-only values, this allows complete diagnostics. And in general, it makes it impossible for SIL passes to "accidentally" create copies. 2. To reuse borrowed storage. This allows the optimizer to share the same storage for multiple exclusive reads of the same variable, avoiding copies. It may also be necessary to support native Swift atomics, which will be unmovable-when-borrowed.

SILVerifier::checkOwnershipForwardingInst

// Address-only values are potentially unmovable when borrowed. Ensure that // guaranteed address-only values are forwarded with the same // representation. Non-destructive projection is allowed. Aggregation and // destructive disaggregation is not allowed. See SIL.rst, Forwarding // Addres-Only Values. if (ownership == OwnershipKind::Guaranteed && OwnershipForwardingMixin::isAddressOnly(i)) { require(OwnershipForwardingMixin::hasSameRepresentation(i), "Forwarding a guaranteed address-only value requires the same " "representation since no move or copy is allowed."); }

SILVerifier::visitSILPhiArgument

// Address-only values are potentially unmovable when borrowed. See also // checkOwnershipForwardingInst. A phi implies a move of its arguments // because they can't necessarilly all reuse the same storage. require((!arg->getType().isAddressOnly() || arg->getOwnershipKind() != OwnershipKind::Guaranteed) && "Guaranteed address-only phi not allowed--implies a copy");

AddressLowering file comment

/// This pass removes "opaque SILValues" by translating them into addressable /// memory locations such as a stack locations. This is mandatory for IRGen. /// /// Lowering to LLVM IR requires each SILValue's type to be a valid "SIL storage /// type". Opaque SILValues have address-only types. Address-only values require /// indirect storage in LLVM, so their SIL storage type must be an address type. /// /// This pass never creates copies except to replace explicit value copies /// (copy_value, load [copy], store). For move-only values, this allows complete /// diagnostics. And in general, it makes it impossible for SIL passes to /// "accidentally" create copies. /// /// This pass inserts moves (copy_addr [take] [initialize]) of owned values to /// - compose aggregates /// - resolve phi interference /// /// For guarantee values, this pass inserts neither copies nor /// moves. Address-only values are potentially unmovable when borrowed. This /// means that guaranteed address-only aggregates and phis are prohibited. This /// SIL invariant is enforced by SILVerifier::checkOwnershipForwardingInst() and /// SILVerifier::visitSILPhiArgument(). /// /// ## Step #1: Map opaque values

@rjmccall have I explained this well enough?

eeckstein · 2022-02-28T12:54:26Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+/// not allowed.
+///
+/// When owned values are consumed by phis, multiple storage locations are
+/// required to avoid interfering with other phi operands. However, the value


Maybe clarify what "phi operands" are: operands of terminators

In OSSA (and even SIL in general) it vastly simplifies everything to distinguish between phis and terminator results. The opcode names are complete nonsense, but in the SILArgument API, the PhiOperand abstraction, and all the code we've written in the past year, the distinction is very clear.

Added (Phi values are block arguments in which phi's arguments are branch operands).

lib/SILOptimizer/Mandatory/AddressLowering.cpp

lib/SILOptimizer/Mandatory/AddressLowering.h

lib/SILOptimizer/Mandatory/AddressLowering.cpp

lib/SILOptimizer/Mandatory/AddressLowering.h

lib/SILOptimizer/Mandatory/AddressLowering.cpp

atrick · 2022-03-08T07:23:05Z

I'll continue reviewing next week

It seems very time consuming to fill out a git comment for every typo or minor suggestion. I just copy-paste all the suggestions into one file while reviewing. No line numbers needed.

lib/SILOptimizer/Mandatory/AddressLowering.cpp

eeckstein · 2022-03-07T11:44:58Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+    }
+  });
+  auto allocPt = firstOpeningInst ? std::next(firstOpeningInst->getIterator())
+                                  : pass.function->begin()->begin();


Does this mean that all alloc_stacks are created at the function begin (except the type has opened archetypes)?
Isn't this very bad for LLVM optimizations, because LLVM cannot separate the lifetimes of the individual stack values?

@eeckstein Initially (5 years ago) I was paranoid about -Onone compile time. Not anymore. We certainly could do an analysis here to figure out the lifetime of stack slots.

I don't think shrink wrapping will make any difference until address lowering is run after inlining, so nothing needs to happen in this PR. It is, however, still worth thinking about.

We can't actually figure out the optimal alloc_stack placement until we know all the places where we'll end up storing into the reused location. This would be a completely separate analysis. I can't think of any reason to complicate address lowering with that analysis. If that's a valuable thing to do, then why should it be confined to this pass? And why confined to address-only values? Shouldn't we perform shrink wrapping on all stack allocations?

I'm open to all suggestions.

@aschwaighofer any ideas?

Another concern is that we have several peephole optimizations which only kick in if all uses of an alloc_stack are in the same block.

But I agree, this pass is already very complex. It would make more sense to implement stack-location shrink wrapping as a separate pass. It might be necessary to do it before opaque values are turned on to not regress performance.

Another concern is that we have several peephole optimizations which only kick in if all uses of an alloc_stack are in the same block.

That's too bad. For the sake of optimizer stability, I don't think we should have any peephole optimizations that make such an assumption.

I agree that shrink-wrapping should be done before address lowering is enabled after Mem2Reg runs at -O. Can you think of any reason this would cause a problem before Mem2Reg?

Note that any address-only local variables will still have the same local alloc_stack scope prior to Mem2Reg.

I'm not sure we would want to do shrink-wrapping at -Onone. We would need to figure out how to preserve variable inspection after last-use.

eeckstein · 2022-03-08T07:34:43Z

It seems very time consuming to fill out a git comment for every typo or minor suggestion.

No, that's fine for me.

eeckstein

next few comments

eeckstein · 2022-03-09T15:10:42Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+
+/// Invoke \p cleanup on all paths exiting a call.
+static void
+cleanupAfterCall(FullApplySite apply,


Still, it would be better to use the existing insertAfterFullEvaluation and call a pass.prepareBuilder

Yes.

which would also remove the FIXME below.

The FIXME's identify cases that don't have unit tests in this PR. So the goal here isn't to get rid of them. Follow up work is on branches blocked by this PR.

lib/SILOptimizer/Mandatory/AddressLowering.cpp

eeckstein · 2022-03-09T15:28:28Z

lib/SILOptimizer/Mandatory/AddressLowering.h

+
+  /// For use-projections, identifies the operand index of the composing use.
+  /// Only valid for non-phi use projections.
+  uint16_t projectedOperandNum = InvalidOper;


This should be 32 bits.
There are (crazy) situations where 16 bits are not enough. E.g. see #35289

There are (crazy) situations where 16 bits are not enough. E.g. see #35289

@eeckstein In this case, I'll simply avoid projecting storage into a use beyond 64k operands. It is worthwhile for ValueStorage to be 2 words, and introducing a PointerIntPair is not worthwhile. Note that this actually still optimizes the preposterous test case you pointed out, because that type is nested.
d6f31a7

In general, the way to handle the problem of importing a static array larger than 64k is definitely not to use 32-bits for an operand index. This will blow up the compiler in any many places where the complexity is cubic or worse when the number of operands are considered proportional to the number of instructions and size of the use list.

The only hope of making the compiler handle types this large is to actively guard against analyzing operands lists larger then 64K. That means going through all the -Onone passes and inserting checks to bypass any analysis of those instructions.

eeckstein · 2022-03-09T15:42:08Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+//                              ValueStorageMap
+//
+//              Map Opaque SILValues to abstract storage units.
+//===----------------------------------------------------------------------===//


There is one piece of information which I found essential for understanding the algorithm and which took me a long time to figure out by reading the code. I would describe it as follows:

"
In general, each opaque SILValue is mapped to a separate alloc_stack.
As an optimization - to avoid copies -, projected values reuse the memory of their "parent" aggregate (or phi) values.
This can be either a

def-projection, e.g.

%x = struct_extract %y, #f

where %x shares the memory location of field f in %y.

use-projection, e.g.

%s = struct(..., %t, ...)

where %t shares the memory location of the n'th field in %s.

phi-projection, e.g.

br bb2(%b) bb2(%a):

where %b shares the memory of %a.
"

I'm not sure where to put this comment. Either here or in AddressLowering.h where you already described the different kind of projections.

I added this description inside the algorithm docs. This is now documented in two places but I think that's reasonable.
2132628

The def-projections are not "just optimizations". The existing file-level docs right above the new docs explains why...

lib/SILOptimizer/Mandatory/AddressLowering.h

eeckstein · 2022-03-09T15:57:18Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+void OpaqueStorageAllocation::allocatePhi(PhiValue phi) {
+  // Coalesces phi operand storage with the phi storage. The algorithm processes
+  // all incoming values at once, so it is is run when visiting the block
+  // argument.


It would be nice to add a comment that in most cases all incoming values of a phi can resuse the phi's storage (at least I think so).
And provide an example, where this is not the case.

Yes, the PhiStorageOptimizer was not well documented at the top level. I'm pasting the file comment here because I can't refer to commits after rebasing

/// PhiStorageOptimizer implements an analysis used by AddressLowering /// to reuse storage across block arguments. /// /// In OSSA, phi operands can often be coalesced because they are /// consuming--they end the lifetime of their operand. This optimization may /// fail to coalesce an operand for two major reasons: /// /// 1. This phi operand is already coalesced with other storage, possibly of a /// different type: /// /// %field = struct_extract %struct : $Struct<T>, #field /// br bb(%field : $T) /// /// bb(%phi : @owned $T): /// ... /// /// 2. This phi operand interferes with another coalesced phi operand. /// /// Only one of the call results below, either %get0 or %get1, can be coalesced /// with %phi. The %phi will itself be coalesced with this function's indirect /// @out argument. /// /// sil [ossa] @function : $@convention(thin) <T> () -> @out T { /// bb0: /// %get0 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0 /// %get1 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0 /// cond_br undef, bb2, bb1 /// /// bb1: /// destroy_value %get0 : $T /// br bb3(%get1 : $T) /// /// bb2: /// destroy_value %get1 : $T /// br bb3(%get0 : $T) /// /// bb3(%phi : @owned $T): /// return %phi : $T /// /// TODO: Liveness is currently recorded at the block level. This could be /// extended to handle operand with nonoverlapping liveness in the same /// block. In this case, %get0 and %get1 could both be coalesced with a bit of /// extra book-keeping: /// /// bb0: /// %get0 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0 /// /// bb1: /// destroy_value %get0 : $T /// %get1 = apply %get<T>() : $@convention(thin) <τ_0_0>() -> @out τ_0_0 /// br bb3(%get1 : $T) /// /// bb2: /// br bb3(%get0 : $T) /// /// bb3(%phi : @owned $T):

lib/SILOptimizer/Mandatory/AddressLowering.cpp

eeckstein · 2022-03-09T16:09:57Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

+  SILValue materializeTupleExtract(SILInstruction *extractInst,
+                                   SILValue elementValue, unsigned fieldIdx);
+
+  SILValue materializeProjectionIntoUse(Operand *operand, bool intoPhiOperand);


materializeProjectionIntoUse sounds like the same as materializeUseProjectionStorage.
IIUC, materializeUseProjectionStorage adds some caching around materializeProjectionIntoUse.
Maybe you can find better names for these functions.

IIUC, materializeUseProjectionStorage adds some caching around materializeProjectionIntoUse

These are mutually recursive for materializing use projections. I renamed the more general entry point to recursivelyMaterializeStorage

eeckstein · 2022-03-09T16:28:06Z

lib/SILOptimizer/Mandatory/AddressLowering.cpp

  SILValue def = operand->get();
-  SILValue destAddr;
-  if (operand->get()->getType().isAddressOnly(*pass.F)) {
+  if (def->getType().isAddressOnly(*pass.function)) {


Shouldn't this just be def->getType().isAddress()?
If it's an address type but not an address-only type we are creating a store here, which should be wrong.

Shouldn't this just be def->getType().isAddress()?

No. An address can't be passed into a struct/tuple. I renamed this to initializeComposingUse.

lib/SILOptimizer/Mandatory/AddressLowering.cpp

Merge the AddressLowering pass from its old development branch and update it so we can begin incrementally enabling it under a flag. This has been reimplemented for simplicity. There's no point in looking at the old code.

This could happen as a result of specialization or concrete address-only values. For now, it's just tested by SIL unit tests.

Mostly documentation and typos.

Compute the latestOpeningInst, not the firstOpeningInst.

In classic compiler terminology, this is a "phi copy" algorithm. But the documentation now tries to clearly distinguish between "semantics copies" vs. moves, where moves are "storage copies".

Explain high-level objectives and terminology with more precision.

Avoid attempting to coalesce enum payloads.

atrick · 2022-03-21T21:24:43Z

I'm planning to merge this PR so we can collaborate on bug fixes, particularly in SILGen. The pass is guarded by a flag, so it should have no effect on 5.7 functionality. We only need it in tree for testing other areas of the compiler.

atrick · 2022-03-21T21:24:50Z

@swift-ci test

atrick requested review from nate-chandler, eeckstein and aschwaighofer February 25, 2022 06:07

atrick force-pushed the addrlower-update branch 2 times, most recently from 0349674 to 69f8018 Compare March 1, 2022 08:18

atrick force-pushed the addrlower-update branch from 69f8018 to aab182f Compare March 1, 2022 08:22

eeckstein reviewed Mar 4, 2022

View reviewed changes

atrick force-pushed the addrlower-update branch from 6045f44 to 254ef75 Compare March 8, 2022 07:18

atrick force-pushed the addrlower-update branch from 254ef75 to b2fbd5d Compare March 8, 2022 07:29

eeckstein reviewed Mar 8, 2022

View reviewed changes

eeckstein reviewed Mar 9, 2022

View reviewed changes

atrick added 12 commits March 9, 2022 17:18

Move -enable-sil-opaque-value to SILOptions.

8a7e955

Add emitLoad/emitStore to OpaqueValue type lowering.

8457ba3

SILModule::hasLoweredAddress

2907c61

Add support for indirect tuple-type results.

b187ba0

This could happen as a result of specialization or concrete address-only values. For now, it's just tested by SIL unit tests.

[SIL-opaque] Code review suggestions

f4176b9

Mostly documentation and typos.

Fix alloc_stack placement for open_existential.

c4e167f

Compute the latestOpeningInst, not the firstOpeningInst.

Rename "phi copy" to "phi move" for consistency with documentation.

71175f7

In classic compiler terminology, this is a "phi copy" algorithm. But the documentation now tries to clearly distinguish between "semantics copies" vs. moves, where moves are "storage copies".

[SIL-opaque] avoid handling operands past 64k

d50c8f6

[SIL-opaque] More file-level documentation

fd45bd0

Explain high-level objectives and terminology with more precision.

[SIL-opaque] Add section-level comment explaining storage allocation

355ca16

[SIL-opaque] remove cleanupAfterCall helper

ef2bf97

atrick force-pushed the addrlower-update branch 2 times, most recently from 8cb6207 to 7a06d15 Compare March 10, 2022 05:28

[SIL-opaque] add a test case for phi coalescing

f79c6c8

Avoid attempting to coalesce enum payloads.

atrick added 3 commits March 9, 2022 22:16

[SIL-opaque] in-depth top-level documentation for phi coalescing.

e116998

[SIL-opaque] minor NFC review feedback

7f88908

[SIL-opaque] rename materialization functions.

72817df

atrick force-pushed the addrlower-update branch from 7a06d15 to 72817df Compare March 10, 2022 06:17

[SIL-opaque] rename initializeComposingUse

6f3a0c3

atrick merged commit 406aa86 into swiftlang:main Mar 22, 2022

atrick deleted the addrlower-update branch March 22, 2022 05:15

Update and reimplement AddressLowering pass (for SIL opaque values). #41557

Update and reimplement AddressLowering pass (for SIL opaque values). #41557

Uh oh!

Conversation

atrick commented Feb 25, 2022

Uh oh!

atrick commented Feb 25, 2022

Uh oh!

atrick commented Mar 1, 2022

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atrick commented Mar 8, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atrick Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Mar 8, 2022

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

eeckstein Mar 9, 2022 • edited by atrick Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atrick Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atrick Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atrick Mar 9, 2022 •

edited

Loading

eeckstein Mar 9, 2022 •

edited

Loading

eeckstein Mar 9, 2022 •

edited by atrick

Loading

atrick Mar 9, 2022 •

edited

Loading

atrick Mar 9, 2022 •

edited

Loading