Skip to content

Copy-on-write representation in SIL: instructions and builtins #31728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 14, 2020

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented May 12, 2020

This is the first PR for copy-on-write (COW) representation in SIL. It adds new SIL instructions and instruction flags and builtins which can be used in the stdlib to create the new SIL instructions.
It's still a NFC (regarding the generated code) because the new instructions and builtins are not used in the library yet.

The goal is to have a clean representation of COW data structures in SIL, most importantly for stdlib's Array. It helps the optimizer and avoids hacks in the code where we currently have to hard code the Array type.

For example, the optimizer then knows that an Array is immutable when this array is immutable in the source code. This is currently not the case.

Note that this work is purely about the SIL representation and not about a language feature. The builtins allow to use the COW representation in the stdlib. It's possible to create a nicer language feature around this, but this is not in the scope of this work.

In case anyone is interested, this is the remaining (not yet finished) work: #31730

I added a section in SIL.rst which gives an overview of the COW representation. I copy it here for convenience:

Copy-on-Write Representation

Copy-on-Write (COW) data structures are implemented by a reference to an object
which is copied on mutation in case it's not uniquely referenced.

A COW mutation sequence in SIL typically looks like:

    (%uniq, %buffer) = begin_cow_mutation %immutable_buffer : $BufferClass
    cond_br %uniq, bb_uniq, bb_not_unique
  bb_uniq:
    br bb_mutate(%buffer : $BufferClass)
  bb_not_unique:
    %copied_buffer = apply %copy_buffer_function(%buffer) : ...
    br bb_mutate(%copied_buffer : $BufferClass)
  bb_mutate(%mutable_buffer : $BufferClass):
    %field = ref_element_addr %mutable_buffer : $BufferClass, #BufferClass.Field
    store %value to %field : $ValueType
    %new_immutable_buffer = end_cow_mutation %buffer : $BufferClass

Loading from a COW data structure looks like:

    %field1 = ref_element_addr [immutable] %immutable_buffer : $BufferClass, #BufferClass.Field
    %value1 = load %field1 : $*FieldType
    ...
    %field2 = ref_element_addr [immutable] %immutable_buffer : $BufferClass, #BufferClass.Field
    %value2 = load %field2 : $*FieldType

The immutable attribute means that loading values from ref_element_addr
and ref_tail_addr instructions, which have the same operand, are
equivalent.
In other words, it's guaranteed that a buffer's properties are not mutated
between two ref_element/tail_addr [immutable] as long as they have the
same buffer reference as operand.
This is even true if e.g. the buffer 'escapes' to an unknown function.

In the example above, %value2 is equal to %value1 because the operand
of both ref_element_addr instructions is the same %immutable_buffer.
Conceptually, the content of a COW buffer object can be seen as part of
the same static (immutable) SSA value as the buffer reference.

The lifetime of a COW value is strictly separated into mutable and
immutable regions by begin_cow_mutation and
end_cow_mutation instructions::

  %b1 = alloc_ref $BufferClass
  // The buffer %b1 is mutable
  %b2 = end_cow_mutation %b1 : $BufferClass
  // The buffer %b2 is immutable
  (%u1, %b3) = begin_cow_mutation %b1 : $BufferClass
  // The buffer %b3 is mutable
  %b4 = end_cow_mutation %b3 : $BufferClass
  // The buffer %b4 is immutable
  ...

Both, begin_cow_mutation and end_cow_mutation, consume their operand
and return the new buffer as an owned value.
The begin_cow_mutation will compile down to a uniqueness check and
end_cow_mutation will compile to a no-op.

Although the physical pointer value of the returned buffer reference is the
same as the operand, it's important to generate a new buffer reference in
SIL. It prevents the optimizer from moving buffer accesses from a mutable into
a immutable region and vice versa.

Because the buffer content is conceptually part of the
buffer reference SSA value, there must be a new buffer reference every time
the buffer content is mutated.

To illustrate this, let's look at an example, where a COW value is mutated in
a loop. As with a scalar SSA value, also mutating a COW buffer will enforce a
phi-argument in the loop header block (for simplicity the code for copying a
non-unique buffer is not shown)::

  header_block(%b_phi : $BufferClass):
    (%u, %b_mutate) = begin_cow_mutation %b_phi : $BufferClass
    // Store something to %b_mutate
    %b_immutable = end_cow_mutation %b_mutate : $BufferClass
    cond_br %loop_cond, exit_block, backedge_block
  backedge_block:
    br header_block(b_immutable : $BufferClass)
  exit_block:

Two adjacent begin_cow_mutation and end_cow_mutation instructions
don't need to be in the same function.

@eeckstein eeckstein requested review from jckarter and atrick May 12, 2020 11:05
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing the new instructions. The "requested change" flag is for the ARCAnalysis.cpp change.

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builtins mostly look good. I just have some questions and think some comments are warranted.

@@ -2249,9 +2281,16 @@ ValueDecl *swift::getBuiltinValueDecl(ASTContext &Context, Identifier Id) {

case BuiltinValueKind::IsUnique:
case BuiltinValueKind::IsUnique_native:
case BuiltinValueKind::BeginCOWmutation:
case BuiltinValueKind::BeginCOWmutation_native:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this since getIsUniqueOperation returns a single bool result

[EDIT] Oh, now I get it. Builtin.isUnique could be treated exactly like the new builtin, except that we can't assume there's an eventual end mutation marker. Could we use begin_cow_mutation for Builtin.isUnique anyway and just not worry that there will not be an end_cow_mutation?

This is a little tricky so it would be helpful to have a short comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should convert all COW data structures in the stdlib to Begin/EndCOWMutation. So isUnique should go away anyway (except some uses in assert conditions).

Copy link
Contributor

@atrick atrick May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will still be a public isUniquelyReferenced API though. The question is whether it should have a different implementation from CoW mutation... just curious what the plan is so we can design for it

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIL.rst docs look good

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein
Copy link
Contributor Author

@atrick Thanks for reviewing! I pushed a new version

eeckstein added 4 commits May 14, 2020 08:39
* a new [immutable] attribute on ref_element_addr and ref_tail_addr
* new instructions: begin_cow_mutation and end_cow_mutation

These new instructions are intended to be used for the stdlib's COW containers, e.g. Array.
They allow more aggressive optimizations, especially for Array.
* Builtin.COWBufferForReading -> ref_element_addr [immutable] / ref_tail_addr [immutable]
* Builtin.beginCOWmutation -> begin_cow_mutation
* Builtin.endCOWmutation -> end_cow_mutation
Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks!

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test and merge

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test and merge

@swift-ci swift-ci merged commit ec0b560 into swiftlang:master May 14, 2020
@eeckstein eeckstein deleted the cow-instructions branch May 14, 2020 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants