[sil] add instruction copy_to_ref #30787

zoecarver · 2020-04-03T01:58:48Z

Storing class properties as a tuple or struct can be beneficial because most optimization passes only work on non-class datatypes (struct, tuple, and enum). To get the best performance it's ideal to memcpy the entire object to the first element of the class reference (rather than emit a dozen ref_element_addr, tuple_extract, and store instructions). That's where this instruction comes into play (from SIL.rst):

copy_to_ref %0 to %1 : $C
  // %0 must be a pointer.
  // %1 must be class type.

Given an instance of a tuple or struct, copy_to_ref bitcasts the "from" value to a byte
pointer and emits a memcpy to the "to" operand which is a reference. The "from" value must have all stored properties of the "to" operand. Here's an example use:

   class Foo {
    var x: Int
    var y: Int
    var z: Int
    var cond: Bool
  }

   %f = alloc_ref $Foo
  %sa = alloc_stack $(Int, Int, Int, Bool)
  %t = tuple (%0 : $Int, %1 : Int, %2 : Int, %3 : Bool)
  store %t to %sa : $*(Int, Int, Int, Bool)
  
  copy_to_ref %sa to %f : $Foo

See #30736 for previous discussion/rational.

This sil instruction can be used to copy an address to a class reference. It does this by memcpying the address to the first element in the class reference (the element after the reference count object). The source must contain all stored properties of the class reference and must be an address. See SIL.rst for more details.

zoecarver · 2020-04-03T01:59:58Z

test/IRGen/copy_to_ref.sil

+  var x : Int
+}
+
+// CHECK-LABEL: define swiftcc void @test_int


As you can see, even without any added optimizations, the llvm-ir codegen is substantially better.

eeckstein · 2020-04-03T12:09:28Z

Some discussion points:

We need to think about ownership. This doesn't matter for trivial types, like Int, but it's important for all non-trivial types. It would make sense that copy_to_ref consumes the source operand, rather than to copy it. Because then the source would just be "moved" to the destination, which is much cheaper than copying. It can be done with memcpy (at least for most types).
In this case "copy_to_ref" is the wrong name - it should be "store_to_ref".

A "store_to_ref" is a shortcut for a series of

  %e = ref_element_addr %dest, field<n>
  %v = tuple_extract %src, <n>
  store %v to [init] %e

The question we have to answer is: what benefit would such a new instruction give us compared to using this series of instructions? Note that converting such a series to a memcpy can also be done by IRGen or an llvm optimization.

There are some optimizations, like LetPropertiesOpt, which rely on ref_element_addr to be there to check all possible accesses to a specific class property. Those optimizations would need to take "store_to_ref" into account.

I'd like to bring up two alternatives. I'm not sure by myself if they are better - just for brainstorming:

Split a "store_to_ref" into two instructions: a new ref_payload_addr and a regular store

  %a = ref_payload_addr %dest // projects the whole content of the class object (the implementation is equivalent to ref_element_addr of the first field)
  store %src to [init] %a

I assume an alloc_ref and a store_to_ref always come as a pair. We could add the source value as an additional operand to alloc_ref

  %ref = alloc_ref $Foo (%src)

zoecarver · 2020-04-03T18:48:04Z

@eeckstein thanks for those suggestions. I like that neither of your alternatives use addresses. This will make it easier to optimize all the allocations away.

The only downside is that in order to replace all ref_element_addrs with tuple_element_addrs we would have to have to create a load. It's rare that a class will have no uses other than initialization but, it's fairly common that a class's only use after initialization will be ref_element_addr instructions. If we can replace all those, we can remove the class and there's a good chance that mem2reg can get rid of the tuple alloc_stack. Anyway, I don't think that should be an issue at all, just a thing to keep in mind.

I tested some various implementations and found that both using the ref_element_addr/tuple_extract/store and copy_to_ref (or whatever equivilat instruction we come up with) genereate the **exact** same llvm-ir.

class XYZClass {
  var x : Int
  var y : Int
  var z : Int
}

sil @using_copy : $@convention(thin) () -> XYZClass {
bb0:
  %0 = integer_literal $Builtin.Int64, 1
  %1 = struct $Int (%0 : $Builtin.Int64)
  %2 = integer_literal $Builtin.Int64, 2
  %3 = struct $Int (%2 : $Builtin.Int64)
  %4 = tuple (%1 : $Int, %3 : $Int, %3 : $Int)

  %tsa = alloc_stack $(Int, Int, Int)
  %f = alloc_ref $XYZClass
  
  store %4 to %tsa : $*(Int, Int, Int)

  copy_to_ref %tsa : $*(Int, Int, Int) to %f : $XYZClass
  
  dealloc_stack %tsa : $*(Int, Int, Int)
  
  return %f : $XYZClass
}

sil @using_store : $@convention(thin) () -> XYZClass {
bb0:
  %0 = integer_literal $Builtin.Int64, 1
  %1 = struct $Int (%0 : $Builtin.Int64)
  %2 = integer_literal $Builtin.Int64, 2
  %3 = struct $Int (%2 : $Builtin.Int64)
  %4 = tuple (%1 : $Int, %3 : $Int, %3 : $Int)

  %f = alloc_ref $XYZClass
  
  %ex = ref_element_addr %f: $XYZClass, #XYZClass.x
  %x = tuple_extract %4 : $(Int, Int, Int), 0
  store %x to %ex : $*Int
  
  %ey = ref_element_addr %f: $XYZClass, #XYZClass.y
  %y = tuple_extract %4 : $(Int, Int, Int), 1
  store %y to %ey : $*Int
  
  %ez = ref_element_addr %f: $XYZClass, #XYZClass.z
  %z = tuple_extract %4 : $(Int, Int, Int), 2
  store %z to %ez : $*Int
  
  return %f : $XYZClass
}

sil_vtable XYZClass { }

Given that, it seems like the best course of action may be not to add any instructions. What do you think? Is it worth the sil code size/clarity improvements?

eeckstein · 2020-04-06T13:28:43Z

Given that, it seems like the best course of action may be not to add any instructions. What do you think?

I agree

Is it worth the sil code size/clarity improvements?

IMO, no

zoecarver · 2020-04-06T16:02:44Z

@eeckstein great, glad we're in agreement. See #30810.

zoecarver added 2 commits April 2, 2020 15:46

Add IRGen test for copy_to_ref to test both IRGen and llvm optimization

1dde9c3

zoecarver requested a review from eeckstein April 3, 2020 01:59

zoecarver commented Apr 3, 2020

View reviewed changes

zoecarver mentioned this pull request Apr 4, 2020

[opt] store properties of stack alloc_ref in tuple #30810

Closed

zoecarver closed this Apr 6, 2020

zoecarver mentioned this pull request Apr 6, 2020

[opt] promote stack alloc_ref to object #30743

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sil] add instruction copy_to_ref #30787

[sil] add instruction copy_to_ref #30787

Uh oh!

zoecarver commented Apr 3, 2020

Uh oh!

zoecarver Apr 3, 2020

Uh oh!

eeckstein commented Apr 3, 2020 •

edited

Loading

Uh oh!

zoecarver commented Apr 3, 2020

Uh oh!

eeckstein commented Apr 6, 2020

Uh oh!

zoecarver commented Apr 6, 2020

Uh oh!

Uh oh!

[sil] add instruction copy_to_ref #30787

[sil] add instruction copy_to_ref #30787

Uh oh!

Conversation

zoecarver commented Apr 3, 2020

Uh oh!

zoecarver Apr 3, 2020

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoecarver commented Apr 3, 2020

Uh oh!

eeckstein commented Apr 6, 2020

Uh oh!

zoecarver commented Apr 6, 2020

Uh oh!

Uh oh!

eeckstein commented Apr 3, 2020 •

edited

Loading