|
| 1 | +# Key Path Memory Layout |
| 2 | + |
| 3 | +**Key path objects** are laid out at runtime as a heap object with a |
| 4 | +variable-sized payload containing a sequence of encoded components describing |
| 5 | +how the key path traverses a value. When the compiler sees a key path literal, |
| 6 | +it generates a **key path pattern** that can be efficiently interpreted by |
| 7 | +the runtime to instantiate a key path object when needed. This document |
| 8 | +describes the layout of both. The key path pattern layout is designed in such a |
| 9 | +way that it can be transformed in-place into a key path object with a one-time |
| 10 | +initialization in the common case where the entire path is fully specialized |
| 11 | +and crosses no resilience boundaries. |
| 12 | + |
| 13 | +## ABI Concerns For Key Paths |
| 14 | + |
| 15 | +For completeness, this document describes the layout of both key path objects |
| 16 | +and patterns; note however that the instantiated runtime layout of key path |
| 17 | +objects is an implementation detail of the Swift runtime, and *only key path |
| 18 | +patterns* are strictly ABI, since they are emitted by the compiler. The |
| 19 | +runtime has the freedom to change the runtime layout of key path objects, but |
| 20 | +will have to maintain the ability to instantiate from key path patterns emitted |
| 21 | +by previous ABI-stable versions of the Swift complier. |
| 22 | + |
| 23 | +## Key Path Objects |
| 24 | + |
| 25 | +### Buffer Header |
| 26 | + |
| 27 | +Key path objects begin with the standard Swift heap object header, followed by a |
| 28 | +key path object header. Relative to the start of the heap object header: |
| 29 | + |
| 30 | +Offset | Description |
| 31 | +------- | ---------------------------------------------- |
| 32 | +`0` | Pointer to KVC compatibility C string, or null |
| 33 | +`1*sizeof(Int)` | Key path buffer header (32 bits) |
| 34 | + |
| 35 | +If the key path is Cocoa KVC-compatible, the first word will be a pointer to |
| 36 | +the equivalent KVC string as a null-terminated UTF-8 C string. It will be null |
| 37 | +otherwise. The **key path buffer header** in the second word contains the |
| 38 | +following bit fields: |
| 39 | + |
| 40 | +Bits (LSB zero) | Description |
| 41 | +--------------- | ----------- |
| 42 | +0...23 | **Buffer size** in bytes |
| 43 | +24...29 | Reserved. Must be zero in Swift 4 runtime |
| 44 | +30 | 1 = Has **reference prefix**, 0 = No reference prefix |
| 45 | +31 | 1 = Is **trivial**, 0 = Has destructor |
| 46 | + |
| 47 | +The *buffer size* indicates the total size in bytes of the components following |
| 48 | +the key path buffer header. A `ReferenceWritableKeyPath` may have a *reference |
| 49 | +prefix* of read-only components that can be projected before initiating |
| 50 | +mutation; bit 30 is set if one is present. A key path may capture values that |
| 51 | +require cleanup when the key path object is deallocated, but a key path that |
| 52 | +does not capture any values with cleanups will have the *trivial* bit 31 set to |
| 53 | +fast-path deallocation. |
| 54 | + |
| 55 | +Components are always pointer-aligned, so the first component always starts at |
| 56 | +offset `2*sizeof(Int)`. On 64-bit platforms, this leaves four bytes of padding. |
| 57 | + |
| 58 | +### Components |
| 59 | + |
| 60 | +After the buffer header, one or more **key path components** appear in memory |
| 61 | +in sequence. Each component begins with a 32-bit **key path component header** |
| 62 | +describing the following component. |
| 63 | + |
| 64 | +Bits (LSB zero) | Description |
| 65 | +--------------- | ----------- |
| 66 | +0...28 | **Payload** (meaning is dependent on component kind) |
| 67 | +29...30 | **Component kind** |
| 68 | +31 | 1 = **End of reference prefix**, 0 = Not end of reference prefix |
| 69 | + |
| 70 | +If the key path has a *reference prefix*, then exactly one component must have |
| 71 | +the *end of reference prefix* bit set in its component header. This indicates |
| 72 | +that the component after the end of the reference prefix will initiate mutation. |
| 73 | + |
| 74 | +The following *component kinds* are recognized: |
| 75 | + |
| 76 | +Value in bit 30&29 | Description |
| 77 | +------------------ | ----------- |
| 78 | +0 | Struct/tuple/self stored property |
| 79 | +1 | Computed |
| 80 | +2 | Class stored property |
| 81 | +3 | Optional chaining/forcing/wrapping |
| 82 | + |
| 83 | +- A **struct stored property** component, when given |
| 84 | + a value of the base type in memory, can project the component value in-place |
| 85 | + at a fixed offset within the base value. This applies for struct stored |
| 86 | + properties, tuple fields, and the `.self` identity component (which trivially |
| 87 | + projects at offset zero). The |
| 88 | + *payload* contains the offset in bytes of the projected field in the |
| 89 | + aggregate, or the special value `0x1FFF_FFFF`, which indicates that the |
| 90 | + offset is too large to pack into the payload and is stored in the next 32 bits |
| 91 | + after the header. |
| 92 | +- A **class stored property** component, when given a reference to a class |
| 93 | + instance, can project the component value inside the class instance at |
| 94 | + a fixed offset. The *payload* |
| 95 | + *payload* contains the offset in bytes of the projected field from the |
| 96 | + address point of the object, or the special value `0x1FFF_FFFF`, which |
| 97 | + indicates that the offset is too large to pack into the payload and is stored |
| 98 | + in the next 32 bits after the header. |
| 99 | +- An **optional** component performs an operation involving `Optional` values. |
| 100 | + The `payload` contains one of the following values: |
| 101 | + |
| 102 | + Value in payload | Description |
| 103 | + ---------------- | ----------- |
| 104 | + 0 | **Optional chaining** |
| 105 | + 1 | **Optional wrapping** |
| 106 | + 2 | **Optional force-unwrapping** |
| 107 | + |
| 108 | + A *chaining* component behaves like the postfix `?` operator, immediately |
| 109 | + ending the key path application and returning nil when the base value is nil, |
| 110 | + or unwrapping the base value and continuing projection on the non-optional |
| 111 | + payload when non-nil. If an optional chain ends in a non-optional value, |
| 112 | + an implicit *wrapping* component is inserted to wrap it up in an |
| 113 | + optional value. A *force-unwrapping* operator behaves like the postfix |
| 114 | + `!` operator, trapping if the base value is nil, or unwrapping the value |
| 115 | + inside the optional if not. |
| 116 | + |
| 117 | +- A **computed** component uses the conservative access pattern of `get`/`set` |
| 118 | + /`materializeForSet` to project from the base value. This is used as a |
| 119 | + general fallback component for any key path component without a more |
| 120 | + specialized representation, including not only computed properties but |
| 121 | + also subscripts, stored properties that require reabstraction, properties |
| 122 | + with behaviors or custom key path components (when we get those), and weak or |
| 123 | + unowned properties. The payload contains additional bitfields describing the |
| 124 | + component: |
| 125 | + |
| 126 | + Bits (LSB zero) | Description |
| 127 | + --------------- | ----------- |
| 128 | + 24 | 1 = **Has captured arguments**, 0 = no captures |
| 129 | + 25...26 | **Identifier kind** |
| 130 | + 27 | 1 = **Settable**, 0 = **Get-Only** |
| 131 | + 28 | 1 = **Mutating** (implies settable), 0 = Nonmutating |
| 132 | + |
| 133 | + The component can *capture* context which is stored after the component in |
| 134 | + the key path object, such as generic arguments from its original context, |
| 135 | + subscript index arguments, and so on. Bit 24 is set if there are any such |
| 136 | + captures. Bits 25 and 26 discriminate the *identifier* which is used to |
| 137 | + determine equality of key paths referring to the same components. If |
| 138 | + bit 27 is set, then the key path is **settable** and can be written through, |
| 139 | + and bit 28 indicates whether the set operation **is mutating** to the base |
| 140 | + value, that is, whether setting through the component changes the base value |
| 141 | + like a value-semantics property or modifies state indirectly like a class |
| 142 | + property or `UnsafePointer.pointee`. |
| 143 | + |
| 144 | + After the header, the component contains the following word-aligned fields: |
| 145 | + |
| 146 | + Offset from header | Description |
| 147 | + ------------------ | ----------- |
| 148 | + `1*sizeof(Int)` | The **identifier** of the component. |
| 149 | + `2*sizeof(Int)` | The **getter function** for the component. |
| 150 | + `3*sizeof(Int)` | (if settable) The **setter function** for the component |
| 151 | + |
| 152 | + The combination of the identifier kind bits and the identifier word are |
| 153 | + compared by the `==` operation on two key paths to determine whether they |
| 154 | + are equivalent. Neither the kind bits nor the identifier word |
| 155 | + have any stable semantic meaning other than as unique identifiers. |
| 156 | + In practice, the compiler picks a stable unique artifact of the |
| 157 | + underlying declaration, such as the naturally-abstracted getter entry point |
| 158 | + for a computed property, the offset of a reabstracted stored property, or |
| 159 | + an Objective-C selector for an imported ObjC property, to identify the |
| 160 | + component. The identifier kind bits are used to discriminate |
| 161 | + possibly-overlapping domains. |
| 162 | + |
| 163 | + The getter function is a pointer to a Swift function with the signature |
| 164 | + `@convention(thin) (@in Base, UnsafeRawPointer) -> @out Value`. When |
| 165 | + the component is applied, the getter is invoked with a copy of the base |
| 166 | + value and is passed a pointer to the captured arguments of the |
| 167 | + component. If the component has no captures, the second argument is |
| 168 | + undefined. |
| 169 | + |
| 170 | + The setter function is also a pointer to a Swift function. This field is |
| 171 | + only present if the *settable* bit of the header is set. If the |
| 172 | + component is nonmutating, then the function has signature |
| 173 | + `@convention(thin) (@in Base, @in Value, UnsafeRawPointer) -> ()`, |
| 174 | + or if it is mutating, then the function has signature |
| 175 | + `@convention(thin) (@inout Base, @in Value, UnsafeRawPointer) -> ()`. |
| 176 | + When a mutating application of the key path is completed, the setter is |
| 177 | + invoked with a copy of the base value (if nonmutating) or a reference to |
| 178 | + the base value (if mutating), along with a copy of the updated component |
| 179 | + value, and a pointer to the captured arguments of the component. If |
| 180 | + the component has no captures, the third argument is undefined. |
| 181 | + |
| 182 | + TODO: Make getter/nonmutating setter take base borrowed, |
| 183 | + yield borrowed result (materializeForGet); use materializeForSet |
| 184 | + |
| 185 | + If the component has captures, the capture area appears after the other |
| 186 | + fields, at offset `3*sizeof(Int)` for a get-only component or |
| 187 | + `4*sizeof(Int)` for a settable component. The area begins with a two-word |
| 188 | + header: |
| 189 | + |
| 190 | + Offset from start | Description |
| 191 | + ----------------- | ----------- |
| 192 | + `0` | Size of captures in bytes |
| 193 | + `1*sizeof(Int)` | Pointer to **argument witness table** |
| 194 | + |
| 195 | + followed by the captures themselves. The *argument witness table* contains |
| 196 | + pointers to functions needed for maintaining the captures: |
| 197 | + |
| 198 | + Offset | Description |
| 199 | + ---------------- | ----------- |
| 200 | + `0` | **Destroy**, or null if trivial |
| 201 | + `1*sizeof(Int)` | **Copy** |
| 202 | + `2*sizeof(Int)` | **Is Equal** |
| 203 | + `3*sizeof(Int)` | **Hash** |
| 204 | + |
| 205 | + The *destroy* function, if not null, has signature |
| 206 | + `@convention(thin) (UnsafeMutableRawPointer) -> ()` and is invoked to |
| 207 | + destroy the captures when the key path object is deallocated. |
| 208 | + |
| 209 | + The *copy* function has signature |
| 210 | + `@convention(thin) (_ src: UnsafeRawPointer, |
| 211 | + _ dest: UnsafeMutableRawPointer) -> ()` |
| 212 | + and is invoked when the captures need to be copied into a new key path |
| 213 | + object, for example when two key paths are appended. |
| 214 | + |
| 215 | + The *is equal* function has signature |
| 216 | + `@convention(thin) (UnsafeRawPointer, UnsafeRawPointer) -> Bool` |
| 217 | + and is invoked when the component is compared for equality with another |
| 218 | + computed component with the same identifier. |
| 219 | + |
| 220 | + The *hash* function has signature |
| 221 | + `@convention(thin) (UnsafeRawPointer, UnsafeRawPointer) -> Int` |
| 222 | + and is invoked when the key path containing the component is hashed. |
| 223 | + The implementation understands a return value of zero to mean that the |
| 224 | + captures should have no effect on the hash value of the key path. |
| 225 | + |
| 226 | +After every component except for the final component, a pointer-aligned |
| 227 | +pointer to the metadata for the type of the projected component is stored. |
| 228 | +(The type of the final component can be found from the `Value` generic |
| 229 | +argument of the `KeyPath<Root, Value>` type.) |
| 230 | + |
| 231 | +### Examples |
| 232 | + |
| 233 | +Given: |
| 234 | + |
| 235 | +```swift |
| 236 | +struct A { |
| 237 | + var padding: (128 x UInt8) |
| 238 | + var b: B |
| 239 | +} |
| 240 | + |
| 241 | +class B { |
| 242 | + var padding: (240 x UInt8) |
| 243 | + var c: C |
| 244 | +} |
| 245 | + |
| 246 | +struct C { |
| 247 | + var padding: (384 x UInt8) |
| 248 | + var d: D |
| 249 | +} |
| 250 | +``` |
| 251 | + |
| 252 | +On a 64-bit platform, a key path object representing `\A.b.c.d` might look like |
| 253 | +this in memory: |
| 254 | + |
| 255 | +Word | Contents |
| 256 | +---- | -------- |
| 257 | +0 | isa pointer to `ReferenceWritableKeyPath<A, D>` |
| 258 | +1 | reference counts |
| 259 | +`-` | `-` |
| 260 | +2 | buffer header 0xC000_0028 - trivial, reference prefix, buffer size 40 |
| 261 | +`-` | `-` |
| 262 | +3 | component header 0x8000_0080 - struct component, offset 128, end of prefix |
| 263 | +4 | type metadata pointer for `B` |
| 264 | +`-` | `-` |
| 265 | +5 | component header 0x4000_0100 - class component, offset 256 |
| 266 | +6 | type metadata pointer for `C` |
| 267 | +`-` | `-` |
| 268 | +7 | component header 0x0000_0180 - struct component, offset 384 |
| 269 | + |
| 270 | +If we add: |
| 271 | + |
| 272 | +``` |
| 273 | +struct D { |
| 274 | + var computed: E { get set } |
| 275 | +} |
| 276 | +
|
| 277 | +struct E { |
| 278 | + subscript(b: B) -> F { get } |
| 279 | +} |
| 280 | +``` |
| 281 | + |
| 282 | +then `\D.e[B()]` would look like: |
| 283 | + |
| 284 | +Word | Contents |
| 285 | +---- | -------- |
| 286 | +0 | isa pointer to `WritableKeyPath<D, E>` |
| 287 | +1 | reference counts |
| 288 | +`-` | `-` |
| 289 | +2 | buffer header 0x0000_0058 - buffer size 88 |
| 290 | +`-` | `-` |
| 291 | +3 | component header 0x3800_0000 - computed, settable, mutating |
| 292 | +4 | identifier pointer |
| 293 | +5 | getter |
| 294 | +6 | setter |
| 295 | +7 | type metadata pointer for `F` |
| 296 | +`-` | `-` |
| 297 | +8 | component header 0x2100_0000 - computed, has captures |
| 298 | +9 | identifier pointer |
| 299 | +10 | getter |
| 300 | +11 | argument size 8 |
| 301 | +12 | pointer to argument witnesses for releasing/retaining/equating/hashing `B` |
| 302 | +13 | value of `B()` |
| 303 | + |
| 304 | +## Key Path Patterns |
| 305 | + |
| 306 | +(to be written) |
0 commit comments