Skip to content

[pitch] Span-providing properties in the standard library #2620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 16, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
295 changes: 295 additions & 0 deletions proposals/AAAA-stdlib-span-properties.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Add `Span`-providing Properties to Standard Library Types

* Proposal: [PR-2620](https://github.com/swiftlang/swift-evolution/pull/2620)
* Author: [Guillaume Lessard](https://github.com/glessard)
* Review Manager: (tbd)
* Status: **Pitch**
* Roadmap: [BufferView Roadmap](https://forums.swift.org/t/66211)
* Bug: rdar://137710901
* Implementation: (tbd)
* Review: [pitch](https://forums.swift.org/t/76138)

[SE-0446]: https://github.com/swiftlang/swift-evolution/blob/main/proposals/0446-non-escapable.md
[SE-0447]: https://github.com/swiftlang/swift-evolution/blob/main/proposals/0447-span-access-shared-contiguous-storage.md
[PR-2305]: https://github.com/swiftlang/swift-evolution/pull/2305
[SE-0453]: https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md

## Introduction

We recently [introduced][SE-0447] the `Span` and `RawSpan` types, but did not provide ways to obtain instances of either from existing types. This proposal adds properties that vend a lifetime-dependent `Span` from a variety of standard library types, as well as vend a lifetime-dependent `RawSpan` when the underlying element type supports it.

## Motivation

Many standard library container types can provide direct access to their internal representation. Up to now, it has only been possible to do so in an unsafe way. The standard library provides this unsafe functionality with closure-taking functions such as `withUnsafeBufferPointer()`, `withContiguousStorageIfAvailable()` and `withUnsafeBytes()`. These functions have a few different drawbacks, most prominently their reliance on unsafe types, which makes them unpalatable in security-conscious environments. Closure-taking API can also be difficult to compose with new features and with one another. These issues are addressed head-on with non-escapable types in general, and `Span` in particular. With this proposal, compatible standard library types will provide access to their internal representation via computed properties of type `Span` and `RawSpan`.

## Proposed solution

Computed properties returning [non-escapable][SE-0446] copyable values represent a particular case of lifetime relationships between two bindings. While initializing a non-escapable value in general requires [lifetime annotations][PR-2305] in order to correctly describe the lifetime relationship, the specific case of computed properties returning non-escapable copyable values can only represent one type of relationship between the parent binding and the non-escapable instance it provides: a borrowing relationship.

For example, in the example below we have an instance of type `A`, with a well-defined lifetime because it is non-copyable. An instance of `A` can provide access to a type `B` which borrows the instance `A`:

```swift
struct A: ~Copyable, Escapable {}
struct B: ~Escapable, Copyable {
init(_ a: borrowing A) {}
}
extension A {
var b: B { B(self) }
}

func function() {
var a = A()
var b = a.b // access to `a` begins here
read(b)
// `b` has ended here, ending access to `a`
modify(&a) // `modify()` can have exclusive access to `a`
}
```
If we were to attempt using `b` again after the call to `modify(&a)`, the compiler would report an overlapping access error, due to attempting to mutate `a` (with `modify(&a)`) while it is already being accessed through `b`'s borrow. Note that the copyability of `B` means that it cannot represent a mutation of `A`; it therefore represents a non-exclusive borrowing relationship.

Given this, we propose to enable the definition of a borrowing relationship via a computed property. With this feature we then propose to add `storage` computed properties to standard library types that can share their internal typed storage, as well as `bytes` computed properties to those standard library types that can safely share their internal storage as untyped memory.

One of the purposes of `Span` is to provide a safer alternative to `UnsafeBufferPointer`. This proposal builds on it and allows us to rewrite code reliant on `withUnsafeBufferPointer()` to use `storage` properties instead. Eventually, code that requires access to contiguous memory can be rewritten to use `Span`, gaining better composability in the process. For example:

```swift
let result = try myArray.withUnsafeBufferPointer { buffer in
let indices = findElements(buffer)
var myResult = MyResult()
for i in indices {
try myResult.modify(buffer[i])
}
}
```

This closure-based call is difficult to evolve, such as making `result` have a non-copyable type, adding a concurrent task, or adding typed throws. An alternative based on a vended `Span` property would look like this:

```swift
let span = myArray.storage
let indices = findElements(span)
var myResult = MyResult()
for i in indices {
try myResult.modify(span[i])
}
```

In this version, code evolution is not constrained by a closure. Incorrect escapes of `span` will be diagnosed by the compiler, and the `modify()` function can be updated with typed throws, concurrency or other features as necessary.

## Detailed Design

Computed property getters returning non-escapable and copyable types (`~Escapable & Copyable`) become possible, requiring no additional annotations. The lifetime of their returned value depends on the type vending them. A `~Escapable & Copyable` value borrows another binding. In terms of the law of exclusivity, a borrow is a read-only access. Multiple borrows are allowed to overlap, but cannot overlap with any mutation.

A computed property getter defined on an `Escapable` type and returning a `~Escapable & Copyable` value establishes a borrowing lifetime relationship of the returned value on the callee's binding. As long as the returned value exists (including local copies,) then the callee's binding remains borrowed.

A computed property getter defined on a non-escapable and copyable (`~Escapable & Copyable`) type and returning a `~Escapable & Copyable` value copies the lifetime dependency of the callee. The returned value becomes an additional borrow of the callee's dependency, but is otherwise independent from the callee.

A computed property getter defined on a non-escapable and non-copyable (`~Escapable & ~Copyable`) type returning a `~Escapable & Copyable` value establishes a borrowing lifetime relationship of the returned value on the callee's binding. As long as the returned value exists (including local copies,) then the callee's binding remains borrowed.

By allowing the language to define lifetime dependencies in these limited ways, we can add `Span`-providing properties to standard library types.

#### <a name="extensions"></a>Extensions to Standard Library types

The standard library and Foundation will provide `storage` and computed properties, returning lifetime-dependent `Span` instances. These computed properties are the safe and composable replacements for the existing `withUnsafeBufferPointer` closure-taking functions.

```swift
extension Array {
/// Share this `Array`'s elements as a `Span`
var storage: Span<Element> { get }
}

extension ArraySlice {
/// Share this `Array`'s elements as a `Span`
var storage: Span<Element> { get }
}

extension ContiguousArray {
/// Share this `Array`'s elements as a `Span`
var storage: Span<Element> { get }
}

extension String.UTF8View {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is the behavior (and what are the performance characteristics) of this operation when a String has storage other than contiguous UTF-8? Would it be desirable to return an optional, or alternatively require the user to know or have called makeContiguousUTF8() on the String as a precondition?

Copy link
Contributor Author

@glessard glessard Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behind the scenes, we are implementing a new "lazy eager bridging" behaviour for bridged Array and bridged String. They will always succeed, but they will sometimes entail an allocation and copy (i.e. "usually O(1), sometimes O(n)".) An initial step is here. For native non-bridged instances, always O(1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! The implementation details would of course be out of scope for this text, but might be worth calling out here the performance implications for calling storage and bytes on these types, pros and cons for usability vs predictability, etc. (given that this is supposed to be a safe, performant API).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appended a paragraph to the detailed design to spell this out.

/// Share this `UTF8View`'s code units as a `Span`
var storage: Span<Unicode.UTF8.CodeUnit> { get }
}

extension Substring.UTF8View {
/// Share this `UTF8View`'s code units as a `Span`
var storage: Span<Unicode.UTF8.CodeUnit> { get }
}

extension CollectionOfOne {
/// Share this `Collection`'s element as a `Span`
var storage: Span<Element> { get }
}

extension SIMD_N_ { // where _N_ ∈ {2, 3, 4 ,8, 16, 32, 64}
/// Share this vector's elements as a `Span`
var storage: Span<Scalar> { get }
}

extension KeyValuePairs {
/// Share this `Collection`'s elements as a `Span`
var storage: Span<(Key, Value)> { get }
}
```

Conditionally to the acceptance of [`Vector`][SE-0453], we will also add the following:

```swift
extension Vector where Element: ~Copyable {
/// Share this vector's elements as a `Span`
var storage: Span<Element> { get }
}
```

#### Accessing the raw bytes of a `Span`

When a `Span`'s element is `BitwiseCopyable`, we allow viewing the underlying storage as raw bytes with `RawSpan`:

```swift
extension Span where Element: BitwiseCopyable {
/// Share the raw bytes of this `Span`'s elements
var bytes: RawSpan { get }
}
```

The returned `RawSpan` instance will borrow the same binding as is borrowed by the `Span`.

#### Extensions to unsafe buffer types

We hope that `Span` and `RawSpan` will become the standard ways to access shared contiguous memory in Swift, but current API provide `UnsafeBufferPointer` and `UnsafeRawBufferPointer` instances to do this. We will provide ways to unsafely obtain `Span` and `RawSpan` instances from them, in order to bridge `UnsafeBufferPointer` to contexts that use `Span`, or `UnsafeRawBufferPointer` to contexts that use `RawSpan`.

```swift
extension UnsafeBufferPointer {
/// Unsafely view this buffer as a `Span`
var storage: Span<Element> { get }
}

extension UnsafeMutableBufferPointer {
/// Unsafely view this buffer as a `Span`
var storage: Span<Element> { get }
}

extension UnsafeRawBufferPointer {
/// Unsafely view this raw buffer as a `RawSpan`
var bytes: RawSpan { get }
}

extension UnsafeMutableRawBufferPointer {
/// Unsafely view this raw buffer as a `RawSpan`
var bytes: RawSpan { get }
}
```

All of these unsafe conversions return a value whose lifetime is dependent on the _binding_ of the UnsafeBufferPointer. Note that this does not keep the underlying memory alive, as usual where the `UnsafePointer` family of types is involved. The programmer must ensure the following invariants for as long as the `Span` or `RawSpan` binding is valid:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is somewhat mysterious:

All of these unsafe conversions return a value whose lifetime is dependent on the binding of the UnsafeBufferPointer.

Here's an attempted concrete explanation...

Swift does not manage the lifetime of unsafe buffer values. Consequently, when an unsafe buffer type provides a storage property, the resulting Span value depends on the variable that the unsafe buffer is bound to. The compiler enforces that the span does not escape the lexical scope of the variable that it depends on. It is up to the programmer to ensure that the unsafe buffer is valid within that lexical scope, which is typically true.

For example, if the unsafe buffer is a function argument, then it's storage is valid within the function body and can even be returned if the function's result depends on the argument:

@lifetime(borrow ubp)
func returnUBPStorage(ubp: UnsafeRawBufferPointer) -> RawSpan {
  return ubp.storage // OK: 'span' can be returned since the function's return value also has a dependence on 'ubp'.
}

It is an error to use the unsafe buffer's storage outside of it's variable's lexical scope, regardless of the buffer's origin:

@lifetime(borrow ubp)
func copyUBPStorage(ubp: UnsafeRawBufferPointer) -> RawSpan {
  let localBuffer = ubp
  return localBuffer.storage // ERROR: lifetime-dependent value escapes its scope
                             //        it depends on the lifetime of variable 'localBuffer'
}

func escapeUBPStorage(array: [Int64]) {
  var span = RawSpan()
  array.withUnsafeBytes {
    span = $0.storage // ERROR: lifetime-dependent value escapes its scope
                      //        it depends on the lifetime of argument '$0'
  }
  read(span)
}


- The underlying memory remains initialized.
- The underlying memory is not mutated.

Failure to keep these invariants results in undefined behaviour.

#### Extensions to `Foundation.Data`

While the `swift-foundation` package and the `Foundation` framework are not governed by the Swift evolution process, `Data` is similar in use to standard library types, and the project acknowledges that it is desirable for it to have similar API when appropriate. Accordingly, we would add the following properties to `Foundation.Data`:

```swift
extension Foundation.Data {
// Share this `Data`'s bytes as a `Span`
var storage: Span<UInt8> { get }

// Share this `Data`'s bytes as a `RawSpan`
var bytes: RawSpan { get }
}
```

Unlike with the standard library types, we plan to have a `bytes` property on `Foundation.Data` directly. This type conceptually consists of untyped bytes, and `bytes` is likely to be the primary way to directly access its memory. As `Data`'s API presents its storage as a collection of `UInt8` elements, we provide both `bytes` and `storage`. Types similar to `Data` may choose to provide both typed and untyped `Span` properties.

#### <a name="performance"></a>Performance

The `storage` and `bytes` properties should be performant and return their `Span` or `RawSpan` with very little work, in O(1) time. This is the case for all native standard library types. There is a performance wrinkle for bridged `Array` and `String` instances on Darwin-based platforms, where they can be bridged to Objective-C types that do not guarantee contiguous storage. In such cases the implementation will eagerly copy the underlying data to the native Swift form, and return a `Span` or `RawSpan` pointing to that copy.

This eager copy behaviour will be specific to the `storage` and `bytes` properties, and therefore the memory usage behaviour of existing unchanged code will remain the same. New code that adopts the `storage` and `bytes` properties will occasionally have higher memory usage due to the eager copies, but we believe this performance compromise is the right approach for the standard library. The alternative is to compromise the design for all platforms supported by Swift, and we consider that a non-starter.

As a result of the eager copy behaviour for bridged `String.UTF8View` and `Array` instances, the `storage` property for these types will have a documented performance characteristic of "amortized constant time performance."

## Source compatibility

This proposal is additive and source-compatible with existing code.

## ABI compatibility

This proposal is additive and ABI-compatible with existing code.

## Implications on adoption

The additions described in this proposal require a new version of the Swift standard library and runtime.

## Alternatives considered

#### Adding `withSpan()` and `withBytes()` closure-taking functions

The `storage` and `bytes` properties aim to be safe replacements for the `withUnsafeBufferPointer()` and `withUnsafeBytes()` closure-taking functions. We could consider `withSpan()` and `withBytes()` closure-taking functions that would provide an quicker migration away from the older unsafe functions. We do not believe the closure-taking functions are desirable in the long run. In the short run, there may be a desire to clearly mark the scope where a `Span` instance is used. The default method would be to explicitly consume a `Span` instance:
```swift
var a = ContiguousArray(0..<8)
var span = a.storage
read(span)
_ = consume span
a.append(8)
```

In order to visually distinguish this lifetime, we could simply use a `do` block:
```swift
var a = ContiguousArray(0..<8)
do {
let span = a.storage
read(span)
}
a.append(8)
```

A more targeted solution may be a consuming function that takes a non-escaping closure:
```swift
var a = ContiguousArray(0..<8)
var span = a.storage
consuming(span) { span in
read(span)
}
a.append(8)
```

During the evolution of Swift, we have learned that closure-based API are difficult to compose, especially with one another. They can also require alterations to support new language features. For example, the generalization of closure-taking API for non-copyable values as well as typed throws is ongoing; adding more closure-taking API may make future feature evolution more labor-intensive. By instead relying on returned values, whether from computed properties or functions, we build for greater composability. Use cases where this approach falls short should be reported as enhancement requests or bugs.

#### Giving the properties different names

We chose the names `storage` and `bytes` because those reflect _what_ they represent. Another option would be to name the properties after _how_ they represent what they do, which would be `span` and `rawSpan`. It is possible the name `storage` would be deemed to clash too much with existing properties of types that would like to provide views of their internal storage with `Span`-providing properties. For example, the Standard Library's concrete `SIMD`-conforming types have a property `var _storage`. The current proposal means that making this property of `SIMD` types into public API would entail a name change more significant than simply removing its leading underscore.

#### Disallowing the definition of non-escapable properties of non-escapable types

The particular case of the lifetime dependence created by a property of a copyable non-escapable type is not as simple as when the parent type is escapable. There are two possible ways to define the lifetime of the new instance: it can either depend on the lifetime of the original instance, or it can acquire the lifetime of the original instance and be otherwise independent. We believe that both these cases can be useful, but that in the majority of cases the desired behaviour will be to have an independent return value, where the newly returned value borrows the same binding as the callee. Therefore we believe that is reasonable to reserve the unannotated spelling for this more common case.

The original version of this pitch disallowed this. As a consequence, the `bytes` property had to be added on each individual type, rather than having `bytes` as a conditional property of `Span`.

#### Omitting extensions to `UnsafeBufferPointer` and related types

We could omit the extensions to `UnsafeBufferPointer` and related types, and rely instead of future `Span` and `RawSpan` initializers. The initializers can have the advantage of being able to communicate semantics (somewhat) through their parameter labels. However, they also have a very different shape than the `storage` computed properties we are proposing. We believe that the adding the same API on both safe and unsafe types is advantageous, even if the preconditions for the properties cannot be statically enforced.

## <a name="directions"></a>Future directions

Note: The future directions stated in [SE-0447](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0447-span-access-shared-contiguous-storage.md#Directions) apply here as well.

#### <a name="MutableSpan"></a>Safe mutations with `MutableSpan<T>`

Some data structures can delegate mutations of their owned memory. In the standard library the function `withMutableBufferPointer()` provides this functionality in an unsafe manner. We expect to add a `MutableSpan` type to support delegating mutations of initialized memory. Standard library types will then add a way to vend `MutableSpan` instances. This could be with a closure-taking `withMutableSpan()` function, or a new property, such as `var mutableStorage`. Note that a computed property providing mutable access needs to have a different name than the `storage` properties proposed here, because we cannot overload the return type of computed properties based on whether mutation is desired.

#### <a name="ContiguousStorage"></a>A `ContiguousStorage` protocol

An early version of the `Span` proposal ( [SE-0447][SE-0447] ) proposed a `ContiguousStorage` protocol by which a type could indicate that it can provide a `Span`. `ContiguousStorage` would form a bridge between generically-typed interfaces and a performant concrete implementation. It would supersede the rejected [SE-0256](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0256-contiguous-collection.md), and many of the standard library collections could conform to `ContiguousStorage`.

The properties added by this proposal are largely the concrete implementations of `ContiguousStorage`. As such, it seems like an obvious enhancement to this proposal.

Unfortunately, a major issue prevents us from proposing it at this time: the ability to suppress requirements on `associatedtype` declarations was deferred during the review of [SE-0427](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0427-noncopyable-generics.md). Once this restriction is lifted, then we could propose a `ContiguousStorage` protocol.

The other limitation stated in [SE-0447][SE-0447]'s section about `ContiguousStorage` is "the inability to declare a `_read` acessor as a protocol requirement." This proposal's addition to enable defining a borrowing relationship via a computed property is a solution to that, as long as we don't need to use a coroutine accessor to produce a `Span`. While allowing the return of `Span`s through coroutine accessors may be undesirable, whether it is undesirable is unclear until coroutine accessors are formalized in the language.

## Acknowledgements

Thanks to Ben Rimmington for suggesting that the `bytes` property should be on `Span` rather than on every type.