|
| 1 | +# `AttributedString` UTF-8 and UTF-16 Views |
| 2 | + |
| 3 | +* Proposal: [SF-0012](0012-attributedstring-utf8-utf16-views.md) |
| 4 | +* Authors: [Jeremy Schonfeld](https://github.com/jmschonfeld) |
| 5 | +* Review Manager: [Tina Liu](https://github.com/itingliu) |
| 6 | +* Status: **Accepted** |
| 7 | +* Implementation: [swiftlang/swift-foundation#1066](https://github.com/swiftlang/swift-foundation/pull/1066) |
| 8 | + |
| 9 | +## Introduction/Motivation |
| 10 | + |
| 11 | +In macOS 12-aligned releases, Foundation added the `AttributedString` type as a new API representing rich/attributed text. `AttributedString` itself is not a collection, but rather a type that offers various views into its contents where each view represents a `Collection` over a different type of element. Today, `AttributedString` offers three views: the character view (`.characters`) which provides a collection of grapheme clusters using the `Character` element type, the unicode scalar view (`.unicodeScalars`) which provides a collection of `Unicode.Scalar`s, and the attribute runs view (`.runs`) which provides a collection of attribute runs present across the text using the `AttributedString.Runs.Run` element type. These three views form the critical APIs required to interact with an `AttributedString` via its text (either at the visual, grapheme cluster level or the underlying scalar level) and its runs. However, more advanced use cases require other ways to view an `AttributedString`'s text. |
| 12 | + |
| 13 | +When working with the text content of an `AttributedString`, sometimes it is necessary to view not only the characters or unicode scalars, but the underlying UTF-8 or UTF-16 contents that make up that text. This can be especially useful when interoperating with other types that use UTF-8 or UTF-16 encoded units as their currency types (for example, `NSAttributedString` and `NSString` which use UTF-16 offsets and UTF-16 scalars as their index and element types). Today, `String` itself has a UTF-8 and UTF-16 view that can be used to perform these encoding-specific operations, however `AttributedString` offers no equivalent. This proposal seeks to remedy this by adding equivalent UTF-8 and UTF-16 views to `AttributedString`, offering easy access to the encoded forms of the text. |
| 14 | + |
| 15 | +## Proposed solution |
| 16 | + |
| 17 | +Just like `String`, `AttributedString` will offer new, immutable UTF-8 and UTF-16 character views via the `.utf8` and `.utf16` properties. Developers will be able to use these new views like the following example: |
| 18 | + |
| 19 | +```swift |
| 20 | +var attrStr: AttributedString |
| 21 | + |
| 22 | +// Iterate over the UTF-8 scalars |
| 23 | +for scalar in attrStr.utf8 { |
| 24 | + print(scalar) |
| 25 | +} |
| 26 | + |
| 27 | +// Determine the UTF-8 offset of a particular index |
| 28 | +let offset = attrStr.utf8.distance(from: attrStr.startIndex, to: someOtherIndex) |
| 29 | +``` |
| 30 | + |
| 31 | +## Detailed design |
| 32 | + |
| 33 | +We propose adding the following API surface: |
| 34 | + |
| 35 | +```swift |
| 36 | +@available(FoundationPreview 6.2, *) |
| 37 | +extension AttributedString { |
| 38 | + public struct UTF8View : BidirectionalCollection, CustomStringConvertible, Sendable { |
| 39 | + public typealias Element = UTF8.CodeUnit |
| 40 | + public typealias Index = AttributedString.Index |
| 41 | + public typealias SubSequence = AttributedString.UTF8View |
| 42 | + } |
| 43 | + |
| 44 | + public struct UTF16View : BidirectionalCollection, CustomStringConvertible, Sendable { |
| 45 | + public typealias Element = UTF16.CodeUnit |
| 46 | + public typealias Index = AttributedString.Index |
| 47 | + public typealias SubSequence = AttributedString.UTF16View |
| 48 | + } |
| 49 | + |
| 50 | + public var utf8: UTF8View { get } |
| 51 | + public var utf16: UTF16View { get } |
| 52 | +} |
| 53 | + |
| 54 | +@available(macOS 12, iOS 15, tvOS 15, watchOS 8, *) |
| 55 | +protocol AttributedStringProtocol { |
| 56 | + // ... |
| 57 | + |
| 58 | + @available(FoundationPreview 6.2, *) |
| 59 | + var utf8: AttributedString.UTF8View { get } |
| 60 | + @available(FoundationPreview 6.2, *) |
| 61 | + var utf16: AttributedString.UTF16View { get } |
| 62 | +} |
| 63 | + |
| 64 | + |
| 65 | +@available(FoundationPreview 6.2, *) |
| 66 | +extension AttributedStringProtocol { |
| 67 | + public var utf8: AttributedString.UTF8View { get } |
| 68 | + public var utf16: AttributedString.UTF16View { get } |
| 69 | +} |
| 70 | + |
| 71 | +@available(FoundationPreview 6.2, *) |
| 72 | +extension AttributedSubstring { |
| 73 | + public var utf8: AttributedString.UTF8View { get } |
| 74 | + public var utf16: AttributedString.UTF16View { get } |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +_Note: omitted here for brevity, `AttributedString.UTF8View` and `AttributedString.UTF16View` must implement all relevant, optional protocol requirements from `BidirectionalCollection` and `RangeReplaceableCollection` to ensure efficient operations over the underlying storage_ |
| 79 | + |
| 80 | +## Source compatibility |
| 81 | + |
| 82 | +All of these changes are additive and have no impact on source compatibility. The added requirements to `AttributedStringProtocol` have provided default implementations and as such are not ABI/API breaking changes. |
| 83 | + |
| 84 | +## Implications on adoption |
| 85 | + |
| 86 | +These new views will be annotated with `FoundationPreview 6.2` availability. On platforms where availability is relevant, these APIs may only be used on versions where these new views are present. |
| 87 | + |
| 88 | +## Future directions |
| 89 | + |
| 90 | +No future directions are considered at this time. |
| 91 | + |
| 92 | +## Alternatives considered |
| 93 | + |
| 94 | +No alternatives are considered at this time. |
0 commit comments