Skip to content

Commit 0d163c9

Browse files
author
Dave Abrahams
authored
Update SE-0180 per core team discussion
1 parent 830d6dd commit 0d163c9

File tree

1 file changed

+37
-20
lines changed

1 file changed

+37
-20
lines changed

proposals/0180-string-index-overhaul.md

Lines changed: 37 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
* Review Manager: [Ted Kremenek](https://github.com/tkremenek)
66
* Status: **Active review (June 4...8)**
77
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806
8+
* Previous Revision: [1](https://github.com/apple/swift-evolution/blob/72b8d90becd60b7cc7695607ae908ef251f1e966/proposals/0180-string-index-overhaul.md)
9+
810

911
*During the review process, add the following fields as needed:*
1012

@@ -139,32 +141,48 @@ let end = String.Index(encodedOffset: n)
139141
assert(end == String.endIndex)
140142
```
141143

142-
# Comparison and Slicing Semantics
144+
# Comparison and Subscript Semantics
143145

144146
When two indices being compared correspond to positions that are valid
145147
in any single `String` view, comparison semantics are already fully
146-
specified by the `Collection` requirements. Where no single `String`
147-
view contains both index values, the indices compare unequal and
148-
ordering is determined by comparison of `encodedOffsets`. These index
149-
values are not totally ordered but do satisfy strict weak ordering
150-
requirements, which is sufficient for algorithms such as `sort` to
151-
exhibit sensible behavior. We might consider loosening the specified
152-
requirements on these algorithms and on `Comparable` to support strict
153-
weak ordering, but for now we can treat such index pairs as being
154-
outside the domain of comparison, like any other indices from
155-
completely distinct collections.
156-
157-
An index that does not fall on an exact boundary in a given `String`
158-
or `Substring` view will be “rounded down” to the nearest boundary
159-
when used on that view. So, for example,
148+
specified by the `Collection` requirements. The other cases occur
149+
when indices fall between Unicode scalar boundaries in views having
150+
distinct encodings. For example, the string `"\u{1f773}"` (“🝳”) is
151+
encoded as `0xD83D, 0xDF73` in UTF-16 and `0xF0, 0x9F, 0x9D, 0xB3` in
152+
UTF-8, and there is no obvious way to compare the second positions in
153+
each of those sequences. The proposed rule is that such indices are
154+
compared by comparing their `encodedOffset`s. Such index values are
155+
not totally ordered but do satisfy strict weak ordering requirements,
156+
which is sufficient for algorithms such as `sort` to exhibit sensible
157+
behavior. We might consider loosening the specified requirements on
158+
these algorithms and on `Comparable` to support strict weak ordering,
159+
but for now we can treat such index pairs as being formally outside
160+
the domain of comparison, like any other indices from completely
161+
distinct collections.
162+
163+
With respect to subscripts, an index that does not fall on an exact
164+
boundary in a given `String` or `Substring` view will be treated as
165+
falling at its `encodedOffset` in the underlying code units, with the
166+
actual contents of the result being an emergent property of applying
167+
the usual Unicode rules for decoding those code units. For example,
168+
when slicing a string with an index `i` that falls between two
169+
`Character` boundaries, `i.encodedOffset` is treated as a position in
170+
the string's underlying code units, and the `Character`s of the result
171+
are determined by performing standard Unicode grapheme breaking on the
172+
resulting sequence of code units.
160173

161174
```swift
162-
let s = "e\u{301}galite\u{301}" // "égalité"
163-
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité"
164-
print(s[..<s.unicodeScalars.indices.last!]) // "égalit"
165-
print(s[s.unicodeScalars.indices.dropFirst().first!]) // "é"
175+
let s = "e\u{301}galite\u{301}" // "égalité"
176+
let i = Array(s.unicodeScalars.indices)
177+
print(s[i[1]...]) // "◌́galité"
178+
print(s[..<p.last!]) // "égalite"
179+
print(s[i[1]) // "◌́"
166180
```
167181

182+
Similarly, assignment to a slice of a string is performed by replacing
183+
the corresponding code units, and again the resulting `Characters` are
184+
determined by re-applying standard grapheme breaking rules.
185+
168186
Replacing the failable APIs listed [above](#motivation) that detect
169187
whether an index represents a valid position in a given view, and
170188
enhancement that explicitly round index positions to nearby boundaries
@@ -371,4 +389,3 @@ This proposal makes no changes to the resilience of any APIs.
371389
## Alternatives considered
372390

373391
The only alternative considered was no action.
374-

0 commit comments

Comments
 (0)