|
5 | 5 | * Review Manager: [Ted Kremenek](https://github.com/tkremenek)
|
6 | 6 | * Status: **Active review (June 4...8)**
|
7 | 7 | * Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806
|
| 8 | +* Previous Revision: [1](https://github.com/apple/swift-evolution/blob/72b8d90becd60b7cc7695607ae908ef251f1e966/proposals/0180-string-index-overhaul.md) |
| 9 | + |
8 | 10 |
|
9 | 11 | *During the review process, add the following fields as needed:*
|
10 | 12 |
|
@@ -139,32 +141,48 @@ let end = String.Index(encodedOffset: n)
|
139 | 141 | assert(end == String.endIndex)
|
140 | 142 | ```
|
141 | 143 |
|
142 |
| -# Comparison and Slicing Semantics |
| 144 | +# Comparison and Subscript Semantics |
143 | 145 |
|
144 | 146 | When two indices being compared correspond to positions that are valid
|
145 | 147 | in any single `String` view, comparison semantics are already fully
|
146 |
| -specified by the `Collection` requirements. Where no single `String` |
147 |
| -view contains both index values, the indices compare unequal and |
148 |
| -ordering is determined by comparison of `encodedOffsets`. These index |
149 |
| -values are not totally ordered but do satisfy strict weak ordering |
150 |
| -requirements, which is sufficient for algorithms such as `sort` to |
151 |
| -exhibit sensible behavior. We might consider loosening the specified |
152 |
| -requirements on these algorithms and on `Comparable` to support strict |
153 |
| -weak ordering, but for now we can treat such index pairs as being |
154 |
| -outside the domain of comparison, like any other indices from |
155 |
| -completely distinct collections. |
156 |
| - |
157 |
| -An index that does not fall on an exact boundary in a given `String` |
158 |
| -or `Substring` view will be “rounded down” to the nearest boundary |
159 |
| -when used on that view. So, for example, |
| 148 | +specified by the `Collection` requirements. The other cases occur |
| 149 | +when indices fall between Unicode scalar boundaries in views having |
| 150 | +distinct encodings. For example, the string `"\u{1f773}"` (“🝳”) is |
| 151 | +encoded as `0xD83D, 0xDF73` in UTF-16 and `0xF0, 0x9F, 0x9D, 0xB3` in |
| 152 | +UTF-8, and there is no obvious way to compare the second positions in |
| 153 | +each of those sequences. The proposed rule is that such indices are |
| 154 | +compared by comparing their `encodedOffset`s. Such index values are |
| 155 | +not totally ordered but do satisfy strict weak ordering requirements, |
| 156 | +which is sufficient for algorithms such as `sort` to exhibit sensible |
| 157 | +behavior. We might consider loosening the specified requirements on |
| 158 | +these algorithms and on `Comparable` to support strict weak ordering, |
| 159 | +but for now we can treat such index pairs as being formally outside |
| 160 | +the domain of comparison, like any other indices from completely |
| 161 | +distinct collections. |
| 162 | + |
| 163 | +With respect to subscripts, an index that does not fall on an exact |
| 164 | +boundary in a given `String` or `Substring` view will be treated as |
| 165 | +falling at its `encodedOffset` in the underlying code units, with the |
| 166 | +actual contents of the result being an emergent property of applying |
| 167 | +the usual Unicode rules for decoding those code units. For example, |
| 168 | +when slicing a string with an index `i` that falls between two |
| 169 | +`Character` boundaries, `i.encodedOffset` is treated as a position in |
| 170 | +the string's underlying code units, and the `Character`s of the result |
| 171 | +are determined by performing standard Unicode grapheme breaking on the |
| 172 | +resulting sequence of code units. |
160 | 173 |
|
161 | 174 | ```swift
|
162 |
| -let s = "e\u{301}galite\u{301}" // "égalité" |
163 |
| -print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité" |
164 |
| -print(s[..<s.unicodeScalars.indices.last!]) // "égalit" |
165 |
| -print(s[s.unicodeScalars.indices.dropFirst().first!]) // "é" |
| 175 | +let s = "e\u{301}galite\u{301}" // "égalité" |
| 176 | +let i = Array(s.unicodeScalars.indices) |
| 177 | +print(s[i[1]...]) // "◌́galité" |
| 178 | +print(s[..<p.last!]) // "égalite" |
| 179 | +print(s[i[1]) // "◌́" |
166 | 180 | ```
|
167 | 181 |
|
| 182 | +Similarly, assignment to a slice of a string is performed by replacing |
| 183 | +the corresponding code units, and again the resulting `Characters` are |
| 184 | +determined by re-applying standard grapheme breaking rules. |
| 185 | + |
168 | 186 | Replacing the failable APIs listed [above](#motivation) that detect
|
169 | 187 | whether an index represents a valid position in a given view, and
|
170 | 188 | enhancement that explicitly round index positions to nearby boundaries
|
@@ -371,4 +389,3 @@ This proposal makes no changes to the resilience of any APIs.
|
371 | 389 | ## Alternatives considered
|
372 | 390 |
|
373 | 391 | The only alternative considered was no action.
|
374 |
| - |
|
0 commit comments