@@ -27,7 +27,7 @@ work that could be done in the Swift 4 timeframe.
27
27
### Ergonomics
28
28
29
29
It's worth noting that ergonomics and correctness are mutually-reinforcing. An
30
- API that is easy to use— but incorrectly— cannot be considered an ergonomic
30
+ API that is easy to use-- but incorrectly-- cannot be considered an ergonomic
31
31
success. Conversely, an API that's simply hard to use is also hard to use
32
32
correctly. Achieving optimal performance without compromising ergonomics or
33
33
correctness is a greater challenge.
@@ -46,13 +46,13 @@ its overall complexity.
46
46
47
47
** Method Arity** | ** Standard Library** | ** Foundation**
48
48
---|:---:|:---:
49
- 0: ` ƒ ()` | 5 | 7
50
- 1: ` ƒ (:)` | 19 | 48
51
- 2: ` ƒ (::)` | 13 | 19
52
- 3: ` ƒ (:::)` | 5 | 11
53
- 4: ` ƒ (::::)` | 1 | 7
54
- 5: ` ƒ (:::::)` | - | 2
55
- 6: ` ƒ (::::::)` | - | 1
49
+ 0: ` f ()` | 5 | 7
50
+ 1: ` f (:)` | 19 | 48
51
+ 2: ` f (::)` | 13 | 19
52
+ 3: ` f (:::)` | 5 | 11
53
+ 4: ` f (::::)` | 1 | 7
54
+ 5: ` f (:::::)` | - | 2
55
+ 6: ` f (::::::)` | - | 1
56
56
57
57
** API Kind** | ** Standard Library** | ** Foundation**
58
58
---|:---:|:---:
@@ -100,8 +100,8 @@ constitutes correct behavior in an extremely complex domain, so
100
100
Unicode-correctness is, and will remain, a fundamental design principle behind
101
101
Swift's ` String ` . That said, the Unicode standard is an evolving document, so
102
102
this objective reference-point is not fixed. <sup id =" a1 " >[ 1] ( #f1 ) </sup > While
103
- many of the most important operations— e.g. string hashing, equality, and
104
- non-localized comparison— [ will be stable] ( #collation-semantics ) , the semantics
103
+ many of the most important operations-- e.g. string hashing, equality, and
104
+ non-localized comparison-- [ will be stable] ( #collation-semantics ) , the semantics
105
105
of others, such as grapheme breaking and localized comparison and case
106
106
conversion, are expected to change as platforms are updated, so programs should
107
107
be written so their correctness does not depend on precise stability of these
@@ -188,7 +188,7 @@ pattern matching | locale, case/diacritic/width-insensitivity
188
188
The defaults for case-, diacritic-, and width-insensitivity are sometimes different for
189
189
localized operations than for non-localized operations, so for example a
190
190
localized search should be case-insensitive by default, and a non-localized search
191
- should be case-sensitive by default. We propose a standard “ language” of
191
+ should be case-sensitive by default. We propose a standard " language" of
192
192
defaulted parameters to be used for these purposes, with usage roughly like this:
193
193
194
194
``` swift
@@ -229,13 +229,13 @@ extension Unicode {
229
229
230
230
#### Collation Semantics
231
231
232
- What Unicode says about collation— which is used in ` < ` , ` == ` , and hashing— turns
232
+ What Unicode says about collation-- which is used in ` < ` , ` == ` , and hashing-- turns
233
233
out to be quite interesting, once you pick it apart. The full Unicode Collation
234
234
Algorithm (UCA) works like this:
235
235
236
236
1 . Fully normalize both strings.
237
237
2 . Convert each string to a sequence of numeric triples to form a collation key.
238
- 3 . “ Flatten” the key by concatenating the sequence of first elements to the
238
+ 3 . " Flatten" the key by concatenating the sequence of first elements to the
239
239
sequence of second elements to the sequence of third elements.
240
240
4 . Lexicographically compare the flattened keys.
241
241
249
249
* However* , there are some bright spots to this story. First, as it turns out,
250
250
string sorting (localized or not) should be done down to what's called
251
251
the
252
- [ “ identical” level] ( http://unicode.org/reports/tr10/#Multi_Level_Comparison ) ,
252
+ [ " identical" level] ( http://unicode.org/reports/tr10/#Multi_Level_Comparison ) ,
253
253
which adds a step 3a: append the string's normalized form to the flattened
254
254
collation key. At first blush this just adds work, but consider what it does
255
255
for equality: two strings that normalize the same, naturally, will collate the
@@ -261,7 +261,7 @@ entirely skip the expensive part of collation for equality comparison.
261
261
Next, naturally, anything that applies to equality also applies to hashing: it
262
262
is sufficient to hash the string's normalized form, bypassing collation keys.
263
263
This should provide significant speedups over the current implementation.
264
- Perhaps more importantly, since comparison down to the “ identical” level applies
264
+ Perhaps more importantly, since comparison down to the " identical" level applies
265
265
even to localized strings, it means that hashing and equality can be implemented
266
266
exactly the same way for localized and non-localized text, and hash tables with
267
267
localized keys will remain valid across current-locale changes.
@@ -279,14 +279,14 @@ implementation has apparently been very well optimized.
279
279
280
280
Following this scheme everywhere would also allow us to make sorting behavior
281
281
consistent across platforms. Currently, we sort ` String ` according to the UCA,
282
- except that— * only on Apple platforms* — pairs of ASCII characters are ordered by
282
+ except that-- * only on Apple platforms* -- pairs of ASCII characters are ordered by
283
283
unicode scalar value.
284
284
285
285
#### Syntax
286
286
287
287
Because the current ` Comparable ` protocol expresses all comparisons with binary
288
- operators, string comparisons— which may require
289
- additional [ options] ( #operations-with-options ) — do not fit smoothly into the
288
+ operators, string comparisons-- which may require
289
+ additional [ options] ( #operations-with-options ) -- do not fit smoothly into the
290
290
existing syntax. At the same time, we'd like to solve other problems with
291
291
comparison, as outlined
292
292
in
@@ -342,17 +342,17 @@ strings.
342
342
This quirk aside, every aspect of strings-as-collections-of-graphemes appears to
343
343
comport perfectly with Unicode. We think the concatenation problem is tolerable,
344
344
because the cases where it occurs all represent partially-formed constructs. The
345
- largest class— isolated combining characters such as ◌́ (U+0301 COMBINING ACUTE
346
- ACCENT)— are explicitly called out in the Unicode standard as
347
- “ [ degenerate] ( http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries ) ” or
348
- “ [ defective] ( http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf ) ” . The other
349
- cases— such as a string ending in a zero-width joiner or half of a regional
350
- indicator— appear to be equally transient and unlikely outside of a text editor.
345
+ largest class-- isolated combining characters such as ◌́ (U+0301 COMBINING ACUTE
346
+ ACCENT)-- are explicitly called out in the Unicode standard as
347
+ " [ degenerate] ( http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries ) " or
348
+ " [ defective] ( http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf ) " . The other
349
+ cases-- such as a string ending in a zero-width joiner or half of a regional
350
+ indicator-- appear to be equally transient and unlikely outside of a text editor.
351
351
352
352
Admitting these cases encourages exploration of grapheme composition and is
353
- consistent with what appears to be an overall Unicode philosophy that “ no
354
- special provisions are made to get marginally better behavior for… cases that
355
- never occur in practice.” <sup id =" a2 " >[ 2] ( #f2 ) </sup > Furthermore, it seems
353
+ consistent with what appears to be an overall Unicode philosophy that " no
354
+ special provisions are made to get marginally better behavior for... cases that
355
+ never occur in practice." <sup id =" a2 " >[ 2] ( #f2 ) </sup > Furthermore, it seems
356
356
unlikely to disturb the semantics of any plausible algorithms. We can handle
357
357
these cases by documenting them, explicitly stating that the elements of a
358
358
` String ` are an emergent property based on Unicode rules.
@@ -402,7 +402,7 @@ The benefits of restoring `Collection` conformance are substantial:
402
402
Because of its collection-like behavior, users naturally think of ` String `
403
403
in collection terms, but run into frustrating limitations where it fails to
404
404
conform and are left to wonder where all the differences lie. Many simply
405
- “ correct” this limitation by declaring a trivial conformance:
405
+ " correct" this limitation by declaring a trivial conformance:
406
406
407
407
``` swift
408
408
extension String : BidirectionalCollection {}
@@ -569,8 +569,8 @@ property) explicitly of type `String`, a type conversion will be performed, and
569
569
at this point the substring buffer is copied and the original string's storage
570
570
can be released.
571
571
572
- A `String ` that was not its own `Substring` could be one word— a single tagged
573
- pointer— without requiring additional allocations. `Substring`s would be a view
572
+ A `String ` that was not its own `Substring` could be one word-- a single tagged
573
+ pointer-- without requiring additional allocations. `Substring`s would be a view
574
574
onto a `String `, so are 3 words - pointer to owner, pointer to start, and a
575
575
length. The small string optimization for `Substring` would take advantage of
576
576
the larger size, probably with a less compressed encoding for speed.
@@ -596,7 +596,7 @@ standard library will traffic in generic models of
596
596
597
597
In this model, ** if a user is unsure about which type to use, `String ` is always
598
598
a reasonable default ** . A `Substring` passed where `String ` is expected will be
599
- implicitly copied. When compared to the “ same type, copied storage” model, we
599
+ implicitly copied. When compared to the " same type, copied storage" model, we
600
600
have effectively deferred the cost of copying from the point where a substring
601
601
is created until it must be converted to `String ` for use with an API.
602
602
@@ -605,11 +605,11 @@ if for performance reasons you are tempted to add a `Range` argument to your
605
605
method as well as a `String ` to avoid unnecessary copies, you should instead
606
606
use `Substring`.
607
607
608
- ##### The “ Empty Subscript”
608
+ ##### The " Empty Subscript"
609
609
610
610
To make it easy to call such an optimized API when you only have a `String` (or
611
611
to call any API that takes a `Collection `'s `SubSequence ` when all you have is
612
- the `Collection `), we propose the following “ empty subscript ” operation,
612
+ the `Collection `), we propose the following " empty subscript" operation,
613
613
614
614
```swift
615
615
extension Collection {
@@ -638,7 +638,7 @@ takesAnArrayOfSubstring(arrayOfString.map { $0[] })
638
638
639
639
As we have seen, all three options above have downsides, but it's possible
640
640
these downsides could be eliminated/ mitigated by the compiler. We are proposing
641
- one such mitigation— implicit conversion— as part of the the " different type,
641
+ one such mitigation-- implicit conversion-- as part of the the " different type,
642
642
shared storage" option, to help avoid the cognitive load on developers of
643
643
having to deal with a separate `Substring` type.
644
644
@@ -743,7 +743,7 @@ let iToJ = Range(nsr, in: s) // Equivalent to i..<j
743
743
With `Substring` and `String ` being distinct types and sharing almost all
744
744
interface and semantics, and with the highest- performance string processing
745
745
requiring knowledge of encoding and layout that the currency types can't
746
- provide, it becomes important to capture the common “ string API” in a protocol.
746
+ provide, it becomes important to capture the common " string API" in a protocol.
747
747
Since Unicode conformance is a key feature of string processing in Swift, we
748
748
call that protocol `Unicode`:
749
749
@@ -812,8 +812,8 @@ protocols in protocols.
812
812
#### Low- Level Textual Analysis
813
813
814
814
We should provide convenient APIs for processing strings by character. For example,
815
- it should be easy to cleanly express, “ if this string starts with `" f" `, process
816
- the rest of the string as follows… ” Swift is well- suited to expressing this
815
+ it should be easy to cleanly express, " if this string starts with `" f" `, process
816
+ the rest of the string as follows... " Swift is well- suited to expressing this
817
817
common pattern beautifully, but we need to add the APIs. Here are two examples
818
818
of the sort of code that might be possible given such APIs:
819
819
@@ -886,8 +886,8 @@ compile-time syntax checking and optimization.
886
886
887
887
### String Indices
888
888
889
- `String ` currently has four views— `characters`, `unicodeScalars`, `utf8`, and
890
- `utf16`— each with its own opaque index type. The APIs used to translate indices
889
+ `String ` currently has four views-- `characters`, `unicodeScalars`, `utf8`, and
890
+ `utf16`-- each with its own opaque index type. The APIs used to translate indices
891
891
between views add needless complexity, and the opacity of indices makes them
892
892
difficult to serialize.
893
893
@@ -924,7 +924,7 @@ let i = String.Index(codeUnitOffset: offset)
924
924
Index interchange between `String ` and its `unicodeScalars`, `codeUnits`,
925
925
and [`extendedASCII`](#parsing - ascii- structure) views can be made entirely
926
926
seamless by having them share an index type (semantics of indexing a `String `
927
- between grapheme cluster boundaries are TBD— it can either trap or be forgiving).
927
+ between grapheme cluster boundaries are TBD-- it can either trap or be forgiving).
928
928
Having a common index allows easy traversal into the interior of graphemes,
929
929
something that is often needed, without making it likely that someone will do it
930
930
by accident.
@@ -1146,7 +1146,7 @@ on the top-level Swift namespace.
1146
1146
1147
1147
- The ability to handle `UTF- 8 `- encoded strings (models of `Unicode`) is not in
1148
1148
question here; this is about what encodings must be storable, without
1149
- transcoding, in the common currency type called “ `String `” .
1149
+ transcoding, in the common currency type called " `String`" .
1150
1150
- ASCII, Latin- 1 , UCS- 2 , and UTF- 16 are UTF- 16 subsets. UTF- 8 is not.
1151
1151
- If we have a way to get at a `String `'s code units, we need a concrete type in
1152
1152
which to express them in the API of `String `, which is a concrete type
@@ -1161,10 +1161,10 @@ on the top-level Swift namespace.
1161
1161
### Do we need a type- erasable base protocol for UnicodeEncoding?
1162
1162
1163
1163
UnicodeEncoding has an associated type, but it may be important to be able to
1164
- traffic in completely dynamic encoding values, e.g. for “ tell me the most
1165
- efficient encoding for this string.”
1164
+ traffic in completely dynamic encoding values, e.g. for " tell me the most
1165
+ efficient encoding for this string."
1166
1166
1167
- ### Should there be a string “ facade? ”
1167
+ ### Should there be a string " facade?"
1168
1168
1169
1169
One possible design alternative makes `Unicode` a vehicle for expressing
1170
1170
the storage and encoding of code units, but does not attempt to give it an API
@@ -1204,11 +1204,11 @@ struct String<U: Unicode = StringStorage>
1204
1204
typealias Substring = String < StringStorage.SubSequence >
1205
1205
```
1206
1206
1207
- One advantage of such a design is that naïve users will always extend “ the right
1208
- type” (`String `) without thinking, and the new APIs will show up on `Substring`,
1207
+ One advantage of such a design is that naïve users will always extend " the right
1208
+ type" (`String `) without thinking, and the new APIs will show up on `Substring`,
1209
1209
`MyUTF8String`, etc. That said, it also has downsides that should not be
1210
1210
overlooked, not least of which is the confusability of the meaning of the word
1211
- “ string.” Is it referring to the generic or the concrete type?
1211
+ " string." Is it referring to the generic or the concrete type?
1212
1212
1213
1213
### `TextOutputStream ` and `TextOutputStreamable`
1214
1214
@@ -1268,8 +1268,8 @@ little benefit. [↩](#a2)
1268
1268
1269
1269
< b id= " f5" > 5 </ b> The queries supported by `NSCharacterSet` map directly onto
1270
1270
properties in a table that's indexed by unicode scalar value. This table is
1271
- part of the Unicode standard. Some of these queries (e.g ., “ is this an
1272
- uppercase character? ” ) may have fairly obvious generalizations to grapheme
1271
+ part of the Unicode standard. Some of these queries (e.g ., " is this an
1272
+ uppercase character?" ) may have fairly obvious generalizations to grapheme
1273
1273
clusters, but exactly how to do it is a research topic and * ideally* we'd either
1274
1274
establish the existing practice that the Unicode committee would standardize, or
1275
1275
the Unicode committee would do the research and we'd implement their
0 commit comments