Skip to content

Commit 2401a58

Browse files
authored
Add a section describing 'find empty' behavior (#352)
1 parent 9ff87db commit 2401a58

File tree

1 file changed

+40
-1
lines changed

1 file changed

+40
-1
lines changed

Documentation/Evolution/StringProcessingAlgorithms.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1066,11 +1066,50 @@ extension RegexComponent {
10661066
[SE-0346]: https://github.com/apple/swift-evolution/blob/main/proposals/0346-light-weight-same-type-syntax.md
10671067
[stdlib-pitch]: https://forums.swift.org/t/pitch-primary-associated-types-in-the-standard-library/56426
10681068

1069+
#### Searching for empty strings and matches
1070+
1071+
Empty matches and inputs are an important edge case for several of the algorithms proposed above. For example, what is the result of `"123.firstRange(of: /[a-z]*/)`? How do you split a collection separated by an empty collection, as in `"1234".split(separator: "")`? For the Swift standard library, this is a new consideration, as current algorithms are `Element`-based and cannot be passed an empty input.
1072+
1073+
Languages and libraries are nearly unanimous about finding the location of an empty string, with Ruby, Python, C#, Java, Javascript, etc, finding an empty string at each index in the target. Notably, Foundation's `NSString.range(of:)` does _not_ find an empty string at all.
1074+
1075+
The methods proposed here follow the consensus behavior, which makes sense if you think of `a.firstRange(of: b)` as returning the first subrange `r` where `a[r] == b`. If a regex can match an empty substring, like `/[a-z]*/`, the behavior is the same.
1076+
1077+
```swift
1078+
let hello = "Hello"
1079+
let emptyRange = hello.firstRange(of: "")
1080+
// emptyRange is equivalent to '0..<0' (integer ranges shown for readability)
1081+
```
1082+
1083+
Because searching again at the same index would yield that same empty string, we advance one position after finding an empty string or matching an empty pattern when finding all ranges. This yields the position of every valid index in the string.
1084+
1085+
```swift
1086+
let allRanges = hello.ranges(of: "")
1087+
// allRanges is equivalent to '[0..<0, 1..<1, 2..<2, 3..<3, 4..<4, 5..<5]'
1088+
```
1089+
1090+
Splitting with an empty separator (or a pattern that matches empty string), uses this same behavior, resulting in a collection of single-element substrings. Interestingly, a couple languages make different choices here. C# returns the original string instead of its parts, and Python rejects an empty separator (though it permits regexes that match empty strings).
1091+
1092+
```swift
1093+
let parts = hello.split(separator: "")
1094+
// parts == ["h", "e", "l", "l", "o"]
1095+
1096+
let moreParts = hello.split(separator: "", omittingEmptySubsequences: false)
1097+
// parts == ["", "h", "e", "l", "l", "o", ""]
1098+
```
1099+
1100+
Finally, searching for an empty string within an empty string yields, as you might imagine, the empty string:
1101+
1102+
```swift
1103+
let empty = ""
1104+
let range = empty.firstRange(of: empty)
1105+
// empty == empty[range]
1106+
```
1107+
10691108
## Alternatives considered
10701109
10711110
### Extend `Sequence` instead of `Collection`
10721111
1073-
Most of the proposed algorithms are necessarily on `Collection` due to the use of indices or mutation. `Sequence` does not support multi-pass iteration, so even `trimPrefix` would problematic on `Sequence` because it needs to look 1 `Element` ahead to know when to stop trimming.
1112+
Most of the proposed algorithms are necessarily on `Collection` due to the use of indices or mutation. `Sequence` does not support multi-pass iteration, so even `trimmingPrefix` would problematic on `Sequence` because it needs to look one `Element` ahead to know when to stop trimming and would need to return a wrapper for the in-progress iterator instead of a subsequence.
10741113
10751114
### Cross-proposal API naming consistency
10761115

0 commit comments

Comments
 (0)