Skip to content

Commit 80f6c7e

Browse files
committed
- Restructure the pitch to this structure:
- Motivation for adding algorithms - Motivation for `CustomRegexComponent` - Design for added algorithms - Design for `CustomRegexComponent` - Add a few doc comments
1 parent fea786a commit 80f6c7e

File tree

1 file changed

+87
-76
lines changed

1 file changed

+87
-76
lines changed

Documentation/Evolution/StringProcessingAlgorithms.md

Lines changed: 87 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -6,37 +6,12 @@ The standard library is currently missing a large number of `String` algorithms
66

77
## Motivation
88

9-
TODO
9+
TODO: Motivation for adding both generic `<r: RegexProtocol>` and non-generic algorithm functions.
1010

11-
## Proposed solution
12-
13-
We introduce internal infrastructure that allows groups of `Collection` algorithms that perform the same operations on different types to share their implementation, leading to a more coherent set of public APIs. This allows us to more easily provide algorithms that work with `RegexProtocol` values, such as
14-
15-
```swift
16-
extension BidirectionalCollection where SubSequence == Substring {
17-
public func ranges<R: RegexProtocol>(of regex: R) -> some Collection<Range<Index>>
18-
}
19-
```
2011

21-
We also introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms:
12+
### Use custom parsers in regex builders and `RegexProtocol` algorithms
2213

23-
```swift
24-
public protocol CustomRegexComponent: RegexProtocol {
25-
/// Match the input string within the specified bounds, beginning at the given index, and return
26-
/// the end position (upper bound) of the match and the matched instance.
27-
/// - Parameters:
28-
/// - input: The string in which the match is performed.
29-
/// - index: An index of `input` at which to begin matching.
30-
/// - bounds: The bounds in `input` in which the match is performed.
31-
/// - Returns: The upper bound where the match terminates and a matched instance, or nil if
32-
/// there isn't a match.
33-
func match(
34-
_ input: String,
35-
startingAt index: String.Index,
36-
in bounds: Range<String.Index>
37-
) -> (upperBound: String.Index, match: Match)?
38-
}
39-
```
14+
It would be handy if you can use types from outside the standard library in regex builders and `RegexProtocol` algorithms.
4015

4116
Consider parsing an HTTP header to capture the date field as a `Date` type:
4217

@@ -52,6 +27,7 @@ Content-Language: en
5227
You are likely going to match a substring that look like a date string (`16 Feb 2022`), and parse the substring as a `Date` with one of Foundation's date parsers:
5328

5429
```swift
30+
let dateParser = Date.ParseStrategy(format: "\(day: .twoDigits) \(month: .abbreviated) \(year: .padded(4))"
5531
let regex = Regex {
5632
capture {
5733
oneOrMore(.digit)
@@ -63,17 +39,17 @@ let regex = Regex {
6339
}
6440

6541
if let dateMatch = header.firstMatch(of: regex)?.0 {
66-
let date = try? Date(dateMatch, strategy: .fixed(format: "\(day: .twoDigits) \(month: .abbreviated) \(year: .padded(4))", timeZone: TimeZone(identifier: "GMT")!, locale: Locale(identifier: "en_US")))
42+
let date = try? Date(dateMatch, strategy: dateParser)
6743
}
6844
```
6945

7046
This works, but wouldn't it be much more approachable if you can directly use the date parser within the string match function?
7147

7248
```swift
49+
let dateParser = Date.ParseStrategy(format: "\(day: .twoDigits) \(month: .abbreviated) \(year: .padded(4))"
50+
7351
let regex = Regex {
74-
capture {
75-
.date(format: "\(day: .twoDigits) \(month: .abbreviated) \(year: .padded(4))", timeZone: TimeZone(identifier: "GMT")!, locale: Locale(identifier: "en_US"))
76-
}
52+
capture(dateParser)
7753
}
7854

7955
if let match = header.firstMatch(of: regex) {
@@ -82,32 +58,9 @@ if let match = header.firstMatch(of: regex) {
8258
}
8359
```
8460

85-
You can do this because Foundation framework's `Date.ParseStrategy` conforms to `CustomRegexComponent`, defined above. You can also conform your custom parser to `CustomRegexComponent`. Conformance is simple: implement the `match` function to return the upper bound of the matched substring, and the type represented by the matched range. It inherits from `RegexProtocol`, so you will be able to use it with all of the string algorithms that take a `RegexProtocol` type.
86-
87-
Foundation framework's `Date.ParseStrategy` conforms to `CustomRegexComponent` this way. It also adds a static function `date(format:timeZone:locale)` as a static member of `RegexProtocol`, so you can refer to it as `.date(format:...)` in the `Regex` result builder.
88-
89-
```swift
90-
extension Date.ParseStrategy : CustomRegexComponent {
91-
func match(
92-
_ input: String,
93-
startingAt index: String.Index,
94-
in bounds: Range<String.Index>
95-
) -> (upperBound: String.Index, match: Date)?
96-
}
97-
98-
extension RegexProtocol where Self == Date.ParseStrategy {
99-
public static func date(
100-
format: Date.FormatString,
101-
timeZone: TimeZone,
102-
locale: Locale? = nil
103-
) -> Self
104-
}
105-
```
106-
107-
Here's another example of how you can use `FloatingPointFormatStyle<Double>.Currency` to parse a bank statement and record all the monetary values:
61+
Here's another example of how you can use `Foundation.FloatingPointFormatStyle<Double>.Currency` to parse a bank statement and record all the monetary values:
10862

10963
```swift
110-
11164
let statement = """
11265
CREDIT 04/06/2020 Paypal transfer $4.99
11366
DSLIP 04/06/2020 REMOTE ONLINE DEPOSIT $3,020.85
@@ -118,39 +71,32 @@ DEBIT 03/24/2020 IRX tax payment ($52,249.98)
11871
"""
11972

12073
let regex = Regex {
121-
capture {
122-
.currency(code: "USD").sign(strategy: .accounting)
123-
}
74+
capture(.localizedCurrency(code: "USD").sign(strategy: .accounting))
12475
}
12576

12677
let amount = statement.matches(of: regex).map(\.1)
12778
// [4.99, 3020.85, 69.73, -38.25, -27.44, -52249.98]
12879
```
12980

130-
## Detailed design
81+
Parsing a currency string such as `$3,020.85` with regex isn't trivial -- it can contain grouping separators, a decimal separator, and a currency symbol, all of which can be localized. Delegating parsing such strings to a dedicated currency parser alleviates the need to handle it yourself.
13182

132-
### `CustomRegexComponent` protocol
83+
In the second part of the pitch, we introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms.
13384

134-
The `CustomRegexComponent` protocol inherits from `RegexProtocol` and satisfies its sole requirement. This enables the usage of types that conform to `CustomRegexComponent` in regex builders and `RegexProtocol` algorithms.
85+
## Proposed solution
86+
87+
We introduce internal infrastructure that allows groups of `Collection` algorithms that perform the same operations on different types to share their implementation, leading to a more coherent set of public APIs. This allows us to more easily provide algorithms that work with `RegexProtocol` values, such as
13588

13689
```swift
137-
public protocol CustomRegexComponent: RegexProtocol {
138-
/// Match the input string within the specified bounds, beginning at the given index, and return
139-
/// the end position (upper bound) of the match and the matched instance.
140-
/// - Parameters:
141-
/// - input: The string in which the match is performed.
142-
/// - index: An index of `input` at which to begin matching.
143-
/// - bounds: The bounds in `input` in which the match is performed.
144-
/// - Returns: The upper bound where the match terminates and a matched instance, or nil if
145-
/// there isn't a match.
146-
func match(
147-
_ input: String,
148-
startingAt index: String.Index,
149-
in bounds: Range<String.Index>
150-
) -> (upperBound: String.Index, match: Match)?
90+
extension BidirectionalCollection where SubSequence == Substring {
91+
public func ranges<R: RegexProtocol>(of regex: R) -> some Collection<Range<Index>>
15192
}
15293
```
15394

95+
We also introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms.
96+
97+
98+
## Detailed design
99+
154100
### Algorithms
155101

156102
The following algorithms are included in this pitch:
@@ -159,11 +105,17 @@ The following algorithms are included in this pitch:
159105

160106
```swift
161107
extension Collection where Element: Equatable {
108+
/// Returns a Boolean value indicating whether the collection contains the given sequence.
109+
/// - Parameter other: A sequence to search for within this collection.
110+
/// - Returns: `true` if the collection contains the specified sequence, otherwise `false`.
162111
public func contains<S: Sequence>(_ other: S) -> Bool
163112
where S.Element == Element
164113
}
165114

166115
extension BidirectionalCollection where SubSequence == Substring {
116+
/// Returns a Boolean value indicating whether the collection contains the given regex.
117+
/// - Parameter regex: A regex to search for within this collection.
118+
/// - Returns: `true` if the regex was found in the collection, otherwise `false`.
167119
public func contains<R: RegexProtocol>(_ regex: R) -> Bool
168120
}
169121
```
@@ -172,6 +124,9 @@ extension BidirectionalCollection where SubSequence == Substring {
172124

173125
```swift
174126
extension BidirectionalCollection where SubSequence == Substring {
127+
/// Returns a Boolean value indicating whether the initial elements of the sequence are the same as the elements in the specified regex.
128+
/// - Parameter regex: A regex to compare to this sequence.
129+
/// - Returns: `true` if the initial elements of the sequence matches the beginning of `regex`; otherwise, `false`.
175130
public func starts<R: RegexProtocol>(with regex: R) -> Bool
176131
}
177132
```
@@ -180,14 +135,21 @@ extension BidirectionalCollection where SubSequence == Substring {
180135

181136
```swift
182137
extension Collection {
138+
/// Returns a new collection of the same type by removing initial elements that satisfy the given predicate from the start
139+
/// - Parameter predicate: A closure that takes an element of the sequence as its argument and returns a Boolean value indicating whether the element should be removed from the collection.
140+
/// - Returns: A collection containing the elements of the receiver that are not removed by `predicate`.
183141
public func trimmingPrefix(while predicate: (Element) -> Bool) -> SubSequence
184142
}
185143

186144
extension Collection where SubSequence == Self {
145+
/// Removes the initial elements that satisfy the given predicate from the start of the sequence.
146+
/// - Parameter predicate: A closure that takes an element of the sequence as its argument and returns a Boolean value indicating whether the element should be removed from the collection.
187147
public mutating func trimPrefix(while predicate: (Element) -> Bool)
188148
}
189149

190150
extension RangeReplaceableCollection {
151+
/// Removes the initial elements that satisfy the given predicate from the start of the sequence.
152+
/// - Parameter predicate: A closure that takes an element of the sequence as its argument and returns a Boolean value indicating whether the element should be removed from the collection.
191153
public mutating func trimPrefix(while predicate: (Element) -> Bool)
192154
}
193155

@@ -342,6 +304,55 @@ extension BidirectionalCollection where SubSequence == Substring {
342304
}
343305
```
344306

307+
### `CustomRegexComponent` protocol
308+
309+
The `CustomRegexComponent` protocol inherits from `RegexProtocol` and satisfies its sole requirement. This enables the usage of types that conform to `CustomRegexComponent` in regex builders and `RegexProtocol` algorithms.
310+
311+
```swift
312+
public protocol CustomRegexComponent: RegexProtocol {
313+
/// Match the input string within the specified bounds, beginning at the given index, and return
314+
/// the end position (upper bound) of the match and the matched instance.
315+
/// - Parameters:
316+
/// - input: The string in which the match is performed.
317+
/// - index: An index of `input` at which to begin matching.
318+
/// - bounds: The bounds in `input` in which the match is performed.
319+
/// - Returns: The upper bound where the match terminates and a matched instance, or nil if
320+
/// there isn't a match.
321+
func match(
322+
_ input: String,
323+
startingAt index: String.Index,
324+
in bounds: Range<String.Index>
325+
) -> (upperBound: String.Index, match: Match)?
326+
}
327+
```
328+
329+
You can conform your custom parser to `CustomRegexComponent`. Conformance is simple: implement the `match` function to return the upper bound of the matched substring, and the type represented by the matched range. It inherits from `RegexProtocol`, so you will be able to use it with all of the string algorithms that take a `RegexProtocol` type.
330+
331+
Here, we use Foundation framework's `FloatingPointFormatStyle<Double>.Currency` as an example. `FloatingPointFormatStyle<Double>.Currency` would conform to `CustomRegexComponent` by implementing the `match` function with `Match` being a `Double`. It could also add a static function `.localizedCurrency(code:)` as a member of `RegexProtocol`, so you can refer to it as `.localizedCurrency(code:)` in the `Regex` result builder.
332+
333+
```swift
334+
extension FloatingPointFormatStyle<Double>.Currency : CustomRegexComponent {
335+
func match(
336+
_ input: String,
337+
startingAt index: String.Index,
338+
in bounds: Range<String.Index>
339+
) -> (upperBound: String.Index, match: Double)?
340+
}
341+
342+
extension RegexProtocol where Self == FloatingPointFormatStyle<Double>.Currency {
343+
public static func localizedCurrency(code: Locale.Currency) -> Self
344+
}
345+
```
346+
347+
Users could specify a pattern to match a localized currency amount such as `"$3,020.85"` simply with the following, and use it in any of the string matching algorithms introduced above.
348+
349+
```swift
350+
let regex = Regex {
351+
capture(.localizedCurreny(code: "USD"))
352+
}
353+
```
354+
355+
345356
## Alternatives considered
346357

347358
### Extend `Sequence` instead of `Collection`

0 commit comments

Comments
 (0)