You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #1 from itingliu/pr/string-processing-pitch-update
Update pitch:
Restructure the pitch to this structure:
- Motivation for adding algorithms
- Motivation for CustomRegexComponent
- Design for added algorithms
- Design for CustomRegexComponent
Add a few doc comments.
Update Foundation example.
The standard library is currently missing a large number of `String` algorithms that do exist in Foundation. We introduce a more coherent set of `Collection` algorithms with a focus on string processing, including support for regular expressions.
6
6
7
-
## Motivation
8
-
9
-
TODO
10
7
11
-
## Proposed solution
8
+
## Motivation
12
9
13
-
We introduce internal infrastructure that allows groups of `Collection` algorithms that perform the same operations on different types to share their implementation, leading to a more coherent set of public APIs. This allows us to more easily provide algorithms that work with `RegexProtocol` values, such as
10
+
TODO: Motivation for adding both generic `<r: RegexProtocol>` and non-generic algorithm functions.
We also introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms:
13
+
### Use custom parsers in regex builders and `RegexProtocol` algorithms
/// Match the input string within the specified bounds, beginning at the given index, and return
26
-
/// the end position (upper bound) of the match and the matched instance.
27
-
/// - Parameters:
28
-
/// - input: The string in which the match is performed.
29
-
/// - index: An index of `input` at which to begin matching.
30
-
/// - bounds: The bounds in `input` in which the match is performed.
31
-
/// - Returns: The upper bound where the match terminates and a matched instance, or nil if
32
-
/// there isn't a match.
33
-
funcmatch(
34
-
_input: String,
35
-
startingAtindex: String.Index,
36
-
inbounds: Range<String.Index>
37
-
) -> (upperBound: String.Index, match: Match)?
38
-
}
39
-
```
15
+
We want to extend string processing to types from outside the standard library, so that one can incorporate custom parsers in regex builders and `RegexProtocol` algorithms seamlessly.
40
16
41
17
Consider parsing an HTTP header to capture the date field as a `Date` type:
42
18
43
-
```
19
+
```swift
20
+
let header ="""
44
21
HTTP/1.1 301 Redirect
45
22
Date: Wed, 16 Feb 2022 23:53:19 GMT
46
23
Connection: close
47
24
Location: https://www.apple.com/
48
25
Content-Type: text/html
49
26
Content-Language: en
27
+
"""
50
28
```
51
29
52
-
You are likely going to match a substring that look like a date string (`16 Feb 2022`), and parse the substring as a `Date` with one of Foundation's date parsers:
30
+
You are likely going to match a substring that look like a date string (`16 Feb 2022`), and parse the substring as a `Date` with one of the date parsers in the Foundation framework:
53
31
54
32
```swift
33
+
let dateParser = Date.ParseStrategy(format: "\(day: .twoDigits)\(month: .abbreviated)\(year: .padded(4))"
let date = header.firstMatch(of: regex).map(\.result.1)
57
+
// A `Date` representing 2022-02-16 00:00:00 +0000
83
58
```
84
59
85
-
You can do this because Foundation framework's `Date.ParseStrategy` conforms to `CustomRegexComponent`, defined above. You can also conform your custom parser to `CustomRegexComponent`. Conformance is simple: implement the `match` function to return the upper bound of the matched substring, and the type represented by the matched range. It inherits from `RegexProtocol`, so you will be able to use it with all of the string algorithms that take a `RegexProtocol` type.
86
-
87
-
Foundation framework's `Date.ParseStrategy` conforms to `CustomRegexComponent` this way. It also adds a static function `date(format:timeZone:locale)` as a static member of `RegexProtocol`, so you can refer to it as `.date(format:...)` in the `Regex` result builder.
We have already seen that parsing a date string can be tricky since it could contain localized month name (`"Feb"` as seen from above). Parsing a currency string such as `$3,020.85` with regex is not trivial either -- it can contain grouping separators, a decimal separator, and a currency symbol, all of which can be localized.
125
74
126
-
let amount = statement.matches(of: regex).map(\.1)
The Foundation framework has various parsers for localized strings like these. Delegating this task to dedicated parsers alleviates the need to handle it yourself. In the second part of the pitch, we introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms.
129
76
130
-
## Detailed design
77
+
## Proposed solution
131
78
132
-
### `CustomRegexComponent` protocol
79
+
We introduce internal infrastructure that allows groups of `Collection` algorithms that perform the same operations on different types to share their implementation, leading to a more coherent set of public APIs. This allows us to more easily provide algorithms that work with `RegexProtocol` values, such as
133
80
134
-
The `CustomRegexComponent` protocol inherits from `RegexProtocol` and satisfies its sole requirement. This enables the usage of types that conform to `CustomRegexComponent` in regex builders and `RegexProtocol` algorithms.
We also introduce the `CustomRegexComponent` protocol that conveniently lets types from outside the standard library participate in regex builders and `RegexProtocol` algorithms.
88
+
89
+
If Foundation's currency parser, `Foundation.FloatingPointFormatStyle<Double>.Currency`, conformed to `CustomRegexComponent`, you would be able to retrieve the currency from the bank statement above as a list of `Double` values this way:
The `CustomRegexComponent` protocol inherits from `RegexProtocol` and satisfies its sole requirement. This enables the usage of types that conform to `CustomRegexComponent` in regex builders and `RegexProtocol` algorithms.
/// Match the input string within the specified bounds, beginning at the given index, and return
333
+
/// the end position (upper bound) of the match and the matched instance.
334
+
/// - Parameters:
335
+
/// - input: The string in which the match is performed.
336
+
/// - index: An index of `input` at which to begin matching.
337
+
/// - bounds: The bounds in `input` in which the match is performed.
338
+
/// - Returns: The upper bound where the match terminates and a matched instance, or nil if
339
+
/// there isn't a match.
340
+
funcmatch(
341
+
_input: String,
342
+
startingAtindex: String.Index,
343
+
inbounds: Range<String.Index>
344
+
) -> (upperBound: String.Index, match: Match)?
345
+
}
346
+
```
347
+
348
+
You can conform your custom parser to `CustomRegexComponent`. Conformance is simple: implement the `match` function to return the upper bound of the matched substring, and the type represented by the matched range. It inherits from `RegexProtocol`, so you will be able to use it with all of the string algorithms that take a `RegexProtocol` type.
349
+
350
+
Here, we use Foundation framework's `FloatingPointFormatStyle<Double>.Currency` as an example. `FloatingPointFormatStyle<Double>.Currency` would conform to `CustomRegexComponent` by implementing the `match` function with `Match` being a `Double`. It could also add a static function `.localizedCurrency(code:)` as a member of `RegexProtocol`, so you can refer to it as `.localizedCurrency(code:)` in the `Regex` result builder.
Users could specify a pattern to match a localized currency amount such as `"$3,020.85"` simply with the following, and use it inany of the string matching algorithms introduced above.
0 commit comments