You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals.
@@ -16,21 +16,50 @@ The overall story is laid out in [Regex Type and Overview](https://github.com/ap
16
16
17
17
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
18
18
19
-
<!--
20
-
... tools need run time construction
21
-
... ns regular expression operates over a fundamentally different model and has limited syntactic and semantic support
22
-
... we prpose a best-in-class treatment of familiar regex syntax
23
-
-->
19
+
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage, such as the need to translate between `NSRange` and `Range`.
20
+
21
+
```swift
22
+
let pattern =#"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
23
+
let nsRegEx =try!NSRegularExpression(pattern: pattern)
24
+
25
+
funcprocessEntry(_line: String) -> Transaction? {
26
+
let range =NSRange(line.startIndex..<line.endIndex, in: line)
27
+
guardlet result = nsRegEx.firstMatch(in: line, range: range),
28
+
let kindRange =Range(result.range(at: 1), in: line),
29
+
let kind = Transaction.Kind(line[kindRange]),
30
+
let dateRange =Range(result.range(at: 2), in: line),
31
+
let date =try?Date(String(line[dateRange]), strategy: dateParser),
32
+
let accountRange =Range(result.range(at: 3), in: line),
33
+
let amountRange =Range(result.range(at: 4), in: line),
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic differences between ICU's string model and Swift's `String` is discussed in [Unicode for String Processing][pitches].
24
46
25
47
The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings.
26
48
27
49
This proposal specifically hones in on the _familiarity_ aspect by providing a best-in-class treatment of familiar regex syntax.
28
50
29
51
## Proposed Solution
30
52
31
-
<!--
32
-
... regex compiling and existential match type
33
-
-->
53
+
We propose run-time construction of `Regex` from a best-in-class treatment of familiar regular expression syntax. A `Regex` is generic over its `Output`, which includes capture information. This may be an existential `AnyRegexOutput`, or a concrete type provided by the user.
54
+
55
+
```swift
56
+
let pattern =#"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
57
+
let regex =try!Regex(compiling: pattern)
58
+
// regex: Regex<AnyRegexOutput>
59
+
60
+
let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> =
61
+
try!Regex(compiling: pattern)
62
+
```
34
63
35
64
### Syntax
36
65
@@ -51,11 +80,87 @@ Regex syntax will be part of Swift's source-compatibility story as well as its b
51
80
52
81
## Detailed Design
53
82
54
-
<!--
55
-
... init, dynamic match, conversion to static
56
-
-->
83
+
We propose initializers to declare and compile a regex from syntax. Upon failure, these initializers throw compilation errors, such as for syntax or type errors. API for retrieving error information is future work.
84
+
85
+
```swift
86
+
extensionRegex {
87
+
/// Parse and compile `pattern`, resulting in a strongly-typed capture list.
/// The range over which a value was captured. `nil` for no-capture.
118
+
publicvar range: Range<String.Index>?
119
+
120
+
/// The slice of the input over which a value was captured. `nil` for no-capture.
121
+
publicvar substring: Substring?
122
+
123
+
/// The captured value. `nil` for no-capture.
124
+
publicvar value: Any?
125
+
}
126
+
127
+
// Trivial collection conformance requirements
128
+
129
+
publicvar startIndex: Int { get }
130
+
131
+
publicvar endIndex: Int { get }
132
+
133
+
publicvar count: Int { get }
134
+
135
+
publicfuncindex(afteri: Int) ->Int
136
+
137
+
publicfuncindex(beforei: Int) ->Int
138
+
139
+
publicsubscript(position: Int) ->Element
140
+
}
141
+
```
142
+
143
+
We propose adding an API to `Regex<AnyRegexOutput>.Match` to cast the output type to a concrete one. A regex match will lazily create a `Substring` on demand, so casting the match itself saves ARC traffic vs extracting and casting the output.
144
+
145
+
```swift
146
+
extensionRegex.Match where Output == AnyRegexOutput {
147
+
/// Creates a type-erased regex match from an existing match.
148
+
///
149
+
/// Use this initializer to fit a regex match with strongly typed captures into the
150
+
/// use site of a dynamic regex match, i.e. one that was created from a string.
151
+
publicinit<Output>(_match: Regex<Output>.Match)
152
+
153
+
/// Returns a typed match by converting the underlying values to the specified
154
+
/// types.
155
+
///
156
+
/// - Parameter type: The expected output type.
157
+
/// - Returns: A match generic over the output type if the underlying values can be converted to the
The rest of this proposal will be a detailed and exhaustive definition of our proposed regex syntax.
59
164
60
165
<details><summary>Grammar Notation</summary>
61
166
@@ -827,6 +932,12 @@ We are deferring runtime support for callouts from regex literals as future work
827
932
828
933
## Alternatives Considered
829
934
935
+
### Failalbe inits
936
+
937
+
There are many ways for compilation to fail, from syntactic errors to unsupported features to type mismatches. In the general case, run-time compilation errors are not recoverable by a tool without modifying the user's input. Even then, the thrown errors contain valuable information as to why compilation failed. For example, swiftpm presents any errors directly to the user.
938
+
939
+
As proposed, the errors thrown will be the same errors presented to the Swift compiler, tracking fine-grained source locations with specific reasons why compilation failed. Defining a rich error API is future work, as these errors are rapidly evolving and it is too early to lock in the ABI.
0 commit comments