You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SE-0351:
- Add future direction about conversion to textual regex.
- Move recursive subpatterns to future directions.
- Clarify `Regex.Match.subscript(_:)` precondition.
SE-0350:
- Rename `firstMatch(_: Substring)` to `firstMatch(in: Substring)` to be consistent with the `String` variant.
- Specify accessors on properties to clarify mutability.
Copy file name to clipboardExpand all lines: proposals/0351-regex-builder.md
+70-45Lines changed: 70 additions & 45 deletions
Original file line number
Diff line number
Diff line change
@@ -1008,12 +1008,16 @@ let regex = Regex {
1008
1008
Variants of `Capture` and `TryCapture` accept a `Reference` argument. References can be used to achieve named captures and named backreferences from textual regexes.
/// Returns the capture referenced by the given reference.
1019
+
///
1020
+
/// - Precondition: The reference must have been captured in the regex that produced this match.
1017
1021
publicsubscript<Capture>(_reference: Reference<Capture>) -> Capture { get }
1018
1022
}
1019
1023
```
@@ -1036,7 +1040,7 @@ if let result = input.firstMatch(of: regex) {
1036
1040
}
1037
1041
```
1038
1042
1039
-
A regex is considered invalid when it contains a use of reference without it ever being captured in the regex. When this occurs in the regex builder DSL, an runtime error will be reported.
1043
+
A regex is considered invalid when it contains a use of reference without it ever being captured in the regex. When this occurs in the regex builder DSL, a runtime error will be reported. Similarly, the use of a reference in a `Regex.Match.subscript(_:)` must have been captured in the regex that produced the match.
1040
1044
1041
1045
### Subpattern
1042
1046
@@ -1056,54 +1060,21 @@ With regex builder, there is no special API required to reuse existing subpatter
1056
1060
1057
1061
```swift
1058
1062
Regex {
1059
-
let subject =ChoiceOf {
1060
-
"I"
1061
-
"you"
1062
-
}
1063
-
let object =ChoiceOf {
1064
-
"goodbye"
1065
-
"hello"
1066
-
}
1067
-
subject
1068
-
"say"
1069
-
object
1070
-
";"
1071
-
subject
1072
-
"say"
1073
-
object
1074
-
}
1075
-
```
1076
-
1077
-
Sometimes, a textual regex may also use `(?R)` or `(?0)` to recusively evaluate the entire regex. For example, the following textual regex matches "I say you say I say you say hello".
1078
-
1079
-
```
1080
-
(you|I) say (goodbye|hello|(?R))
1081
-
```
1082
-
1083
-
For this, `Regex` offers a special initializer that allows its pattern to recursively reference itself. This is somewhat akin to a fixed-point combinator.
1084
-
1085
-
```swift
1086
-
extensionRegex {
1087
-
publicinit<R: RegexComponent>(
1088
-
@RegexComponentBuilder _content: (Regex<Substring>) -> R
1089
-
) where R.Output == Match
1090
-
}
1091
-
```
1092
-
1093
-
With this initializer, the above regex can be expressed as the following using regex builder.
1094
-
1095
-
```swift
1096
-
Regex { wholeSentence in
1097
-
ChoiceOf {
1098
-
"I"
1099
-
"you"
1063
+
let subject =ChoiceOf {
1064
+
"I"
1065
+
"you"
1100
1066
}
1101
-
"say"
1102
-
ChoiceOf {
1067
+
let object =ChoiceOf {
1103
1068
"goodbye"
1104
1069
"hello"
1105
-
wholeSentence
1106
1070
}
1071
+
subject
1072
+
"say"
1073
+
object
1074
+
";"
1075
+
subject
1076
+
"say"
1077
+
object
1107
1078
}
1108
1079
```
1109
1080
@@ -1166,6 +1137,59 @@ The proposed feature does not change the ABI of existing features.
1166
1137
1167
1138
The proposed feature relies heavily upon overloads of `buildBlock` and `buildPartialBlock(accumulated:next:)` to work for different capture arities. In the fullness of time, we are hoping for variadic generics to supercede existing overloads. Such a change should not involve ABI-breaking modifications as it is merely a change of overload resolution.
1168
1139
1140
+
## Future directions
1141
+
1142
+
### Conversion to textual regex
1143
+
1144
+
Sometimes it may be useful to convert a regex created using regex builder to textual regex. This may be achieved in the future by extending `RegexComponent` with a computed property.
1145
+
1146
+
```swift
1147
+
extensionRegexComponent {
1148
+
publicfuncmakeTextualRegex() ->String?
1149
+
}
1150
+
```
1151
+
1152
+
It is worth noting that the internal representation of a `Regex` is _not_ textual regex, but an efficient pattern matching bytecode compiled from an abstract syntax tree. Moreover, not every `Regex` can be converted to textual regex. Regex builder supports arbitrary types that conform to the `RegexComponent` protocol, including `CustomMatchingRegexComponent` (pitched in [String Processing Algorithms]) which can be implemented with arbitrary code. If a `Regex` contains a `CustomMatchingRegexComponent`, it cannot be converted to textual regex.
1153
+
1154
+
### Recursive subpatterns
1155
+
1156
+
Sometimes, a textual regex may also use `(?R)` or `(?0)` to recusively evaluate the entire regex. For example, the following textual regex matches "I say you say I say you say hello".
1157
+
1158
+
```
1159
+
(you|I) say (goodbye|hello|(?R))
1160
+
```
1161
+
1162
+
For this, `Regex` offers a special initializer that allows its pattern to recursively reference itself. This is somewhat akin to a fixed-point combinator.
1163
+
1164
+
```swift
1165
+
extensionRegex {
1166
+
publicinit<R: RegexComponent>(
1167
+
@RegexComponentBuilder _content: (Regex<Substring>) -> R
1168
+
) where R.Output == Match
1169
+
}
1170
+
```
1171
+
1172
+
With this initializer, the above regex can be expressed as the following using regex builder.
1173
+
1174
+
```swift
1175
+
Regex { wholeSentence in
1176
+
ChoiceOf {
1177
+
"I"
1178
+
"you"
1179
+
}
1180
+
"say"
1181
+
ChoiceOf {
1182
+
"goodbye"
1183
+
"hello"
1184
+
wholeSentence
1185
+
}
1186
+
}
1187
+
```
1188
+
1189
+
There are some concerns with this design which we need to consider:
1190
+
- Due to the lack of labeling, the argument to the builder closure can be arbitrarily named and cause confusion.
1191
+
- When there is an initializer that accepts a result builder closure, overloading that initializer with the same argument labels could lead to bad error messages upon interor type errors.
1192
+
1169
1193
## Alternatives considered
1170
1194
1171
1195
### Operators for quantification and alternation
@@ -1385,3 +1409,4 @@ This is cool, but it adds extra complexity to regex builder and it isn't as clea
0 commit comments