Skip to content

Commit 925255d

Browse files
committed
Update DSL proposal.
- Removed `if` support. - Added `Local`. - Added alternatives.
1 parent 820ab38 commit 925255d

File tree

1 file changed

+129
-183
lines changed

1 file changed

+129
-183
lines changed

Documentation/Evolution/RegexBuilderDSL.md

Lines changed: 129 additions & 183 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
- [Quantification](#quantification)
1818
- [Capture and reference](#capture-and-reference)
1919
- [Subpattern](#subpattern)
20+
- [Scoping](#scoping)
2021
- [Source compatibility](#source-compatibility)
2122
- [Effect on ABI stability](#effect-on-abi-stability)
2223
- [Effect on API resilience](#effect-on-api-resilience)
@@ -400,95 +401,7 @@ extension RegexComponentBuilder {
400401
}
401402
```
402403

403-
To support `if` statements, `buildEither(first:)`, `buildEither(second:)` and `buildOptional(_:)` are defined with overloads to support up to 10 captures because each capture type needs to be transformed to an optional. The overload for non-capturing regexes, due to the lack of generic constraints, must be annotated with `@_disfavoredOverload` in order not shadow other overloads. We expect that a variadic-generic version of this method will eventually superseded all of these overloads.
404-
405-
```swift
406-
extension RegexComponentBuilder {
407-
// The following builder methods implement what would be possible with
408-
// variadic generics (using imaginary syntax) as a single method:
409-
//
410-
// public static func buildEither<
411-
// Component, WholeMatch, Capture...
412-
// >(
413-
// first component: Component
414-
// ) -> Regex<(Substring, Capture...)>
415-
// where Component.Output == (WholeMatch, Capture...)
416-
417-
public static func buildEither<R: RegexComponent>(
418-
first component: Component<R>
419-
) -> Regex<Substring> {
420-
component
421-
}
422-
423-
public static func buildEither<W, C0, R: RegexComponent>(
424-
first component: Component<R>
425-
) -> Regex<(Substring, C0)> where R.Output == (W, C0) {
426-
component
427-
}
428-
429-
public static func buildEither<W, C0, C1, R: RegexComponent>(
430-
first component: Component<R>
431-
) -> Regex<(Substring, C0, C1)> where R.Output == (W, C0, C1) {
432-
component
433-
}
434-
435-
// The following builder methods implement what would be possible with
436-
// variadic generics (using imaginary syntax) as a single method:
437-
//
438-
// public static func buildEither<
439-
// Component, WholeMatch, Capture...
440-
// >(
441-
// second component: Component
442-
// ) -> Regex<(Substring, Capture...)>
443-
// where Component.Output == (WholeMatch, Capture...)
444-
445-
public static func buildEither<R: RegexComponent>(
446-
second component: Component<R>
447-
) -> Regex<Substring> {
448-
component
449-
}
450-
451-
public static func buildEither<W, C0, R: RegexComponent>(
452-
second component: Component<R>
453-
) -> Regex<(Substring, C0)> where R.Output == (W, C0) {
454-
component
455-
}
456-
457-
public static func buildEither<W, C0, C1, R: RegexComponent>(
458-
second component: Component<R>
459-
) -> Regex<(Substring, C0, C1)> where R.Output == (W, C0, C1) {
460-
component
461-
}
462-
463-
// ... `O(arity)` overloads of `buildEither(_:)`
464-
465-
// The following builder methods implement what would be possible with
466-
// variadic generics (using imaginary syntax) as a single method:
467-
//
468-
// public static func buildOptional<
469-
// Component, WholeMatch, Capture...
470-
// >(
471-
// _ component: Component?
472-
// ) where Component.Output == (WholeMatch, Capture...)
473-
474-
@_disfavoredOverload
475-
public static func buildOptional<R: RegexComponent>(
476-
_ component: Component<R>?
477-
) -> Regex<Substring>
478-
479-
public static func buildOptional<W, C0, R: RegexComponent>(
480-
_ component: Component<R>?
481-
) -> Regex<(Substring, C0?)>
482-
483-
public static func buildOptional<W, C0, C1, R: RegexComponent>(
484-
_ component: Component<R>?
485-
) -> Regex<(Substring, C0?, C1?)>
486-
487-
// ... `O(arity)` overloads of `buildOptional(_:)`
488-
}
489-
```
490-
491-
To support `if #available(...)` statements, `buildLimitedAvailability(_:)` is defined with overloads to support up to 10 captures. Similar to `buildOptional`, the overload for non-capturing regexes must be annotated with `@_disfavoredOverload`.
404+
To support `if #available(...)` statements, `buildLimitedAvailability(_:)` is defined with overloads to support up to 10 captures. The overload for non-capturing regexes, due to the lack of generic constraints, must be annotated with `@_disfavoredOverload` in order not shadow other overloads. We expect that a variadic-generic version of this method will eventually superseded all of these overloads.
492405

493406
```swift
494407
extension RegexComponentBuilder {
@@ -518,6 +431,8 @@ extension RegexComponentBuilder {
518431
}
519432
```
520433

434+
`buildOptional` and `buildEither` are intentionally not supported due to ergonomic issues and fundamental semantic differences between regex conditionals and result builder conditionals. Please refer to the [alternatives considered](#support-buildoptional-and-buildeither) section for detailed rationale.
435+
521436
### Alternation
522437

523438
Alternations are used to match one of multiple patterns. An alternation wraps its underlying patterns' capture types in an `Optional` and concatenates them together, first to last.
@@ -620,99 +535,6 @@ public enum AlternationBuilder {
620535
// ... `O(arity^2)` overloads of `buildPartialBlock(accumulated:next:)`
621536
}
622537

623-
extension AlternationBuilder {
624-
// The following builder methods implement what would be possible with
625-
// variadic generics (using imaginary syntax) as a single method:
626-
//
627-
// public static func buildEither<
628-
// R, WholeMatch, Capture...
629-
// >(
630-
// first component: Component<R>
631-
// ) -> Regex<(Substring, Component<R>?...)>
632-
// where R.Output == (WholeMatch, Capture...)
633-
634-
@_disfavoredOverload
635-
public static func buildEither<R: RegexComponent>(
636-
first component: Component<R>
637-
) -> Regex<Substring>
638-
639-
public static func buildEither<W, C0, R: RegexComponent>(
640-
first component: Component<R>
641-
) -> Regex<(Substring, C0?)>
642-
643-
public static func buildEither<W, C0, C1, R: RegexComponent>(
644-
first component: Component<R>
645-
) -> Regex<(Substring, C0?, C1?)>
646-
647-
// ... `O(arity)` overloads of `buildEither(_:)`
648-
649-
public static func buildEither<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
650-
first component: Component<R>
651-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
652-
}
653-
654-
extension AlternationBuilder {
655-
// The following builder methods implement what would be possible with
656-
// variadic generics (using imaginary syntax) as a single method:
657-
//
658-
// public static func buildEither<
659-
// R, WholeMatch, Capture...
660-
// >(
661-
// second component: Component<R>
662-
// ) -> Regex<(Substring, Capture?...)>
663-
// where R.Output == (WholeMatch, Capture...)
664-
665-
@_disfavoredOverload
666-
public static func buildEither<R: RegexComponent>(
667-
second component: Component<R>
668-
) -> Regex<Substring>
669-
670-
public static func buildEither<W, C0, R: RegexComponent>(
671-
second component: Component<R>
672-
) -> Regex<(Substring, C0?)>
673-
674-
public static func buildEither<W, C0, C1, R: RegexComponent>(
675-
second component: Component<R>
676-
) -> Regex<(Substring, C0?, C1?)>
677-
678-
// ... `O(arity)` overloads of `buildEither(_:)`
679-
680-
public static func buildEither<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
681-
second component: Component<R>
682-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
683-
}
684-
685-
extension AlternationBuilder {
686-
// The following builder methods implement what would be possible with
687-
// variadic generics (using imaginary syntax) as a single method:
688-
//
689-
// public static func buildOptional<
690-
// Component, WholeMatch, Capture...
691-
// >(
692-
// _ component: Component?
693-
// ) -> Regex<(Substring, Capture?...)>
694-
// where Component.Output == (WholeMatch, Capture...)
695-
696-
@_disfavoredOverload
697-
public static func buildOptional<Component: RegexComponent>(
698-
_ component: Component?
699-
) -> Regex<Substring>
700-
701-
public static func buildOptional<W, C0, R: RegexComponent>(
702-
_ component: Component<R>?
703-
) -> Regex<(Substring, C0?)>
704-
705-
public static func buildOptional<W, C0, C1, R: RegexComponent>(
706-
_ component: Component<R>?
707-
) -> Regex<(Substring, C0?, C1?)>
708-
709-
// ... `O(arity)` overloads of `buildOptional(_:)`
710-
711-
public static func buildOptional<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
712-
_ component: Component<R>?
713-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
714-
}
715-
716538
extension AlternationBuilder {
717539
// The following builder methods implement what would be possible with
718540
// variadic generics (using imaginary syntax) as a single method:
@@ -1290,6 +1112,53 @@ Regex { wholeSentence in
12901112
}
12911113
```
12921114

1115+
### Scoping
1116+
1117+
In textual regexes, atomic groups (`(?>...)`) can be used to define a backtracking scope. That is, when the regex engine exits from the scope successfully, it throws away all backtracking positions from the scope. In regex builder, the `Local` type serves this purpose.
1118+
1119+
```swift
1120+
public struct Local<Output>: RegexComponent {
1121+
public var regex: Regex<Output>
1122+
1123+
// The following builder methods implement what would be possible with
1124+
// variadic generics (using imaginary syntax) as a single set of methods:
1125+
//
1126+
// public init<WholeMatch, Capture..., Component: RegexComponent>(
1127+
// @RegexComponentBuilder _ component: () -> Component
1128+
// ) where Output == (Substring, Capture...), Component.Output == (WholeMatch, Capture...)
1129+
1130+
@_disfavoredOverload
1131+
public init<Component: RegexComponent>(
1132+
@RegexComponentBuilder _ component: () -> Component
1133+
) where Output == Substring
1134+
1135+
public init<W, C0, Component: RegexComponent>(
1136+
@RegexComponentBuilder _ component: () -> Component
1137+
) where Output == (Substring, C0), Component.Output == (W, C0)
1138+
1139+
public init<W, C0, C1, Component: RegexComponent>(
1140+
@RegexComponentBuilder _ component: () -> Component
1141+
) where Output == (Substring, C0, C1), Component.Output == (W, C0, C1)
1142+
1143+
// ... `O(arity)` overloads
1144+
}
1145+
```
1146+
1147+
For example, the following regex matches string `abcc` but not `abc`.
1148+
1149+
```swift
1150+
Regex {
1151+
"a"
1152+
Local {
1153+
ChoiceOf {
1154+
"bc"
1155+
"b"
1156+
}
1157+
}
1158+
"c"
1159+
}
1160+
```
1161+
12931162
## Source compatibility
12941163

12951164
Regex builder will be shipped in a new module named `RegexBuilder`, and thus will not affect the source compatibility of the existing code.
@@ -1306,7 +1175,7 @@ The proposed feature relies heavily upon overloads of `buildBlock` and `buildPar
13061175

13071176
### Operators for quantification and alternation
13081177

1309-
While `ChoiceOf` and quantifier functions provide a general way of creating alternations and quantifications, we recognize that some synctactic sugar can be useful for creating one-liners like in textual regexes, e.g. infix operator `|`, postfix operator `*`, etc.
1178+
While `ChoiceOf` and quantifier types provide a general way of creating alternations and quantifications, we recognize that some synctactic sugar can be useful for creating one-liners like in textual regexes, e.g. infix operator `|`, postfix operator `*`, etc.
13101179

13111180
```swift
13121181
// The following functions implement what would be possible with variadic
@@ -1441,6 +1310,83 @@ One could argue that type such as `OneOrMore<Output>` could be defined as a top-
14411310

14421311
Another reason to use types instead of free functions is consistency with existing result-builder-based DSLs such as SwiftUI.
14431312

1313+
### Support `buildOptional` and `buildEither`
1314+
1315+
To support `if` statements, an earlier iteration of this proposal defined `buildEither(first:)`, `buildEither(second:)` and `buildOptional(_:)` as the following:
1316+
1317+
```swift
1318+
extension RegexComponentBuilder {
1319+
public static func buildEither<
1320+
Component, WholeMatch, Capture...
1321+
>(
1322+
first component: Component
1323+
) -> Regex<(Substring, Capture...)>
1324+
where Component.Output == (WholeMatch, Capture...)
1325+
1326+
public static func buildEither<
1327+
Component, WholeMatch, Capture...
1328+
>(
1329+
second component: Component
1330+
) -> Regex<(Substring, Capture...)>
1331+
where Component.Output == (WholeMatch, Capture...)
1332+
1333+
public static func buildOptional<
1334+
Component, WholeMatch, Capture...
1335+
>(
1336+
_ component: Component?
1337+
) where Component.Output == (WholeMatch, Capture...)
1338+
}
1339+
```
1340+
1341+
However, multiple-branch control flow statements (e.g. `if`-`else` and `switch`) would need to be required to produce either the same regex type, which is limiting, or an "either-like" type, which can be difficult to work with when nested. Unlike `ChoiceOf`, producing a tuple of optionals is not an option, because the branch taken would be decided when the builder closure is executed, and it would cause capture numbering to be inconsistent with conventional regex.
1342+
1343+
Moreover, result builder conditionals does not work the same way as regex conditionals. In regex conditionals, the conditions are themselves regexes and are evaluated by the regex engine during matching, whereas result builder conditionals are evaluated as part of the builder closure. We hope that a future result builder feature will support "lifting" control flow conditions into the DSL domain, e.g. supporting `Regex<Bool>` as a condition.
1344+
1345+
### Flatten optionals
1346+
1347+
With the proposed design, `ChoiceOf` with `AlternationBuilder` wraps every component's capture type with an `Optional`. This means that any `ChoiceOf` with optional-capturing components would lead to a doubly-nested optional captures. This could make the result of matching harder to use.
1348+
1349+
```swift
1350+
ChoiceOf {
1351+
OneOrMore(Capture(.digit)) // Output == (Substring, Substring)
1352+
Optionally {
1353+
ZeroOrMore(Capture(.word)) // Output == (Substring, Substring?)
1354+
"a"
1355+
} // Output == (Substring, Substring??)
1356+
} // Output == (Substring, Substring?, Substring???)
1357+
```
1358+
1359+
One way to improve this could be overloading quantifier initializers (e.g. `ZeroOrMore.init(_:)`) and `AlternationBuilder.buildPartialBlock` to flatten any optionals upon composition. However, this would be non-trivial. Quantifier initializers would need to be overloaded `O(2^arity)` times to account for all possible positions of `Optional` that may appear in the `Output` tuple. Even worse, `AlternationBuilder.buildPartialBlock` would need to be overloaded `O(arity!)` times to account for all possible combinations of two `Output` tuples with all possible positions of `Optional` that may appear in one of the `Output` tuples.
1360+
1361+
### Structured rather than flat captures
1362+
1363+
We propose inferring capture types in such a way as to align with the traditional numbering of backreferences. This is because much of the motivation behind providing regex literals in Swift is their familiarity.
1364+
1365+
If we decided to deprioritize this motivation, there are opportunities to infer safer, more ergonomic, and arguably more intuitive types for captures. For example, to be consistent with traditional regex backreferences quantifications of multiple or nested captures had to produce parallel arrays rather than an array of tuples.
1366+
1367+
```swift
1368+
OneOrMore {
1369+
Capture {
1370+
OneOrMore(.hexDigit)
1371+
}
1372+
".."
1373+
Capture {
1374+
OneOrMore(.hexDigit)
1375+
}
1376+
}
1377+
1378+
// Flat capture types:
1379+
// => `Output == (Substring, Substring, Substring)>`
1380+
1381+
// Structured capture types:
1382+
// => `Output == (Substring, (Substring, Substring))`
1383+
```
1384+
1385+
Similarly, an alternation of multiple or nested captures could produce a structured alternation type (or an anonymous sum type) rather than flat optionals.
1386+
1387+
This is cool, but it adds extra complexity to regex builder and it isn't as clear because the generic type no longer aligns with the traditional regex backreference numbering. We think the consistency of the flat capture types trumps the added safety and ergonomics of the structured capture types.
1388+
1389+
14441390
[Declarative String Processing]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/DeclarativeStringProcessing.md
14451391
[Strongly Typed Regex Captures]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md
14461392
[Regex Syntax]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexSyntax.md

0 commit comments

Comments
 (0)