Skip to content

Commit 6699667

Browse files
committed
Update DSL proposal.
- Removed `if` support. - Added `Local`. - Added alternatives.
1 parent 820ab38 commit 6699667

File tree

1 file changed

+128
-183
lines changed

1 file changed

+128
-183
lines changed

Documentation/Evolution/RegexBuilderDSL.md

Lines changed: 128 additions & 183 deletions
Original file line numberDiff line numberDiff line change
@@ -400,95 +400,7 @@ extension RegexComponentBuilder {
400400
}
401401
```
402402

403-
To support `if` statements, `buildEither(first:)`, `buildEither(second:)` and `buildOptional(_:)` are defined with overloads to support up to 10 captures because each capture type needs to be transformed to an optional. The overload for non-capturing regexes, due to the lack of generic constraints, must be annotated with `@_disfavoredOverload` in order not shadow other overloads. We expect that a variadic-generic version of this method will eventually superseded all of these overloads.
404-
405-
```swift
406-
extension RegexComponentBuilder {
407-
// The following builder methods implement what would be possible with
408-
// variadic generics (using imaginary syntax) as a single method:
409-
//
410-
// public static func buildEither<
411-
// Component, WholeMatch, Capture...
412-
// >(
413-
// first component: Component
414-
// ) -> Regex<(Substring, Capture...)>
415-
// where Component.Output == (WholeMatch, Capture...)
416-
417-
public static func buildEither<R: RegexComponent>(
418-
first component: Component<R>
419-
) -> Regex<Substring> {
420-
component
421-
}
422-
423-
public static func buildEither<W, C0, R: RegexComponent>(
424-
first component: Component<R>
425-
) -> Regex<(Substring, C0)> where R.Output == (W, C0) {
426-
component
427-
}
428-
429-
public static func buildEither<W, C0, C1, R: RegexComponent>(
430-
first component: Component<R>
431-
) -> Regex<(Substring, C0, C1)> where R.Output == (W, C0, C1) {
432-
component
433-
}
434-
435-
// The following builder methods implement what would be possible with
436-
// variadic generics (using imaginary syntax) as a single method:
437-
//
438-
// public static func buildEither<
439-
// Component, WholeMatch, Capture...
440-
// >(
441-
// second component: Component
442-
// ) -> Regex<(Substring, Capture...)>
443-
// where Component.Output == (WholeMatch, Capture...)
444-
445-
public static func buildEither<R: RegexComponent>(
446-
second component: Component<R>
447-
) -> Regex<Substring> {
448-
component
449-
}
450-
451-
public static func buildEither<W, C0, R: RegexComponent>(
452-
second component: Component<R>
453-
) -> Regex<(Substring, C0)> where R.Output == (W, C0) {
454-
component
455-
}
456-
457-
public static func buildEither<W, C0, C1, R: RegexComponent>(
458-
second component: Component<R>
459-
) -> Regex<(Substring, C0, C1)> where R.Output == (W, C0, C1) {
460-
component
461-
}
462-
463-
// ... `O(arity)` overloads of `buildEither(_:)`
464-
465-
// The following builder methods implement what would be possible with
466-
// variadic generics (using imaginary syntax) as a single method:
467-
//
468-
// public static func buildOptional<
469-
// Component, WholeMatch, Capture...
470-
// >(
471-
// _ component: Component?
472-
// ) where Component.Output == (WholeMatch, Capture...)
473-
474-
@_disfavoredOverload
475-
public static func buildOptional<R: RegexComponent>(
476-
_ component: Component<R>?
477-
) -> Regex<Substring>
478-
479-
public static func buildOptional<W, C0, R: RegexComponent>(
480-
_ component: Component<R>?
481-
) -> Regex<(Substring, C0?)>
482-
483-
public static func buildOptional<W, C0, C1, R: RegexComponent>(
484-
_ component: Component<R>?
485-
) -> Regex<(Substring, C0?, C1?)>
486-
487-
// ... `O(arity)` overloads of `buildOptional(_:)`
488-
}
489-
```
490-
491-
To support `if #available(...)` statements, `buildLimitedAvailability(_:)` is defined with overloads to support up to 10 captures. Similar to `buildOptional`, the overload for non-capturing regexes must be annotated with `@_disfavoredOverload`.
403+
To support `if #available(...)` statements, `buildLimitedAvailability(_:)` is defined with overloads to support up to 10 captures. The overload for non-capturing regexes, due to the lack of generic constraints, must be annotated with `@_disfavoredOverload` in order not shadow other overloads. We expect that a variadic-generic version of this method will eventually superseded all of these overloads.
492404

493405
```swift
494406
extension RegexComponentBuilder {
@@ -518,6 +430,8 @@ extension RegexComponentBuilder {
518430
}
519431
```
520432

433+
`buildOptional` and `buildEither` are intentionally not supported due to ergonomic issues and fundamental semantic differences between regex conditionals and result builder conditionals. Please refer to the [alternatives considered](#support-buildoptional-and-buildeither) section for detailed rationale.
434+
521435
### Alternation
522436

523437
Alternations are used to match one of multiple patterns. An alternation wraps its underlying patterns' capture types in an `Optional` and concatenates them together, first to last.
@@ -620,99 +534,6 @@ public enum AlternationBuilder {
620534
// ... `O(arity^2)` overloads of `buildPartialBlock(accumulated:next:)`
621535
}
622536

623-
extension AlternationBuilder {
624-
// The following builder methods implement what would be possible with
625-
// variadic generics (using imaginary syntax) as a single method:
626-
//
627-
// public static func buildEither<
628-
// R, WholeMatch, Capture...
629-
// >(
630-
// first component: Component<R>
631-
// ) -> Regex<(Substring, Component<R>?...)>
632-
// where R.Output == (WholeMatch, Capture...)
633-
634-
@_disfavoredOverload
635-
public static func buildEither<R: RegexComponent>(
636-
first component: Component<R>
637-
) -> Regex<Substring>
638-
639-
public static func buildEither<W, C0, R: RegexComponent>(
640-
first component: Component<R>
641-
) -> Regex<(Substring, C0?)>
642-
643-
public static func buildEither<W, C0, C1, R: RegexComponent>(
644-
first component: Component<R>
645-
) -> Regex<(Substring, C0?, C1?)>
646-
647-
// ... `O(arity)` overloads of `buildEither(_:)`
648-
649-
public static func buildEither<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
650-
first component: Component<R>
651-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
652-
}
653-
654-
extension AlternationBuilder {
655-
// The following builder methods implement what would be possible with
656-
// variadic generics (using imaginary syntax) as a single method:
657-
//
658-
// public static func buildEither<
659-
// R, WholeMatch, Capture...
660-
// >(
661-
// second component: Component<R>
662-
// ) -> Regex<(Substring, Capture?...)>
663-
// where R.Output == (WholeMatch, Capture...)
664-
665-
@_disfavoredOverload
666-
public static func buildEither<R: RegexComponent>(
667-
second component: Component<R>
668-
) -> Regex<Substring>
669-
670-
public static func buildEither<W, C0, R: RegexComponent>(
671-
second component: Component<R>
672-
) -> Regex<(Substring, C0?)>
673-
674-
public static func buildEither<W, C0, C1, R: RegexComponent>(
675-
second component: Component<R>
676-
) -> Regex<(Substring, C0?, C1?)>
677-
678-
// ... `O(arity)` overloads of `buildEither(_:)`
679-
680-
public static func buildEither<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
681-
second component: Component<R>
682-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
683-
}
684-
685-
extension AlternationBuilder {
686-
// The following builder methods implement what would be possible with
687-
// variadic generics (using imaginary syntax) as a single method:
688-
//
689-
// public static func buildOptional<
690-
// Component, WholeMatch, Capture...
691-
// >(
692-
// _ component: Component?
693-
// ) -> Regex<(Substring, Capture?...)>
694-
// where Component.Output == (WholeMatch, Capture...)
695-
696-
@_disfavoredOverload
697-
public static func buildOptional<Component: RegexComponent>(
698-
_ component: Component?
699-
) -> Regex<Substring>
700-
701-
public static func buildOptional<W, C0, R: RegexComponent>(
702-
_ component: Component<R>?
703-
) -> Regex<(Substring, C0?)>
704-
705-
public static func buildOptional<W, C0, C1, R: RegexComponent>(
706-
_ component: Component<R>?
707-
) -> Regex<(Substring, C0?, C1?)>
708-
709-
// ... `O(arity)` overloads of `buildOptional(_:)`
710-
711-
public static func buildOptional<W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, R: RegexComponent>(
712-
_ component: Component<R>?
713-
) -> Regex<(Substring, C0?, C1?, C2?, C3?, C4?, C5?, C6?, C7?, C8, C9?)> where R.Output == (W, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)
714-
}
715-
716537
extension AlternationBuilder {
717538
// The following builder methods implement what would be possible with
718539
// variadic generics (using imaginary syntax) as a single method:
@@ -1290,6 +1111,53 @@ Regex { wholeSentence in
12901111
}
12911112
```
12921113

1114+
### Scoping
1115+
1116+
In textual regexes, atomic groups (`(?>...)`) can be used to define a backtracking scope. That is, when the regex engine exits from the scope successfully, it throws away all backtracking positions from the scope. In regex builder, the `Local` type serves this purpose.
1117+
1118+
```swift
1119+
public struct Local<Output>: RegexComponent {
1120+
public var regex: Regex<Output>
1121+
1122+
// The following builder methods implement what would be possible with
1123+
// variadic generics (using imaginary syntax) as a single set of methods:
1124+
//
1125+
// public init<WholeMatch, Capture..., Component: RegexComponent>(
1126+
// @RegexComponentBuilder _ component: () -> Component
1127+
// ) where Output == (Substring, Capture...), Component.Output == (WholeMatch, Capture...)
1128+
1129+
@_disfavoredOverload
1130+
public init<Component: RegexComponent>(
1131+
@RegexComponentBuilder _ component: () -> Component
1132+
) where Output == Substring
1133+
1134+
public init<W, C0, Component: RegexComponent>(
1135+
@RegexComponentBuilder _ component: () -> Component
1136+
) where Output == (Substring, C0), Component.Output == (W, C0)
1137+
1138+
public init<W, C0, C1, Component: RegexComponent>(
1139+
@RegexComponentBuilder _ component: () -> Component
1140+
) where Output == (Substring, C0, C1), Component.Output == (W, C0, C1)
1141+
1142+
// ... `O(arity)` overloads
1143+
}
1144+
```
1145+
1146+
For example, the following regex matches string `abcc` but not `abc`.
1147+
1148+
```swift
1149+
Regex {
1150+
"a"
1151+
Local {
1152+
ChoiceOf {
1153+
"bc"
1154+
"b"
1155+
}
1156+
}
1157+
"c"
1158+
}
1159+
```
1160+
12931161
## Source compatibility
12941162

12951163
Regex builder will be shipped in a new module named `RegexBuilder`, and thus will not affect the source compatibility of the existing code.
@@ -1306,7 +1174,7 @@ The proposed feature relies heavily upon overloads of `buildBlock` and `buildPar
13061174

13071175
### Operators for quantification and alternation
13081176

1309-
While `ChoiceOf` and quantifier functions provide a general way of creating alternations and quantifications, we recognize that some synctactic sugar can be useful for creating one-liners like in textual regexes, e.g. infix operator `|`, postfix operator `*`, etc.
1177+
While `ChoiceOf` and quantifier types provide a general way of creating alternations and quantifications, we recognize that some synctactic sugar can be useful for creating one-liners like in textual regexes, e.g. infix operator `|`, postfix operator `*`, etc.
13101178

13111179
```swift
13121180
// The following functions implement what would be possible with variadic
@@ -1441,6 +1309,83 @@ One could argue that type such as `OneOrMore<Output>` could be defined as a top-
14411309

14421310
Another reason to use types instead of free functions is consistency with existing result-builder-based DSLs such as SwiftUI.
14431311

1312+
### Support `buildOptional` and `buildEither`
1313+
1314+
To support `if` statements, an earlier iteration of this proposal defined `buildEither(first:)`, `buildEither(second:)` and `buildOptional(_:)` as the following:
1315+
1316+
```swift
1317+
extension RegexComponentBuilder {
1318+
public static func buildEither<
1319+
Component, WholeMatch, Capture...
1320+
>(
1321+
first component: Component
1322+
) -> Regex<(Substring, Capture...)>
1323+
where Component.Output == (WholeMatch, Capture...)
1324+
1325+
public static func buildEither<
1326+
Component, WholeMatch, Capture...
1327+
>(
1328+
second component: Component
1329+
) -> Regex<(Substring, Capture...)>
1330+
where Component.Output == (WholeMatch, Capture...)
1331+
1332+
public static func buildOptional<
1333+
Component, WholeMatch, Capture...
1334+
>(
1335+
_ component: Component?
1336+
) where Component.Output == (WholeMatch, Capture...)
1337+
}
1338+
```
1339+
1340+
However, multiple-branch control flow statements (e.g. `if`-`else` and `switch`) would need to be required to produce either the same regex type, which is limiting, or an "either-like" type, which can be difficult to work with when nested. Unlike `ChoiceOf`, producing a tuple of optionals is not an option, because the branch taken would be decided when the builder closure is executed, and it would cause capture numbering to be inconsistent with conventional regex.
1341+
1342+
Moreover, result builder conditionals does not work the same way as regex conditionals. In regex conditionals, the conditions are themselves regexes and are evaluated by the regex engine during matching, whereas result builder conditionals are evaluated as part of the builder closure. We hope that a future result builder feature will support "lifting" control flow conditions into the DSL domain, e.g. supporting `Regex<Bool>` as a condition.
1343+
1344+
### Flatten optionals
1345+
1346+
With the proposed design, `ChoiceOf` with `AlternationBuilder` wraps every component's capture type with an `Optional`. This means that any `ChoiceOf` with optional-capturing components would lead to a doubly-nested optional captures. This could make the result of matching harder to use.
1347+
1348+
```swift
1349+
ChoiceOf {
1350+
OneOrMore(Capture(.digit)) // Output == (Substring, Substring)
1351+
Optionally {
1352+
ZeroOrMore(Capture(.word)) // Output == (Substring, Substring?)
1353+
"a"
1354+
} // Output == (Substring, Substring??)
1355+
} // Output == (Substring, Substring?, Substring???)
1356+
```
1357+
1358+
One way to improve this could be overloading quantifier initializers (e.g. `ZeroOrMore.init(_:)`) and `AlternationBuilder.buildPartialBlock` to flatten any optionals upon composition. However, this would be non-trivial. Quantifier initializers would need to be overloaded `O(2^arity)` times to account for all possible positions of `Optional` that may appear in the `Output` tuple. Even worse, `AlternationBuilder.buildPartialBlock` would need to be overloaded `O(arity!)` times to account for all possible combinations of two `Output` tuples with all possible positions of `Optional` that may appear in one of the `Output` tuples.
1359+
1360+
### Structured rather than flat captures
1361+
1362+
We propose inferring capture types in such a way as to align with the traditional numbering of backreferences. This is because much of the motivation behind providing regex literals in Swift is their familiarity.
1363+
1364+
If we decided to deprioritize this motivation, there are opportunities to infer safer, more ergonomic, and arguably more intuitive types for captures. For example, to be consistent with traditional regex backreferences quantifications of multiple or nested captures had to produce parallel arrays rather than an array of tuples.
1365+
1366+
```swift
1367+
OneOrMore {
1368+
Capture {
1369+
OneOrMore(.hexDigit)
1370+
}
1371+
".."
1372+
Capture {
1373+
OneOrMore(.hexDigit)
1374+
}
1375+
}
1376+
1377+
// Flat capture types:
1378+
// => `Output == (Substring, Substring, Substring)>`
1379+
1380+
// Structured capture types:
1381+
// => `Output == (Substring, (Substring, Substring))`
1382+
```
1383+
1384+
Similarly, an alternation of multiple or nested captures could produce a structured alternation type (or an anonymous sum type) rather than flat optionals.
1385+
1386+
This is cool, but it adds extra complexity to regex builder and it isn't as clear because the generic type no longer aligns with the traditional regex backreference numbering. We think the consistency of the flat capture types trumps the added safety and ergonomics of the structured capture types.
1387+
1388+
14441389
[Declarative String Processing]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/DeclarativeStringProcessing.md
14451390
[Strongly Typed Regex Captures]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md
14461391
[Regex Syntax]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexSyntax.md

0 commit comments

Comments
 (0)