You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/DelimiterSyntax.md
+28-21Lines changed: 28 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,7 @@
4
4
5
5
## Introduction
6
6
7
-
This proposal helps complete the story told in *[Regex Type and Overview][regex-type]* and [elsewhere][pitch-status]. We propose the introduction of regex literals to Swift source code. The proposed syntax mirrors literals in other programing languages such as Perl, JavaScript and Ruby. As in those languages, literals are delimited with the `/` character:
8
-
9
-
```swift
10
-
let re =/[0-9]+/
11
-
```
7
+
This proposal helps complete the story told in *[Regex Type and Overview][regex-type]* and [elsewhere][pitch-status]. We propose the introduction of regex literals to Swift source code, providing compile-time checks and typed-capture inference.
12
8
13
9
## Motivation
14
10
@@ -37,23 +33,26 @@ let regex = /(?<identifier>[[:alpha:]]\w*) = (?<hex>[0-9A-F]+)/
Forward slashes are a regex term of art, and are used as the delimiters for regex literals inPerl, JavaScript and Ruby (though Perl and Ruby also provide alternatives). Their ubiquity and familiarity makes them a compelling choice for Swift.
36
+
Forward slashes are a regex term of art. They are used as the delimiters for regex literals in, e.g., Perl, JavaScript and Ruby. Perl and Ruby additionally allow for [user-selected delimiters](https://perldoc.perl.org/perlop#Quote-and-Quote-like-Operators) to avoid having to escape any slashes inside a regex. For that purpose, we propose the extended literal `#/.../#`.
41
37
42
-
A regex literal may also be spelled using an extended syntax `#/.../#`, which allows the placement of an arbitrary number of balanced `#` characters around the literal. This syntax may be used to avoid needing to escape forward slashes within the regex. Additionally, it allows for a multi-line mode when the opening delimiter is followed by a new line.
38
+
An extended literal, `#/.../#`, avoids the need to escape forward slashes within the regex. It allows an arbitrary number of balanced `#` characters around the literal and escape. When the opening delimiter is followed by a new line, it supports a multi-line literal where whitespace is non-semantic and line-ending comments are ignored.
43
39
44
-
Within a regex literal, the compiler will parse the regex syntax outlined in *[Regex Construction][internal-syntax]*, and diagnose any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Using a literal allows editors to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see *[Regex builder DSL][regex-dsl]*).
40
+
The compiler will parse the contents of a regex literal using regex syntax outlined in *[Regex Construction][internal-syntax]*, diagnosing any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Regex literals allows editors and source tools to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see *[Regex builder DSL][regex-dsl]*).
45
41
46
42
A regex literal also allows for seamless composition with the Regex DSL, enabling lightweight intermixing of a regex syntax with other elements of the builder:
47
43
48
44
```swift
49
-
// A regex literal for parsing an amount of currency in dollars or pounds.
45
+
// A regex for extracting a currency (dollars or pounds) and amount from input
46
+
// with precisely the form /[$£]\d+\.\d{2}/
50
47
let regex =Regex {
51
-
/([$£])/
48
+
Capture { /[$£]/ }
52
49
TryCapture {
53
-
OneOrMore(.digit)
50
+
/\d+/
54
51
"."
55
-
Repeat(.digit, count: 2)
56
-
} transform: { Amount(twoDecimalPlaces: $0) }
52
+
/\d{2}/
53
+
} transform: {
54
+
Amount(twoDecimalPlaces: $0)
55
+
}
57
56
}
58
57
```
59
58
@@ -65,7 +64,7 @@ Due to the existing use of `/` in comment syntax and operators, there are some s
65
64
66
65
### Upgrade path
67
66
68
-
Due to the source breaking changes needed for the `/.../` syntax, it will be introduced in Swift 6 mode. However, projects will be able to adopt it earlier by using the compiler flag `-enable-regex-literals`. Note this does not affect the extended syntax`#/.../#`, which will be usable immediately.
67
+
Due to the source breaking changes needed for the `/.../` syntax, it will be introduced in Swift 6 mode. However, projects will be able to adopt it earlier by using the compiler flag `-enable-regex-literals`. Note this does not affect the extended literal`#/.../#`, which will be usable immediately.
This allows the captures to be referenced as `match.identifier` and `match.hex` instead of `match.1` and `match.2`, which would be the behavior for unnamed capture groups. This label inference behavior is not available in the DSL, however users are able to [bind captures to named variables instead][dsl-captures].
86
+
This allows the captures to be referenced as `match.identifier` and `match.hex`, in addition to numerically (like unnamed capture groups) as `match.1` and `match.2`. This label inference behavior is not available in the DSL, however users are able to [bind captures to named variables instead][dsl-captures].
88
87
89
88
### Extended delimiters `#/.../#`, `##/.../##`
90
89
91
-
Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced pound characters. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example:
90
+
Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced octothorpes. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example:
92
91
93
92
```swift
94
93
let regex =#/usr/lib/modules/([^/]+)/vmlinuz/#
95
94
// regex: Regex<(Substring, Substring)>
96
95
```
97
96
98
-
The number of pounds may be further increased to allow the use of e.g `/#` within the literal. This is similar in style to the raw string literal syntax introduced by [SE-0200], however it has a couple of key differences. The escaping rules for backslashes do not change, and a multi-line mode is entered when the opening delimiter is followed by a newline.
97
+
The number of pounds may be further increased to allow the use of e.g `/#` within the literal. This is similar in style to the raw string literal syntax introduced by [SE-0200], however it has a couple of key differences. The escaping rules for backslashes do not change. Additionally, a multi-line mode, where whitespace and line-ending comments are ignored, is entered when the opening delimiter is followed by a newline.
98
+
99
+
```swift
100
+
let regex =#/
101
+
/usr/lib/modules/ # Prefix
102
+
(?<subpath>[^/]+)
103
+
/vmlinuz # The kernel
104
+
#/
105
+
// regex: Regex<(Substring, subpath: Substring)>
106
+
```
99
107
100
108
#### Escaping of backslashes
101
109
@@ -158,11 +166,11 @@ Perhaps the most obvious parsing ambiguity with `/.../` delimiters is with comme
158
166
159
167
### Ambiguity with infix operators
160
168
161
-
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required for regex literal interpretation, e.g `x + /y/`. Alternatively, extended syntax may be used, e.g `x+#/y/#`.
169
+
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required for regex literal interpretation, e.g `x + /y/`. Alternatively, extended literals may be used, e.g `x+#/y/#`.
162
170
163
171
### Regex syntax limitations
164
172
165
-
In order to help avoid further parsing ambiguities, a `/.../` regex literal will not be parsed if it starts with a space, tab, or `)` character. Though the latter is already invalid regex syntax. This restriction may be avoided by using extended `#/.../#`syntax.
173
+
In order to help avoid further parsing ambiguities, a `/.../` regex literal will not be parsed if it starts with a space, tab, or `)` character. Though the latter is already invalid regex syntax. This restriction may be avoided by using the extended `#/.../#`literal.
166
174
167
175
#### Rationale
168
176
@@ -194,7 +202,7 @@ let regex = Regex {
194
202
}
195
203
```
196
204
197
-
or extended syntax must be used, e.g:
205
+
or extended literal must be used, e.g:
198
206
199
207
```swift
200
208
let regex =Regex {
@@ -378,4 +386,3 @@ We therefore feel this would be a much less compelling feature without first cla
0 commit comments