Skip to content

Commit 9d0cf04

Browse files
authored
Update delimiter proposal
More details and word smithing.
1 parent 8da68d3 commit 9d0cf04

File tree

1 file changed

+28
-21
lines changed

1 file changed

+28
-21
lines changed

Documentation/Evolution/DelimiterSyntax.md

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,7 @@
44

55
## Introduction
66

7-
This proposal helps complete the story told in *[Regex Type and Overview][regex-type]* and [elsewhere][pitch-status]. We propose the introduction of regex literals to Swift source code. The proposed syntax mirrors literals in other programing languages such as Perl, JavaScript and Ruby. As in those languages, literals are delimited with the `/` character:
8-
9-
```swift
10-
let re = /[0-9]+/
11-
```
7+
This proposal helps complete the story told in *[Regex Type and Overview][regex-type]* and [elsewhere][pitch-status]. We propose the introduction of regex literals to Swift source code, providing compile-time checks and typed-capture inference.
128

139
## Motivation
1410

@@ -37,23 +33,26 @@ let regex = /(?<identifier>[[:alpha:]]\w*) = (?<hex>[0-9A-F]+)/
3733
// regex: Regex<(Substring, identifier: Substring, hex: Substring)>
3834
```
3935

40-
Forward slashes are a regex term of art, and are used as the delimiters for regex literals in Perl, JavaScript and Ruby (though Perl and Ruby also provide alternatives). Their ubiquity and familiarity makes them a compelling choice for Swift.
36+
Forward slashes are a regex term of art. They are used as the delimiters for regex literals in, e.g., Perl, JavaScript and Ruby. Perl and Ruby additionally allow for [user-selected delimiters](https://perldoc.perl.org/perlop#Quote-and-Quote-like-Operators) to avoid having to escape any slashes inside a regex. For that purpose, we propose the extended literal `#/.../#`.
4137

42-
A regex literal may also be spelled using an extended syntax `#/.../#`, which allows the placement of an arbitrary number of balanced `#` characters around the literal. This syntax may be used to avoid needing to escape forward slashes within the regex. Additionally, it allows for a multi-line mode when the opening delimiter is followed by a new line.
38+
An extended literal, `#/.../#`, avoids the need to escape forward slashes within the regex. It allows an arbitrary number of balanced `#` characters around the literal and escape. When the opening delimiter is followed by a new line, it supports a multi-line literal where whitespace is non-semantic and line-ending comments are ignored.
4339

44-
Within a regex literal, the compiler will parse the regex syntax outlined in *[Regex Construction][internal-syntax]*, and diagnose any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Using a literal allows editors to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see *[Regex builder DSL][regex-dsl]*).
40+
The compiler will parse the contents of a regex literal using regex syntax outlined in *[Regex Construction][internal-syntax]*, diagnosing any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Regex literals allows editors and source tools to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see *[Regex builder DSL][regex-dsl]*).
4541

4642
A regex literal also allows for seamless composition with the Regex DSL, enabling lightweight intermixing of a regex syntax with other elements of the builder:
4743

4844
```swift
49-
// A regex literal for parsing an amount of currency in dollars or pounds.
45+
// A regex for extracting a currency (dollars or pounds) and amount from input
46+
// with precisely the form /[$£]\d+\.\d{2}/
5047
let regex = Regex {
51-
/([$£])/
48+
Capture { /[$£]/ }
5249
TryCapture {
53-
OneOrMore(.digit)
50+
/\d+/
5451
"."
55-
Repeat(.digit, count: 2)
56-
} transform: { Amount(twoDecimalPlaces: $0) }
52+
/\d{2}/
53+
} transform: {
54+
Amount(twoDecimalPlaces: $0)
55+
}
5756
}
5857
```
5958

@@ -65,7 +64,7 @@ Due to the existing use of `/` in comment syntax and operators, there are some s
6564

6665
### Upgrade path
6766

68-
Due to the source breaking changes needed for the `/.../` syntax, it will be introduced in Swift 6 mode. However, projects will be able to adopt it earlier by using the compiler flag `-enable-regex-literals`. Note this does not affect the extended syntax `#/.../#`, which will be usable immediately.
67+
Due to the source breaking changes needed for the `/.../` syntax, it will be introduced in Swift 6 mode. However, projects will be able to adopt it earlier by using the compiler flag `-enable-regex-literals`. Note this does not affect the extended literal `#/.../#`, which will be usable immediately.
6968

7069
### Named typed captures
7170

@@ -84,18 +83,27 @@ func matchHexAssignment(_ input: String) -> (String, Int)? {
8483
}
8584
```
8685

87-
This allows the captures to be referenced as `match.identifier` and `match.hex` instead of `match.1` and `match.2`, which would be the behavior for unnamed capture groups. This label inference behavior is not available in the DSL, however users are able to [bind captures to named variables instead][dsl-captures].
86+
This allows the captures to be referenced as `match.identifier` and `match.hex`, in addition to numerically (like unnamed capture groups) as `match.1` and `match.2`. This label inference behavior is not available in the DSL, however users are able to [bind captures to named variables instead][dsl-captures].
8887

8988
### Extended delimiters `#/.../#`, `##/.../##`
9089

91-
Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced pound characters. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example:
90+
Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced octothorpes. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example:
9291

9392
```swift
9493
let regex = #/usr/lib/modules/([^/]+)/vmlinuz/#
9594
// regex: Regex<(Substring, Substring)>
9695
```
9796

98-
The number of pounds may be further increased to allow the use of e.g `/#` within the literal. This is similar in style to the raw string literal syntax introduced by [SE-0200], however it has a couple of key differences. The escaping rules for backslashes do not change, and a multi-line mode is entered when the opening delimiter is followed by a newline.
97+
The number of pounds may be further increased to allow the use of e.g `/#` within the literal. This is similar in style to the raw string literal syntax introduced by [SE-0200], however it has a couple of key differences. The escaping rules for backslashes do not change. Additionally, a multi-line mode, where whitespace and line-ending comments are ignored, is entered when the opening delimiter is followed by a newline.
98+
99+
```swift
100+
let regex = #/
101+
/usr/lib/modules/ # Prefix
102+
(?<subpath> [^/]+)
103+
/vmlinuz # The kernel
104+
#/
105+
// regex: Regex<(Substring, subpath: Substring)>
106+
```
99107
100108
#### Escaping of backslashes
101109
@@ -158,11 +166,11 @@ Perhaps the most obvious parsing ambiguity with `/.../` delimiters is with comme
158166

159167
### Ambiguity with infix operators
160168

161-
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required for regex literal interpretation, e.g `x + /y/`. Alternatively, extended syntax may be used, e.g `x+#/y/#`.
169+
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required for regex literal interpretation, e.g `x + /y/`. Alternatively, extended literals may be used, e.g `x+#/y/#`.
162170

163171
### Regex syntax limitations
164172

165-
In order to help avoid further parsing ambiguities, a `/.../` regex literal will not be parsed if it starts with a space, tab, or `)` character. Though the latter is already invalid regex syntax. This restriction may be avoided by using extended `#/.../#` syntax.
173+
In order to help avoid further parsing ambiguities, a `/.../` regex literal will not be parsed if it starts with a space, tab, or `)` character. Though the latter is already invalid regex syntax. This restriction may be avoided by using the extended `#/.../#` literal.
166174

167175
#### Rationale
168176

@@ -194,7 +202,7 @@ let regex = Regex {
194202
}
195203
```
196204

197-
or extended syntax must be used, e.g:
205+
or extended literal must be used, e.g:
198206

199207
```swift
200208
let regex = Regex {
@@ -378,4 +386,3 @@ We therefore feel this would be a much less compelling feature without first cla
378386

379387
[regex-dsl]: https://github.com/apple/swift-evolution/blob/main/proposals/0351-regex-builder.md
380388
[dsl-captures]: https://github.com/apple/swift-evolution/blob/main/proposals/0351-regex-builder.md#capture-and-reference
381-

0 commit comments

Comments
 (0)