Skip to content

Commit 298092c

Browse files
committed
Flesh things out a bit more. Initial bits for the Intro and Motivation. Split out Proposed solution from Detailed design. Parallelize the structure a bit better.
1 parent c0e3bef commit 298092c

File tree

1 file changed

+42
-33
lines changed

1 file changed

+42
-33
lines changed

Documentation/Evolution/DelimiterSyntax.md

Lines changed: 42 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,27 @@
1-
# Regular Expression Literal Delimiters
1+
# Regex Literal Delimiters
22

33
- Authors: Hamish Knight, Michael Ilseman, David Ewing
44

55
## Introduction
66

7-
**TODO**
7+
This proposal introduces regex literals to Swift source code. The proposed syntax mirrors literals in other programing languages such as Perl, JavaScript and Ruby. As in those languages, literals are delimited with the `/` character:
88

9-
**TODO: Motivation for regex literals in the first place? Or is that a given?**
9+
```swift
10+
let re = /[0-9]+/
11+
```
12+
13+
## Motivation
1014

11-
**TODO: Overview of regex literals in other languages?**
15+
This proposal helps complete the story told in [Regex Type and Overview][regex-type] and [elsewhere][pitch-status]. Literals are compiled directly, allowing errors to be found at compile time, rather than at run time. Using a literal also allows editors to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see [Regex builder DSL][regex-dsl]). It would be difficult to support all of this if regexes could only be defined inside a string.
1216

13-
## Detailed Design
17+
18+
## Proposed solution
1419

1520
**TODO: Say that this is Swift 6 syntax only, `#/.../#` would be 5.7 syntax**
1621

17-
A regular expression literal will be introduced using `/.../` delimiters, within which the compiler will parse a regular expression (the details of which are outlined in [the Regex Syntax pitch][internal-syntax]):
22+
**TODO: But is it?**
23+
24+
A regex literal will be introduced using `/.../` delimiters, within which the compiler will parse a regular expression (the details of which are outlined in [the Regex Syntax pitch][internal-syntax]):
1825

1926
```swift
2027
// Matches "<identifier> = <hexadecimal value>", extracting the identifier and hex number
@@ -25,29 +32,32 @@ Forward slashes are a regex term of art, and are used as the delimiters for rege
2532

2633
Due to the existing use of `/` in comment syntax and operators, there are some syntactic ambiguities to consider. While there are quite a few cases to consider, we do not feel that the impact of any individual case is sufficient to disqualify the syntax.
2734

28-
**TODO: Anything else we want to say here before segueing into the massive list?**
35+
## Detailed design
2936

30-
### Parsing ambiguities
37+
Choice of `/` as the regex literal delimiter requires a number of ambiguities to be resolved. And it requires some existing features of the language to be disallowed.
3138

32-
The obvious parsing ambiguity with `/.../` delimiters is with comment syntaxes.
39+
### Ambiguities with comment syntax
3340

34-
- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and can therefore be disallowed without significant impact.
41+
Perhaps the most obvious parsing ambiguity with `/.../` delimiters is with comment syntax.
3542

36-
- The obvious choice for a multi-line regular expression literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. A different multi-line delimiter would be needed, with no obvious choice.
43+
- An empty regex literal would conflict with line comment syntax `//`. But an empty regex isn't a particularly useful thing to express, and can be disallowed without significant impact.
3744

3845
- There is a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
3946

4047
```swift
4148
/*
42-
let regex = /x*/
49+
let regex = /[0-9]*/
4350
*/
4451
```
4552

4653
In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier.
4754

4855
- Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax.
4956

50-
- Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
57+
58+
### Ambiguity with infix operators
59+
60+
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
5161

5262
### Regex syntax limitations
5363

@@ -163,12 +173,22 @@ This takes advantage of the fact that a regex literal will not be parsed if the
163173

164174
</details>
165175

166-
### Editor Considerations
167176

168-
**TODO: Rewrite now that `/.../` is the syntax being pitched?**
177+
## Future Directions
178+
179+
### Raw literals
180+
181+
The obvious choice here would follow string literals and use `#/.../#`.
182+
183+
### Multi-line literals
184+
185+
The obvious choice for a multi-line regex literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. But this signifies a (documentation) comment, so a different multi-line delimiter would be needed, with no obvious choice. However, it's not clear that we need multi-line regex literals. The existing literals can be used inside a regex builder DSL.
169186

170-
As described above, there would be a lot involved in handling the parsing ambiguities with `/.../` delimiters. It's one thing to do this in the compiler. But the language also has to be understood by a plethora of source code editors. Those editors either need encode all those ambiguities, or they need to provide a "best effort" at handling the most common cases. It's all too common for editors to take the "best effort" route. There's a long history of complaints with editors that don't completely support a language's features. And indeed, there's plenty of history of editors that don't correctly support regular expression literals in other languages. By choosing a literal that is easily parsed, we should avoid seeing those complaints regarding Swift.
187+
### Regex extended syntax
171188

189+
Allowing non-semantic whitespace and other features of the extended syntax would be highly desired, with no obvious choice for a literal. Perhaps the need is also lessened by the ability to use regex literals inside the regex builder DSL.
190+
191+
## Alternatives Considered
172192

173193
### Pound slash `#/.../#`
174194

@@ -180,12 +200,6 @@ However this option would also have the same block comment issue as `/.../` wher
180200

181201
Additionally, introducing this syntax would introduce an inconsistency with raw string literal syntax, as `#/.../#` on its own would not treat backslashes as literal, unlike `#"..."#`. If raw regex syntax were implemented, it would start at `##/.../##`. With raw strings, escape sequences must use the same number of `#`s as the delimiter, e.g `#"\#n"#` for a newline. However for raw regex literals it would be one fewer `#` than the delimiter e.g `##/\#n/##`.
182202

183-
## Future Directions
184-
185-
**TODO: What do we want to say here? Talk about raw and multiline? Don't really have a good option for the latter tho**
186-
187-
## Alternatives Considered
188-
189203
### Prefixed quote `re'...'`
190204

191205
We could choose to use `re'...'` delimiters, for example:
@@ -195,17 +209,9 @@ We could choose to use `re'...'` delimiters, for example:
195209
let regex = re'([[:alpha:]]\w*) = ([0-9A-F]+)'
196210
```
197211

198-
The use of two letter prefix could potentially be used as a namespace for future literal types. However, it is unusual for a Swift literal to be prefixed in this way.
199-
200-
**TODO: Any other reasons why not to pick this?**
201-
202-
**TODO: Mention that it nicely extends to raw and multiline?**
203-
204-
#### Regex syntax limitations
205-
206-
There are a few items of regex grammar that use the single quote character as a metacharacter. These include named group definitions and references such as `(?'name')`, `(?('name'))`, `\g'name'`, `\k'name'`, as well as callout syntax `(?C'arg')`. The use of a single quote conflicts with the `re'...'` delimiter as it will be considered the end of the literal. Fortunately, alternative syntax exists for all of these constructs, e.g `(?<name>)`, `\k<name>`, and `(?C"arg")`.
212+
The use of two letter prefix could potentially be used as a namespace for future literal types. It would also have obvious extensions to raw and multi-line literals using `re#'...'#` and `re'''...'''` respectively. However, it is unusual for a Swift literal to be prefixed in this way. We also feel that its similarity to a string literal might have users confuse it with a raw string literal.
207213

208-
As such, the single quote variants of the syntax would be considered invalid in a `re'...'` literal, and users must use the alternative syntax instead. If a raw variant of the syntax `re#'...'#` of the syntax is later added, that may also be used. In order to improve diagnostic behavior, the compiler would attempt to scan ahead when encountering the ending sequences `(?`, `(?(`, `\g`, `\k` and `(?C`. This would enable a more accurate error to be emitted that suggests the alternative syntax.
214+
Also, there are a few items of regex grammar that use the single quote character as a metacharacter. These include named group definitions and references such as `(?'name')`, `(?('name'))`, `\g'name'`, `\k'name'`, as well as callout syntax `(?C'arg')`. The use of a single quote conflicts with the `re'...'` delimiter as it will be considered the end of the literal. However, alternative syntax exists for all of these constructs, e.g `(?<name>)`, `\k<name>`, and `(?C"arg")`. Those could be required instead. If a raw regex literal were later added, the single quote syntax could also be used.
209215

210216
### Prefixed double quote `re"...."`
211217

@@ -245,7 +251,7 @@ let regex: Regex = #"([[:alpha:]]\w*) = ([0-9A-F]+)"#
245251

246252
However we decided against this because:
247253

248-
- We would not be able to easily apply custom syntax highlighting for the regex syntax.
254+
- We would not be able to easily apply custom syntax highlighting and other editor features for the regex syntax.
249255
- It would require an `ExpressibleByRegexLiteral` contextual type to be treated as a regex, otherwise it would be defaulted to `String`, which may be undesired.
250256
- In an overloaded context it may be ambiguous or unclear whether a string literal is meant to be interpreted as a literal string or regex.
251257
- Regex-specific escape sequences such as `\w` would likely require the use of raw string syntax `#"..."#`, as they are otherwise invalid in a string literal.
@@ -258,3 +264,6 @@ Instead of adding a custom regex literal, we could require users to explicitly w
258264
[SE-0168]: https://github.com/apple/swift-evolution/blob/main/proposals/0168-multi-line-string-literals.md
259265
[SE-0200]: https://github.com/apple/swift-evolution/blob/main/proposals/0200-raw-string-escaping.md
260266
[internal-syntax]: https://forums.swift.org/t/pitch-regex-syntax/55711
267+
[regex-type]: https://forums.swift.org/t/pitch-regex-type-and-overview/56029
268+
[pitch-status]: https://github.com/apple/swift-experimental-string-processing/issues/107
269+
[regex-dsl]: https://forums.swift.org/t/pitch-regex-builder-dsl/56007

0 commit comments

Comments
 (0)