You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flesh things out a bit more. Initial bits for the Intro and Motivation. Split out Proposed solution from Detailed design. Parallelize the structure a bit better.
Copy file name to clipboardExpand all lines: Documentation/Evolution/DelimiterSyntax.md
+42-33Lines changed: 42 additions & 33 deletions
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,27 @@
1
-
# Regular Expression Literal Delimiters
1
+
# Regex Literal Delimiters
2
2
3
3
- Authors: Hamish Knight, Michael Ilseman, David Ewing
4
4
5
5
## Introduction
6
6
7
-
**TODO**
7
+
This proposal introduces regex literals to Swift source code. The proposed syntax mirrors literals in other programing languages such as Perl, JavaScript and Ruby. As in those languages, literals are delimited with the `/` character:
8
8
9
-
**TODO: Motivation for regex literals in the first place? Or is that a given?**
9
+
```swift
10
+
let re =/[0-9]+/
11
+
```
12
+
13
+
## Motivation
10
14
11
-
**TODO: Overviewof regex literals in other languages?**
15
+
This proposal helps complete the story told in [Regex Type and Overview][regex-type] and [elsewhere][pitch-status]. Literals are compiled directly, allowing errors to be found at compile time, rather than at run time. Using a literal also allows editors to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL (see [Regex builder DSL][regex-dsl]). It would be difficult to support all of this if regexes could only be defined inside a string.
12
16
13
-
## Detailed Design
17
+
18
+
## Proposed solution
14
19
15
20
**TODO: Say that this is Swift 6 syntax only, `#/.../#` would be 5.7 syntax**
16
21
17
-
A regular expression literal will be introduced using `/.../` delimiters, within which the compiler will parse a regular expression (the details of which are outlined in [the Regex Syntax pitch][internal-syntax]):
22
+
**TODO: But is it?**
23
+
24
+
A regex literal will be introduced using `/.../` delimiters, within which the compiler will parse a regular expression (the details of which are outlined in [the Regex Syntax pitch][internal-syntax]):
18
25
19
26
```swift
20
27
// Matches "<identifier> = <hexadecimal value>", extracting the identifier and hex number
@@ -25,29 +32,32 @@ Forward slashes are a regex term of art, and are used as the delimiters for rege
25
32
26
33
Due to the existing use of `/` in comment syntax and operators, there are some syntactic ambiguities to consider. While there are quite a few cases to consider, we do not feel that the impact of any individual case is sufficient to disqualify the syntax.
27
34
28
-
**TODO: Anything else we want to say here before segueing into the massive list?**
35
+
## Detailed design
29
36
30
-
### Parsing ambiguities
37
+
Choice of `/` as the regex literal delimiter requires a number of ambiguities to be resolved. And it requires some existing features of the language to be disallowed.
31
38
32
-
The obvious parsing ambiguity with `/.../` delimiters is with comment syntaxes.
39
+
### Ambiguities with comment syntax
33
40
34
-
- An empty regex literal would conflict with line comment syntax `//`. But this isn't a particularly useful thing to express, and can therefore be disallowed without significant impact.
41
+
Perhaps the most obvious parsing ambiguity with `/.../` delimiters is with comment syntax.
35
42
36
-
-The obvious choice for a multi-line regular expression literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. A different multi-line delimiter would be needed, with no obvious choice.
43
+
-An empty regex literal would conflict with line comment syntax `//`. But an empty regex isn't a particularly useful thing to express, and can be disallowed without significant impact.
37
44
38
45
- There is a conflict with block comment syntax, when surrounding a regex literal ending with `*`, for example:
39
46
40
47
```swift
41
48
/*
42
-
let regex = /x*/
49
+
let regex = /[0-9]*/
43
50
*/
44
51
```
45
52
46
53
In this case, the block comment would prematurely end on the second line, rather than extending all the way to the third line as the user would expect. This is already an issue today with `*/` in a string literal, however it is much more likely to occur in a regular expression given the prevalence of the `*` quantifier.
47
54
48
55
- Block comment syntax also means that a regex literal would not be able to start with the `*` character, however this is less of a concern as it would not be valid regex syntax.
49
56
50
-
- Finally, there would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
57
+
58
+
### Ambiguity with infix operators
59
+
60
+
There would be a minor ambiguity with infix operators used with regex literals. When used without whitespace, e.g `x+/y/`, the expression will be treated as using an infix operator `+/`. Whitespace is therefore required `x + /y/` for regex literal interpretation.
51
61
52
62
### Regex syntax limitations
53
63
@@ -163,12 +173,22 @@ This takes advantage of the fact that a regex literal will not be parsed if the
163
173
164
174
</details>
165
175
166
-
### Editor Considerations
167
176
168
-
**TODO: Rewrite now that `/.../` is the syntax being pitched?**
177
+
## Future Directions
178
+
179
+
### Raw literals
180
+
181
+
The obvious choice here would follow string literals and use `#/.../#`.
182
+
183
+
### Multi-line literals
184
+
185
+
The obvious choice for a multi-line regex literal would be to use `///` delimiters, in accordance with the precedent set by multi-line string literals `"""`. But this signifies a (documentation) comment, so a different multi-line delimiter would be needed, with no obvious choice. However, it's not clear that we need multi-line regex literals. The existing literals can be used inside a regex builder DSL.
169
186
170
-
As described above, there would be a lot involved in handling the parsing ambiguities with `/.../` delimiters. It's one thing to do this in the compiler. But the language also has to be understood by a plethora of source code editors. Those editors either need encode all those ambiguities, or they need to provide a "best effort" at handling the most common cases. It's all too common for editors to take the "best effort" route. There's a long history of complaints with editors that don't completely support a language's features. And indeed, there's plenty of history of editors that don't correctly support regular expression literals in other languages. By choosing a literal that is easily parsed, we should avoid seeing those complaints regarding Swift.
187
+
### Regex extended syntax
171
188
189
+
Allowing non-semantic whitespace and other features of the extended syntax would be highly desired, with no obvious choice for a literal. Perhaps the need is also lessened by the ability to use regex literals inside the regex builder DSL.
190
+
191
+
## Alternatives Considered
172
192
173
193
### Pound slash `#/.../#`
174
194
@@ -180,12 +200,6 @@ However this option would also have the same block comment issue as `/.../` wher
180
200
181
201
Additionally, introducing this syntax would introduce an inconsistency with raw string literal syntax, as `#/.../#` on its own would not treat backslashes as literal, unlike `#"..."#`. If raw regex syntax were implemented, it would start at `##/.../##`. With raw strings, escape sequences must use the same number of `#`s as the delimiter, e.g `#"\#n"#` for a newline. However for raw regex literals it would be one fewer `#` than the delimiter e.g `##/\#n/##`.
182
202
183
-
## Future Directions
184
-
185
-
**TODO: What do we want to say here? Talk about raw and multiline? Don't really have a good option for the latter tho**
186
-
187
-
## Alternatives Considered
188
-
189
203
### Prefixed quote `re'...'`
190
204
191
205
We could choose to use `re'...'` delimiters, for example:
@@ -195,17 +209,9 @@ We could choose to use `re'...'` delimiters, for example:
195
209
let regex = re'([[:alpha:]]\w*) = ([0-9A-F]+)'
196
210
```
197
211
198
-
The use of two letter prefix could potentially be used as a namespace for future literal types. However, it is unusual for a Swift literal to be prefixed in this way.
199
-
200
-
**TODO: Any other reasons why not to pick this?**
201
-
202
-
**TODO: Mention that it nicely extends to raw and multiline?**
203
-
204
-
#### Regex syntax limitations
205
-
206
-
There are a few items of regex grammar that use the single quote character as a metacharacter. These include named group definitions and references such as `(?'name')`, `(?('name'))`, `\g'name'`, `\k'name'`, as well as callout syntax `(?C'arg')`. The use of a single quote conflicts with the `re'...'` delimiter as it will be considered the end of the literal. Fortunately, alternative syntax exists for all of these constructs, e.g `(?<name>)`, `\k<name>`, and `(?C"arg")`.
212
+
The use of two letter prefix could potentially be used as a namespace for future literal types. It would also have obvious extensions to raw and multi-line literals using `re#'...'#` and `re'''...'''` respectively. However, it is unusual for a Swift literal to be prefixed in this way. We also feel that its similarity to a string literal might have users confuse it with a raw string literal.
207
213
208
-
As such, the single quote variants of the syntax would be considered invalid in a `re'...'` literal, and users must use the alternative syntax instead. If a raw variant of the syntax `re#'...'#` of the syntax is later added, that may also be used. In order to improve diagnostic behavior, the compiler would attempt to scan ahead when encountering the ending sequences `(?`, `(?(`, `\g`, `\k`and `(?C`. This would enable a more accurate error to be emitted that suggests the alternative syntax.
214
+
Also, there are a few items of regex grammar that use the single quote character as a metacharacter. These include named group definitions and references such as `(?'name')`, `(?('name'))`, `\g'name'`, `\k'name'`, as well as callout syntax `(?C'arg')`. The use of a single quote conflicts with the `re'...'` delimiter as it will be considered the end of the literal. However, alternative syntax exists for all of these constructs, e.g `(?<name>)`, `\k<name>`, and `(?C"arg")`. Those could be required instead. If a raw regex literal were later added, the single quote syntax could also be used.
- We would not be able to easily apply custom syntax highlighting for the regex syntax.
254
+
- We would not be able to easily apply custom syntax highlighting and other editor features for the regex syntax.
249
255
- It would require an `ExpressibleByRegexLiteral` contextual type to be treated as a regex, otherwise it would be defaulted to `String`, which may be undesired.
250
256
- In an overloaded context it may be ambiguous or unclear whether a string literal is meant to be interpreted as a literal string or regex.
251
257
- Regex-specific escape sequences such as `\w` would likely require the use of raw string syntax `#"..."#`, as they are otherwise invalid in a string literal.
@@ -258,3 +264,6 @@ Instead of adding a custom regex literal, we could require users to explicitly w
0 commit comments