Skip to content

Commit 141faf8

Browse files
milsemanJacobHearststephentyrone
authored
[Pitch] Regex Lookbehind Assertions (#2525)
* Add regex reverse matching proposal * Resolve Regex builder section TODO (#5) * Resolve Regex builder section TODO * List APIs * Small spelling, documentation fixups * Lookbehind assertions only * Adjust proposal name * Update and rename nnnn-regex-lookbehind-assertions.md to 0448-regex-lookbehind-assertions.md Prepare 0448: regex lookbehind for review. * Update 0448-regex-lookbehind-assertions.md --------- Co-authored-by: Jacob Hearst <[email protected]> Co-authored-by: Stephen Canon <[email protected]>
1 parent d780651 commit 141faf8

File tree

1 file changed

+132
-0
lines changed

1 file changed

+132
-0
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Regex lookbehind assertions
2+
3+
* Proposal: [SE-0448](0448-regex-lookbehind-assertions.md)
4+
* Authors: [Jacob Hearst](https://github.com/JacobHearst) [Michael Ilseman](https://github.com/milseman)
5+
* Review Manager: [Steve Canon](https://github.com/stephentyrone)
6+
* Status: **Active review (September 17...October 1, 2024)**
7+
* Implementation: https://github.com/swiftlang/swift-experimental-string-processing/pull/760
8+
* Review: ([pitch](https://github.com/swiftlang/swift-evolution/pull/2525))([review])
9+
10+
11+
## Introduction
12+
13+
Regex supports lookahead assertions, but does not currently support lookbehind assertions. We propose adding these.
14+
15+
## Motivation
16+
17+
Modern regular expression engines support lookbehind assertions, whether fixed length (Perl, PCRE2, Python, Java) or arbitrary length (.NET, Javascript).
18+
19+
## Proposed solution
20+
21+
We propose supporting arbitrary-length lookbehind regexes which can be achieved by performing matching in reverse.
22+
23+
Like lookahead assertions, lookbehind assertions are _zero-width_, meaning they do not affect the current match position.
24+
25+
Examples:
26+
27+
28+
```swift
29+
"abc".firstMatch(of: /a(?<=a)bc/) // matches "abc"
30+
"abc".firstMatch(of: /a(?<=b)c/) // no match
31+
"abc".firstMatch(of: /a(?<=.)./) // matches "ab"
32+
"abc".firstMatch(of: /ab(?<=a)c/) // no match
33+
"abc".firstMatch(of: /ab(?<=.a)c/) // no match
34+
"abc".firstMatch(of: /ab(?<=a.)c/) // matches "abc"
35+
```
36+
37+
Lookbehind assertions run in reverse, i.e. right-to-left, meaning that right-most eager quantifications have the opportunity to consume more of the input than left-most. This does not affect whether an input matches, but could affect the value of captures inside of a lookbehind assertion:
38+
39+
```swift
40+
"abcdefg".wholeMatch(of: /(.+)(.+)/)
41+
// Produces ("abcdefg", "abcdef", "g")
42+
43+
"abcdefg".wholeMatch(of: /.*(?<=(.+)(.+)/))
44+
// Produces ("abcdefg", "a", "bcdefg")
45+
```
46+
47+
## Detailed design
48+
49+
50+
### Syntax
51+
52+
Lookbehind assertion syntax is already supported in the existing [Regex syntax](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0355-regex-syntax-run-time-construction.md#lookahead-and-lookbehind).
53+
54+
The engine is currently incapable of running them, so a compilation error is thrown:
55+
56+
```swift
57+
let regex = /(?<=a)b/
58+
// error: Cannot parse regular expression: lookbehind is not currently supported
59+
```
60+
61+
With this proposal, this restriction is lifted and the following syntactic forms will be accepted:
62+
63+
```swift
64+
// Positive lookbehind
65+
/a(?<=b)c/
66+
/a(*plb:b)c/
67+
/a(*positive_lookbehind:b)c/
68+
69+
// Negative lookbehind
70+
/a(?<!b)c/
71+
/a(*nlb:b)c/
72+
/a(*negative_lookbehind:b)c/
73+
```
74+
75+
### Regex builders
76+
This proposal adds support for both positive and negative lookbehind assertions when using the Regex builder, for example:
77+
78+
```swift
79+
// Positive Lookbehind
80+
Regex {
81+
"a"
82+
Lookbehind { "b" }
83+
"c"
84+
}
85+
86+
// Negative lookbehind
87+
Regex {
88+
"a"
89+
NegativeLookbehind { "b" }
90+
"c"
91+
}
92+
```
93+
94+
## Source compatibility
95+
96+
This proposal is additive and source-compatible with existing code.
97+
98+
## ABI compatibility
99+
100+
This proposal is additive and ABI-compatible with existing code.
101+
102+
## Implications on adoption
103+
104+
The additions described in this proposal require a new version of the standard library and runtime.
105+
106+
## Future directions
107+
108+
### Support PCRE's `\K`
109+
110+
Future work includes supporting PCRE's `\K`, which resets the current produced match.
111+
112+
### Reverse matching API
113+
114+
Earlier versions of this pitch added API to run regex in reverse from the end of the string. However, we faced difficulties communicating the nuance of reverse matching in API and this is an obscure feature that isn't supported by mainstream languages.
115+
116+
## Alternatives considered
117+
118+
### Fixed length lookbehind assertions only
119+
120+
Fixed-length lookbehind assertions are easier to implement and retrofit onto existing engines. Python only supports a single fixed-width concatenation sequence, PCRE2 additionally supports alternations of fixed-width concatenations, and Java additionally supports bounded quantifications within.
121+
122+
However, this would limit Swift's expressivity compared to Javascript and .NET, as well as be insufficient for reverse matching API.
123+
124+
125+
## Acknowledgments
126+
127+
cherrycoke, bjhomer, Simulacroton, and rnantes provided use cases and rationale for lookbehind assertions. xwu provided feedback on the difficulties of communicating reverse matching in API. ksluder, nikolai.ruhe, and pyrtsa surfaced interesting examples and documentation needs.
128+
129+
130+
131+
132+

0 commit comments

Comments
 (0)