1
1
Rust's lexical grammar is not context-free. Raw string literals are the source
2
2
of the problem. Informally, a raw string literal is an ` r ` , followed by ` N `
3
3
hashes (where N can be zero), a quote, any characters, then a quote followed
4
- by ` N ` hashes. Critically, once inside the first pair of quotes,
5
- another quote cannot be followed by ` N ` consecutive hashes. e.g.
6
- ` r###""###"### ` is invalid.
7
-
8
- This grammar describes this as best possible:
4
+ by ` N ` hashes. This grammar describes this as best possible:
9
5
10
6
R -> 'r' S
11
7
S -> '"' B '"'
@@ -26,39 +22,8 @@ accepted as one by the above grammar, using the derivation:
26
22
(Where ` T : U ` means the rule ` T ` is applied, and ` U ` is the remainder of the
27
23
string.) The difficulty arises from the fact that it is fundamentally
28
24
context-sensitive. In particular, the context needed is the number of hashes.
29
-
30
- To prove that Rust's string literals are not context-free, we will use
31
- the fact that context-free languages are closed under intersection with
32
- regular languages, and the
33
- [ pumping lemma for context-free languages] ( https://en.wikipedia.org/wiki/Pumping_lemma_for_context-free_languages ) .
34
-
35
- Consider the regular language ` R = r#+""#*"#+ ` . If Rust's raw string literals are
36
- context-free, then their intersection with ` R ` , ` R' ` , should also be context-free.
37
- Therefore, to prove that raw string literals are not context-free,
38
- it is sufficient to prove that ` R' ` is not context-free.
39
-
40
- The language ` R' ` is ` {r#^n""#^m"#^n | m < n} ` .
41
-
42
- Assume ` R' ` * is* context-free. Then ` R' ` has some pumping length ` p > 0 ` for which
43
- the pumping lemma applies. Consider the following string ` s ` in ` R' ` :
44
-
45
- ` r#^p""#^{p-1}"#^p `
46
-
47
- e.g. for ` p = 2 ` : ` s = r##""#"## `
48
-
49
- Then ` s = uvwxy ` for some choice of ` uvwxy ` such that ` vx ` is non-empty,
50
- ` |vwx| < p+1 ` , and ` uv^iwx^iy ` is in ` R' ` for all ` i >= 0 ` .
51
-
52
- Neither ` v ` nor ` x ` can contain a ` " ` or ` r ` , as the number of these characters
53
- in any string in ` R' ` is fixed. So ` v ` and ` x ` contain only hashes.
54
- Consequently, of the three sequences of hashes, ` v ` and ` x ` combined
55
- can only pump two of them.
56
- If we ever choose the central sequence of hashes, then one of the outer sequences
57
- will not grow when we pump, leading to an imbalance between the outer sequences.
58
- Therefore, we must pump both outer sequences of hashes. However,
59
- there are ` p+2 ` characters between these two sequences of hashes, and ` |vwx| ` must
60
- be less than ` p+1 ` . Therefore we have a contradiction, and ` R' ` must not be
61
- context-free.
62
-
63
- Since ` R' ` is not context-free, it follows that the Rust's raw string literals
64
- must not be context-free.
25
+ I know of no way to resolve this, but also have not come up with a proof that
26
+ it is not context sensitive. Such a proof would probably use the pumping lemma
27
+ for context-free languages, but I (cmr) could not come up with a proof after
28
+ spending a few hours on it, and decided my time best spent elsewhere. Pull
29
+ request welcome!
0 commit comments