Skip to content

Commit 1a1a9d5

Browse files
committed
Add raw string literal ambiguity document
1 parent 19e1f5c commit 1a1a9d5

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Rust's lexical grammar is not context-free. Raw string literals are the source
2+
of the problem. Informally, a raw string literal is an `r`, followed by `N`
3+
hashes (where N can be zero), a quote, any characters, then a quote followed
4+
by `N` hashes. This grammar describes this as best possible:
5+
6+
R -> 'r' S
7+
S -> '"' B '"'
8+
S -> '#' S '#'
9+
B -> . B
10+
B -> ε
11+
12+
Where `.` represents any character, and `ε` the empty string. Consider the
13+
string `r#""#"#`. This string is not a valid raw string literal, but can be
14+
accepted as one by the above grammar, using the derivation:
15+
16+
R : #""#"#
17+
S : ""#"
18+
S : "#
19+
B : #
20+
B : ε
21+
22+
(Where `T : U` means the rule `T` is applied, and `U` is the remainder of the
23+
string.) The difficulty arises from the fact that it is fundamentally
24+
context-sensitive. In particular, the context needed is the number of hashes.
25+
I know of no way to resolve this, but also have not come up with a proof that
26+
it is not context sensitive. Such a proof would probably use the pumping lemma
27+
for context-free languages, but I (cmr) could not come up with a proof after
28+
spending a few hours on it, and decided my time best spent elsewhere. Pull
29+
request welcome!

0 commit comments

Comments
 (0)