Skip to content

Commit e54acbf

Browse files
committed
Document the macro parser a little more.
1 parent 9297d1f commit e54acbf

File tree

1 file changed

+60
-2
lines changed

1 file changed

+60
-2
lines changed

src/libsyntax/ext/tt/earley_parser.rs

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,72 @@ import ast_util::mk_sp;
1313
import std::map::{hashmap, uint_hash};
1414

1515
/* This is an Earley-like parser, without support for in-grammar nonterminals,
16-
onlyl calling out to the main rust parser for named nonterminals (which it
16+
only by calling out to the main rust parser for named nonterminals (which it
1717
commits to fully when it hits one in a grammar). This means that there are no
1818
completer or predictor rules, and therefore no need to store one column per
1919
token: instead, there's a set of current Earley items and a set of next
2020
ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
2121
pathological cases, is worse than traditional Earley parsing, but it's an
2222
easier fit for Macro-by-Example-style rules, and I think the overhead is
23-
lower. */
23+
lower. (In order to prevent the pathological case, we'd need to lazily
24+
construct the resulting `named_match`es at the very end. It'd be a pain,
25+
and require more memory to keep around old items, but it would also save
26+
overhead)*/
27+
28+
/* Quick intro to how the parser works:
29+
30+
A 'position' is a dot in the middle of a matcher, usually represented as a
31+
dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
32+
33+
The parser walks through the input a character at a time, maintaining a list
34+
of items consistent with the current position in the input string: `cur_eis`.
35+
36+
As it processes them, it fills up `eof_eis` with items that would be valid if
37+
the macro invocation is now over, `bb_eis` with items that are waiting on
38+
a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
39+
on the a particular token. Most of the logic concerns moving the · through the
40+
repetitions indicated by Kleene stars. It only advances or calls out to the
41+
real Rust parser when no `cur_eis` items remain
42+
43+
Example: Start parsing `a a a a b` against [· a $( a )* a b].
44+
45+
Remaining input: `a a a a b`
46+
next_eis: [· a $( a )* a b]
47+
48+
- - - Advance over an `a`. - - -
49+
50+
Remaining input: `a a a b`
51+
cur: [a · $( a )* a b]
52+
Descend/Skip (first item).
53+
next: [a $( · a )* a b] [a $( a )* · a b].
54+
55+
- - - Advance over an `a`. - - -
56+
57+
Remaining input: `a a b`
58+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
59+
Finish/Repeat (first item)
60+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
61+
62+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
63+
64+
Remaining input: `a b`
65+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
66+
Finish/Repeat (first item)
67+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
68+
69+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
70+
71+
Remaining input: `b`
72+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
73+
Finish/Repeat (first item)
74+
next: [a $( a )* · a b] [a $( · a )* a b]
75+
76+
- - - Advance over a `b`. - - -
77+
78+
Remaining input: ``
79+
eof: [a $( a )* a b ·]
80+
81+
*/
2482

2583

2684
/* to avoid costly uniqueness checks, we require that `match_seq` always has a

0 commit comments

Comments
 (0)