Skip to content

Commit 4d0d7a8

Browse files
committed
---
yaml --- r: 23473 b: refs/heads/master c: e54acbf h: refs/heads/master i: 23471: ecf354f v: v3
1 parent a9d1757 commit 4d0d7a8

File tree

2 files changed

+61
-3
lines changed

2 files changed

+61
-3
lines changed

[refs]

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
refs/heads/master: 9297d1f00a27ac6bb272d5b2b75535697f1e2e4b
2+
refs/heads/master: e54acbf8488c877eeca264948e7e94f3c3434d41
33
refs/heads/snap-stage1: e33de59e47c5076a89eadeb38f4934f58a3618a6
44
refs/heads/snap-stage3: cd6f24f9d14ac90d167386a56e7a6ac1f0318195
55
refs/heads/try: ffbe0e0e00374358b789b0037bcb3a577cd218be

trunk/src/libsyntax/ext/tt/earley_parser.rs

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,72 @@ import ast_util::mk_sp;
1313
import std::map::{hashmap, uint_hash};
1414

1515
/* This is an Earley-like parser, without support for in-grammar nonterminals,
16-
onlyl calling out to the main rust parser for named nonterminals (which it
16+
only by calling out to the main rust parser for named nonterminals (which it
1717
commits to fully when it hits one in a grammar). This means that there are no
1818
completer or predictor rules, and therefore no need to store one column per
1919
token: instead, there's a set of current Earley items and a set of next
2020
ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
2121
pathological cases, is worse than traditional Earley parsing, but it's an
2222
easier fit for Macro-by-Example-style rules, and I think the overhead is
23-
lower. */
23+
lower. (In order to prevent the pathological case, we'd need to lazily
24+
construct the resulting `named_match`es at the very end. It'd be a pain,
25+
and require more memory to keep around old items, but it would also save
26+
overhead)*/
27+
28+
/* Quick intro to how the parser works:
29+
30+
A 'position' is a dot in the middle of a matcher, usually represented as a
31+
dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
32+
33+
The parser walks through the input a character at a time, maintaining a list
34+
of items consistent with the current position in the input string: `cur_eis`.
35+
36+
As it processes them, it fills up `eof_eis` with items that would be valid if
37+
the macro invocation is now over, `bb_eis` with items that are waiting on
38+
a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
39+
on the a particular token. Most of the logic concerns moving the · through the
40+
repetitions indicated by Kleene stars. It only advances or calls out to the
41+
real Rust parser when no `cur_eis` items remain
42+
43+
Example: Start parsing `a a a a b` against [· a $( a )* a b].
44+
45+
Remaining input: `a a a a b`
46+
next_eis: [· a $( a )* a b]
47+
48+
- - - Advance over an `a`. - - -
49+
50+
Remaining input: `a a a b`
51+
cur: [a · $( a )* a b]
52+
Descend/Skip (first item).
53+
next: [a $( · a )* a b] [a $( a )* · a b].
54+
55+
- - - Advance over an `a`. - - -
56+
57+
Remaining input: `a a b`
58+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
59+
Finish/Repeat (first item)
60+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
61+
62+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
63+
64+
Remaining input: `a b`
65+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
66+
Finish/Repeat (first item)
67+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
68+
69+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
70+
71+
Remaining input: `b`
72+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
73+
Finish/Repeat (first item)
74+
next: [a $( a )* · a b] [a $( · a )* a b]
75+
76+
- - - Advance over a `b`. - - -
77+
78+
Remaining input: ``
79+
eof: [a $( a )* a b ·]
80+
81+
*/
2482

2583

2684
/* to avoid costly uniqueness checks, we require that `match_seq` always has a

0 commit comments

Comments
 (0)