Skip to content

Commit 3e655ce

Browse files
committed
---
yaml --- r: 32135 b: refs/heads/dist-snap c: e54acbf h: refs/heads/master i: 32133: 6f8513a 32131: 0760691 32127: 9020d16 v: v3
1 parent aeda1af commit 3e655ce

File tree

2 files changed

+61
-3
lines changed

2 files changed

+61
-3
lines changed

[refs]

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ refs/tags/release-0.1: 1f5c5126e96c79d22cb7862f75304136e204f105
77
refs/heads/ndm: f3868061cd7988080c30d6d5bf352a5a5fe2460b
88
refs/heads/try2: d0c6ce338884ee21843f4b40bf6bf18d222ce5df
99
refs/heads/incoming: d9317a174e434d4c99fc1a37fd7dc0d2f5328d37
10-
refs/heads/dist-snap: 9297d1f00a27ac6bb272d5b2b75535697f1e2e4b
10+
refs/heads/dist-snap: e54acbf8488c877eeca264948e7e94f3c3434d41
1111
refs/tags/release-0.2: c870d2dffb391e14efb05aa27898f1f6333a9596
1212
refs/tags/release-0.3: b5f0d0f648d9a6153664837026ba1be43d3e2503

branches/dist-snap/src/libsyntax/ext/tt/earley_parser.rs

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,72 @@ import ast_util::mk_sp;
1313
import std::map::{hashmap, uint_hash};
1414

1515
/* This is an Earley-like parser, without support for in-grammar nonterminals,
16-
onlyl calling out to the main rust parser for named nonterminals (which it
16+
only by calling out to the main rust parser for named nonterminals (which it
1717
commits to fully when it hits one in a grammar). This means that there are no
1818
completer or predictor rules, and therefore no need to store one column per
1919
token: instead, there's a set of current Earley items and a set of next
2020
ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
2121
pathological cases, is worse than traditional Earley parsing, but it's an
2222
easier fit for Macro-by-Example-style rules, and I think the overhead is
23-
lower. */
23+
lower. (In order to prevent the pathological case, we'd need to lazily
24+
construct the resulting `named_match`es at the very end. It'd be a pain,
25+
and require more memory to keep around old items, but it would also save
26+
overhead)*/
27+
28+
/* Quick intro to how the parser works:
29+
30+
A 'position' is a dot in the middle of a matcher, usually represented as a
31+
dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
32+
33+
The parser walks through the input a character at a time, maintaining a list
34+
of items consistent with the current position in the input string: `cur_eis`.
35+
36+
As it processes them, it fills up `eof_eis` with items that would be valid if
37+
the macro invocation is now over, `bb_eis` with items that are waiting on
38+
a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
39+
on the a particular token. Most of the logic concerns moving the · through the
40+
repetitions indicated by Kleene stars. It only advances or calls out to the
41+
real Rust parser when no `cur_eis` items remain
42+
43+
Example: Start parsing `a a a a b` against [· a $( a )* a b].
44+
45+
Remaining input: `a a a a b`
46+
next_eis: [· a $( a )* a b]
47+
48+
- - - Advance over an `a`. - - -
49+
50+
Remaining input: `a a a b`
51+
cur: [a · $( a )* a b]
52+
Descend/Skip (first item).
53+
next: [a $( · a )* a b] [a $( a )* · a b].
54+
55+
- - - Advance over an `a`. - - -
56+
57+
Remaining input: `a a b`
58+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
59+
Finish/Repeat (first item)
60+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
61+
62+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
63+
64+
Remaining input: `a b`
65+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
66+
Finish/Repeat (first item)
67+
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
68+
69+
- - - Advance over an `a`. - - - (this looks exactly like the last step)
70+
71+
Remaining input: `b`
72+
cur: [a $( a · )* a b] next: [a $( a )* a · b]
73+
Finish/Repeat (first item)
74+
next: [a $( a )* · a b] [a $( · a )* a b]
75+
76+
- - - Advance over a `b`. - - -
77+
78+
Remaining input: ``
79+
eof: [a $( a )* a b ·]
80+
81+
*/
2482

2583

2684
/* to avoid costly uniqueness checks, we require that `match_seq` always has a

0 commit comments

Comments
 (0)