@@ -13,14 +13,72 @@ import ast_util::mk_sp;
13
13
import std:: map:: { hashmap, uint_hash} ;
14
14
15
15
/* This is an Earley-like parser, without support for in-grammar nonterminals,
16
- onlyl calling out to the main rust parser for named nonterminals (which it
16
+ only by calling out to the main rust parser for named nonterminals (which it
17
17
commits to fully when it hits one in a grammar). This means that there are no
18
18
completer or predictor rules, and therefore no need to store one column per
19
19
token: instead, there's a set of current Earley items and a set of next
20
20
ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
21
21
pathological cases, is worse than traditional Earley parsing, but it's an
22
22
easier fit for Macro-by-Example-style rules, and I think the overhead is
23
- lower. */
23
+ lower. (In order to prevent the pathological case, we'd need to lazily
24
+ construct the resulting `named_match`es at the very end. It'd be a pain,
25
+ and require more memory to keep around old items, but it would also save
26
+ overhead)*/
27
+
28
+ /* Quick intro to how the parser works:
29
+
30
+ A 'position' is a dot in the middle of a matcher, usually represented as a
31
+ dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
32
+
33
+ The parser walks through the input a character at a time, maintaining a list
34
+ of items consistent with the current position in the input string: `cur_eis`.
35
+
36
+ As it processes them, it fills up `eof_eis` with items that would be valid if
37
+ the macro invocation is now over, `bb_eis` with items that are waiting on
38
+ a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
39
+ on the a particular token. Most of the logic concerns moving the · through the
40
+ repetitions indicated by Kleene stars. It only advances or calls out to the
41
+ real Rust parser when no `cur_eis` items remain
42
+
43
+ Example: Start parsing `a a a a b` against [· a $( a )* a b].
44
+
45
+ Remaining input: `a a a a b`
46
+ next_eis: [· a $( a )* a b]
47
+
48
+ - - - Advance over an `a`. - - -
49
+
50
+ Remaining input: `a a a b`
51
+ cur: [a · $( a )* a b]
52
+ Descend/Skip (first item).
53
+ next: [a $( · a )* a b] [a $( a )* · a b].
54
+
55
+ - - - Advance over an `a`. - - -
56
+
57
+ Remaining input: `a a b`
58
+ cur: [a $( a · )* a b] next: [a $( a )* a · b]
59
+ Finish/Repeat (first item)
60
+ next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
61
+
62
+ - - - Advance over an `a`. - - - (this looks exactly like the last step)
63
+
64
+ Remaining input: `a b`
65
+ cur: [a $( a · )* a b] next: [a $( a )* a · b]
66
+ Finish/Repeat (first item)
67
+ next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
68
+
69
+ - - - Advance over an `a`. - - - (this looks exactly like the last step)
70
+
71
+ Remaining input: `b`
72
+ cur: [a $( a · )* a b] next: [a $( a )* a · b]
73
+ Finish/Repeat (first item)
74
+ next: [a $( a )* · a b] [a $( · a )* a b]
75
+
76
+ - - - Advance over a `b`. - - -
77
+
78
+ Remaining input: ``
79
+ eof: [a $( a )* a b ·]
80
+
81
+ */
24
82
25
83
26
84
/* to avoid costly uniqueness checks, we require that `match_seq` always has a
0 commit comments