---

paulstansifer · paulstansifer · commit 4d0d7a817d2d · 2012-08-24T18:20:17.000-07:00
yaml --- r: 23473 b: refs/heads/master c: e54acbf h: refs/heads/master i: 23471: ecf354f v: v3
diff --git a/[refs] b/[refs]
@@ -1,5 +1,5 @@
 ---
-refs/heads/master: 9297d1f00a27ac6bb272d5b2b75535697f1e2e4b
+refs/heads/master: e54acbf8488c877eeca264948e7e94f3c3434d41
 refs/heads/snap-stage1: e33de59e47c5076a89eadeb38f4934f58a3618a6
 refs/heads/snap-stage3: cd6f24f9d14ac90d167386a56e7a6ac1f0318195
 refs/heads/try: ffbe0e0e00374358b789b0037bcb3a577cd218be
diff --git a/trunk/src/libsyntax/ext/tt/earley_parser.rs b/trunk/src/libsyntax/ext/tt/earley_parser.rs
@@ -13,14 +13,72 @@ import ast_util::mk_sp;
 import std::map::{hashmap, uint_hash};
 
 /* This is an Earley-like parser, without support for in-grammar nonterminals,
-onlyl calling out to the main rust parser for named nonterminals (which it
+only by calling out to the main rust parser for named nonterminals (which it
 commits to fully when it hits one in a grammar). This means that there are no
 completer or predictor rules, and therefore no need to store one column per
 token: instead, there's a set of current Earley items and a set of next
 ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
 pathological cases, is worse than traditional Earley parsing, but it's an
 easier fit for Macro-by-Example-style rules, and I think the overhead is
-lower. */
+lower. (In order to prevent the pathological case, we'd need to lazily
+construct the resulting `named_match`es at the very end. It'd be a pain,
+and require more memory to keep around old items, but it would also save
+overhead)*/
+
+/* Quick intro to how the parser works:
+
+A 'position' is a dot in the middle of a matcher, usually represented as a
+dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
+
+The parser walks through the input a character at a time, maintaining a list
+of items consistent with the current position in the input string: `cur_eis`.
+
+As it processes them, it fills up `eof_eis` with items that would be valid if
+the macro invocation is now over, `bb_eis` with items that are waiting on
+a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
+on the a particular token. Most of the logic concerns moving the · through the
+repetitions indicated by Kleene stars. It only advances or calls out to the
+real Rust parser when no `cur_eis` items remain
+
+Example: Start parsing `a a a a b` against [· a $( a )* a b].
+
+Remaining input: `a a a a b`
+next_eis: [· a $( a )* a b]
+
+- - - Advance over an `a`. - - -
+
+Remaining input: `a a a b`
+cur: [a · $( a )* a b]
+Descend/Skip (first item).
+next: [a $( · a )* a b]  [a $( a )* · a b].
+
+- - - Advance over an `a`. - - -
+
+Remaining input: `a a b`
+cur: [a $( a · )* a b]  next: [a $( a )* a · b]
+Finish/Repeat (first item)
+next: [a $( a )* · a b]  [a $( · a )* a b]  [a $( a )* a · b]
+
+- - - Advance over an `a`. - - - (this looks exactly like the last step)
+
+Remaining input: `a b`
+cur: [a $( a · )* a b]  next: [a $( a )* a · b]
+Finish/Repeat (first item)
+next: [a $( a )* · a b]  [a $( · a )* a b]  [a $( a )* a · b]
+
+- - - Advance over an `a`. - - - (this looks exactly like the last step)
+
+Remaining input: `b`
+cur: [a $( a · )* a b]  next: [a $( a )* a · b]
+Finish/Repeat (first item)
+next: [a $( a )* · a b]  [a $( · a )* a b]
+
+- - - Advance over a `b`. - - -
+
+Remaining input: ``
+eof: [a $( a )* a b ·]
+
+ */
 
 
 /* to avoid costly uniqueness checks, we require that `match_seq` always has a