Skip to content
This repository was archived by the owner on Feb 19, 2018. It is now read-only.

Parser: jashkenas Coffeescript

Andre Lewis edited this page Sep 10, 2016 · 10 revisions

jashkenas/coffeescript

=== A breakdown of the original Coffeescript parser, to help understand and eventually provide a guide to adding to. Based on the thread here:

The original Coffeescript Parser

Pros

  • It gets the job done.
  • There’s coffeelint.
  • The "standard" to ensure compatibility with.

Cons

  • Codebase is crufty and convoluted
  • Many pull requests are mired and may never be integrated
  • Adding new features is complicated
  • It is too permissive: It accepts more input code as legal than was intended, while still outputting valid JavaScript. This has led to disagreements on what is legal CoffeeScript and what is not.

Parser Overview

Parser Details:

Excerpt from CS Pull Request by JimPanic

lexer.coffee

identifierToken basically takes one word or symbol (read: @chunk) at a time, assigns it a name or type and creates a token in the form of a token tuple [ tag, value, offsetInChunk, length, origin ]. This is what the functions token and subsequently makeToken create.

In identifierToken there are a few key variables and functions that are needed:

@chunk: the current string to handle, this is split up into [input, id, colon] with the IDENTIFIER regular expression at the bottom id: in case of import, this is literally 'import' @tag(): gets the tag (first value of the token tuple) of the last processed token. When processing foo (as in the second chunk of import 'foo'), @tag() will return 'IMPORT'. @value(): gets the value (second value of the token tuple) of the last processed token. When processing foo (as in the second chunk of import 'foo'), @value() will return import, the very string that was held in id in the last chunk's handling. So basically what I added to identifierToken was the tags IMPORT, IMPORT_AS, IMPORT_FROM as well as the variable @seenImport to know that when I encounter an as or a from, this will be from an import and not a yield or similar. This also means in theory that from can still be used as an identifier as well. We have to test that though. :)

These three tags are then used in grammar.coffee.

There's also code the reset @seenImport when the statement is terminated (in lineToken iirc).

grammar.coffee

For this part I took a look at the spec for imports and basically copied the structure from there.

The DSL used here basically mixes and matches tags and named grammar forms. In this case the tags are 'IMPORT', 'IMPORT_AS', 'IMPORT_FROM' as replaced in lexer.coffee's identifierToken. The other parts of those strings are just other named grammar forms (ImportsList, OptComma, Identifier, etc.).

The structure builds up through references to other grammar forms and functions that create and return data structures, like -> new Import $2. $n variables are just references to the nth word in the string.

This process leads to an AST that is passed to the Import class defined in nodes.coffee.

Off the top of my head this should look as follows:

# import 'foo' will yield something like:

new Import(Value { value: 'foo' })

# import { foo } from 'foo' will yield something like:

new Import(Value { value: 'foo' }, ImportsList { .... })

You can look at this AST quite easily by just prepending a console.log before calling new Import:

Import: [
  o 'IMPORT String',                          -> console.log($2); new Import $2
  o 'IMPORT ImportClause IMPORT_FROM String', -> console.log($4, $2); new Import $4, $2
]

nodes.coffee

Taking the AST from grammar.coffee, the classes in nodes.coffee are supposed to create tupels of "code" through @makeCode and compileNode functions. I'm not entirely clear on this part yet, but each node is compiled to a string by calling compileNode or compileToFragments. What Import.compileNode basically does is just look at the AST and either return an array of strings passed through @makdeCode directly OR it calls the token's compileNode function.

This part is a bit of magic for me still, as there function names and processes don't line up with my way of thinking it seems.

ES2013 Compliance:

ES2015 compliance:

Clone this wiki locally