Skip to content

allow whitespace to be a _terminator where possible? #24

Open
@the-mikedavis

Description

@the-mikedavis

I've been fiddling with the (H)EEx queries lately trying to find something that works really well. There are some tough cases that I think might be solveable with changes to this grammar.

The main issue is combined injections. Consider some EEx like so:

<%= Enum.map(@charges, fn charge -> %>
  <%= humanize_money(charge) %>
<% end) %>

Here lines 1 and 3 need injection.combined (like with tree-sitter-iex) so that the end is parsed as part of the same document as the fn .. ->. Line 2 can be parsed regularly (no combined injections). For the combined parts, this looks like so to tree-sitter-elixir:

Enum.map(@charges, fn charge ->  end) 

That case works pretty well with the current grammar and queries in tree-sitter-eex, but injection.combined introduces some problems. Consider this EEx:

<%= Enum.map(@charges, fn charge -> %><% end) %>
<%= Enum.map(@charges, fn charge -> %><% end) %>

With the current grammar and queries, this produces an (ERROR) on the second Enum. This happens because the EEx grammar is consuming the newline. To tree-sitter-elixir, this example looks like:

Enum.map(@charges, fn charge -> end)  Enum.map(@charges, fn charge -> end) 

How does tree-sitter-embedded-template fix this?

In this example of an ejs template from the expressjs repo,

  <% posts.forEach(function(post) { %>
    <dt><%= post.title %></dt>
    <dd><%= post.body %></dd>
  <% }) %>

All of the inner JavaScript contents are injection.combined, so this comes out like so for tree-sitter-javascript

posts.forEach(function(post) { post.title post.body })

Which tree-sitter-javascript parses as if it were valid JavaScript (which, to my knowledge, it is not). This happens in tree-sitter-ruby as well with erb templates: the grammars are intentionally more permissive around newlines and whitespace than the language actually allows.

Possible solutions

I see a few ways around this to make the EEx injections work as well as ejs/erb.

The first is to try to allow spaces to be $._terminators whenever it doesn't introduce abiguity. I tried some minor edits to the grammar and it seems like this approach is easier said than done because of things like function calls without parens. This would also sacrifice the current parity between this parser and the parser in the Elixir compiler, which seems a bit distasteful to me.

The second would be to propose a change to injections upstream in tree-sitter to try to introduce a way to limit the scope of a combined injection. Currently all injections which are marked as combined globally in a document are parsed together in one "combined" sub-document. If the EEx grammar were rewritten as in this branch to group EEx tags based on start/middle-expressions and end-expressions, and then if we could combine only the tags within each group, then we wouldn't need to care about $._terminators at all. In the example above, tags one and two would be grouped and three and four grouped, but only tags one and two woud be combined with one another, and tags three and four with one another. This is nice because it mimics the EEx.Tokenizer, but it's unclear if a change like this would be accepted upstream in tree-sitter since no other language has needed anything like it.

Thirdly there's a way to fix this particular case by parsing newlines in the eex grammar and handing them over to the elixir grammar when injecting (see here) but it's not general enough to handle some odd cases, like if the example above had all four tags on the same line.

The last resort as I see it would be to rewrite the EEx grammar to depend on the Elixir grammar, so it can have fine-grained control over how expressions within EEx tags are terminated. This seems potentially brittle and certainly more of a maintenance burden, though, so I think this should be avoided if possible.

What do you think @jonatanklosko?

see also connorlay/tree-sitter-eex#1
connects #2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions