Skip to content

Format loop statements #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Format loop statements #103

wants to merge 4 commits into from

Conversation

chellmuth
Copy link
Contributor

Implements #102

I thought loop would be a simple control statement to start with. I ended up hitting a few issues that I am not sure the best way to handle:

  1. It seems expressions rewrite themselves by returning a String that is pushed onto the ChangeSet after the entire expression is formatted. Everything else seems to push onto the ChangeSet as it goes.

loop causes the two techniques to mix. I match ast::Expr_::ExprLoop in rewrite_expr, which wants to create a String, but ExprLoop needs to format its block, which is done via direct ChangeSet manipulation (visit::walk_block(self, block) -> FmtVisitor::visit_stmt).

I started with the simplest thing I could think of, which was writing directly to the ChangeSet inside rewrite_loop. Maybe I should instead be handling the ExprLoop inside visit_stmt (matching inside a StmtExpr) so I don't have to pretend to build a String?

  1. The format_missing* methods confuse me, and so does what the formatting rules should be when there are comments in awkward places. I wouldn't be surprised if I made some mistakes here.

I put a bunch of ugly comments in tests/source/loops.rs. The only one this implementation misses is comment1. What's the best way to keep track that we're currently dropping a comment here? Should I remove it from the test?

  1. I had to manually add the semi-colon to the last statement in the loop's block. I'm not sure if there is a better way to handle that.

@marcusklaas
Copy link
Contributor

I struggled a bit with this split between returning a String and pushing to ChangeSet directly myself when I tried formatting if/else statements. I personally don't think pushing to ChangeSet here is a good idea, because it's not what rewrite_* promises to do: build a String without side effects. Ideally, we'd move the code from walk_block to rewrite_block, which would return a String. walk_block could then just call rewrite_block and push that string onto the change set. It's a bit more work, but it would keep things tidy.

That's what I think anyway. @nrc probably has much better insight.

@chellmuth
Copy link
Contributor Author

Thanks for the feedback @marcusklaas !

Yep, pushing to the ChangeSet from a rewrite_* method is dumb. I agree, changing walk_block() to rewrite_block() seems like the right move, but then rewrite_block needs to return a String, which means everything touchable by rewrite_block needs to be changed from visit_* to rewrite_* as well.

What is the reason all these methods are visit_* to begin with, instead of rewrite_*? My guess is that it's because anything called as a visit::Visitor callback has no caller to return a String to and is forced to push onto the ChangeSet. You said you hit this issue with if/else, if I started on this would I be duplicating your efforts?

@nrc
Copy link
Member

nrc commented Jun 11, 2015

I agree with @marcusklaas, the rewrite_* methods should not push to the ChangeSet.

@chellmuth the reason for the visti_* methods pushing to the changeset is that it was simply easier to make it work that way, especially with the interaction with comments and other 'missing' code. It seemed when I wrote this that there was a nice split between methods which worked inline, and those that worked in blocks, but that does not seem to be the case.

This is, I think, a hard question and needs some thought and experimentation. I'll describe my initial thoughts, but they might be totally impractical, so take this with a grain of salt:

We continue to work in these two modes - which I call inline and block, but we no longer require that block items are 'outside' inline items. We add a data structure which is something like a ChangeSet, but a bit more lightweight - it should abstract out the stuff that deals with maintaining a buffer and the associated code (currently scattered all over the place) for dealing with comments, etc. At the top level we have one of these buffers backed by the ChangeSet (or the bits which are left over, which should be just the connection to actual files, iirc). When we hit inline mode we move to the immutable, returning a string approach. When we hit an expression which is a block kind, then we create a new buffer thing and start working in block mode, appending to the buffer (this could include moving back to inline mode and then more nested buffers). At the end of all this we return the buffer to the method which is operating in inline mode and append that to the current String it is operating on.

Now, in order to avoid massive copies, we probably want the inline methods to start using a custom buffer too which can accept the block buffers as well as Strings (this will hopefully give us better perf as well because we can control allocation and in the future be easier to combine with some kind of specialised arena allocation). So in both cases we actually end up writing to buffers and returning them. However, the mode in which we operate is quite different between the two kinds of methods.

@chellmuth, @marcusklaas does this architecture change make sense? Do you think it will work? Does the work with the work either of you have already done? Or does it break any of it? Do either of you want to implement something like this?

@marcusklaas
Copy link
Contributor

@nrc Would this mean that all the functions that currently write to ChangeSet would write to the current light weight block buffer (which we would track in the Visitor)?

And what kind of data structure do you have in mind for the block mode? I suppose something that is allows for efficient appends and decent (possibly destructive) start-to-end reads. Is this something ropes are good at? I think a String with a decent initial capacity (say 1000 bytes) could be good enough as a start.

Edit: I've only briefly tried my hand at this problem and gave up when I encountered this problem. Currently working on some other things, so if you want to give this a try, @chellmuth, feel free!

@chellmuth
Copy link
Contributor Author

I think the broad strokes of the architecture make sense, and it would definitely fix the problem with my loop implementation. I think the nestable side-buffer is the key piece.

I still don't understand long-term why there is a need for two modes. I probably need to build more intuition with the codebase, as you say it was just easier to build most of the formatters that way but I can't see why yet. It feels to me that once we got rid of the visit:: calls and walk the entire AST ourselves, everything could be done in inline-mode.

Here's my understanding of what you're describing, using loop for an example,

loop { // inline mode, written to inline buffer

    // rewrite_loop calls visit::block()
    // enter visit_stmt callback, switch to block mode, write to side-buffer
    let x = 1; // written to side-buffer
    // return to rewrite_loop from visit::block() call, 
    //   append side-buffer content to inline buffer

} // append to inline buffer
// We're done, return the inline buffer (String) to caller

If my understanding is right, I wouldn't mind taking a crack at a proof-of-concept/prototype.

@nrc
Copy link
Member

nrc commented Jun 12, 2015

Long term, I hope we don't need to use the libsyntax visitor at all, and then perhaps we don't need two modes. That is a long way off though. We might also get better performance by amending a buffer rather than returning strings where possible. I'm not sure how much difference makes.

For a data structure, there is a StringBuffer that we already use in ChangeSet (I think), part of the strings crate, which should have the right performance characteristics. Ropes are better for when you need to replace or insert substrings.

The example looks almost right to me, I think. I think I would start writing to the side buffer once I enter the block, rather than in visit_stmt. I.e., if there are 10 statements, I make a single side buffer, not 10 of them. It's not clear if that is what you mean since there is only one stmt.

@nrc nrc mentioned this pull request Jun 15, 2015
@nrc
Copy link
Member

nrc commented Jun 23, 2015

@chellmuth ping - did you get anywhere with an implementation of the refactoring? Are there commits in this PR that we can land in the meantime? Or should I close it for now?

@marcusklaas marcusklaas mentioned this pull request Jul 13, 2015
@nrc
Copy link
Member

nrc commented Jul 15, 2015

Closing in favour of #128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants