-
Notifications
You must be signed in to change notification settings - Fork 931
Format loop statements #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I struggled a bit with this split between returning a That's what I think anyway. @nrc probably has much better insight. |
Thanks for the feedback @marcusklaas ! Yep, pushing to the What is the reason all these methods are |
I agree with @marcusklaas, the rewrite_* methods should not push to the ChangeSet. @chellmuth the reason for the visti_* methods pushing to the changeset is that it was simply easier to make it work that way, especially with the interaction with comments and other 'missing' code. It seemed when I wrote this that there was a nice split between methods which worked inline, and those that worked in blocks, but that does not seem to be the case. This is, I think, a hard question and needs some thought and experimentation. I'll describe my initial thoughts, but they might be totally impractical, so take this with a grain of salt: We continue to work in these two modes - which I call inline and block, but we no longer require that block items are 'outside' inline items. We add a data structure which is something like a ChangeSet, but a bit more lightweight - it should abstract out the stuff that deals with maintaining a buffer and the associated code (currently scattered all over the place) for dealing with comments, etc. At the top level we have one of these buffers backed by the ChangeSet (or the bits which are left over, which should be just the connection to actual files, iirc). When we hit inline mode we move to the immutable, returning a string approach. When we hit an expression which is a block kind, then we create a new buffer thing and start working in block mode, appending to the buffer (this could include moving back to inline mode and then more nested buffers). At the end of all this we return the buffer to the method which is operating in inline mode and append that to the current String it is operating on. Now, in order to avoid massive copies, we probably want the inline methods to start using a custom buffer too which can accept the block buffers as well as Strings (this will hopefully give us better perf as well because we can control allocation and in the future be easier to combine with some kind of specialised arena allocation). So in both cases we actually end up writing to buffers and returning them. However, the mode in which we operate is quite different between the two kinds of methods. @chellmuth, @marcusklaas does this architecture change make sense? Do you think it will work? Does the work with the work either of you have already done? Or does it break any of it? Do either of you want to implement something like this? |
@nrc Would this mean that all the functions that currently write to And what kind of data structure do you have in mind for the block mode? I suppose something that is allows for efficient appends and decent (possibly destructive) start-to-end reads. Is this something ropes are good at? Edit: I've only briefly tried my hand at this problem and gave up when I encountered this problem. Currently working on some other things, so if you want to give this a try, @chellmuth, feel free! |
I think the broad strokes of the architecture make sense, and it would definitely fix the problem with my loop implementation. I think the nestable side-buffer is the key piece. I still don't understand long-term why there is a need for two modes. I probably need to build more intuition with the codebase, as you say it was just easier to build most of the formatters that way but I can't see why yet. It feels to me that once we got rid of the Here's my understanding of what you're describing, using loop for an example,
If my understanding is right, I wouldn't mind taking a crack at a proof-of-concept/prototype. |
Long term, I hope we don't need to use the libsyntax visitor at all, and then perhaps we don't need two modes. That is a long way off though. We might also get better performance by amending a buffer rather than returning strings where possible. I'm not sure how much difference makes. For a data structure, there is a StringBuffer that we already use in ChangeSet (I think), part of the strings crate, which should have the right performance characteristics. Ropes are better for when you need to replace or insert substrings. The example looks almost right to me, I think. I think I would start writing to the side buffer once I enter the block, rather than in visit_stmt. I.e., if there are 10 statements, I make a single side buffer, not 10 of them. It's not clear if that is what you mean since there is only one stmt. |
@chellmuth ping - did you get anywhere with an implementation of the refactoring? Are there commits in this PR that we can land in the meantime? Or should I close it for now? |
Closing in favour of #128 |
Implements #102
I thought loop would be a simple control statement to start with. I ended up hitting a few issues that I am not sure the best way to handle:
String
that is pushed onto theChangeSet
after the entire expression is formatted. Everything else seems to push onto theChangeSet
as it goes.loop causes the two techniques to mix. I match
ast::Expr_::ExprLoop
inrewrite_expr
, which wants to create aString
, butExprLoop
needs to format its block, which is done via directChangeSet
manipulation (visit::walk_block(self, block)
->FmtVisitor::visit_stmt
).I started with the simplest thing I could think of, which was writing directly to the
ChangeSet
insiderewrite_loop
. Maybe I should instead be handling theExprLoop
insidevisit_stmt
(matching inside aStmtExpr
) so I don't have to pretend to build aString
?format_missing*
methods confuse me, and so does what the formatting rules should be when there are comments in awkward places. I wouldn't be surprised if I made some mistakes here.I put a bunch of ugly comments in
tests/source/loops.rs
. The only one this implementation misses iscomment1
. What's the best way to keep track that we're currently dropping a comment here? Should I remove it from the test?