-
Notifications
You must be signed in to change notification settings - Fork 931
Add a Rewrite trait #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Rewrite trait #107
Conversation
I have been thinking that the current architecture - a billion methods on one big context object is very non-optimal and I would love to improve it. There might well be a place for a Some details on this proposal:
|
|
||
pub enum Comment { | ||
Simple(String), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you envisage this enum growing in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least adding a DocComment
and InnerDocComment
. We need to know it if we handle any reformating of the comments.
@marcusklaas thoughts? |
|
btw, cc #103 where there was also some discussion of re-architecting. |
CommentsThere is no formal statement on how comments work with the AST. Currently doc comments are in the AST and regular comments are not. But this is an implementation detail and subject to change (long term, I hope all comments will be in the AST, but who knows(it's not entirely clear how that would work)). So, currently comments are 'between' AST nodes. That makes it kind of hard to decide where they should be semantically attached. E.g.:
At the moment there are some slightly odd heuristics about working this out in lists and we guess or fail else where. I think that at the moment we have to be pretty crude with comments. We can probably have better heuristics, but we're not going to be perfect. I think that is OK - comments are kind of an edge case and people won't expect us to get them perfect (as long as we don't drop any, which we do in a few places atm). |
re the Codemap, we should probably keep some kind of context struct which includes a reference to the Codemap. I expect we will want other info in there in the future. That should be one of the arguments to rewrite, I think. |
I think the discussion in #103 is somewhat orhogonal to the discussion here. We're only speaking of inline mode here. Still two points:
|
CommentsWhile the parsing is based on add heurisics, maybe we should define clearly what's the goal of these heuristics, ie what meaning and relation to the AST we will have after the comments have been parsed. But a precise definition of this is though not needed to implement struct AAAAAAAAAAAAAAA; // a long comment, the line goes over 100 chars
fn a(x: X /* comment */, y: Y /* comment */, z: Z) to // a long comment, the line goes over 100 chars
struct AAAAAAAAAAAAAAA;
fn a(x: X, // comment
y: Y, // comment
z: Z) with some not so complicated heuristics. I think it's worth the cost, while it will not be perfect. Another fact the pushes me in this direction is that we already use this kind of pattern in The exact form of the comment information can be anything, from a simple string, to a big struct, |
Interesting PR! I definitely believe that pulling logic from the Visitor is the way to go. The idea of returning comments separately to the caller also appeals to me. Doing transformations as in your last example would be very nice! I do have some concerns regarding the implementation, however. Currently, most comments are pulled from the 'space' between the current span and the last span. I don't see how we could do this without an context. Also, what do with items with multiple comments? For example: // Comment 1
fn some_func() /* Comment 2 */ {
...
} The |
I think the context strcture suggested by @nrc is a good solution for that. It could include a Span going from the end of the preceding AST element to the following one. EDIT: In fact, not a good idea, see my next comment. The handling of multiple comments is an interesting question. They could be represented as a variant for Do you have a reason for which you want to avoid implementing I will update this PR to implement things we discussed, but it may take a few days. |
I changed the content of the PR, removing the impl for list (which was not very representative and a bit messy) in favor of a impl for The diff, ignoring whitespaces (there is a lot of reindentation) is essentially made of trivial fixes. The biggest difference being that all I think this shows that the In this commit, I completely avoided the tricky part of the question: comments. |
CommentsUpon reflection, I think I've been wrong with that. It's probably better to leave comments outside of the result of
So, I suggest to keeping the current approach of comments, taking convention for
We could though to have a type representing a comment, which could implement Rewrite. This would allow to keep a uniform formatting of comments while leaving room for ad-hoc heuristics in the parsing and placement. |
I'm a bit irresolute how to make this PR move forward: should I make a big move to the use of |
@cassiersg sorry I haven't had a chance to look at the revised PR. I'll try and do so over the weekend. I would advise against a big all over change. Is it possible to do it incrementally - just try it on one kind of item and see how it feels? I would hold off on doing anything major before I get a chance to read over the current stuff. |
@nrc All proposed changes may be made incrementally, and they are very simple for a basic "lifting only" implementation, which doesn't change anything in the logic of the code. However, in many cases, this will lead to a code that doesn't take advantage of the possibility to return None, and some functions will not respect the width constraint, because they didn't had such thing implemented. So, I would prefer an approach where we implement I implemented Rewrite for the Take you time to review this PR and if you feel you need other implementations to evaluate the benefits and drawbacks, don't hesitate to tell it to me. |
I have thought a bit about this. I think returning an (optional) I now quite like the solution that was proposed by @nrc here. If I understand correctly, we would give expression rewriters a generic buffer which it can append. This could be either a A rewrite function's signature could look something like this: fn rewrite_expr<B: Buf>(buf: &mut B, ex: ast::Expr, codemap: &Codemap, width: usize, offset: usize) -> Option<()> where What do you think? |
I totally agree with you that the current proposal is very bad in terms of performance. My idea was first to make something simple with I think @nrc has also thought to use better datastructures to represent strings, as his strings.rs crate is a dependency of rustfmt. The option you suggest is interesting, and is probably not far of the optimum performance while we don't make too much try/discard. I'm not clear how we do such discards: does it need to allocate a new I think there will be lots of places in the code where we might want to discard some formatting already done: when we need to rewrite some sub-items when we rewrite an item, if the formatting of the second sub-item fails, we must cancel the first. Another point is out-of-order reformatting: how would fit the current rewriting of lists (major place where we don't rewrite things in a "streaming" way) ? Would that also need to allocate new buffers ? If all these things need new buffers, we won't probably gain a lot of performance over returning String, since these are omnipresent. All these concerns are valid mainly in deeply nested contexts. In more linear context such as blocks, where we append in-order, without the need of "discard point" between each instruction, a buffer is probably well suited. If there isn't major perfomance problems, I think it is simpler to work with code that returns something than with code that modifies a shared buffer, especially when discard is needed. As with most performance-related questions, it's hard to evaluate the exact impact without significant benchmarks, but i don't know if the code is sufficiently mature to make that. Does it make sense or I am missing a point here ? |
@cassiersg I think you are on point with all your questions/assumptions, and this is why we have the two different modes of operation - buffers vs returning strings. Furthermore, currently, whenever we are in buffer mode, formatting is infallible, so we never need to 'undo' formatting. Earlier, I had a version which did do modification, and that is why there is a rope in strings.rs. I'm not sure exactly how to make that efficient, but I think it can be done. |
Some(result) | ||
} | ||
|
||
fn rewrite_call(context: &RewriteContext, | ||
callee: &ast::Expr, | ||
args: &[ptr::P<ast::Expr>], | ||
width: usize, | ||
offset: usize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the indentation here (this should make us fail a test, I wonder why travis passes?)
@cassiersg I'd like to keep thinking about the long term and an efficient solution for the buffered case, but this PR looks good as is (modulo the small comments), so I'd like to land it. Thanks for working on it! |
@nrc I fixed style issues, and added a |
#[macro_export] | ||
macro_rules! try_opt_collect { | ||
($expr:expr) => ({ | ||
let mut res = Vec::with_capacity($expr.size_hint().0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could using $expr
twice have unintended consequences? When it alters mutable state, for example? It think we're evaluating the expression twice anyway, which is not optimal.
Maybe we should let e = $expr;
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a thing is already in the std libaries, by the way. See the first implementor of: http://doc.rust-lang.org/std/iter/trait.FromIterator.html.
let vec: Vec<_> = try_opt_collect!(option_iter)
should be equivalent to
let vec: Vec<_> = try_opt!(option_iter.collect())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't found this impl of IntoIterator for Option. That's exactly what I need, thanks for the pointer.
Combined with try_opt!, this avoid an explicit for loop or another macro.
This PR is just a proposal of a new trait, and is primarily intended for discussion. It's a draft of implementation of an idea I wanted to share and discuss. It's just an idea, that will maybe not well work.
Because it might change a lot of code, I prefer to discuss it before a complete implementation.
Motivation
We have a recurrent pattern in the code: rewrite some code with a given offset and width. (It's very common in
expr.rs
, for example).I wanted to abstract "something that can be rewritten into offset and width" when working on list formatting. Format a list (of args, types parameters...) is also something useful. With the current
lists
module, it needs some recurrent code, essentially : rewrite all items of the list within a maximal width, but the last one within the given width.I found a lot of these patterns when I tried to format types (
ast::Ty
).Design
The idea is quite simple: add a new trait
Rewrite
to represent therewrite
returns anOption
to express the fact that it's sometime not possible to not excess the given width. (Should it be aResult
instead ?) The first element of the tuple (String
) is the code formatted, and the second is a possible comment. Getting the comment separated of the code gives more degrees of freedom to the caller : it can put the comment on an own line before the code, of at the end of the line. The enumComment
is intended to be able to make a difference between a doc comment and another kind of comment (we will eventually replace bloc comments with line comments), but it's maybe not useful and a simple string would be sufficient.The current implementation of this trait to format lists is only used by a place-holder for the
write_list
function.At the very end, all
rewrite_*
functions could be impls ofRewrite
.When it comes to implement and use this trait, things are getting wrong : it makes sense to implement it for
ast::Expr
, and usingrewrite_expr
as therewrite
method, but it doesn't work sincerewrite_expr
is itself a method ofFmtVisitor
.I see two solutions to this problem : in the longterm, we might have a self-sufficient AST, and
FmtVisitor
would not more be needed. Currently, it's not possible though. The second solution is to create structs containing the item we want to format and a ref to theFmtVisitor
. I don't know if this approach would be sustainable if we use it extensively. There is maybe another solution. (?)I tried also an alternative, which is a more ad-hoc solution : pass the rewriting function as argument of
rewrite_list
. It looked me a bit "hackish" and less extensible, but maybe it's only because i'm not used to that.So, please let me know what you think about it.
About the list formatting algorithm, and tuple formatting, the changes are not very relevant, it's just an example of impl of
Rewrite
. The tuple formatting is a good example of use case forRewrite
: the same algorithm may be used for tuple litterals and for tuple types.