|
| 1 | +# Some thoughts on the design of rustfmt |
| 2 | + |
| 3 | +## Use cases |
| 4 | + |
| 5 | +A formatting tool can be used in different ways and the different use cases can |
| 6 | +affect the design of the tool. The use cases I'm particularly concerned with are: |
| 7 | + |
| 8 | +* running on a whole repo before check-in |
| 9 | + - in particular, to replace the `make tidy` pass on the Rust distro |
| 10 | +* running on code from another project that you are adding to your own |
| 11 | +* using for mass changes in code style over a project |
| 12 | + |
| 13 | +Some valid use cases for a formatting tool which I am explicitly not trying to |
| 14 | +address (although it would be nice, if possible): |
| 15 | + |
| 16 | +* running 'as you type' in an IDE |
| 17 | +* running on arbitrary snippets of code |
| 18 | +* running on Rust-like code, specifically code which doesn't parse |
| 19 | +* use as a pretty printer inside the compiler |
| 20 | +* refactoring |
| 21 | +* formatting totally unformatted source code |
| 22 | + |
| 23 | + |
| 24 | +## Scope and vision |
| 25 | + |
| 26 | +I do not subscribe to the notion that a formatting tool should only change |
| 27 | +whitespace. I believe that we should semantics preserving, but not necessarily |
| 28 | +syntax preserving, i.e., we can change the AST of a program. |
| 29 | + |
| 30 | +I.e., we might change glob imports to list or single imports, re-order imports, |
| 31 | +move bounds to where clauses, combine multiple impls into a single impl, etc. |
| 32 | + |
| 33 | +However, we will not change the names of variables or make any changes which |
| 34 | +*could* change the semantics. To be ever so slightly formal, we might imagine |
| 35 | +a compilers high level intermediate representation, we should strive to only |
| 36 | +make changes which do not change the HIR, even if they do change the AST. |
| 37 | + |
| 38 | +I would like to be able to output refactoring scripts for making deeper changes |
| 39 | +though. (E.g., renaming variables to satisfy our style guidelines). |
| 40 | + |
| 41 | +My long term goal is that all style lints can be moved from the compiler to |
| 42 | +rustfmt and, as well as warning, can either fix problems or emit refactoring |
| 43 | +scripts to do so. |
| 44 | + |
| 45 | +### Configurability |
| 46 | + |
| 47 | +I believe reformatting should be configurable to some extent. We should read in |
| 48 | +options from a configuration file and reformat accordingly. We should supply at |
| 49 | +least a config file which matches the Rust style guidelines. |
| 50 | + |
| 51 | +There should be multiple modes for running the tool. As well as simply replacing |
| 52 | +each file, we should be able to show the user a list of the changes we would |
| 53 | +make, or show a list of violations without corrections (the difference being |
| 54 | +that there are multiple ways to satisfy a given set of style guidelines, and we |
| 55 | +should distinguish violations from deviations from our own model). |
| 56 | + |
| 57 | + |
| 58 | +## Implementation philosophy |
| 59 | + |
| 60 | +Some details of the philosophy behind the implementation. |
| 61 | + |
| 62 | + |
| 63 | +### Operate on the AST |
| 64 | + |
| 65 | +A reformatting tool can be based on either the AST or a token stream (in Rust |
| 66 | +this is actually a stream of token trees, but its not a fundamental difference). |
| 67 | +There are pros and cons to the two approaches. I have chosen to use the AST |
| 68 | +approach. The primary reasons are that it allows us to do more sophisticated |
| 69 | +manipulations, rather than just change whitespace, and it gives us more context |
| 70 | +when making those changes. |
| 71 | + |
| 72 | +The advantage of the tokens approach are that you can operate on non-parsable |
| 73 | +code. I don't care too much about that, it would be nice, but I think being able |
| 74 | +to perform sophisticated transformations is more important. In the future I hope to |
| 75 | +(optionally) be able to use type information for informing reformatting too. One |
| 76 | +specific case of unparsable code is macros. Using tokens is certainly easier |
| 77 | +here, but I believe it is perfectly solvable with the AST approach. At the limit, |
| 78 | +we can operate on just tokens in the macro case. |
| 79 | + |
| 80 | +I believe that there is not in fact that much difference between the two |
| 81 | +approaches. Due to imperfect span information, under the AST approach, we |
| 82 | +sometimes are reduced to examining tokens or do some re-lexing of our own. Under |
| 83 | +the tokens approach you need to implement your own (much simpler) parser. I |
| 84 | +believe that as the tool gets more sophisticated, you end up doing more at the |
| 85 | +token-level, or having an increasingly sophisticated parser, until at the limit |
| 86 | +you have the same tool. |
| 87 | + |
| 88 | +However, I believe starting from the AST gets you more quickly to a usable and |
| 89 | +useful tool. |
| 90 | + |
| 91 | + |
| 92 | +### Heuristic rather than algorithmic |
| 93 | + |
| 94 | +Many formatting tools use a very general algorithmic or even algebraic tool for |
| 95 | +pretty printing. This results in very elegant code, but I believe does not give |
| 96 | +the best results. I prefer a more ad hoc approach where each expression/item is |
| 97 | +formatted using custom rules. We hopefully don't end up with too much code due |
| 98 | +to good old fashioned abstraction and code sharing. This will give a bigger code |
| 99 | +base, but hopefully a better result. |
| 100 | + |
| 101 | +It also means that there will be some cases we can't format and we have to give |
| 102 | +up. I think that is OK. Hopefully they are rare enough that manually fixing them |
| 103 | +is not painful. Better to have a tool that gives great code in 99% of cases and |
| 104 | +fails in 1% than a tool which gives 50% great code and 50% ugly code, but never |
| 105 | +fails. |
| 106 | + |
| 107 | + |
| 108 | +### Incremental development |
| 109 | + |
| 110 | +I want rustfmt to be useful as soon as possible and to always be useful. I |
| 111 | +specifically don't want to have to wait for a feature (or worse, the whole tool) |
| 112 | +to be perfect before it is useful. The main ways this is achieved is to output |
| 113 | +the source code where we can't yet reformat, be able to turn off new features |
| 114 | +until they are ready, and the 'do no harm' principle (see next section). |
| 115 | + |
| 116 | + |
| 117 | +### First, do no harm |
| 118 | + |
| 119 | +Until rustfmt it perfect, there will always be a trade-off between doing more and |
| 120 | +doing existing things well. I want to err on the side of the latter. |
| 121 | +Specifically, rustfmt should never take OK code and make it look worse. If we |
| 122 | +can't make it better, we should leave it as is. That might mean being less |
| 123 | +aggressive than we like or using configurability. |
| 124 | + |
| 125 | + |
| 126 | +### Use the source code as guidance |
| 127 | + |
| 128 | +There are often multiple ways to format code and satisfy standards. Where this |
| 129 | +is the case, we should use the source code as a hint for reformatting. |
| 130 | +Furthermore, where the code has been formatted in a particular way that satisfies |
| 131 | +the coding standard, it should not be changed (this is sometimes not possible or |
| 132 | +not worthwhile due to uniformity being desirable, but it is a useful goal). |
| 133 | + |
| 134 | + |
| 135 | +### Architecture details |
| 136 | + |
| 137 | +We use the AST from libsyntax. We use libsyntax's visit module to walk the AST |
| 138 | +to find starting points for reformatting. Eventually, we should reformat everything |
| 139 | +and we shouldn't need the visit module. We keep track of the last formatted |
| 140 | +position in the code, and when we reformat the next piece of code we make sure |
| 141 | +to output the span for all the code in between (handled by missed_spans.rs). |
| 142 | + |
| 143 | +Our visitor keeps track of the desired current indent due to blocks ( |
| 144 | +`block_indent`). Each `visit_*` method reformats code according to this indent |
| 145 | +and `IDEAL_WIDTH` and `MAX_WIDTH` (which should one day be supplied from a |
| 146 | +config file). Most reformatting done in the `visit_*` methods is a bit hackey |
| 147 | +and is meant to be temporary until it can be done properly. |
| 148 | + |
| 149 | +There are a bunch of methods called `rewrite_*`. There do the bulk of the |
| 150 | +reformatting. These take the AST node to be reformatted (this may not literally |
| 151 | +be an AST node from libsyntax, there might be multiple parameters describing a |
| 152 | +logical node), the current indent, and the current width budget. They return a |
| 153 | +`String` (or sometimes an `Option<String>`) which formats the code in the box |
| 154 | +given by the indent and width budget. If the method fails, it returns `None` and |
| 155 | +the calling method then has to fallback in some way to give the callee more space. |
| 156 | + |
| 157 | +So, in summary to format a node, we calculate the width budget and then walk down |
| 158 | +the tree from the node. At a leaf, we generate an actual string and then unwind, |
| 159 | +combining these strings as we go back up the tree. |
| 160 | + |
| 161 | +For example, consider a method definition: |
| 162 | + |
| 163 | +``` |
| 164 | + fn foo(a: A, b: B) { |
| 165 | + ... |
| 166 | + } |
| 167 | +``` |
| 168 | + |
| 169 | +We start at indent 4, the rewrite function for the whole function knows it must |
| 170 | +write `fn foo(` before the arguments and `) {` after them, assuming the max width |
| 171 | +is 100, it thus asks the rewrite argument list function to rewrite with an indent |
| 172 | +of 11 and in a width of 86. Assuming that is possible (obviously in this case), |
| 173 | +it returns a string for the arguments and it can make a string for the function |
| 174 | +header. If the arguments couldn't be fitted in that space, we might try to |
| 175 | +fallback to a hanging indent, so we try again with indent 8 and width 89. |
0 commit comments