|
| 1 | +- Start Date: (fill me in with today's date, YYYY-MM-DD) |
| 2 | +- RFC PR #: (leave this empty) |
| 3 | +- Rust Issue #: (leave this empty) |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +- Convert function call `a(b, ..., z)` into an overloadable operator |
| 8 | + via the traits `Fn<A,R>`, `FnShare<A,R>`, and `FnOnce<A,R>`, where `A` |
| 9 | + is a tuple `(B, ..., Z)` of the types `B...Z` of the arguments |
| 10 | + `b...z`, and `R` is the return type. The three traits differ in |
| 11 | + their self argument (`&mut self` vs `&self` vs `self`). |
| 12 | +- Remove the `proc` expression form and type. |
| 13 | +- Remove the closure types (though the form lives on as syntactic |
| 14 | + sugar, see below). |
| 15 | +- Modify closure expressions to permit specifying by-reference vs |
| 16 | + by-value capture and the receiver type: |
| 17 | + - Specifying by-reference vs by-value closures: |
| 18 | + - `ref |...| expr` indicates a closure that captures upvars from the |
| 19 | + environment by reference. This is what closures do today and the |
| 20 | + behavior will remain unchanged, other than requiring an explicit |
| 21 | + keyword. |
| 22 | + - `|...| expr` will therefore indicate a closure that captures upvars |
| 23 | + from the environment by value. As usual, this is either a copy or |
| 24 | + move depending on whether the type of the upvar implements `Copy`. |
| 25 | + - Specifying receiver mode (orthogonal to capture mode above): |
| 26 | + - `|a, b, c| expr` is equivalent to `|&mut: a, b, c| expr` |
| 27 | + - `|&mut: ...| expr` indicates that the closure implements `Fn` |
| 28 | + - `|&: ...| expr` indicates that the closure implements `FnShare` |
| 29 | + - `|: a, b, c| expr` indicates that the closure implements `FnOnce`. |
| 30 | +- Add syntactic sugar where `|T1, T2| -> R1` is translated to |
| 31 | + a reference to one of the fn traits as follows: |
| 32 | + - `|T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>` |
| 33 | + - `|&mut: T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>` |
| 34 | + - `|&: T1, ..., Tn| -> R` is translated to `FnShare<(T1, ..., Tn), R>` |
| 35 | + - `|: T1, ..., Tn| -> R` is translated to `FnOnce<(T1, ..., Tn), R>` |
| 36 | + |
| 37 | +One aspect of closures that this RFC does *not* describe is that we |
| 38 | +must permit trait references to be universally quantified over regions |
| 39 | +as closures are today. A description of this change is described below |
| 40 | +under *Unresolved questions* and the details will come in a |
| 41 | +forthcoming RFC. |
| 42 | + |
| 43 | +# Motivation |
| 44 | + |
| 45 | +Over time we have observed a very large number of possible use cases |
| 46 | +for closures. The goal of this RFC is to create a unified closure |
| 47 | +model that encompasses all of these use cases. |
| 48 | + |
| 49 | +Specific goals (explained in more detail below): |
| 50 | + |
| 51 | +1. Give control over inlining to users. |
| 52 | +2. Support closures that bind by reference and closures that bind by value. |
| 53 | +3. Support different means of accessing the closure environment, |
| 54 | + corresponding to `self`, `&self`, and `&mut self` methods. |
| 55 | + |
| 56 | +As a side benefit, though not a direct goal, the RFC reduces the |
| 57 | +size/complexity of the language's core type system by unifying |
| 58 | +closures and traits. |
| 59 | + |
| 60 | +## The core idea: unifying closures and traits |
| 61 | + |
| 62 | +The core idea of the RFC is to unify closures, procs, and |
| 63 | +traits. There are a number of reasons to do this. First, it simplifies |
| 64 | +the language, because closures, procs, and traits already served |
| 65 | +similar roles and there was sometimes a lack of clarity about which |
| 66 | +would be the appropriate choice. However, in addition, the unification |
| 67 | +offers increased expressiveness and power, because traits are a more |
| 68 | +generic model that gives users more control over optimization. |
| 69 | + |
| 70 | +The basic idea is that function calls become an overridable operator. |
| 71 | +Therefore, an expression like `a(...)` will be desugar into an |
| 72 | +invocation of one of the following traits: |
| 73 | + |
| 74 | + trait Fn<A,R> { |
| 75 | + fn call(&mut self, args: A) -> R; |
| 76 | + } |
| 77 | + |
| 78 | + trait FnShare<A,R> { |
| 79 | + fn call(&self, args: A) -> R; |
| 80 | + } |
| 81 | + |
| 82 | + trait FnOnce<A,R> { |
| 83 | + fn call(&self, args: A) -> R; |
| 84 | + } |
| 85 | + |
| 86 | +Essentially, `a(b, c, d)` becomes sugar for one of the following: |
| 87 | + |
| 88 | + Fn::call(&mut a, (b, c, d)) |
| 89 | + FnShare::call(&a, (b, c, d)) |
| 90 | + FnOnce::call(a, (b, c, d)) |
| 91 | + |
| 92 | +To integrate with this, closure expressions are then translated into a |
| 93 | +fresh struct that implements one of those three traits. The precise |
| 94 | +trait is currently indicated using explicit syntax but may eventually |
| 95 | +be inferred. |
| 96 | + |
| 97 | +This change gives user control over virtual vs static dispatch. This |
| 98 | +works in the same way as generic types today: |
| 99 | + |
| 100 | + fn foo(x: &mut Fn<int,int>) -> int { |
| 101 | + x(2) // virtual dispatch |
| 102 | + } |
| 103 | + |
| 104 | + fn foo<F:Fn<int,int>>(x: &mut F) -> int { |
| 105 | + x(2) // static dispatch |
| 106 | + } |
| 107 | + |
| 108 | +The change also permits returning closures, which is not currently |
| 109 | +possible (the example relies on the proposed `impl` syntax from |
| 110 | +rust-lang/rfcs#105): |
| 111 | + |
| 112 | + fn foo(x: impl Fn<int,int>) -> impl Fn<int,int> { |
| 113 | + |v| x(v * 2) |
| 114 | + } |
| 115 | + |
| 116 | +Basically, in this design there is nothing special about a closure. |
| 117 | +Closure expressions are simply a convenient way to generate a struct |
| 118 | +that implements a suitable `Fn` trait. |
| 119 | + |
| 120 | +## Bind by reference vs bind by value |
| 121 | + |
| 122 | +When creating a closure, it is now possible to specify whether the |
| 123 | +closure should capture variables from its environment ("upvars") by |
| 124 | +reference or by value. The distinction is indicated using the leading |
| 125 | +keyword `ref`: |
| 126 | + |
| 127 | + || foo(a, b) // captures `a` and `b` by value |
| 128 | + |
| 129 | + ref || foo(a, b) // captures `a` and `b` by reference, as today |
| 130 | + |
| 131 | +### Reasons to bind by value |
| 132 | + |
| 133 | +Bind by value is useful when creating closures that will escape from |
| 134 | +the stack frame that created them, such as task bodies (`spawn(|| |
| 135 | +...)`) or combinators. It is also useful for moving values out of a |
| 136 | +closure, though it should be possible to enable that with bind by |
| 137 | +reference as well in the future. |
| 138 | + |
| 139 | +### Reasons to bind by reference |
| 140 | + |
| 141 | +Bind by reference is useful for any case where the closure is known |
| 142 | +not to escape the creating stack frame. This frequently occurs |
| 143 | +when using closures to encapsulate common control-flow patterns: |
| 144 | + |
| 145 | + map.insert_or_update_with(key, value, || ...) |
| 146 | + opt_val.unwrap_or_else(|| ...) |
| 147 | + |
| 148 | +In such cases, the closure frequently wishes to read or modify local |
| 149 | +variables on the enclosing stack frame. Generally speaking, then, such |
| 150 | +closures should capture variables by-reference -- that is, they should |
| 151 | +store a reference to the variable in the creating stack frame, rather |
| 152 | +than copying the value out. Using a reference allows the closure to |
| 153 | +mutate the variables in place and also avoids moving values that are |
| 154 | +simply read temporarily. |
| 155 | + |
| 156 | +The vast majority of closures in use today are should be "by |
| 157 | +reference" closures. The only exceptions are those closures that wish |
| 158 | +to "move out" from an upvar (where we commonly use the so-called |
| 159 | +"option dance" today). In fact, even those closures could be "by |
| 160 | +reference" closures, but we will have to extend the inference to |
| 161 | +selectively identify those variables that must be moved and take those |
| 162 | +"by value". |
| 163 | + |
| 164 | +# Detailed design |
| 165 | + |
| 166 | +## Closure expression syntax |
| 167 | + |
| 168 | +Closure expressions will have the following form (using EBNF notation, |
| 169 | +where `[]` denotes optional things and `{}` denotes a comma-separated |
| 170 | +list): |
| 171 | + |
| 172 | + CLOSURE = ['ref'] '|' [SELF] {ARG} '|' ['->' TYPE] EXPR |
| 173 | + SELF = ':' | '&' ':' | '&' 'mut' ':' |
| 174 | + ARG = ID [ ':' TYPE ] |
| 175 | + |
| 176 | +The optional keyword `ref` is used to indicate whether this closure |
| 177 | +captures *by reference* or *by value*. |
| 178 | + |
| 179 | +Closures are always translated into a fresh struct type with one field |
| 180 | +per upvar. In a by-value closure, the types of these fields will be |
| 181 | +the same as the types of the corresponding upvars (modulo `&mut` |
| 182 | +reborrows, see below). In a by-reference closure, the types of these |
| 183 | +fields will be a suitable reference (`&`, `&mut`, etc) to the |
| 184 | +variables being borrowed. |
| 185 | + |
| 186 | +### By-value closures |
| 187 | + |
| 188 | +The default form for a closure is by-value. This implies that all |
| 189 | +upvars which are referenced are copied/moved into the closure as |
| 190 | +appropriate. There is one special case: if the type of the value to be |
| 191 | +moved is `&mut`, we will "reborrow" the value when it is copied into |
| 192 | +the closure. That is, given an upvar `x` of type `&'a mut T`, the |
| 193 | +value which is actually captured will have type `&'b mut T` where `'b |
| 194 | +<= 'a`. This rule is consistent with our general treatment of `&mut`, |
| 195 | +which is to aggressively reborrow wherever possible; moreover, this |
| 196 | +rule cannot introduce additional compilation errors, it can only make |
| 197 | +more programs successfully typecheck. |
| 198 | + |
| 199 | +### By-reference closures |
| 200 | + |
| 201 | +A *by-reference* closure is a convenience form in which values used in |
| 202 | +the closure are converted into references before being captured. By |
| 203 | +reference closures are always rewritable into by value closures if |
| 204 | +desired, but the rewrite can often be cumbersome and annoying. |
| 205 | + |
| 206 | +Here is a (rather artificial) example of a by-reference closure in |
| 207 | +use: |
| 208 | + |
| 209 | + let in_vec: Vec<int> = ...; |
| 210 | + let mut out_vec: Vec<int> = Vec::new(); |
| 211 | + let opt_int: Option<int> = ...; |
| 212 | + |
| 213 | + opt_int.map(ref |v| { |
| 214 | + out_vec.push(v); |
| 215 | + in_vec.fold(v, |a, &b| a + b) |
| 216 | + }); |
| 217 | + |
| 218 | +This could be rewritten into a by-value closure as follows: |
| 219 | + |
| 220 | + let in_vec: Vec<int> = ...; |
| 221 | + let mut out_vec: Vec<int> = Vec::new(); |
| 222 | + let opt_int: Option<int> = ...; |
| 223 | + |
| 224 | + opt_int.map({ |
| 225 | + let in_vec = &in_vec; |
| 226 | + let out_vec = &mut in_vec; |
| 227 | + |v| { |
| 228 | + out_vec.push(v); |
| 229 | + in_vec.fold(v, |a, &b| a + b) |
| 230 | + } |
| 231 | + }) |
| 232 | + |
| 233 | +In this case, the capture closed over two variables, `in_vec` and |
| 234 | +`out_vec`. As you can see, the compiler automatically infers, for each |
| 235 | +variable, how it should be borrowed and inserts the appropriate |
| 236 | +capture. |
| 237 | + |
| 238 | +In the body of a `ref` closure, the upvars continue to have the same |
| 239 | +type as they did in the outer environment. For example, the type of a |
| 240 | +reference to `in_vec` in the above example is always `Vec<int>`, |
| 241 | +whether or not it appears as part of a `ref` closure. This is not only |
| 242 | +convenient, it is required to make it possible to infer whether each |
| 243 | +variable is borrowed as an `&T` or `&mut T` borrow. |
| 244 | + |
| 245 | +Note that there are some cases where the compiler internally employs a |
| 246 | +form of borrow that is not available in the core language, |
| 247 | +`&uniq`. This borrow does not permit aliasing (like `&mut`) but does |
| 248 | +not require mutability (like `&`). This is required to allow |
| 249 | +transparent closing over of `&mut` pointers as |
| 250 | +[described in this blog post][p]. |
| 251 | + |
| 252 | +**Evolutionary note:** It is possible to evolve by-reference |
| 253 | +closures in the future in a backwards compatible way. The goal would |
| 254 | +be to cause more programs to type-check by default. Two possible |
| 255 | +extensions follow: |
| 256 | + |
| 257 | +- Detect when values are *moved* and hence should be taken by value |
| 258 | + rather than by reference. (This is only applicable to once |
| 259 | + closures.) |
| 260 | +- Detect when it is only necessary to borrow a sub-path. Imagine a |
| 261 | + closure like `ref || use(&context.variable_map)`. Currently, this |
| 262 | + closure will borrow `context`, even though it only *uses* the field |
| 263 | + `variable_map`. As a result, it is sometimes necessary to rewrite |
| 264 | + the closure to have the form `{let v = &context.variable_map; || |
| 265 | + use(v)}`. In the future, however, we could extend the inference so |
| 266 | + that rather than borrowing `context` to create the closure, we would |
| 267 | + borrow `context.variable_map` directly. |
| 268 | + |
| 269 | +## Closure sugar in trait references |
| 270 | + |
| 271 | +The current type for closures, `|T1, T2| -> R`, will be repurposed as |
| 272 | +syntactic sugar for a reference to the appropriate `Fn` trait. This |
| 273 | +shorthand be used any place that a trait reference is appropriate. The |
| 274 | +full type will be written as one of the following: |
| 275 | + |
| 276 | + <'a...'z> |T1...Tn|: K -> R |
| 277 | + <'a...'z> |&mut: T1...Tn|: K -> R |
| 278 | + <'a...'z> |&: T1...Tn|: K -> R |
| 279 | + <'a...'z> |: T1...Tn|: K -> R |
| 280 | + |
| 281 | +Each of which would then be translated into the following trait |
| 282 | +references, respectively: |
| 283 | + |
| 284 | + <'a...'z> Fn<(T1...Tn), R> + K |
| 285 | + <'a...'z> Fn<(T1...Tn), R> + K |
| 286 | + <'a...'z> FnShare<(T1...Tn), R> + K |
| 287 | + <'a...'z> FnOnce<(T1...Tn), R> + K |
| 288 | + |
| 289 | +Note that the bound lifetimes `'a...'z` are not in scope for the bound |
| 290 | +`K`. |
| 291 | + |
| 292 | +# Drawbacks |
| 293 | + |
| 294 | +This model is more complex than the existing model in some respects |
| 295 | +(but the existing model does not serve the full set of desired use cases). |
| 296 | + |
| 297 | +# Alternatives |
| 298 | + |
| 299 | +There is one aspect of the design that is still under active |
| 300 | +discussion: |
| 301 | + |
| 302 | +**Introduce a more generic sugar.** It was proposed that we could |
| 303 | +introduce `Trait(A, B) -> C` as syntactic sugar for `Trait<(A,B),C>` |
| 304 | +rather than retaining the form `|A,B| -> C`. This is appealing but |
| 305 | +removes the correspondence between the expression form and the |
| 306 | +corresponding type. One (somewhat open) question is whether there will |
| 307 | +be additional traits that mirror fn types that might benefit from this |
| 308 | +more general sugar. |
| 309 | + |
| 310 | +**Tweak trait names.** In conjunction with the above, there is some |
| 311 | +concern that the type name `fn(A) -> B` for a bare function with no |
| 312 | +environment is too similar to `Fn(A) -> B` for a closure. To remedy |
| 313 | +that, we could change the name of the trait to something like |
| 314 | +`Closure(A) -> B` (naturally the other traits would be renamed to |
| 315 | +match). |
| 316 | + |
| 317 | +Then there are a large number of permutations and options that were |
| 318 | +largely rejected: |
| 319 | + |
| 320 | +**Only offer by-value closures.** We tried this and found it |
| 321 | +required a lot of painful rewrites of perfectly reasonable code. |
| 322 | + |
| 323 | +**Make by-reference closures the default.** We felt this was |
| 324 | +inconsistent with the language as a whole, which tends to make "by |
| 325 | +value" the default (e.g., `x` vs `ref x` in patterns, `x` vs `&x` in |
| 326 | +expressions, etc.). |
| 327 | + |
| 328 | +**Use a capture clause syntax that borrows individual variables.** "By |
| 329 | +value" closures combined with `let` statements already serve this |
| 330 | +role. Simply specifying "by-reference closure" also gives us room to |
| 331 | +continue improving inference in the future in a backwards compatible |
| 332 | +way. Moreover, the syntactic space around closures expressions is |
| 333 | +extremely constrained and we were unable to find a satisfactory |
| 334 | +syntax, particularly when combined with self-type annotations. |
| 335 | +Finally, if we decide we *do* want the ability to have "mostly |
| 336 | +by-value" closures, we can easily extend the current syntax by writing |
| 337 | +something like `(ref x, ref mut y) || ...` etc. |
| 338 | + |
| 339 | +**Retain the proc expression form.** It was proposed that we could |
| 340 | +retain the `proc` expression form to specify a by-value closure and |
| 341 | +have `||` expressions be by-reference. Frankly, the main objection to |
| 342 | +this is that nobody likes the `proc` keyword. |
| 343 | + |
| 344 | +**Use variadic generics in place of tuple arguments.** While variadic |
| 345 | +generics are an interesting addition in their own right, we'd prefer |
| 346 | +not to introduce a dependency between closures and variadic |
| 347 | +generics. Having all arguments be placed into a tuple is also a |
| 348 | +simpler model overall. Moreover, native ABIs on platforms of interest |
| 349 | +treat a structure passed by value identically to distinct |
| 350 | +arguments. Finally, given that trait calls have the "Rust" ABI, which |
| 351 | +is not specified, we can always tweak the rules if necessary (though |
| 352 | +their advantages for tooling when the Rust ABI closely matches the |
| 353 | +native ABI). |
| 354 | + |
| 355 | +**Use inference to determine the self type of a closure rather than an |
| 356 | +annotation.** We retain this option for future expansion, but it is |
| 357 | +not clear whether we can always infer the self type of a |
| 358 | +closure. Moreover, using inference rather a default raises the |
| 359 | +question of what to do for a type like `|int| -> uint`, where |
| 360 | +inference is not possible. |
| 361 | + |
| 362 | +**Default to something other than `&mut self`.** It is our belief that |
| 363 | +this is the most common use case for closures. |
| 364 | + |
| 365 | +# Transition plan |
| 366 | + |
| 367 | +TBD. pcwalton is working furiously as we speak. |
| 368 | + |
| 369 | +# Unresolved questions |
| 370 | + |
| 371 | +## Closures that are quantified over lifetimes |
| 372 | + |
| 373 | +A separate RFC is needed to describe bound lifetimes in trait |
| 374 | +references. For example, today one can write a type like `<'a> |&'a A| |
| 375 | +-> &'a B`, which indicates a closure that takes and returns a |
| 376 | +reference with the same lifetime specified by the caller at each |
| 377 | +call-site. Note that a trait reference like `Fn<(&'a A), &'a B>`, |
| 378 | +while syntactically similar, does *not* have the same meaning because |
| 379 | +it lacks the universal quantifier `<'a>`. Therefore, in the second |
| 380 | +case, `'a` refers to some specific lifetime `'a`, rather than being a |
| 381 | +lifetime parameter that is specified at each callsite. The high-level |
| 382 | +summary of the change therefore is to permit trait references like |
| 383 | +`<'a> Fn<(&'a A), &'a B>`; in this case, the value of `<'a>` will be |
| 384 | +specified each time a method or other member of the trait is accessed. |
| 385 | + |
| 386 | +[p]: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/ |
0 commit comments