Skip to content

Commit 620a173

Browse files
committed
Initial commit
1 parent 69a0f68 commit 620a173

File tree

1 file changed

+386
-0
lines changed

1 file changed

+386
-0
lines changed

active/0000-closures.md

Lines changed: 386 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,386 @@
1+
- Start Date: (fill me in with today's date, YYYY-MM-DD)
2+
- RFC PR #: (leave this empty)
3+
- Rust Issue #: (leave this empty)
4+
5+
# Summary
6+
7+
- Convert function call `a(b, ..., z)` into an overloadable operator
8+
via the traits `Fn<A,R>`, `FnShare<A,R>`, and `FnOnce<A,R>`, where `A`
9+
is a tuple `(B, ..., Z)` of the types `B...Z` of the arguments
10+
`b...z`, and `R` is the return type. The three traits differ in
11+
their self argument (`&mut self` vs `&self` vs `self`).
12+
- Remove the `proc` expression form and type.
13+
- Remove the closure types (though the form lives on as syntactic
14+
sugar, see below).
15+
- Modify closure expressions to permit specifying by-reference vs
16+
by-value capture and the receiver type:
17+
- Specifying by-reference vs by-value closures:
18+
- `ref |...| expr` indicates a closure that captures upvars from the
19+
environment by reference. This is what closures do today and the
20+
behavior will remain unchanged, other than requiring an explicit
21+
keyword.
22+
- `|...| expr` will therefore indicate a closure that captures upvars
23+
from the environment by value. As usual, this is either a copy or
24+
move depending on whether the type of the upvar implements `Copy`.
25+
- Specifying receiver mode (orthogonal to capture mode above):
26+
- `|a, b, c| expr` is equivalent to `|&mut: a, b, c| expr`
27+
- `|&mut: ...| expr` indicates that the closure implements `Fn`
28+
- `|&: ...| expr` indicates that the closure implements `FnShare`
29+
- `|: a, b, c| expr` indicates that the closure implements `FnOnce`.
30+
- Add syntactic sugar where `|T1, T2| -> R1` is translated to
31+
a reference to one of the fn traits as follows:
32+
- `|T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>`
33+
- `|&mut: T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>`
34+
- `|&: T1, ..., Tn| -> R` is translated to `FnShare<(T1, ..., Tn), R>`
35+
- `|: T1, ..., Tn| -> R` is translated to `FnOnce<(T1, ..., Tn), R>`
36+
37+
One aspect of closures that this RFC does *not* describe is that we
38+
must permit trait references to be universally quantified over regions
39+
as closures are today. A description of this change is described below
40+
under *Unresolved questions* and the details will come in a
41+
forthcoming RFC.
42+
43+
# Motivation
44+
45+
Over time we have observed a very large number of possible use cases
46+
for closures. The goal of this RFC is to create a unified closure
47+
model that encompasses all of these use cases.
48+
49+
Specific goals (explained in more detail below):
50+
51+
1. Give control over inlining to users.
52+
2. Support closures that bind by reference and closures that bind by value.
53+
3. Support different means of accessing the closure environment,
54+
corresponding to `self`, `&self`, and `&mut self` methods.
55+
56+
As a side benefit, though not a direct goal, the RFC reduces the
57+
size/complexity of the language's core type system by unifying
58+
closures and traits.
59+
60+
## The core idea: unifying closures and traits
61+
62+
The core idea of the RFC is to unify closures, procs, and
63+
traits. There are a number of reasons to do this. First, it simplifies
64+
the language, because closures, procs, and traits already served
65+
similar roles and there was sometimes a lack of clarity about which
66+
would be the appropriate choice. However, in addition, the unification
67+
offers increased expressiveness and power, because traits are a more
68+
generic model that gives users more control over optimization.
69+
70+
The basic idea is that function calls become an overridable operator.
71+
Therefore, an expression like `a(...)` will be desugar into an
72+
invocation of one of the following traits:
73+
74+
trait Fn<A,R> {
75+
fn call(&mut self, args: A) -> R;
76+
}
77+
78+
trait FnShare<A,R> {
79+
fn call(&self, args: A) -> R;
80+
}
81+
82+
trait FnOnce<A,R> {
83+
fn call(&self, args: A) -> R;
84+
}
85+
86+
Essentially, `a(b, c, d)` becomes sugar for one of the following:
87+
88+
Fn::call(&mut a, (b, c, d))
89+
FnShare::call(&a, (b, c, d))
90+
FnOnce::call(a, (b, c, d))
91+
92+
To integrate with this, closure expressions are then translated into a
93+
fresh struct that implements one of those three traits. The precise
94+
trait is currently indicated using explicit syntax but may eventually
95+
be inferred.
96+
97+
This change gives user control over virtual vs static dispatch. This
98+
works in the same way as generic types today:
99+
100+
fn foo(x: &mut Fn<int,int>) -> int {
101+
x(2) // virtual dispatch
102+
}
103+
104+
fn foo<F:Fn<int,int>>(x: &mut F) -> int {
105+
x(2) // static dispatch
106+
}
107+
108+
The change also permits returning closures, which is not currently
109+
possible (the example relies on the proposed `impl` syntax from
110+
rust-lang/rfcs#105):
111+
112+
fn foo(x: impl Fn<int,int>) -> impl Fn<int,int> {
113+
|v| x(v * 2)
114+
}
115+
116+
Basically, in this design there is nothing special about a closure.
117+
Closure expressions are simply a convenient way to generate a struct
118+
that implements a suitable `Fn` trait.
119+
120+
## Bind by reference vs bind by value
121+
122+
When creating a closure, it is now possible to specify whether the
123+
closure should capture variables from its environment ("upvars") by
124+
reference or by value. The distinction is indicated using the leading
125+
keyword `ref`:
126+
127+
|| foo(a, b) // captures `a` and `b` by value
128+
129+
ref || foo(a, b) // captures `a` and `b` by reference, as today
130+
131+
### Reasons to bind by value
132+
133+
Bind by value is useful when creating closures that will escape from
134+
the stack frame that created them, such as task bodies (`spawn(||
135+
...)`) or combinators. It is also useful for moving values out of a
136+
closure, though it should be possible to enable that with bind by
137+
reference as well in the future.
138+
139+
### Reasons to bind by reference
140+
141+
Bind by reference is useful for any case where the closure is known
142+
not to escape the creating stack frame. This frequently occurs
143+
when using closures to encapsulate common control-flow patterns:
144+
145+
map.insert_or_update_with(key, value, || ...)
146+
opt_val.unwrap_or_else(|| ...)
147+
148+
In such cases, the closure frequently wishes to read or modify local
149+
variables on the enclosing stack frame. Generally speaking, then, such
150+
closures should capture variables by-reference -- that is, they should
151+
store a reference to the variable in the creating stack frame, rather
152+
than copying the value out. Using a reference allows the closure to
153+
mutate the variables in place and also avoids moving values that are
154+
simply read temporarily.
155+
156+
The vast majority of closures in use today are should be "by
157+
reference" closures. The only exceptions are those closures that wish
158+
to "move out" from an upvar (where we commonly use the so-called
159+
"option dance" today). In fact, even those closures could be "by
160+
reference" closures, but we will have to extend the inference to
161+
selectively identify those variables that must be moved and take those
162+
"by value".
163+
164+
# Detailed design
165+
166+
## Closure expression syntax
167+
168+
Closure expressions will have the following form (using EBNF notation,
169+
where `[]` denotes optional things and `{}` denotes a comma-separated
170+
list):
171+
172+
CLOSURE = ['ref'] '|' [SELF] {ARG} '|' ['->' TYPE] EXPR
173+
SELF = ':' | '&' ':' | '&' 'mut' ':'
174+
ARG = ID [ ':' TYPE ]
175+
176+
The optional keyword `ref` is used to indicate whether this closure
177+
captures *by reference* or *by value*.
178+
179+
Closures are always translated into a fresh struct type with one field
180+
per upvar. In a by-value closure, the types of these fields will be
181+
the same as the types of the corresponding upvars (modulo `&mut`
182+
reborrows, see below). In a by-reference closure, the types of these
183+
fields will be a suitable reference (`&`, `&mut`, etc) to the
184+
variables being borrowed.
185+
186+
### By-value closures
187+
188+
The default form for a closure is by-value. This implies that all
189+
upvars which are referenced are copied/moved into the closure as
190+
appropriate. There is one special case: if the type of the value to be
191+
moved is `&mut`, we will "reborrow" the value when it is copied into
192+
the closure. That is, given an upvar `x` of type `&'a mut T`, the
193+
value which is actually captured will have type `&'b mut T` where `'b
194+
<= 'a`. This rule is consistent with our general treatment of `&mut`,
195+
which is to aggressively reborrow wherever possible; moreover, this
196+
rule cannot introduce additional compilation errors, it can only make
197+
more programs successfully typecheck.
198+
199+
### By-reference closures
200+
201+
A *by-reference* closure is a convenience form in which values used in
202+
the closure are converted into references before being captured. By
203+
reference closures are always rewritable into by value closures if
204+
desired, but the rewrite can often be cumbersome and annoying.
205+
206+
Here is a (rather artificial) example of a by-reference closure in
207+
use:
208+
209+
let in_vec: Vec<int> = ...;
210+
let mut out_vec: Vec<int> = Vec::new();
211+
let opt_int: Option<int> = ...;
212+
213+
opt_int.map(ref |v| {
214+
out_vec.push(v);
215+
in_vec.fold(v, |a, &b| a + b)
216+
});
217+
218+
This could be rewritten into a by-value closure as follows:
219+
220+
let in_vec: Vec<int> = ...;
221+
let mut out_vec: Vec<int> = Vec::new();
222+
let opt_int: Option<int> = ...;
223+
224+
opt_int.map({
225+
let in_vec = &in_vec;
226+
let out_vec = &mut in_vec;
227+
|v| {
228+
out_vec.push(v);
229+
in_vec.fold(v, |a, &b| a + b)
230+
}
231+
})
232+
233+
In this case, the capture closed over two variables, `in_vec` and
234+
`out_vec`. As you can see, the compiler automatically infers, for each
235+
variable, how it should be borrowed and inserts the appropriate
236+
capture.
237+
238+
In the body of a `ref` closure, the upvars continue to have the same
239+
type as they did in the outer environment. For example, the type of a
240+
reference to `in_vec` in the above example is always `Vec<int>`,
241+
whether or not it appears as part of a `ref` closure. This is not only
242+
convenient, it is required to make it possible to infer whether each
243+
variable is borrowed as an `&T` or `&mut T` borrow.
244+
245+
Note that there are some cases where the compiler internally employs a
246+
form of borrow that is not available in the core language,
247+
`&uniq`. This borrow does not permit aliasing (like `&mut`) but does
248+
not require mutability (like `&`). This is required to allow
249+
transparent closing over of `&mut` pointers as
250+
[described in this blog post][p].
251+
252+
**Evolutionary note:** It is possible to evolve by-reference
253+
closures in the future in a backwards compatible way. The goal would
254+
be to cause more programs to type-check by default. Two possible
255+
extensions follow:
256+
257+
- Detect when values are *moved* and hence should be taken by value
258+
rather than by reference. (This is only applicable to once
259+
closures.)
260+
- Detect when it is only necessary to borrow a sub-path. Imagine a
261+
closure like `ref || use(&context.variable_map)`. Currently, this
262+
closure will borrow `context`, even though it only *uses* the field
263+
`variable_map`. As a result, it is sometimes necessary to rewrite
264+
the closure to have the form `{let v = &context.variable_map; ||
265+
use(v)}`. In the future, however, we could extend the inference so
266+
that rather than borrowing `context` to create the closure, we would
267+
borrow `context.variable_map` directly.
268+
269+
## Closure sugar in trait references
270+
271+
The current type for closures, `|T1, T2| -> R`, will be repurposed as
272+
syntactic sugar for a reference to the appropriate `Fn` trait. This
273+
shorthand be used any place that a trait reference is appropriate. The
274+
full type will be written as one of the following:
275+
276+
<'a...'z> |T1...Tn|: K -> R
277+
<'a...'z> |&mut: T1...Tn|: K -> R
278+
<'a...'z> |&: T1...Tn|: K -> R
279+
<'a...'z> |: T1...Tn|: K -> R
280+
281+
Each of which would then be translated into the following trait
282+
references, respectively:
283+
284+
<'a...'z> Fn<(T1...Tn), R> + K
285+
<'a...'z> Fn<(T1...Tn), R> + K
286+
<'a...'z> FnShare<(T1...Tn), R> + K
287+
<'a...'z> FnOnce<(T1...Tn), R> + K
288+
289+
Note that the bound lifetimes `'a...'z` are not in scope for the bound
290+
`K`.
291+
292+
# Drawbacks
293+
294+
This model is more complex than the existing model in some respects
295+
(but the existing model does not serve the full set of desired use cases).
296+
297+
# Alternatives
298+
299+
There is one aspect of the design that is still under active
300+
discussion:
301+
302+
**Introduce a more generic sugar.** It was proposed that we could
303+
introduce `Trait(A, B) -> C` as syntactic sugar for `Trait<(A,B),C>`
304+
rather than retaining the form `|A,B| -> C`. This is appealing but
305+
removes the correspondence between the expression form and the
306+
corresponding type. One (somewhat open) question is whether there will
307+
be additional traits that mirror fn types that might benefit from this
308+
more general sugar.
309+
310+
**Tweak trait names.** In conjunction with the above, there is some
311+
concern that the type name `fn(A) -> B` for a bare function with no
312+
environment is too similar to `Fn(A) -> B` for a closure. To remedy
313+
that, we could change the name of the trait to something like
314+
`Closure(A) -> B` (naturally the other traits would be renamed to
315+
match).
316+
317+
Then there are a large number of permutations and options that were
318+
largely rejected:
319+
320+
**Only offer by-value closures.** We tried this and found it
321+
required a lot of painful rewrites of perfectly reasonable code.
322+
323+
**Make by-reference closures the default.** We felt this was
324+
inconsistent with the language as a whole, which tends to make "by
325+
value" the default (e.g., `x` vs `ref x` in patterns, `x` vs `&x` in
326+
expressions, etc.).
327+
328+
**Use a capture clause syntax that borrows individual variables.** "By
329+
value" closures combined with `let` statements already serve this
330+
role. Simply specifying "by-reference closure" also gives us room to
331+
continue improving inference in the future in a backwards compatible
332+
way. Moreover, the syntactic space around closures expressions is
333+
extremely constrained and we were unable to find a satisfactory
334+
syntax, particularly when combined with self-type annotations.
335+
Finally, if we decide we *do* want the ability to have "mostly
336+
by-value" closures, we can easily extend the current syntax by writing
337+
something like `(ref x, ref mut y) || ...` etc.
338+
339+
**Retain the proc expression form.** It was proposed that we could
340+
retain the `proc` expression form to specify a by-value closure and
341+
have `||` expressions be by-reference. Frankly, the main objection to
342+
this is that nobody likes the `proc` keyword.
343+
344+
**Use variadic generics in place of tuple arguments.** While variadic
345+
generics are an interesting addition in their own right, we'd prefer
346+
not to introduce a dependency between closures and variadic
347+
generics. Having all arguments be placed into a tuple is also a
348+
simpler model overall. Moreover, native ABIs on platforms of interest
349+
treat a structure passed by value identically to distinct
350+
arguments. Finally, given that trait calls have the "Rust" ABI, which
351+
is not specified, we can always tweak the rules if necessary (though
352+
their advantages for tooling when the Rust ABI closely matches the
353+
native ABI).
354+
355+
**Use inference to determine the self type of a closure rather than an
356+
annotation.** We retain this option for future expansion, but it is
357+
not clear whether we can always infer the self type of a
358+
closure. Moreover, using inference rather a default raises the
359+
question of what to do for a type like `|int| -> uint`, where
360+
inference is not possible.
361+
362+
**Default to something other than `&mut self`.** It is our belief that
363+
this is the most common use case for closures.
364+
365+
# Transition plan
366+
367+
TBD. pcwalton is working furiously as we speak.
368+
369+
# Unresolved questions
370+
371+
## Closures that are quantified over lifetimes
372+
373+
A separate RFC is needed to describe bound lifetimes in trait
374+
references. For example, today one can write a type like `<'a> |&'a A|
375+
-> &'a B`, which indicates a closure that takes and returns a
376+
reference with the same lifetime specified by the caller at each
377+
call-site. Note that a trait reference like `Fn<(&'a A), &'a B>`,
378+
while syntactically similar, does *not* have the same meaning because
379+
it lacks the universal quantifier `<'a>`. Therefore, in the second
380+
case, `'a` refers to some specific lifetime `'a`, rather than being a
381+
lifetime parameter that is specified at each callsite. The high-level
382+
summary of the change therefore is to permit trait references like
383+
`<'a> Fn<(&'a A), &'a B>`; in this case, the value of `<'a>` will be
384+
specified each time a method or other member of the trait is accessed.
385+
386+
[p]: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/

0 commit comments

Comments
 (0)