Skip to content

Commit d8ca9d1

Browse files
committed
Revamp with three dimensions
1 parent 455c134 commit d8ca9d1

File tree

1 file changed

+92
-93
lines changed

1 file changed

+92
-93
lines changed

_posts/2017-03-02-lang-ergonomics.md

Lines changed: 92 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -72,54 +72,45 @@ but relevant or surprising information is kept front and center.
7272

7373
### How to analyze and manage the reasoning footprint
7474

75-
There are basically two dimensions of the reasoning footprint for implicitness:
75+
There are three dimensions of the reasoning footprint for implicitness:
7676

77-
- **Where can you elide?** In other words, how do you know when implicitness may be in play?
78-
- **How are the gaps filled in?** In other words, how do you know what is being implied?
77+
- **Applicability**. Where are you allowed to elide implied information? Is
78+
there any heads-up that this might be happening?
7979

80-
A well-designed feature will tailor both of these dimensions to match (or inform)
81-
the programmer's mental model, and to make the data you have to keep in your
82-
head manageable. Again, that doesn't necessarily mean *minimizing* the
83-
dimensions. Often what's most important is *clarity*, so that you know what to
84-
expect, and how to easily find the information that can influence something
85-
implicit.
80+
- **Power**. What influence does the elided information have? Can it radically
81+
change program behavior or its types?
8682

87-
Perhaps the most important insight in breaking apart these two dimensions is
88-
that you can play one against the other:
83+
- **Context-dependence**. How much of do you have to know about the rest of the
84+
code to know what is being implied, i.e. how elided details will be filled in?
85+
Is there always a clear place to look?
8986

90-
- **Guideline: Trade power against precision**. If the information being elided
91-
is very well-known, or trivial, then it's fine to allow it to be inferred in a
92-
wide range of contexts. On the other hand, if the elided information is
93-
complex or embodies powerful behavior, you should be precise about where
94-
elision is allowed: only in narrow, clear-cut, or explicitly marked contexts.
87+
**The basic thesis of this post is that implicit features should balance these
88+
three dimensions**. If a feature is large in one of the dimensions, it's best to
89+
strongly limit it in the other two.
9590

9691
The [`?` operator](https://blog.rust-lang.org/2016/11/10/Rust-1.13.html) in Rust
97-
is a good example of this trade. It explicitly (but concisely) marks a point
98-
where you will bail out of the current context on an error, possibly doing an
99-
implicit conversion on the way. It's powerful, but marked, which is one way that
100-
error handling in Rust feels as ergonomic as working with exceptions while
101-
avoiding some of their well-known downsides.
102-
103-
When it comes to filling in the gaps left implicit, there are two typical
104-
options: context or convention.
105-
106-
- **Context**. The compiler infers something missing from what it already
107-
knows. For example, inferring the type of a variable from the expression it's
108-
bound to.
109-
110-
- **Convention**. The compiler assumes a default unless told otherwise. For
111-
example, the fact that `mod foo;` looks for `foo.rs` (or `foo/mod.rs`) by default.
112-
113-
Both options come with techniques to manage the reasoning footprint:
114-
115-
- **Guideline: Limit context**. When inferring from context, keep the context limited and
116-
well-known, so that it's (1) easy to find the needed contextual information
117-
and (2) more likely that it will already be in your mental cache.
118-
119-
- **Guideline: Make defaults boring**. When providing defaults, strive to make them simple,
120-
intuitive and nearly universal. That will cement them as part of the "usual
121-
way of things" and hence something you hardly have to think about in the vast
122-
majority of cases, once you've internalized the rules.
92+
is a good example of this kind of tradeoff. It explicitly (but concisely) marks
93+
a point where you will bail out of the current context on an error, possibly
94+
doing an implicit conversion on the way. The fact that it's marked means the
95+
feature has strongly limited applicability: you'll never be surprised that it's
96+
coming into play. On the other hand, it's fairly powerful, and somewhat
97+
context-dependent, since the conversion can depend on the type where `?` is
98+
used, and the type expected in the scope it's jumping to. Altogether, this
99+
careful balance makes error handling in Rust feels as ergonomic as working with
100+
exceptions while avoiding some of their well-known downsides.
101+
102+
By contrast, a feature like unrestricted implicit conversion rightfully has a
103+
bad reputation, because it's universally applicable, quite powerful, *and*
104+
context-dependent. If we were to expand implicit conversions in Rust, we would
105+
likely limit their power (by, say, restricting them to `AsRef`-style coercions,
106+
which can do very little).
107+
108+
One route for strongly limiting context-dependence is employing *conventions*,
109+
in which the compiler is simply assuming a default unless told otherwise. Often
110+
such conventions are universal and well-known, meaning that you don't need to
111+
know anything about the rest of the code to know what they are. A good example
112+
of this technique in Rust is the the fact that `mod foo;` looks for `foo.rs` (or
113+
`foo/mod.rs`) by default.
123114

124115
One final point. "Implicitness" is often relative to where the language is
125116
today, something that seems radical at first—like type inference!—but then
@@ -137,17 +128,16 @@ inference*. In the days of yore, you'd have to annotate every local variable
137128
with its type, a practice that seems wildly verbose now—but at the time, type
138129
inference seemed wildly implicit.
139130

140-
Rust's approach to type annotations follows the guidelines I laid out above. In
141-
particular:
131+
Type inference in Rust is quite powerful, but we limit the other two dimensions:
142132

143-
- Trade power/precision: type inference happens only for variable bindings; data
144-
types and functions must include complete, explicit signatures. This choice
145-
gives you the bulk of the ergonomic benefits, allowing for very powerful
146-
inference, but ensuring that the scope of the inference is kept local.
133+
- Applicability: type inference happens only for variable bindings; data
134+
types and functions must include complete, explicit signatures.
147135

148-
- Limit context: similarly, because data types and functions are annotated, it's
149-
easy to determine the information that's influencing the outcome of
150-
inference.
136+
- Context-dependence: because data types and functions are annotated, it's easy
137+
to determine the information that's influencing the outcome of inference. You
138+
only need to look *shallowly* at code outside of the current function. Another
139+
way of saying this is that type inference is performed modularly, one function
140+
body at a time.
151141

152142
By and large, the amount of type inference we do in Rust seems to be a good
153143
match for what you can hold in your head.
@@ -157,20 +147,23 @@ ergonomics: [lifetime elision]. That feature allows you to leave off lifetimes
157147
from function signatures in the vast majority of cases (check out the RFC—we
158148
measured!). **Lifetime elision greatly aids learnability, because it allows you
159149
to work at an intuitive level with borrowing before you grapple with explicit
160-
lifetimes.** Again, the approach follows the rules above:
150+
lifetimes.**
151+
152+
- Applicability: lifetime elision applies to a broad class of locations—any
153+
function signature—but is limited to those cases for which the lifetimes are
154+
*strongly* implied.
161155

162-
- Trade power/precision: currently, elision works solely in function signatures,
163-
and only when there is an "obvious" choice of lifetimes. It's also *usually*
164-
obvious when elision is coming into play. However, we overshot in one respect:
165-
the fact that elision applies to types other than `&` and `&mut`, which means
166-
that to even know whether reborrrowing is happening in a signature like `fn
167-
lookup(&self) -> Ref<T>`, you need to know that `Ref` has a lifetime parameter
168-
that's being left out. We've been considering pushing in the direction of a
169-
small but explicit marker to say that a lifetime is being elided for `Ref`, a
170-
strategy similar to the one for `?` mentioned earlier.
156+
- Power: limited; elision is just a shorthand for a use of lifetime parameters,
157+
and if you get this wrong, the compiler will complain.
171158

172-
- Make defaults boring: the elision rules are very simple, and in most cases in
173-
which they apply, there is really no choice about how to set up the lifetimes.
159+
- Context-dependence: here, we overshot. The fact that elision applies to types
160+
other than `&` and `&mut`, means that to even know whether reborrrowing is
161+
happening in a signature like `fn lookup(&self) -> Ref<T>`, you need to know
162+
whether `Ref` has a lifetime parameter that's being left out. For something as
163+
common as function signatures, this is too much context. We've been considering
164+
pushing in the direction of a small but explicit marker to say that a lifetime
165+
is being elided for `Ref`, a strategy similar to the one for `?` mentioned
166+
earlier.
174167

175168
There's also been some extensions to the original elision proposal, again
176169
carefully crafted to follow these rules, like the [lifetimes in statics] RFC.
@@ -220,16 +213,18 @@ you're applying to a type variable like `K`. So in particular, if you're trying
220213
to invoke `use_map`, you need to know that there are some unstated constraints
221214
on `K`.
222215

223-
* Trade power/precision: this is a feature
224-
that allows for quite widespread elision. But in practice, what's being elided
225-
tends to be well-known, and even when it's not, it tends not to cause too much
226-
trouble. After all, when *using* a function like `use_map`, you're generally
227-
going to be passing in an existing `HashMap`, which by construction will ensure
228-
that the bounds already hold.
216+
- Applicability: very broad; applies to any use of generics.
217+
218+
- Power: very limited; the bounds will almost always be needed anyway, and in
219+
any case adding bounds is not very risky.
229220

230-
* Limit context: it's straightforward to discover how the bounds are being
231-
imposed by examining the type definitions, and the compiler can reliably produce
232-
an error pointing directly to the type(s) imposing unfulfilled bounds.
221+
- Context-dependence: fairly limited; it draws from the bounds on all type
222+
constructors that are applied to type variables (like `HashMap<K,
223+
V>`). Usually you will be well aware of these bounds anyway, and when *using*
224+
a function like `use_map`, you're generally going to be passing in an existing
225+
`HashMap`, which by construction will ensure that the bounds already hold.
226+
The compiler can reliably also produce an error pointing directly to the
227+
type(s) imposing unfulfilled bounds.
233228

234229
## Example: ownership
235230

@@ -240,26 +235,25 @@ look at the places where borrowing is explicit, and places where it's not:
240235
- Borrowing is implicit for the receiver when invoking a method.
241236
- Borrowing is explicit for normal function arguments and in other expressions.
242237

243-
This design was arrived at after many iterations with alternatives, and it's
244-
turned out quite nicely. It's another example of the power/precision
245-
tradeoff. We provide powerful borrowing inference but only for a narrowly
246-
limited location.
238+
Ownership is important in Rust, and reasoning locally about it is vital. So why
239+
did we end up with this particular mix of implicit and explicit ownership
240+
tracking?
247241

248-
Ownership is important in Rust, and reasoning locally about it is
249-
vital. However:
242+
- Applicability: common, but narrowly-described: it applies only to the receiver
243+
of method calls.
250244

251-
- The vast majority of methods borrow `self`, rather than taking it by value.
252-
- Usually the name of a method makes obvious whether the `self` borrow will be
253-
mutable or not (e.g. `push` versus `len`).
254-
- The borrow checker will prevent any incorrect borrowing, though if borrowing
255-
were too implicit, it could make debugging borrow check errors harder.
245+
- Power: moderately powerful, since it can determine whether the receiver can be
246+
mutated (by mutably borrowing it). That's mitigated to some degree by borrow
247+
checking, which will at least ensure that it's *permitted* to do such a borrow.
256248

257-
Neither of the first two points apply as strongly to method/function
258-
arguments. So we ended up with a system that is neither fully explicit nor fully
259-
implicit, but rather one that balances good ergonomics with a compact reasoning
260-
footprint. **This design also aids learnability, by often just doing "the
261-
obvious thing" for borrowing, and thereby limiting the situations in which
262-
newcomers have to grapple with choices about it**.
249+
- Context-dependence: in principle, you need to know how the method is resolved,
250+
and then its signature. In practice, the style of `self` borrowing is almost
251+
always implied by the method name (e.g. `push` versus `len`). Notably, this
252+
point does *not* apply to function arguments.
253+
254+
**This design also aids learnability, by often just doing "the obvious thing"
255+
for borrowing, and thereby limiting the situations in which newcomers have to
256+
grapple with choices about it**.
263257

264258
### Ideas: implied borrows
265259

@@ -288,8 +282,9 @@ buffer is destroyed at the end of the call to `read_config`). But it allows you
288282
to gloss over the unimportant detail that the callee happened to only need a
289283
borrow. And again, if you just forgot to borrow, and try to use `path`
290284
afterward, the compiler will catch it, just as it does today. This is an example
291-
of a not terribly powerful bit of inference that we'd allow to occur virtually
292-
everywhere (power/precision tradeoff).
285+
of a not terribly powerful bit of inference (it's only introducing a shared
286+
borrow for an object about to be dropped) that we'd allow to occur virtually
287+
everywhere.
293288

294289
**Borrowing in match patterns**. One stumbling block when leaning Rust is the
295290
interaction between pattern matching and borrowing. In particular, when you're
@@ -321,6 +316,10 @@ doing*. Thus, we could consider inferring these markers from context:
321316
do. And in any case, it's still quite *local* context. As usual, if you get
322317
this wrong, the borrow checker will catch it.
323318

319+
In addition to that story for context-dependence, the feature would be only
320+
narrowly applicable (only to `match`) and only moderately powerful (since,
321+
again, the borrower checker will catch mistakes).
322+
324323
Both of these changes would expand the reasoning footprint slightly, but in a
325324
very controlled way. They remove the need to write down annotations which are
326325
essentially already forced by nearby code. And that in turn lowers the learning
@@ -357,7 +356,7 @@ the `Cargo.toml`, which becomes the sole source of truth for this
357356
concern. That's a pretty limited context: it's single place to look, and in many
358357
cases you already need some level of awareness of its contents, to know *which
359358
version* of the crate is being assumed. Inferring `extern crate` also fares well
360-
on the power/precision front: only root modules are affected, so it's easy to
359+
on the applicability front: only root modules are affected, so it's easy to
361360
know precisely when you need to consult `Cargo.toml`.
362361

363362
Thinking along similar, but more radical lines, an argument could be made about
@@ -366,7 +365,7 @@ some_module` to tell Rust to pull in a file at a canonical location with the
366365
same name, we're being forced to duplicate information that was already readily
367366
available. You could instead imagine the filesystem hierarchy directly informing
368367
the module system hierarchy. The concerns about limited context and
369-
power/precision work out pretty much the same way as with `Cargo.toml`, and the
368+
applicability work out pretty much the same way as with `Cargo.toml`, and the
370369
learnability and ergonomic gains are significant.
371370

372371
Now, both of these proposals assume your code follows the *typical* patterns,

0 commit comments

Comments
 (0)