Skip to content

Commit dc08b56

Browse files
committed
---
yaml --- r: 15834 b: refs/heads/try c: fa2ac66 h: refs/heads/master v: v3
1 parent 0f68534 commit dc08b56

File tree

2 files changed

+180
-17
lines changed

2 files changed

+180
-17
lines changed

[refs]

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
refs/heads/master: 61b1875c16de39c166b0f4d54bba19f9c6777d1a
33
refs/heads/snap-stage1: e33de59e47c5076a89eadeb38f4934f58a3618a6
44
refs/heads/snap-stage3: 4a81779abd786ff22d71434c6d9a5917ea4cdfff
5-
refs/heads/try: 558be3a70f20072b415f7436b3130462caefa63a
5+
refs/heads/try: fa2ac6646233384deb6ff694da4e3b674a187cd0
66
refs/tags/release-0.1: 1f5c5126e96c79d22cb7862f75304136e204f105

branches/try/src/rustc/middle/typeck/infer.rs

Lines changed: 179 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,147 @@
1+
/*
2+
3+
# Type inference engine
4+
5+
This is loosely based on standard HM-type inference, but with an
6+
extension to try and accommodate subtyping. There is nothing
7+
principled about this extension; it's sound---I hope!---but it's a
8+
heuristic, ultimately, and does not guarantee that it finds a valid
9+
typing even if one exists (in fact, there are known scenarios where it
10+
fails, some of which may eventually become problematic).
11+
12+
## Key idea
13+
14+
The main change is that each type variable T is associated with a
15+
lower-bound L and an upper-bound U. L and U begin as bottom and top,
16+
respectively, but gradually narrow in response to new constraints
17+
being introduced. When a variable is finally resolved to a concrete
18+
type, it can (theoretically) select any type that is a supertype of L
19+
and a subtype of U.
20+
21+
There are several critical invariants which we maintain:
22+
23+
- the upper-bound of a variable only becomes lower and the lower-bound
24+
only becomes higher over time;
25+
- the lower-bound L is always a subtype of the upper bound U;
26+
- the lower-bound L and upper-bound U never refer to other type variables,
27+
but only to types (though those types may contain type variables).
28+
29+
> An aside: if the terms upper- and lower-bound confuse you, think of
30+
> "supertype" and "subtype". The upper-bound is a "supertype"
31+
> (super=upper in Latin, or something like that anyway) and the lower-bound
32+
> is a "subtype" (sub=lower in Latin). I find it helps to visualize
33+
> a simple class hierarchy, like Java minus interfaces and
34+
> primitive types. The class Object is at the root (top) and other
35+
> types lie in between. The bottom type is then the Null type.
36+
> So the tree looks like:
37+
>
38+
> Object
39+
> / \
40+
> String Other
41+
> \ /
42+
> (null)
43+
>
44+
> So the upper bound type is the "supertype" and the lower bound is the
45+
> "subtype" (also, super and sub mean upper and lower in Latin, or something
46+
> like that anyway).
47+
48+
## Satisfying constraints
49+
50+
At a primitive level, there is only one form of constraint that the
51+
inference understands: a subtype relation. So the outside world can
52+
say "make type A a subtype of type B". If there are variables
53+
involved, the inferencer will adjust their upper- and lower-bounds as
54+
needed to ensure that this relation is satisfied. (We also allow "make
55+
type A equal to type B", but this is translated into "A <: B" and "B
56+
<: A")
57+
58+
As stated above, we always maintain the invariant that type bounds
59+
never refer to other variables. This keeps the inference relatively
60+
simple, avoiding the scenario of having a kind of graph where we have
61+
to pump constraints along and reach a fixed point, but it does impose
62+
some heuristics in the case where the user is relating two type
63+
variables A <: B.
64+
65+
The key point when relating type variables is that we do not know whta
66+
type the variable represents, but we must make some change that will
67+
ensure that, whatever types A and B are resolved to, they are resolved
68+
to types which have a subtype relation.
69+
70+
There are basically two options here:
71+
72+
- we can merge A and B. Basically we make them the same variable.
73+
The lower bound of this new variable is LUB(LB(A), LB(B)) and
74+
the upper bound is GLB(UB(A), UB(B)).
75+
76+
- we can adjust the bounds of A and B. Because we do not allow
77+
type variables to appear in each other's bounds, this only works if A
78+
and B have appropriate bounds. But if we can ensure that UB(A) <: LB(B),
79+
then we know that whatever happens A and B will be resolved to types with
80+
the appropriate relation.
81+
82+
Our current technique is to *try* (transactionally) to relate the
83+
existing bounds of A and B, if there are any (i.e., if `UB(A) != top
84+
&& LB(B) != bot`). If that succeeds, we're done. If it fails, then
85+
we merge A and B into same variable.
86+
87+
This is not clearly the correct course. For example, if `UB(A) !=
88+
top` but `LB(B) == bot`, we could conceivably set `LB(B)` to `UB(A)`
89+
and leave the variables unmerged. This is sometimes the better
90+
course, it depends on the program.
91+
92+
The main case which fails today that I would like to support is:
93+
94+
fn foo<T>(x: T, y: T) { ... }
95+
96+
fn bar() {
97+
let x: @mut int = @mut 3;
98+
let y: @int = @3;
99+
foo(x, y);
100+
}
101+
102+
In principle, the inferencer ought to find that the parameter `T` to
103+
`foo(x, y)` is `@const int`. Today, however, it does not; this is
104+
because the type variable `T` is merged with the type variable for
105+
`X`, and thus inherits its UB/LB of `@mut int`. This leaves no
106+
flexibility for `T` to later adjust to accommodate `@int`.
107+
108+
## Transactional support
109+
110+
Whenever we adjust merge variables or adjust their bounds, we always
111+
keep a record of the old value. This allows the changes to be undone.
112+
113+
## Regions
114+
115+
I've only talked about type variables here, but region variables
116+
follow the same principle. They have upper- and lower-bounds. A
117+
region A is a subregion of a region B if A being valid implies that B
118+
is valid. This basically corresponds to the block nesting structure:
119+
the regions for outer block scopes are superregions of those for inner
120+
block scopes.
121+
122+
## GLB/LUB
123+
124+
Computing the greatest-lower-bound and least-upper-bound of two
125+
types/regions is generally straightforward except when type variables
126+
are involved. In that case, we follow a similar "try to use the bounds
127+
when possible but otherwise merge the variables" strategy. In other
128+
words, `GLB(A, B)` where `A` and `B` are variables will often result
129+
in `A` and `B` being merged and the result being `A`.
130+
131+
## Type assignment
132+
133+
We have a notion of assignability which differs somewhat from
134+
subtyping; in particular it may cause region borrowing to occur. See
135+
the big comment later in this file on Type Assignment for specifics.
136+
137+
# Implementation details
138+
139+
We make use of a trait-like impementation strategy to consolidate
140+
duplicated code between subtypes, GLB, and LUB computations. See the
141+
section on "Type Combining" below for details.
142+
143+
*/
144+
1145
import std::smallintmap;
2146
import std::smallintmap::smallintmap;
3147
import std::smallintmap::map;
@@ -914,27 +1058,46 @@ impl assignment for infer_ctxt {
9141058
// ______________________________________________________________________
9151059
// Type combining
9161060
//
917-
// There are three type combiners, sub, lub, and gub. `sub` is a bit
918-
// different from the other two: it takes two types and makes the first
919-
// a subtype of the other, or fails if this cannot be done. It does
920-
// return a new type but its return value is meaningless---it is only
921-
// there to allow for greater code reuse.
922-
//
923-
// `lub` and `glb` compute the Least Upper Bound and Greatest Lower
924-
// Bound respectively of two types `a` and `b`. The LUB is a mutual
925-
// supertype type `c` where `a <: c` and `b <: c`. As the name
926-
// implies, it tries to pick the most precise `c` possible. The GLB
927-
// is a mutual subtype, aiming for the most general such type
928-
// possible. Both computations may fail.
1061+
// There are three type combiners, sub, lub, and gub. Each implements
1062+
// the interface `combine` contains methods for combining two
1063+
// instances of various things and yielding a new instance. These
1064+
// combiner methods always yield a `result<T>`---failure is propagated
1065+
// upward using `chain()` methods.
9291066
//
9301067
// There is a lot of common code for these operations, which is
9311068
// abstracted out into functions named `super_X()` which take a combiner
9321069
// instance as the first parameter. This would be better implemented
933-
// using traits.
1070+
// using traits. For this system to work properly, you should not
1071+
// call the `super_X(foo, ...)` functions directly, but rather call
1072+
// `foo.X(...)`. The implemtation of `X()` can then choose to delegate
1073+
// to the `super` routine or to do other things.
1074+
//
1075+
// In reality, the sub operation is rather different from lub/glb, but
1076+
// they are combined into one interface to avoid duplication (they
1077+
// used to be separate but there were many bugs because there were two
1078+
// copies of most routines.
1079+
//
1080+
// The differences are:
1081+
//
1082+
// - when making two things have a sub relationship, the order of the
1083+
// arguments is significant (a <: b) and the return value of the
1084+
// combine functions is largely irrelevant. The important thing is
1085+
// whether the action succeeds or fails. If it succeeds, then side
1086+
// effects have been committed into the type variables.
1087+
//
1088+
// - for GLB/LUB, the order of arguments is not significant (GLB(a,b) ==
1089+
// GLB(b,a)) and the return value is important (it is the GLB). Of
1090+
// course GLB/LUB may also have side effects.
1091+
//
1092+
// Contravariance
9341093
//
935-
// The `super_X()` top-level items work for *sub, lub, and glb*: any
936-
// operation which varies will be dynamically dispatched using a
937-
// `self.Y()` operation.
1094+
// When you are relating two things which have a contravariant
1095+
// relationship, you should use `contratys()` or `contraregions()`,
1096+
// rather than inversing the order of arguments! This is necessary
1097+
// because the order of arguments is not relevant for LUB and GLB. It
1098+
// is also useful to track which value is the "expected" value in
1099+
// terms of error reporting, although we do not do that properly right
1100+
// now.
9381101

9391102
type cres<T> = result<T,ty::type_err>;
9401103

0 commit comments

Comments
 (0)