|
11 | 11 | /// (without being so rigorous).
|
12 | 12 | ///
|
13 | 13 | /// The core of the algorithm revolves about a "usefulness" check. In particular, we
|
14 |
| -/// are trying to compute a predicate `U(P, p_{m + 1})` where `P` is a list of patterns |
15 |
| -/// of length `m` for a compound (product) type with `n` components (we refer to this as |
16 |
| -/// a matrix). `U(P, p_{m + 1})` represents whether, given an existing list of patterns |
17 |
| -/// `p_1 ..= p_m`, adding a new pattern will be "useful" (that is, cover previously- |
| 14 | +/// are trying to compute a predicate `U(P, p)` where `P` is a list of patterns (we refer to this as |
| 15 | +/// a matrix). `U(P, p)` represents whether, given an existing list of patterns |
| 16 | +/// `P_1 ..= P_m`, adding a new pattern `p` will be "useful" (that is, cover previously- |
18 | 17 | /// uncovered values of the type).
|
19 | 18 | ///
|
20 | 19 | /// If we have this predicate, then we can easily compute both exhaustiveness of an
|
21 | 20 | /// entire set of patterns and the individual usefulness of each one.
|
22 | 21 | /// (a) the set of patterns is exhaustive iff `U(P, _)` is false (i.e., adding a wildcard
|
23 | 22 | /// match doesn't increase the number of values we're matching)
|
24 |
| -/// (b) a pattern `p_i` is not useful if `U(P[0..=(i-1), p_i)` is false (i.e., adding a |
| 23 | +/// (b) a pattern `P_i` is not useful if `U(P[0..=(i-1), P_i)` is false (i.e., adding a |
25 | 24 | /// pattern to those that have come before it doesn't increase the number of values
|
26 | 25 | /// we're matching).
|
27 | 26 | ///
|
| 27 | +/// During the course of the algorithm, the rows of the matrix won't just be individual patterns, |
| 28 | +/// but rather partially-deconstructed patterns in the form of a list of patterns. The paper |
| 29 | +/// calls those pattern-vectors, and we will call them pattern-stacks. The same holds for the |
| 30 | +/// new pattern `p`. |
| 31 | +/// |
28 | 32 | /// For example, say we have the following:
|
29 | 33 | /// ```
|
30 | 34 | /// // x: (Option<bool>, Result<()>)
|
|
34 | 38 | /// (None, Err(_)) => {}
|
35 | 39 | /// }
|
36 | 40 | /// ```
|
37 |
| -/// Here, the matrix `P` is 3 x 2 (rows x columns). |
| 41 | +/// Here, the matrix `P` starts as: |
38 | 42 | /// [
|
39 |
| -/// [Some(true), _], |
40 |
| -/// [None, Err(())], |
41 |
| -/// [None, Err(_)], |
| 43 | +/// [(Some(true), _)], |
| 44 | +/// [(None, Err(()))], |
| 45 | +/// [(None, Err(_))], |
42 | 46 | /// ]
|
43 | 47 | /// We can tell it's not exhaustive, because `U(P, _)` is true (we're not covering
|
44 |
| -/// `[Some(false), _]`, for instance). In addition, row 3 is not useful, because |
| 48 | +/// `[(Some(false), _)]`, for instance). In addition, row 3 is not useful, because |
45 | 49 | /// all the values it covers are already covered by row 2.
|
46 | 50 | ///
|
47 |
| -/// To compute `U`, we must have two other concepts. |
48 |
| -/// 1. `S(c, P)` is a "specialized matrix", where `c` is a constructor (like `Some` or |
49 |
| -/// `None`). You can think of it as filtering `P` to just the rows whose *first* pattern |
50 |
| -/// can cover `c` (and expanding OR-patterns into distinct patterns), and then expanding |
51 |
| -/// the constructor into all of its components. |
52 |
| -/// The specialization of a row vector is computed by `specialize`. |
| 51 | +/// A list of patterns can be thought of as a stack, because we are mainly interested in the top of |
| 52 | +/// the stack at any given point, and we can pop or apply constructors to get new pattern-stacks. |
| 53 | +/// To match the paper, the top of the stack is at the beginning / on the left. |
| 54 | +/// |
| 55 | +/// There are two important operations on pattern-stacks necessary to understand the algorithm: |
| 56 | +/// 1. We can pop a given constructor off the top of a stack. This operation is called |
| 57 | +/// `specialize`, and is denoted `S(c, p)` where `c` is a constructor (like `Some` or |
| 58 | +/// `None`) and `p` a pattern-stack. |
| 59 | +/// If the pattern on top of the stack can cover `c`, this removes the constructor and |
| 60 | +/// pushes its arguments onto the stack. It also expands OR-patterns into distinct patterns. |
| 61 | +/// Otherwise the pattern-stack is discarded. |
| 62 | +/// This essentially filters those pattern-stacks whose top covers the constructor `c` and |
| 63 | +/// discards the others. |
| 64 | +/// |
| 65 | +/// For example, the first pattern above initially gives a stack `[(Some(true), _)]`. If we |
| 66 | +/// pop the tuple constructor, we are left with `[Some(true), _]`, and if we then pop the |
| 67 | +/// `Some` constructor we get `[true, _]`. If we had popped `None` instead, we would get |
| 68 | +/// nothing back. |
53 | 69 | ///
|
54 |
| -/// It is computed as follows. For each row `p_i` of P, we have four cases: |
55 |
| -/// 1.1. `p_(i,1) = c(r_1, .., r_a)`. Then `S(c, P)` has a corresponding row: |
56 |
| -/// r_1, .., r_a, p_(i,2), .., p_(i,n) |
57 |
| -/// 1.2. `p_(i,1) = c'(r_1, .., r_a')` where `c ≠ c'`. Then `S(c, P)` has no |
58 |
| -/// corresponding row. |
59 |
| -/// 1.3. `p_(i,1) = _`. Then `S(c, P)` has a corresponding row: |
60 |
| -/// _, .., _, p_(i,2), .., p_(i,n) |
61 |
| -/// 1.4. `p_(i,1) = r_1 | r_2`. Then `S(c, P)` has corresponding rows inlined from: |
62 |
| -/// S(c, (r_1, p_(i,2), .., p_(i,n))) |
63 |
| -/// S(c, (r_2, p_(i,2), .., p_(i,n))) |
| 70 | +/// This returns zero or more new pattern-stacks, as follows. We look at the pattern `p_1` |
| 71 | +/// on top of the stack, and we have four cases: |
| 72 | +/// 1.1. `p_1 = c(r_1, .., r_a)`, i.e. the top of the stack has constructor `c`. We |
| 73 | +/// push onto the stack the arguments of this constructor, and return the result: |
| 74 | +/// r_1, .., r_a, p_2, .., p_n |
| 75 | +/// 1.2. `p_1 = c'(r_1, .., r_a')` where `c ≠ c'`. We discard the current stack and |
| 76 | +/// return nothing. |
| 77 | +/// 1.3. `p_1 = _`. We push onto the stack as many wildcards as the constructor `c` has |
| 78 | +/// arguments (its arity), and return the resulting stack: |
| 79 | +/// _, .., _, p_2, .., p_n |
| 80 | +/// 1.4. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
| 81 | +/// stack: |
| 82 | +/// S(c, (r_1, p_2, .., p_n)) |
| 83 | +/// S(c, (r_2, p_2, .., p_n)) |
64 | 84 | ///
|
65 |
| -/// 2. `D(P)` is a "default matrix". This is used when we know there are missing |
66 |
| -/// constructor cases, but there might be existing wildcard patterns, so to check the |
67 |
| -/// usefulness of the matrix, we have to check all its *other* components. |
68 |
| -/// The default matrix is computed inline in `is_useful`. |
| 85 | +/// 2. We can pop a wildcard off the top of the stack. This is called `D(p)`, where `p` is |
| 86 | +/// a pattern-stack. |
| 87 | +/// This is used when we know there are missing constructor cases, but there might be |
| 88 | +/// existing wildcard patterns, so to check the usefulness of the matrix, we have to check |
| 89 | +/// all its *other* components. |
69 | 90 | ///
|
70 |
| -/// It is computed as follows. For each row `p_i` of P, we have three cases: |
71 |
| -/// 1.1. `p_(i,1) = c(r_1, .., r_a)`. Then `D(P)` has no corresponding row. |
72 |
| -/// 1.2. `p_(i,1) = _`. Then `D(P)` has a corresponding row: |
73 |
| -/// p_(i,2), .., p_(i,n) |
74 |
| -/// 1.3. `p_(i,1) = r_1 | r_2`. Then `D(P)` has corresponding rows inlined from: |
75 |
| -/// D((r_1, p_(i,2), .., p_(i,n))) |
76 |
| -/// D((r_2, p_(i,2), .., p_(i,n))) |
| 91 | +/// It is computed as follows. We look at the pattern `p_1` on top of the stack, |
| 92 | +/// and we have three cases: |
| 93 | +/// 1.1. `p_1 = c(r_1, .., r_a)`. We discard the current stack and return nothing. |
| 94 | +/// 1.2. `p_1 = _`. We return the rest of the stack: |
| 95 | +/// p_2, .., p_n |
| 96 | +/// 1.3. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
| 97 | +/// stack. |
| 98 | +/// D((r_1, p_2, .., p_n)) |
| 99 | +/// D((r_2, p_2, .., p_n)) |
| 100 | +/// |
| 101 | +/// Note that the OR-patterns are not always used directly in Rust, but are used to derive the |
| 102 | +/// exhaustive integer matching rules, so they're written here for posterity. |
| 103 | +/// |
| 104 | +/// Both those operations extend straightforwardly to a list or pattern-stacks, i.e. a matrix, by |
| 105 | +/// working row-by-row. Popping a constructor ends up keeping only the matrix rows that start with |
| 106 | +/// the given constructor, and popping a wildcard keeps those rows that start with a wildcard. |
77 | 107 | ///
|
78 |
| -/// Note that the OR-patterns are not always used directly in Rust, but are used to derive |
79 |
| -/// the exhaustive integer matching rules, so they're written here for posterity. |
80 | 108 | ///
|
81 | 109 | /// The algorithm for computing `U`
|
82 | 110 | /// -------------------------------
|
83 | 111 | /// The algorithm is inductive (on the number of columns: i.e., components of tuple patterns).
|
84 | 112 | /// That means we're going to check the components from left-to-right, so the algorithm
|
85 |
| -/// operates principally on the first component of the matrix and new pattern `p_{m + 1}`. |
| 113 | +/// operates principally on the first component of the matrix and new pattern-stack `p`. |
86 | 114 | /// This algorithm is realised in the `is_useful` function.
|
87 | 115 | ///
|
88 | 116 | /// Base case. (`n = 0`, i.e., an empty tuple pattern)
|
89 | 117 | /// - If `P` already contains an empty pattern (i.e., if the number of patterns `m > 0`),
|
90 |
| -/// then `U(P, p_{m + 1})` is false. |
91 |
| -/// - Otherwise, `P` must be empty, so `U(P, p_{m + 1})` is true. |
| 118 | +/// then `U(P, p)` is false. |
| 119 | +/// - Otherwise, `P` must be empty, so `U(P, p)` is true. |
92 | 120 | ///
|
93 | 121 | /// Inductive step. (`n > 0`, i.e., whether there's at least one column
|
94 | 122 | /// [which may then be expanded into further columns later])
|
95 |
| -/// We're going to match on the new pattern, `p_{m + 1}`. |
96 |
| -/// - If `p_{m + 1} == c(r_1, .., r_a)`, then we have a constructor pattern. |
97 |
| -/// Thus, the usefulness of `p_{m + 1}` can be reduced to whether it is useful when |
98 |
| -/// we ignore all the patterns in `P` that involve other constructors. This is where |
99 |
| -/// `S(c, P)` comes in: |
100 |
| -/// `U(P, p_{m + 1}) := U(S(c, P), S(c, p_{m + 1}))` |
| 123 | +/// We're going to match on the top of the new pattern-stack, `p_1`. |
| 124 | +/// - If `p_1 == c(r_1, .., r_a)`, i.e. we have a constructor pattern. |
| 125 | +/// Then, the usefulness of `p_1` can be reduced to whether it is useful when |
| 126 | +/// we ignore all the patterns in the first column of `P` that involve other constructors. |
| 127 | +/// This is where `S(c, P)` comes in: |
| 128 | +/// `U(P, p) := U(S(c, P), S(c, p))` |
101 | 129 | /// This special case is handled in `is_useful_specialized`.
|
102 |
| -/// - If `p_{m + 1} == _`, then we have two more cases: |
103 |
| -/// + All the constructors of the first component of the type exist within |
104 |
| -/// all the rows (after having expanded OR-patterns). In this case: |
105 |
| -/// `U(P, p_{m + 1}) := ∨(k ϵ constructors) U(S(k, P), S(k, p_{m + 1}))` |
106 |
| -/// I.e., the pattern `p_{m + 1}` is only useful when all the constructors are |
107 |
| -/// present *if* its later components are useful for the respective constructors |
108 |
| -/// covered by `p_{m + 1}` (usually a single constructor, but all in the case of `_`). |
109 |
| -/// + Some constructors are not present in the existing rows (after having expanded |
110 |
| -/// OR-patterns). However, there might be wildcard patterns (`_`) present. Thus, we |
111 |
| -/// are only really concerned with the other patterns leading with wildcards. This is |
112 |
| -/// where `D` comes in: |
113 |
| -/// `U(P, p_{m + 1}) := U(D(P), p_({m + 1},2), .., p_({m + 1},n))` |
114 |
| -/// - If `p_{m + 1} == r_1 | r_2`, then the usefulness depends on each separately: |
115 |
| -/// `U(P, p_{m + 1}) := U(P, (r_1, p_({m + 1},2), .., p_({m + 1},n))) |
116 |
| -/// || U(P, (r_2, p_({m + 1},2), .., p_({m + 1},n)))` |
| 130 | +/// |
| 131 | +/// For example, if `P` is: |
| 132 | +/// [ |
| 133 | +/// [Some(true), _], |
| 134 | +/// [None, 0], |
| 135 | +/// ] |
| 136 | +/// and `p` is [Some(false), 0], then we don't care about row 2 since we know `p` only |
| 137 | +/// matches values that row 2 doesn't. For row 1 however, we need to dig into the |
| 138 | +/// arguments of `Some` to know whether some new value is covered. So we compute |
| 139 | +/// `U([[true, _]], [false, 0])`. |
| 140 | +/// |
| 141 | +/// - If `p_1 == _`, then we look at the list of constructors that appear in the first |
| 142 | +/// component of the rows of `P`: |
| 143 | +/// + If there are some constructors that aren't present, then we might think that the |
| 144 | +/// wildcard `_` is useful, since it covers those constructors that weren't covered |
| 145 | +/// before. |
| 146 | +/// That's almost correct, but only works if there were no wildcards in those first |
| 147 | +/// components. So we need to check that `p` is useful with respect to the rows that |
| 148 | +/// start with a wildcard, if there are any. This is where `D` comes in: |
| 149 | +/// `U(P, p) := U(D(P), D(p))` |
| 150 | +/// |
| 151 | +/// For example, if `P` is: |
| 152 | +/// [ |
| 153 | +/// [_, true, _], |
| 154 | +/// [None, false, 1], |
| 155 | +/// ] |
| 156 | +/// and `p` is [_, false, _], the `Some` constructor doesn't appear in `P`. So if we |
| 157 | +/// only had row 2, we'd know that `p` is useful. However row 1 starts with a |
| 158 | +/// wildcard, so we need to check whether `U([[true, _]], [false, 1])`. |
| 159 | +/// |
| 160 | +/// + Otherwise, all possible constructors (for the relevant type) are present. In this |
| 161 | +/// case we must check whether the wildcard pattern covers any unmatched value. For |
| 162 | +/// that, we can think of the `_` pattern as a big OR-pattern that covers all |
| 163 | +/// possible constructors. For `Option`, that would mean `_ = None | Some(_)` for |
| 164 | +/// example. The wildcard pattern is useful in this case if it is useful when |
| 165 | +/// specialized to one of the possible constructors. So we compute: |
| 166 | +/// `U(P, p) := ∃(k ϵ constructors) U(S(k, P), S(k, p))` |
| 167 | +/// |
| 168 | +/// For example, if `P` is: |
| 169 | +/// [ |
| 170 | +/// [Some(true), _], |
| 171 | +/// [None, false], |
| 172 | +/// ] |
| 173 | +/// and `p` is [_, false], both `None` and `Some` constructors appear in the first |
| 174 | +/// components of `P`. We will therefore try popping both constructors in turn: we |
| 175 | +/// compute U([[true, _]], [_, false]) for the `Some` constructor, and U([[false]], |
| 176 | +/// [false]) for the `None` constructor. The first case returns true, so we know that |
| 177 | +/// `p` is useful for `P`. Indeed, it matches `[Some(false), _]` that wasn't matched |
| 178 | +/// before. |
| 179 | +/// |
| 180 | +/// - If `p_1 == r_1 | r_2`, then the usefulness depends on each `r_i` separately: |
| 181 | +/// `U(P, p) := U(P, (r_1, p_2, .., p_n)) |
| 182 | +/// || U(P, (r_2, p_2, .., p_n))` |
117 | 183 | ///
|
118 | 184 | /// Modifications to the algorithm
|
119 | 185 | /// ------------------------------
|
120 | 186 | /// The algorithm in the paper doesn't cover some of the special cases that arise in Rust, for
|
121 | 187 | /// example uninhabited types and variable-length slice patterns. These are drawn attention to
|
122 |
| -/// throughout the code below. I'll make a quick note here about how exhaustive integer matching |
123 |
| -/// is accounted for, though. |
| 188 | +/// throughout the code below. I'll make a quick note here about how exhaustive integer matching is |
| 189 | +/// accounted for, though. |
124 | 190 | ///
|
125 | 191 | /// Exhaustive integer matching
|
126 | 192 | /// ---------------------------
|
|
150 | 216 | /// invalid, because we want a disjunction over every *integer* in each range, not just a
|
151 | 217 | /// disjunction over every range. This is a bit more tricky to deal with: essentially we need
|
152 | 218 | /// to form equivalence classes of subranges of the constructor range for which the behaviour
|
153 |
| -/// of the matrix `P` and new pattern `p_{m + 1}` are the same. This is described in more |
| 219 | +/// of the matrix `P` and new pattern `p` are the same. This is described in more |
154 | 220 | /// detail in `split_grouped_constructors`.
|
155 | 221 | /// + If some constructors are missing from the matrix, it turns out we don't need to do
|
156 | 222 | /// anything special (because we know none of the integers are actually wildcards: i.e., we
|
|
0 commit comments