You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lib/elixir/pages/references/gradual-set-theoretic-types.md
+58-19Lines changed: 58 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -8,31 +8,72 @@ Elixir is in the process of incorporating set-theoretic types into the compiler.
8
8
9
9
***developer friendly** - the types are described, implemented, and composed using basic set operations: unions, intersections, and negation (hence it is a set-theoretic type system)
10
10
11
-
The current milestone aims to infer types from patterns and guards and use them to type check programs, enabling the Elixir compiler to find faults and bugs in codebases without requiring changes to existing software. User provided type signatures are planned for future releases. The underlying principles, theory, and roadmap of our work have been outlined in ["The Design Principles of the Elixir Type System" by Giuseppe Castagna, Guillaume Duboc, José Valim](https://arxiv.org/abs/2306.06391).
11
+
The current milestone aims to infer types from existing programs and use them for type checking, enabling the Elixir compiler to find faults and bugs in codebases without requiring changes to existing software. User provided type signatures are planned for future releases. The underlying principles, theory, and roadmap of our work have been outlined in ["The Design Principles of the Elixir Type System" by Giuseppe Castagna, Guillaume Duboc, José Valim](https://arxiv.org/abs/2306.06391).
12
12
13
-
## Supported types
13
+
## A gentle introduction
14
14
15
-
At the moment, Elixir developers interact with set-theoretic types through warnings found by the type system. These warnings will represent types using the following notation:
15
+
Types in Elixir are written using the type named followed by parentheses, such as `integer()` or `list(integer())`. The basic types in the language are: `atom()`, `binary()`, `integer()`, `float()`, `function()`, `list()` (and `improper_list()`), `map()`, `pid()`, `port()`, `reference()`, and `tuple()`.
16
16
17
-
*`binary()`, `integer()`, `float()`, `pid()`, `port()`, `reference()` - these types are indivisible. This means both `1` and `13` get the same `integer()` type.
17
+
Many of the types above can also be written more precisely. We will discuss their syntax in detail later, but here are some examples:
18
18
19
-
*`atom()`- it represents all atoms and it is divisible. For instance, the atom `:foo` and `:hello_world` are also valid (distinct) types.
19
+
*While `atom()` represents all atoms, the atom `:ok` can also be represented in the type system as `:ok`
20
20
21
-
*`tuple()`- it represents all tuples. Tuples may also be written using the curly brackets syntax, such as `{:ok, binary()}`. A `...` at the end of the tuple means the overall size of the tuple is unknown. For example, the following tuple has at least two elements: `{:ok, binary(), ...}`.
21
+
*While `tuple()` represents all tuples, you can specify the type of a two-element tuple where the first element is the atom `:ok` and the second is an integer as `{:ok, integer()}`
22
22
23
-
*`list(type)`- it represents a list of `type`. More precisely, it can be written as `empty_list() or non_empty_list(type, empty_list())`. Improper lists, which are lists which do not end with an empty list, such as `[1, 2 | 3]`, can be written as `list(integer(), integer())`.
23
+
*While `function()` represents all functions, you can specify a function that receives an integer and returns a boolean as `(integer() -> boolean())`
24
24
25
-
*`map()`and structs - maps can be "closed" or "open". Closed maps only allow the specified keys, such as `%{key: atom(), value: integer()}`. Open maps support any other keys in addition to the ones listed and their definition starts with `...`, such as `%{..., key: atom(), value: integer()}`. Structs are closed maps with the `__struct__` key.
25
+
There are also three special types: `none()`(represents an empty set), `term()` (represents all types), `dynamic()` (represents a range of the given types).
26
26
27
-
*`function()` - it represents anonymous functions (which may be closures)
27
+
Given the types are set-theoretic, we can compose them using unions (`or`), intersections (`and`), and negations (`not`). For example, to say a function returns either atoms or integers, one could write: `atom() or integer()`.
28
28
29
-
## Set operations
29
+
Intersections will find the elements in common between the operands. For example, `atom() and integer()`, which in this case it becomes the empty set `none()`. You can combine intersections and negations to perform difference, for example, to say that a function expects all atoms, except `nil` (which is an atom), you could write: `atom() and not nil`.
30
30
31
-
We compose types by using set operations. For example, to say a function returns either atoms or integers, one could write: `atom() or integer()`.
31
+
## The syntax of data types
32
32
33
-
Intersections are available via the `and` operator, such as `atom() and integer()`, which in this case it becomes the empty set `none()`. `term()` is the union of all types, also known as the "top" type.
33
+
In this section we will cover the syntax of all data types.
34
34
35
-
Intersections are useful when modelling functions. For example, imagine the following function:
35
+
### Indivisible types
36
+
37
+
These types are indivisibile and have no further representation. They are: `binary()`, `integer()`, `float()`, `pid()`, `port()`, `reference()`. For example, the numbers `1` and `42` are both represented by the type `integer()`.
38
+
39
+
### Atoms
40
+
41
+
You can represent all atoms as `atom()`. You can also represent each individual atom using their literal syntax. For instance, the atom `:foo` and `:hello_world` are also valid (distinct) types.
42
+
43
+
`nil`, `true`, and `false` are also atoms and can be written as is. `boolean()` is a convenience type alias that represents `true or false`.
44
+
45
+
### Tuples
46
+
47
+
You can represent all tuples as `tuple()`. Tuples may also be written using the curly brackets syntax, such as `{:ok, binary()}`.
48
+
49
+
You may use `...` at the end of the tuple to imply the overall size of the tuple is unknown. For example, the following tuple has at least two elements: `{:ok, binary(), ...}`.
50
+
51
+
### Lists
52
+
53
+
You can represent all _proper_ lists as `list()`, which also includes the empty list.
54
+
55
+
You can also specify the type of the list element as argument. For example, `list(integer())` represents the values `[]` and `[1, 2, 3]`, but not `[1, "two", 3]`.
56
+
57
+
Internally, Elixir represents the type `list(a)` as the union two distinct types, `empty_list()` and `not_empty_list(a)`. In other words, `list(integer())` is equivalent to `empty_list() or non_empty_list(integer())`.
58
+
59
+
Elixir also supports improper lists, where the last element is not an empty list, via `non_empty_list(elem_type, tail_type)`. For example, the value `[1, 2 | 3]` would have the type `non_empty_list(integer(), integer())`.
60
+
61
+
While most developers will simply use `list(a)`, the type system can express all different representations of lists in Elixir. At the end of the day, `list()` and `improper_list()` are translations to the following constructs:
62
+
63
+
list() == empty_list() or non_empty_list(term())
64
+
improper_list() == non_empty_list(term(), term() and not empty_list())
65
+
66
+
### Maps
67
+
68
+
You can represent all maps as `map()`. Maps may also be written using their literal syntax, such as `%{name: binary(), age: integer()}`, which outlines a map with exactly two keys, `:name` and `:age`, and values of type `binary()` and `integer()` respectively.
69
+
70
+
We say the map above is a "closed" map: it only supports the two keys explicitly defined. We can also mark a map as "open", by including `...` as its last element. For example, the type `%{name: binary(), age: integer(), ...}` means the keys `:name` and `:age` must exist, with their respective types, but any other key may also be present. Structs are closed maps with the `__struct__` key pointing to the struct name.
71
+
72
+
### Functions
73
+
74
+
You can represent all functions as `function()`. However, in practice, most functions are represented as arrows. For example, a function that receives an integer and return boolean would be written as `(integer() -> boolean())`. A function that receives two integers and return a string (i.e. a binary) would be written as `(integer(), integer() -> binary())`.
75
+
76
+
When representing functions with multiple clauses, which may take different input types, we use intersections. For example, imagine the following function:
36
77
37
78
```elixir
38
79
defnegate(x) whenis_integer(x), do:-x
@@ -53,8 +94,6 @@ At this point, you may ask, why not a union? As a real-world example, take a t-s
53
94
54
95
Since the t-shirt has both colors, we say it belongs to the intersection of both sets. The same way that a function that goes from `(integer() -> integer())` and `(boolean() -> boolean())` is also an intersection. In practice, it does not make sense to define the union of two functions in Elixir, so the compiler will always point to the right direction.
55
96
56
-
Finally, we can also negate types by using `not`. For example, to express all atoms, except the atoms `:foo` and `:bar`, one can write: `atom() and not (:foo or :bar)`.
57
-
58
97
## The `dynamic()` type
59
98
60
99
Existing Elixir programs do not have type declarations, but we still want to be able to type check them. This is done with the introduction of the `dynamic()` type.
@@ -90,15 +129,15 @@ Inferring type signatures comes with a series of trade-offs:
90
129
91
130
* Cascading errors - when a user accidentally makes type errors or the code has conflicting assumptions, type inference may lead to less clear error messages as the type system tries to reconcile diverging type assumptions across code paths.
92
131
93
-
On the other hand, type inference offers the benefit of enabling type checking for functions and codebases without requiring the user to add type annotations. To balance these trade-offs, Elixir has a two-steps system, where we first perform module-local inference on functions without a type signature, and then we type check all modules.
132
+
On the other hand, type inference offers the benefit of enabling type checking for functions and codebases without requiring the user to add type annotations. To balance these trade-offs, Elixir has a two-steps system, where we first perform module-local inference on functions without type signatures, and then we type check all modules. Module-local inference means the types of the arguments, return values, and all variables are computed considering all of the function calls to the same module and to Elixir's standard library. Any call to a function in another module is conservatively assumed to return `dynamic()` during inference.
94
133
95
-
Module-local inference means the types of the arguments, return values, and all variables are computed considering all of the function calls to the same module and to Elixir's standard library. Any call to a function in another module is conservatively assumed to return `dynamic()` during inference. Our goal is to provide an efficient type reconstruction algorithm that can detect definite bugs in dynamic codebases, where all combinations of a type *will* fail, even in the absence of explicit type annotations.
134
+
Type inference in Elixir is best-effort: it doesn't guarantee it will find all possible type incompatibilities, only that it may find bugs where all combinations of a type *will* fail, even in the absence of explicit type annotations. It is meant to be an efficient routine that brings developers some benefits of static typing without requiring any effort from them.
96
135
97
-
Once Elixir introduces typed function signatures (see "Roadmap"), any function with an explicit type signature will be checked against the user-provided type, as in other statically typed languages, without performing type inference. In summary, type checking will rely on type signatures and only fallback to inferred types when no signature is available.
136
+
In the long term, Elixir developers who want typing guarantees must explicitly add type signatures to their functions (see "Roadmap"). Any function with an explicit type signature will be typed checked against the user-provided annotations, as in other statically typed languages, without performing type inference. In summary, type checking will rely on type signatures and only fallback to inferred types when no signature is available.
98
137
99
138
## Roadmap
100
139
101
-
The current milestone is to implement type inference of patterns and guards, as well as type checking of all language constructs, without changes to the Elixir language. At this stage, we want to collect feedback on the quality of error messages and performance, and therefore the type system has no user facing API. Full type inference of patterns was released in Elixir v1.18, and inference of guards is expected as part of Elixir v1.19.
140
+
The current milestone is to implement type inference of existing codebases, as well as type checking of all language constructs, without changes to the Elixir language. At this stage, we want to collect feedback on the quality of error messages and performance, and therefore the type system has no user facing API. Full type inference of patterns was released in Elixir v1.18, and complete inference is expected as part of Elixir v1.19.
102
141
103
142
If the results are satisfactory, the next milestone will include a mechanism for defining typed structs. Elixir programs frequently pattern match on structs, which reveals information about the struct fields, but it knows nothing about their respective types. By propagating types from structs and their fields throughout the program, we will increase the type system’s ability to find errors while further straining our type system implementation. Proposals including the required changes to the language surface will be sent to the community once we reach this stage.
0 commit comments