Skip to content

Commit 39c0473

Browse files
committed
progress
1 parent f2a37fc commit 39c0473

File tree

2 files changed

+324
-0
lines changed

2 files changed

+324
-0
lines changed

conversions.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,77 @@
11
% Type Conversions
22

3+
At the end of the day, everything is just a pile of bits somewhere, and type systems
4+
are just there to help us use those bits right. Needing to reinterpret those piles
5+
of bits as different types is a common problem and Rust consequently gives you
6+
several ways to do that.
7+
38
# Safe Rust
49

10+
First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The
11+
most trivial way to do this is to just destructure a value into its constituent
12+
parts and then build a new type out of them. e.g.
13+
14+
```rust
15+
struct Foo {
16+
x: u32,
17+
y: u16,
18+
}
19+
20+
struct Bar {
21+
a: u32,
22+
b: u16,
23+
}
24+
25+
fn reinterpret(foo: Foo) -> Bar {
26+
let Foo { x, y } = foo;
27+
Bar { a: x, b: y }
28+
}
29+
```
30+
31+
But this is, at best, annoying to do. For common conversions, rust provides
32+
more ergonomic alternatives.
33+
34+
## Auto-Deref
35+
36+
Deref is a trait that allows you to overload the unary `*` to specify a type
37+
you dereference to. This is largely only intended to be implemented by pointer
38+
types like `&`, `Box`, and `Rc`. The dot operator will automatically perform
39+
automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `&&Foo`,
40+
`&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match,
41+
so implementing methods on pointers is generally to be avoided, as it will shadow
42+
"actual" methods.
43+
44+
## Coercions
45+
46+
Types can implicitly be coerced to change in certain contexts. These changes are generally
47+
just *weakening* of types, largely focused around pointers. They mostly exist to make
48+
Rust "just work" in more cases. For instance
49+
`&mut T` coerces to `&T`, and `&T` coerces to `*const T`. The most useful coercion you will
50+
actually think about it is probably the general *Deref Coercion*: `&T` coerces to `&U` when
51+
`T: Deref<U>`. This enables us to pass an `&String` where an `&str` is expected, for instance.
52+
53+
## Casts
54+
55+
Casts are a superset of coercions: every coercion can be explicitly invoked via a cast,
56+
but some changes require a cast. These "true casts" are generally regarded as dangerous or
57+
problematic actions. The set of true casts is actually quite small, and once again revolves
58+
largely around pointers. However it also introduces the primary mechanism to convert between
59+
numeric types.
60+
61+
* rawptr -> rawptr (e.g. `*mut T as *const T` or `*mut T as *mut U`)
62+
* rawptr <-> usize (e.g. `*mut T as usize` or `usize as *mut T`)
63+
* primitive -> primitive (e.g. `u32 as u8` or `u8 as u32`)
64+
* c-like enum -> integer/bool (e.g. `DaysOfWeek as u8`)
65+
* `u8` -> `char`
66+
67+
68+
## Conversion Traits
69+
70+
For full formal specification of all the kinds of coercions and coercion sites, see:
71+
https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md
72+
73+
74+
575
* Coercions
676
* Casts
777
* Conversion Traits (Into/As/...)

data.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
% Data Representation in Rust
2+
3+
Low-level programming cares a lot about data layout. It's a big deal. It also pervasively
4+
influences the rest of the language, so we're going to start by digging into how data is
5+
represented in Rust.
6+
7+
# The `rust` repr
8+
9+
Rust gives you the following ways to lay out composite data:
10+
11+
* structs (named product types)
12+
* tuples (anonymous product types)
13+
* arrays (homogeneous product types)
14+
* enums (named sum types -- tagged unions)
15+
16+
For all these, individual fields are aligned to their preferred alignment.
17+
For primitives this is equal to
18+
their size. For instance, a u32 will be aligned to a multiple of 32 bits, and a u16 will
19+
be aligned to a multiple of 16 bits. Composite structures will have their size rounded
20+
up to be a multiple of the highest alignment required by their fields, and an alignment
21+
requirement equal to the highest alignment required by their fields. So for instance,
22+
23+
```rust
24+
struct A {
25+
a: u8,
26+
c: u64,
27+
b: u32,
28+
}
29+
```
30+
31+
will have a size that is a multiple of 64-bits, and 64-bit alignment.
32+
33+
There is *no indirection* for these types; all data is stored contiguously as you would
34+
expect in C. However with the exception of arrays, the layout of data is not by
35+
default specified in Rust. Given the two following struct definitions:
36+
37+
```rust
38+
struct A {
39+
a: i32,
40+
b: u64,
41+
}
42+
43+
struct B {
44+
x: i32,
45+
b: u64,
46+
}
47+
```
48+
49+
Rust *does* guarantee that two instances of A have their data laid out in exactly
50+
the same way. However Rust *does not* guarantee that an instance of A has the same
51+
field ordering or padding as an instance of B (in practice there's no *particular*
52+
reason why they wouldn't, other than that its not currently guaranteed).
53+
54+
With A and B as written, this is basically nonsensical, but several other features
55+
of Rust make it desirable for the language to play with data layout in complex ways.
56+
57+
For instance, consider this struct:
58+
59+
```rust
60+
struct Foo<T, U> {
61+
count: u16,
62+
data1: T,
63+
data2: U,
64+
}
65+
```
66+
67+
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
68+
fields in the order specified, we expect it to *pad* the values in the struct to satisfy
69+
their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to
70+
produce the following:
71+
72+
```rust
73+
struct Foo<u16, u32> {
74+
count: u16,
75+
data1: u16,
76+
data2: u32,
77+
}
78+
79+
struct Foo<u32, u16> {
80+
count: u16,
81+
_pad1: u16,
82+
data1: u32,
83+
data2: u16,
84+
_pad2: u16,
85+
}
86+
```
87+
88+
The former case quite simply wastes space. An optimal use of space therefore requires
89+
different monomorphizations to *have different field orderings*.
90+
91+
**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0.0**
92+
93+
Enums make this consideration even more complicated. Naively, an enum such as:
94+
95+
```rust
96+
enum Foo {
97+
A(u32),
98+
B(u64),
99+
C(u8),
100+
}
101+
```
102+
103+
would be laid out as:
104+
105+
```rust
106+
struct FooRepr {
107+
data: u64, // this is *really* either a u64, u32, or u8 based on `tag`
108+
tag: u8, // 0 = A, 1 = B, 2 = C
109+
}
110+
```
111+
112+
And indeed this is approximately how it would be laid out in general
113+
(modulo the size and position of `tag`). However there are several cases where
114+
such a representation is ineffiecient. The classic case of this is Rust's
115+
"null pointer optimization". Given a pointer that is known to not be null
116+
(e.g. `&u32`), an enum can *store* a discriminant bit *inside* the pointer
117+
by using null as a special value. The net result is that
118+
`sizeof(Option<&T>) == sizeof<&T>`
119+
120+
There are many types in Rust that are, or contain, "not null" pointers such as `Box<T>`, `Vec<T>`,
121+
`String`, `&T`, and `&mut T`. Similarly, one can imagine nested enums pooling their tags into
122+
a single descriminant, as they are by definition known to have a limited range of valid values.
123+
In principle enums can use fairly elaborate algorithms to cache bits throughout nested types
124+
with special constrained representations. As such it is *especially* desirable that we leave
125+
enum layout unspecified today.
126+
127+
# Dynamically Sized Types (DSTs)
128+
129+
Rust also supports types without a statically known size. On the surface,
130+
this is a bit nonsensical: Rust must know the size of something in order to
131+
work with it. DSTs are generally produced as views, or through type-erasure
132+
of types that *do* have a known size. Due to their lack of a statically known
133+
size, these types can only exist *behind* some kind of pointer. They consequently
134+
produce a *fat* pointer consisting of the pointer and the information that
135+
*completes* them.
136+
137+
For instance, the slice type, `[T]`, is some statically unknown number of elements
138+
stored contiguously. `&[T]` consequently consists of a `(&T, usize)` pair that specifies
139+
where the slice starts, and how many elements it contains. Similarly Trait Objects
140+
support interface-oriented type erasure through a `(data_ptr, vtable_ptr)` pair.
141+
142+
Structs can actually store a single DST directly as their last field, but this
143+
makes them a DST as well:
144+
145+
```rust
146+
// Can't be stored on the stack directly
147+
struct Foo {
148+
info: u32,
149+
data: [u8],
150+
}
151+
```
152+
153+
# Zero Sized Types (ZSTs)
154+
155+
Rust actually allows types to be specified that occupy *no* space:
156+
157+
```rust
158+
struct Foo; // No fields = no size
159+
enum Bar; // No variants = no size
160+
161+
// All fields have no size = no size
162+
struct Baz {
163+
foo: Foo,
164+
bar: Bar,
165+
qux: (), // empty tuple has no size
166+
}
167+
```
168+
169+
On their own, ZSTs are, for obvious reasons, pretty useless. However
170+
as with many curious layout choices in Rust, their potential is realized in a generic
171+
context.
172+
173+
Rust largely understands that any operation that produces or stores a ZST
174+
can be reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
175+
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap` normally
176+
does to store and retrieve keys will be completely stripped in monomorphization.
177+
178+
Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s.
179+
180+
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
181+
consequence of types with no size. In particular, pointer offsets are no-ops, and
182+
standard allocators (including jemalloc, the one used by Rust) generally consider
183+
passing in `0` as Undefined Behaviour.
184+
185+
# Drop Flags
186+
187+
For unfortunate legacy implementation reasons, Rust as of 1.0.0 will do a nasty trick to
188+
any type that implements the `Drop` trait (has a destructor): it will insert a secret field
189+
in the type. That is,
190+
191+
```rust
192+
struct Foo {
193+
a: u32,
194+
b: u32,
195+
}
196+
197+
impl Drop for Foo {
198+
fn drop(&mut self) { }
199+
}
200+
```
201+
202+
will cause Foo to secretly become:
203+
204+
```rust
205+
struct Foo {
206+
a: u32,
207+
b: u32,
208+
_drop_flag: u8,
209+
}
210+
```
211+
212+
For details as to *why* this is done, and how to make it not happen, check out
213+
[SOME OTHER SECTION].
214+
215+
# Alternative representations
216+
217+
Rust allows you to specify alternative data layout strategies from the default Rust
218+
one.
219+
220+
# repr(C)
221+
222+
This is the most important `repr`. It has fairly simple intent: do what C does.
223+
The order, size, and alignment of fields is exactly what you would expect from
224+
C or C++. Any type you expect to pass through an FFI boundary should have `repr(C)`,
225+
as C is the lingua-franca of the programming world. However this is also necessary
226+
to soundly do more elaborate tricks with data layout such as reintepretting values
227+
as a different type.
228+
229+
However, the interaction with Rust's more exotic data layout features must be kept
230+
in mind. Due to its dual purpose as a "for FFI" and "for layout control", repr(C)
231+
can be applied to types that will be nonsensical or problematic if passed through
232+
the FFI boundary.
233+
234+
* ZSTs are still zero-sized, even though this is not a standard behaviour
235+
in C, and is explicitly contrary to the behaviour of an empty type in C++, which
236+
still consumes a byte of space.
237+
238+
* DSTs are not a concept in C
239+
240+
* **The drop flag will still be added**
241+
242+
* This is equivalent to repr(u32) for enums (see below)
243+
244+
# repr(packed)
245+
246+
`repr(packed)` forces rust to strip any padding it would normally apply.
247+
This may improve the memory footprint of a type, but will have negative
248+
side-effects from "field access is heavily penalized" to "completely breaks
249+
everything" based on target platform.
250+
251+
# repr(u8), repr(u16), repr(u32), repr(u64)
252+
253+
These specify the size to make a c-like enum (one which has no values in its variants).
254+

0 commit comments

Comments
 (0)