Skip to content

Commit af2fd1d

Browse files
committed
rustbook support
1 parent 3287372 commit af2fd1d

12 files changed

+672
-271
lines changed

FiraSans-Medium.woff

87.8 KB
Binary file not shown.

FiraSans-Regular.woff

89.8 KB
Binary file not shown.

Heuristica-Italic.woff

117 KB
Binary file not shown.

README.md

Lines changed: 253 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,254 @@
1-
# The Unsafe Rust Programming Language (Book)
1+
% The Unsafe Rust Programming Language
2+
3+
**This document is about advanced functionality and low-level development practices
4+
in the Rust Programming Language. Most of the things discussed won't matter
5+
to the average Rust programmer. However if you wish to correctly write unsafe
6+
code in Rust, this text contains invaluable information.**
7+
8+
This document seeks to complement [The Rust Programming Language Book][trpl] (TRPL).
9+
Where TRPL introduces the language and teaches the basics, TURPL dives deep into
10+
the specification of the language, and all the nasty bits necessary to write
11+
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
12+
the basics of the language and systems programming. We will not explain the
13+
stack or heap, we will not explain the syntax.
14+
15+
16+
# A Tale Of Two Languages
17+
18+
Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
19+
Any time someone opines the guarantees of Rust, they are almost surely talking about
20+
Safe Rust. However Safe Rust is not sufficient to write every program. For that,
21+
we need the Unsafe Rust superset.
22+
23+
Most fundamentally, writing bindings to other languages
24+
(such as the C exposed by your operating system) is never going to be safe. Rust
25+
can't control what other languages do to program execution! However Unsafe Rust is
26+
also necessary to construct fundamental abstractions where the type system is not
27+
sufficient to automatically prove what you're doing is sound.
28+
29+
Indeed, the Rust standard library is implemented in Rust, and it makes substantial
30+
use of Unsafe Rust for implementing IO, memory allocation, collections,
31+
synchronization, and other low-level computational primitives.
32+
33+
Upon hearing this, many wonder why they would not simply just use C or C++ in place of
34+
Rust (or just use a "real" safe language). If we're going to do unsafe things, why not
35+
lean on these much more established languages?
36+
37+
The most important difference between C++ and Rust is a matter of defaults:
38+
Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular
39+
action. In deciding to work with unchecked uninitialized memory, this does not
40+
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
41+
one does not have to suddenly worry about indexing out of bounds on `y`.
42+
C and C++, by contrast, have pervasive unsafety baked into the language. Even the
43+
modern best practices like `unique_ptr` have various safety pitfalls.
44+
45+
It should also be noted that writing Unsafe Rust should be regarded as an exceptional
46+
action. Unsafe Rust is often the domain of *fundamental libraries*. Anything that needs
47+
to make FFI bindings or define core abstractions. These fundamental libraries then expose
48+
a *safe* interface for intermediate libraries and applications to build upon. And these
49+
safe interfaces make an important promise: if your application segfaults, it's not your
50+
fault. *They* have a bug.
51+
52+
And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
53+
can internally do all sorts of nasty things. The languages themselves are no
54+
different. Safe languages regularly have bugs that cause critical vulnerabilities.
55+
The fact that Rust is written with a healthy spoonful of Unsafe Rust is no different.
56+
However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
57+
C to do the nasty things that need to get done.
58+
59+
60+
61+
62+
# What does `unsafe` mean?
63+
64+
Rust tries to model memory safety through the `unsafe` keyword. Interestingly,
65+
the meaning of `unsafe` largely revolves around what
66+
its *absence* means. If the `unsafe` keyword is absent from a program, it should
67+
not be possible to violate memory safety under *any* conditions. The presence
68+
of `unsafe` means that there are conditions under which this code *could*
69+
violate memory safety.
70+
71+
To be more concrete, Rust cares about preventing the following things:
72+
73+
* Dereferencing null/dangling pointers
74+
* Reading uninitialized memory
75+
* Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell)
76+
* Invoking Undefined Behaviour (in e.g. compiler intrinsics)
77+
* Producing invalid primitive values:
78+
* dangling/null references
79+
* a `bool` that isn't 0 or 1
80+
* an undefined `enum` discriminant
81+
* a `char` larger than char::MAX
82+
* A non-utf8 `str`
83+
* Unwinding into an FFI function
84+
* Causing a data race
85+
86+
That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
87+
declare arbitrary requirements if they could transitively cause memory safety
88+
issues, but it all boils down to the above actions. Rust is otherwise
89+
quite permisive with respect to other dubious operations. Rust considers it
90+
"safe" to:
91+
92+
* Deadlock
93+
* Leak memory
94+
* Fail to call destructors
95+
* Access private fields
96+
* Overflow integers
97+
* Delete the production database
98+
99+
However any program that does such a thing is *probably* incorrect. Rust just isn't
100+
interested in modeling these problems, as they are much harder to prevent in general,
101+
and it's literally impossible to prevent incorrect programs from getting written.
102+
103+
There are several places `unsafe` can appear in Rust today, which can largely be
104+
grouped into two categories:
105+
106+
* There are unchecked contracts here. To declare you understand this, I require
107+
you to write `unsafe` elsewhere:
108+
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
109+
of the function must check the documentation to determine what this means,
110+
and then have to write `unsafe` somewhere to identify that they're aware of
111+
the danger.
112+
* On trait declarations, `unsafe` is declaring that *implementing* the trait
113+
is an unsafe operation, as it has contracts that other unsafe code is free to
114+
trust blindly.
115+
116+
* I am declaring that I have, to the best of my knowledge, adhered to the
117+
unchecked contracts:
118+
* On trait implementations, `unsafe` is declaring that the contract of the
119+
`unsafe` trait has been upheld.
120+
* On blocks, `unsafe` is declaring any unsafety from an unsafe
121+
operation within to be handled, and therefore the parent function is safe.
122+
123+
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
124+
historical reasons and is in the process of being phased out. See the section on
125+
destructors for details.
126+
127+
Some examples of unsafe functions:
128+
129+
* `slice::get_unchecked` will perform unchecked indexing, allowing memory
130+
safety to be freely violated.
131+
* `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is
132+
not "in bounds" as defined by LLVM (see the lifetimes section for details).
133+
* `mem::transmute` reinterprets some value as having the given type,
134+
bypassing type safety in arbitrary ways. (see the conversions section for details)
135+
* All FFI functions are `unsafe` because they can do arbitrary things.
136+
C being an obvious culprit, but generally any language can do something
137+
that Rust isn't happy about. (see the FFI section for details)
138+
139+
As of Rust 1.0 there are exactly two unsafe traits:
140+
141+
* `Send` is a marker trait (it has no actual API) that promises implementors
142+
are safe to send to another thread.
143+
* `Sync` is a marker trait that promises that threads can safely share
144+
implementors through a shared reference.
145+
146+
All other traits that declare any kind of contract *really* can't be trusted
147+
to adhere to their contract when memory-safety is at stake. For instance Rust has
148+
`PartialOrd` and `Ord` to differentiate between types which can "just" be
149+
compared and those that implement a total ordering. However you can't actually
150+
trust an implementor of `Ord` to actually provide a total ordering if failing to
151+
do so causes you to e.g. index out of bounds. But if it just makes your program
152+
do a stupid thing, then it's "fine" to rely on `Ord`.
153+
154+
The reason this is the case is that `Ord` is safe to implement, and it should be
155+
impossible for bad *safe* code to violate memory safety. Rust has traditionally
156+
avoided making traits unsafe because it makes `unsafe` pervasive in the language,
157+
which is not desirable. The only reason `Send` and `Sync` are unsafe is because
158+
thread safety is a sort of fundamental thing that a program can't really guard
159+
against locally (even by-value message passing still requires a notion Send).
160+
161+
162+
163+
164+
# Working with unsafe
165+
166+
Rust generally only gives us the tools to talk about safety in a scoped and
167+
binary manner. Unfortunately reality is significantly more complicated than that.
168+
For instance, consider the following toy function:
169+
170+
```rust
171+
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
172+
if idx < arr.len() {
173+
unsafe {
174+
Some(*arr.get_unchecked(idx))
175+
}
176+
} else {
177+
None
178+
}
179+
}
180+
```
181+
182+
Clearly, this function is safe. We check that the index is in bounds, and if it
183+
is, index into the array in an unchecked manner. But even in such a trivial
184+
function, the scope of the unsafe block is questionable. Consider changing the
185+
`<` to a `<=`:
186+
187+
```rust
188+
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
189+
if idx <= arr.len() {
190+
unsafe {
191+
Some(*arr.get_unchecked(idx))
192+
}
193+
} else {
194+
None
195+
}
196+
}
197+
```
198+
199+
This program is now unsound, an yet *we only modified safe code*. This is the
200+
fundamental problem of safety: it's non-local. The soundness of our unsafe
201+
operations necessarily depends on the state established by "safe" operations.
202+
Although safety *is* modular (we *still* don't need to worry about about
203+
unrelated safety issues like uninitialized memory), it quickly contaminates the
204+
surrounding code.
205+
206+
Trickier than that is when we get into actual statefulness. Consider a simple
207+
implementation of `Vec`:
208+
209+
```rust
210+
// Note this defintion is insufficient. See the section on lifetimes.
211+
struct Vec<T> {
212+
ptr: *mut T,
213+
len: usize,
214+
cap: usize,
215+
}
216+
217+
// Note this implementation does not correctly handle zero-sized types.
218+
// We currently live in a nice imaginary world of only positive fixed-size
219+
// types.
220+
impl<T> Vec<T> {
221+
fn push(&mut self, elem: T) {
222+
if self.len == self.cap {
223+
// not important for this example
224+
self.reallocate();
225+
}
226+
unsafe {
227+
ptr::write(self.ptr.offset(len as isize), elem);
228+
self.len += 1;
229+
}
230+
}
231+
}
232+
```
233+
234+
This code is simple enough to reasonably audit and verify. Now consider
235+
adding the following method:
236+
237+
```rust
238+
fn make_room(&mut self) {
239+
// grow the capacity
240+
self.cap += 1;
241+
}
242+
```
243+
244+
This code is safe, but it is also completely unsound. Changing the capacity
245+
violates the invariants of Vec (that `cap` reflects the allocated space in the
246+
Vec). This is not something the rest of `Vec` can guard against. It *has* to
247+
trust the capacity field because there's no way to verify it.
248+
249+
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
250+
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
251+
module boundary with privacy.
252+
253+
[trpl]: https://doc.rust-lang.org/book/
2254

3-
[Start at the intro](http://www.cglab.ca/~abeinges/blah/turpl/intro.html)

SUMMARY.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Summary
2+
3+
* [Data Layout](data.md)
4+
* [Ownership and Lifetimes](lifetimes.md)
5+
* [Conversions](conversions.md)
6+
* [Uninitialized Memory](uninitialized.md)
7+
* [Ownership-oriented resource management (RAII)](raii.md)
8+
* [Concurrency](concurrency.md)
9+
* [Example: Implementing Vec](vec.md)

SourceCodePro-Regular.woff

54.2 KB
Binary file not shown.

SourceCodePro-Semibold.woff

54.1 KB
Binary file not shown.

SourceSerifPro-Bold.woff

47.6 KB
Binary file not shown.

SourceSerifPro-Regular.woff

48.8 KB
Binary file not shown.

0 commit comments

Comments
 (0)