|
1 |
| -# The Unsafe Rust Programming Language (Book) |
| 1 | +% The Unsafe Rust Programming Language |
| 2 | + |
| 3 | +**This document is about advanced functionality and low-level development practices |
| 4 | +in the Rust Programming Language. Most of the things discussed won't matter |
| 5 | +to the average Rust programmer. However if you wish to correctly write unsafe |
| 6 | +code in Rust, this text contains invaluable information.** |
| 7 | + |
| 8 | +This document seeks to complement [The Rust Programming Language Book][trpl] (TRPL). |
| 9 | +Where TRPL introduces the language and teaches the basics, TURPL dives deep into |
| 10 | +the specification of the language, and all the nasty bits necessary to write |
| 11 | +Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know |
| 12 | +the basics of the language and systems programming. We will not explain the |
| 13 | +stack or heap, we will not explain the syntax. |
| 14 | + |
| 15 | + |
| 16 | +# A Tale Of Two Languages |
| 17 | + |
| 18 | +Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust. |
| 19 | +Any time someone opines the guarantees of Rust, they are almost surely talking about |
| 20 | +Safe Rust. However Safe Rust is not sufficient to write every program. For that, |
| 21 | +we need the Unsafe Rust superset. |
| 22 | + |
| 23 | +Most fundamentally, writing bindings to other languages |
| 24 | +(such as the C exposed by your operating system) is never going to be safe. Rust |
| 25 | +can't control what other languages do to program execution! However Unsafe Rust is |
| 26 | +also necessary to construct fundamental abstractions where the type system is not |
| 27 | +sufficient to automatically prove what you're doing is sound. |
| 28 | + |
| 29 | +Indeed, the Rust standard library is implemented in Rust, and it makes substantial |
| 30 | +use of Unsafe Rust for implementing IO, memory allocation, collections, |
| 31 | +synchronization, and other low-level computational primitives. |
| 32 | + |
| 33 | +Upon hearing this, many wonder why they would not simply just use C or C++ in place of |
| 34 | +Rust (or just use a "real" safe language). If we're going to do unsafe things, why not |
| 35 | +lean on these much more established languages? |
| 36 | + |
| 37 | +The most important difference between C++ and Rust is a matter of defaults: |
| 38 | +Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular |
| 39 | +action. In deciding to work with unchecked uninitialized memory, this does not |
| 40 | +suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`, |
| 41 | +one does not have to suddenly worry about indexing out of bounds on `y`. |
| 42 | +C and C++, by contrast, have pervasive unsafety baked into the language. Even the |
| 43 | +modern best practices like `unique_ptr` have various safety pitfalls. |
| 44 | + |
| 45 | +It should also be noted that writing Unsafe Rust should be regarded as an exceptional |
| 46 | +action. Unsafe Rust is often the domain of *fundamental libraries*. Anything that needs |
| 47 | +to make FFI bindings or define core abstractions. These fundamental libraries then expose |
| 48 | +a *safe* interface for intermediate libraries and applications to build upon. And these |
| 49 | +safe interfaces make an important promise: if your application segfaults, it's not your |
| 50 | +fault. *They* have a bug. |
| 51 | + |
| 52 | +And really, how is that different from *any* safe language? Python, Ruby, and Java libraries |
| 53 | +can internally do all sorts of nasty things. The languages themselves are no |
| 54 | +different. Safe languages regularly have bugs that cause critical vulnerabilities. |
| 55 | +The fact that Rust is written with a healthy spoonful of Unsafe Rust is no different. |
| 56 | +However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of |
| 57 | +C to do the nasty things that need to get done. |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +# What does `unsafe` mean? |
| 63 | + |
| 64 | +Rust tries to model memory safety through the `unsafe` keyword. Interestingly, |
| 65 | +the meaning of `unsafe` largely revolves around what |
| 66 | +its *absence* means. If the `unsafe` keyword is absent from a program, it should |
| 67 | +not be possible to violate memory safety under *any* conditions. The presence |
| 68 | +of `unsafe` means that there are conditions under which this code *could* |
| 69 | +violate memory safety. |
| 70 | + |
| 71 | +To be more concrete, Rust cares about preventing the following things: |
| 72 | + |
| 73 | +* Dereferencing null/dangling pointers |
| 74 | +* Reading uninitialized memory |
| 75 | +* Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell) |
| 76 | +* Invoking Undefined Behaviour (in e.g. compiler intrinsics) |
| 77 | +* Producing invalid primitive values: |
| 78 | + * dangling/null references |
| 79 | + * a `bool` that isn't 0 or 1 |
| 80 | + * an undefined `enum` discriminant |
| 81 | + * a `char` larger than char::MAX |
| 82 | + * A non-utf8 `str` |
| 83 | +* Unwinding into an FFI function |
| 84 | +* Causing a data race |
| 85 | + |
| 86 | +That's it. That's all the Undefined Behaviour in Rust. Libraries are free to |
| 87 | +declare arbitrary requirements if they could transitively cause memory safety |
| 88 | +issues, but it all boils down to the above actions. Rust is otherwise |
| 89 | +quite permisive with respect to other dubious operations. Rust considers it |
| 90 | +"safe" to: |
| 91 | + |
| 92 | +* Deadlock |
| 93 | +* Leak memory |
| 94 | +* Fail to call destructors |
| 95 | +* Access private fields |
| 96 | +* Overflow integers |
| 97 | +* Delete the production database |
| 98 | + |
| 99 | +However any program that does such a thing is *probably* incorrect. Rust just isn't |
| 100 | +interested in modeling these problems, as they are much harder to prevent in general, |
| 101 | +and it's literally impossible to prevent incorrect programs from getting written. |
| 102 | + |
| 103 | +There are several places `unsafe` can appear in Rust today, which can largely be |
| 104 | +grouped into two categories: |
| 105 | + |
| 106 | +* There are unchecked contracts here. To declare you understand this, I require |
| 107 | +you to write `unsafe` elsewhere: |
| 108 | + * On functions, `unsafe` is declaring the function to be unsafe to call. Users |
| 109 | + of the function must check the documentation to determine what this means, |
| 110 | + and then have to write `unsafe` somewhere to identify that they're aware of |
| 111 | + the danger. |
| 112 | + * On trait declarations, `unsafe` is declaring that *implementing* the trait |
| 113 | + is an unsafe operation, as it has contracts that other unsafe code is free to |
| 114 | + trust blindly. |
| 115 | + |
| 116 | +* I am declaring that I have, to the best of my knowledge, adhered to the |
| 117 | +unchecked contracts: |
| 118 | + * On trait implementations, `unsafe` is declaring that the contract of the |
| 119 | + `unsafe` trait has been upheld. |
| 120 | + * On blocks, `unsafe` is declaring any unsafety from an unsafe |
| 121 | + operation within to be handled, and therefore the parent function is safe. |
| 122 | + |
| 123 | +There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for |
| 124 | +historical reasons and is in the process of being phased out. See the section on |
| 125 | +destructors for details. |
| 126 | + |
| 127 | +Some examples of unsafe functions: |
| 128 | + |
| 129 | +* `slice::get_unchecked` will perform unchecked indexing, allowing memory |
| 130 | + safety to be freely violated. |
| 131 | +* `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is |
| 132 | + not "in bounds" as defined by LLVM (see the lifetimes section for details). |
| 133 | +* `mem::transmute` reinterprets some value as having the given type, |
| 134 | + bypassing type safety in arbitrary ways. (see the conversions section for details) |
| 135 | +* All FFI functions are `unsafe` because they can do arbitrary things. |
| 136 | + C being an obvious culprit, but generally any language can do something |
| 137 | + that Rust isn't happy about. (see the FFI section for details) |
| 138 | + |
| 139 | +As of Rust 1.0 there are exactly two unsafe traits: |
| 140 | + |
| 141 | +* `Send` is a marker trait (it has no actual API) that promises implementors |
| 142 | + are safe to send to another thread. |
| 143 | +* `Sync` is a marker trait that promises that threads can safely share |
| 144 | + implementors through a shared reference. |
| 145 | + |
| 146 | +All other traits that declare any kind of contract *really* can't be trusted |
| 147 | +to adhere to their contract when memory-safety is at stake. For instance Rust has |
| 148 | +`PartialOrd` and `Ord` to differentiate between types which can "just" be |
| 149 | +compared and those that implement a total ordering. However you can't actually |
| 150 | +trust an implementor of `Ord` to actually provide a total ordering if failing to |
| 151 | +do so causes you to e.g. index out of bounds. But if it just makes your program |
| 152 | +do a stupid thing, then it's "fine" to rely on `Ord`. |
| 153 | + |
| 154 | +The reason this is the case is that `Ord` is safe to implement, and it should be |
| 155 | +impossible for bad *safe* code to violate memory safety. Rust has traditionally |
| 156 | +avoided making traits unsafe because it makes `unsafe` pervasive in the language, |
| 157 | +which is not desirable. The only reason `Send` and `Sync` are unsafe is because |
| 158 | +thread safety is a sort of fundamental thing that a program can't really guard |
| 159 | +against locally (even by-value message passing still requires a notion Send). |
| 160 | + |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | +# Working with unsafe |
| 165 | + |
| 166 | +Rust generally only gives us the tools to talk about safety in a scoped and |
| 167 | +binary manner. Unfortunately reality is significantly more complicated than that. |
| 168 | +For instance, consider the following toy function: |
| 169 | + |
| 170 | +```rust |
| 171 | +fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> { |
| 172 | + if idx < arr.len() { |
| 173 | + unsafe { |
| 174 | + Some(*arr.get_unchecked(idx)) |
| 175 | + } |
| 176 | + } else { |
| 177 | + None |
| 178 | + } |
| 179 | +} |
| 180 | +``` |
| 181 | + |
| 182 | +Clearly, this function is safe. We check that the index is in bounds, and if it |
| 183 | +is, index into the array in an unchecked manner. But even in such a trivial |
| 184 | +function, the scope of the unsafe block is questionable. Consider changing the |
| 185 | +`<` to a `<=`: |
| 186 | + |
| 187 | +```rust |
| 188 | +fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> { |
| 189 | + if idx <= arr.len() { |
| 190 | + unsafe { |
| 191 | + Some(*arr.get_unchecked(idx)) |
| 192 | + } |
| 193 | + } else { |
| 194 | + None |
| 195 | + } |
| 196 | +} |
| 197 | +``` |
| 198 | + |
| 199 | +This program is now unsound, an yet *we only modified safe code*. This is the |
| 200 | +fundamental problem of safety: it's non-local. The soundness of our unsafe |
| 201 | +operations necessarily depends on the state established by "safe" operations. |
| 202 | +Although safety *is* modular (we *still* don't need to worry about about |
| 203 | +unrelated safety issues like uninitialized memory), it quickly contaminates the |
| 204 | +surrounding code. |
| 205 | + |
| 206 | +Trickier than that is when we get into actual statefulness. Consider a simple |
| 207 | +implementation of `Vec`: |
| 208 | + |
| 209 | +```rust |
| 210 | +// Note this defintion is insufficient. See the section on lifetimes. |
| 211 | +struct Vec<T> { |
| 212 | + ptr: *mut T, |
| 213 | + len: usize, |
| 214 | + cap: usize, |
| 215 | +} |
| 216 | + |
| 217 | +// Note this implementation does not correctly handle zero-sized types. |
| 218 | +// We currently live in a nice imaginary world of only positive fixed-size |
| 219 | +// types. |
| 220 | +impl<T> Vec<T> { |
| 221 | + fn push(&mut self, elem: T) { |
| 222 | + if self.len == self.cap { |
| 223 | + // not important for this example |
| 224 | + self.reallocate(); |
| 225 | + } |
| 226 | + unsafe { |
| 227 | + ptr::write(self.ptr.offset(len as isize), elem); |
| 228 | + self.len += 1; |
| 229 | + } |
| 230 | + } |
| 231 | +} |
| 232 | +``` |
| 233 | + |
| 234 | +This code is simple enough to reasonably audit and verify. Now consider |
| 235 | +adding the following method: |
| 236 | + |
| 237 | +```rust |
| 238 | + fn make_room(&mut self) { |
| 239 | + // grow the capacity |
| 240 | + self.cap += 1; |
| 241 | + } |
| 242 | +``` |
| 243 | + |
| 244 | +This code is safe, but it is also completely unsound. Changing the capacity |
| 245 | +violates the invariants of Vec (that `cap` reflects the allocated space in the |
| 246 | +Vec). This is not something the rest of `Vec` can guard against. It *has* to |
| 247 | +trust the capacity field because there's no way to verify it. |
| 248 | + |
| 249 | +`unsafe` does more than pollute a whole function: it pollutes a whole *module*. |
| 250 | +Generally, the only bullet-proof way to limit the scope of unsafe code is at the |
| 251 | +module boundary with privacy. |
| 252 | + |
| 253 | +[trpl]: https://doc.rust-lang.org/book/ |
2 | 254 |
|
3 |
| -[Start at the intro](http://www.cglab.ca/~abeinges/blah/turpl/intro.html) |
|
0 commit comments