get into the weeds over GEP and allocations

Gankra · Gankra · commit 0a36ea7db130 · 2015-07-20T15:32:52.000-07:00
diff --git a/src/doc/tarpl/vec-alloc.md b/src/doc/tarpl/vec-alloc.md
@@ -1,5 +1,22 @@
 % Allocating Memory
 
+Using Unique throws a wrench in an important feature of Vec (and indeed all of
+the std collections): an empty Vec doesn't actually allocate at all. So if we
+can't allocate, but also can't put a null pointer in `ptr`, what do we do in
+`Vec::new`? Well, we just put some other garbage in there!
+
+This is perfectly fine because we already have `cap == 0` as our sentinel for no
+allocation. We don't even need to handle it specially in almost any code because
+we usually need to check if `cap > len` or `len > 0` anyway. The traditional
+Rust value to put here is `0x01`. The standard library actually exposes this
+as `std::rt::heap::EMPTY`. There are quite a few places where we'll
+want to use `heap::EMPTY` because there's no real allocation to talk about but
+`null` would make the compiler do bad things.
+
+All of the `heap` API is totally unstable under the `heap_api` feature, though.
+We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
+the `heap` API anyway, so let's just get that dependency over with.
+
 So:
 
 ```rust,ignore
@@ -24,15 +41,29 @@ I slipped in that assert there because zero-sized types will require some
 special handling throughout our code, and I want to defer the issue for now.
 Without this assert, some of our early drafts will do some Very Bad Things.
 
-Next we need to figure out what to actually do when we *do* want space. For that,
-we'll need to use the rest of the heap APIs. These basically allow us to
-talk directly to Rust's instance of jemalloc.
-
-We'll also need a way to handle out-of-memory conditions. The standard library
-calls the `abort` intrinsic, but calling intrinsics from normal Rust code is a
-pretty bad idea. Unfortunately, the `abort` exposed by the standard library
-allocates. Not something we want to do during `oom`! Instead, we'll call
-`std::process::exit`.
+Next we need to figure out what to actually do when we *do* want space. For
+that, we'll need to use the rest of the heap APIs. These basically allow us to
+talk directly to Rust's allocator (jemalloc by default).
+
+We'll also need a way to handle out-of-memory (OOM) conditions. The standard
+library calls the `abort` intrinsic, which just calls an illegal instruction to
+crash the whole program. The reason we abort and don't panic is because
+unwinding can cause allocations to happen, and that seems like a bad thing to do
+when your allocator just came back with "hey I don't have any more memory".
+
+Of course, this is a bit silly since most platforms don't actually run out of
+memory in a conventional way. Your operating system will probably kill the
+application by another means if you legitimately start using up all the memory.
+The most likely way we'll trigger OOM is by just asking for ludicrous quantities
+of memory at once (e.g. half the theoretical address space). As such it's
+*probably* fine to panic and nothing bad will happen. Still, we're trying to be
+like the standard library as much as possible, so we'll just kill the whole
+program.
+
+We said we don't want to use intrinsics, so doing *exactly* what `std` does is
+out. `std::rt::util::abort` actually exists, but it takes a message to print,
+which will probably allocate. Also it's still unstable. Instead, we'll call
+`std::process::exit` with some random number.
 
 ```rust
 fn oom() {
@@ -51,29 +82,104 @@ else:
     cap *= 2
 ```
 
-But Rust's only supported allocator API is so low level that we'll need to
-do a fair bit of extra work, though. We also need to guard against some special
-conditions that can occur with really large allocations. In particular, we index
-into arrays using unsigned integers, but `ptr::offset` takes signed integers. This
-means Bad Things will happen if we ever manage to grow to contain more than
-`isize::MAX` elements. Thankfully, this isn't something we need to worry about
-in most cases.
+But Rust's only supported allocator API is so low level that we'll need to do a
+fair bit of extra work. We also need to guard against some special
+conditions that can occur with really large allocations or empty allocations.
+
+In particular, `ptr::offset` will cause us *a lot* of trouble, because it has
+the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to
+not have dealt with this instruction, here's the basic story with GEP: alias
+analysis, alias analysis, alias analysis. It's super important to an optimizing
+compiler to be able to reason about data dependencies and aliasing.
 
-On 64-bit targets we're artifically limited to only 48-bits, so we'll run out
-of memory far before we reach that point. However on 32-bit targets, particularly
-those with extensions to use more of the address space, it's theoretically possible
-to successfully allocate more than `isize::MAX` bytes of memory. Still, we only
-really need to worry about that if we're allocating elements that are a byte large.
-Anything else will use up too much space.
+As a simple example, consider the following fragment of code:
+
+```rust
+# let x = &mut 0;
+# let y = &mut 0;
+*x *= 7;
+*y *= 3;
+```
 
-However since this is a tutorial, we're not going to be particularly optimal here,
-and just unconditionally check, rather than use clever platform-specific `cfg`s.
+If the compiler can prove that `x` and `y` point to different locations in
+memory, the two operations can in theory be executed in parallel (by e.g.
+loading them into different registers and working on them independently).
+However in *general* the compiler can't do this because if x and y point to
+the same location in memory, the operations need to be done to the same value,
+and they can't just be merged afterwards.
+
+When you use GEP inbounds, you are specifically telling LLVM that the offsets
+you're about to do are within the bounds of a single allocated entity. The
+ultimate payoff being that LLVM can assume that if two pointers are known to
+point to two disjoint objects, all the offsets of those pointers are *also*
+known to not alias (because you won't just end up in some random place in
+memory). LLVM is heavily optimized to work with GEP offsets, and inbounds
+offsets are the best of all, so it's important that we use them as much as
+possible.
+
+So that's what GEP's about, how can it cause us trouble?
+
+The first problem is that we index into arrays with unsigned integers, but
+GEP (and as a consequence `ptr::offset`) takes a *signed integer*. This means
+that half of the seemingly valid indices into an array will overflow GEP and
+actually go in the wrong direction! As such we must limit all allocations to
+`isize::MAX` elements. This actually means we only need to worry about
+byte-sized objects, because e.g. `> isize::MAX` `u16`s will truly exhaust all of
+the system's memory. However in order to avoid subtle corner cases where someone
+reinterprets some array of `< isize::MAX` objects as bytes, std limits all
+allocations to `isize::MAX` bytes.
+
+On all 64-bit targets that Rust currently supports we're artificially limited
+to significantly less than all 64 bits of the address space (modern x64
+platforms only expose 48-bit addressing), so we can rely on just running out of
+memory first. However on 32-bit targets, particularly those with extensions to
+use more of the address space (PAE x86 or x32), it's theoretically possible to
+successfully allocate more than `isize::MAX` bytes of memory.
+
+However since this is a tutorial, we're not going to be particularly optimal
+here, and just unconditionally check, rather than use clever platform-specific
+`cfg`s.
+
+The other corner-case we need to worry about is *empty* allocations. There will
+be two kinds of empty allocations we need to worry about: `cap = 0` for all T,
+and `cap > 0` for zero-sized types.
+
+These cases are tricky because they come
+down to what LLVM means by "allocated". LLVM's notion of an
+allocation is significantly more abstract than how we usually use it. Because
+LLVM needs to work with different languages' semantics and custom allocators,
+it can't really intimately understand allocation. Instead, the main idea behind
+allocation is "doesn't overlap with other stuff". That is, heap allocations,
+stack allocations, and globals don't randomly overlap. Yep, it's about alias
+analysis. As such, Rust can technically play a bit fast an loose with the notion of
+an allocation as long as it's *consistent*.
+
+Getting back to the empty allocation case, there are a couple of places where
+we want to offset by 0 as a consequence of generic code. The question is then:
+is it consistent to do so? For zero-sized types, we have concluded that it is
+indeed consistent to do a GEP inbounds offset by an arbitrary number of
+elements. This is a runtime no-op because every element takes up no space,
+and it's fine to pretend that there's infinite zero-sized types allocated
+at `0x01`. No allocator will ever allocate that address, because they won't
+allocate `0x00` and they generally allocate to some minimal alignment higher
+than a byte.
+
+However what about for positive-sized types? That one's a bit trickier. In
+principle, you can argue that offsetting by 0 gives LLVM no information: either
+there's an element before the address, or after it, but it can't know which.
+However we've chosen to conservatively assume that it may do bad things. As
+such we *will* guard against this case explicitly.
+
+*Phew*
+
+Ok with all the nonsense out of the way, let's actually allocate some memory:
 
 ```rust,ignore
 fn grow(&mut self) {
     // this is all pretty delicate, so let's say it's all unsafe
     unsafe {
-        let align = mem::min_align_of::<T>();
+        // current API requires us to specify size and alignment manually.
+        let align = mem::align_of::<T>();
         let elem_size = mem::size_of::<T>();
 
         let (new_cap, ptr) = if self.cap == 0 {
diff --git a/src/doc/tarpl/vec-layout.md b/src/doc/tarpl/vec-layout.md
@@ -13,15 +13,64 @@ pub struct Vec<T> {
 # fn main() {}
 ```
 
-And indeed this would compile. Unfortunately, it would be incorrect. The
-compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
+And indeed this would compile. Unfortunately, it would be incorrect. First, the
+compiler will give us too strict variance. So a `&Vec<&'static str>`
 couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
-will give incorrect ownership information to dropck, as it will conservatively
-assume we don't own any values of type `T`. See [the chapter on ownership and
-lifetimes] (lifetimes.html) for details.
+will give incorrect ownership information to the drop checker, as it will
+conservatively assume we don't own any values of type `T`. See [the chapter
+on ownership and lifetimes][ownership] for all the details on variance and
+drop check.
 
-As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
-`*mut T` when we have a raw pointer to an allocation we own:
+As we saw in the ownership chapter, we should use `Unique<T>` in place of
+`*mut T` when we have a raw pointer to an allocation we own. Unique is unstable,
+so we'd like to not use it if possible, though.
+
+As a recap, Unique is a wrapper around a raw pointer that declares that:
+
+* We are variant over `T`
+* We may own a value of type `T` (for drop check)
+* We are Send/Sync if `T` is Send/Sync
+* We deref to `*mut T` (so it largely acts like a `*mut` in our code)
+* Our pointer is never null (so `Option<Vec<T>>` is null-pointer-optimized)
+
+We can implement all of the above requirements except for the last
+one in stable Rust:
+
+```rust
+use std::marker::PhantomData;
+use std::ops::Deref;
+use std::mem;
+
+struct Unique<T> {
+    ptr: *const T,              // *const for variance
+    _marker: PhantomData<T>,    // For the drop checker
+}
+
+// Deriving Send and Sync is safe because we are the Unique owners
+// of this data. It's like Unique<T> is "just" T.
+unsafe impl<T: Send> Send for Unique<T> {}
+unsafe impl<T: Sync> Sync for Unique<T> {}
+
+impl<T> Unique<T> {
+    pub fn new(ptr: *mut T) -> Self {
+        Unique { ptr: ptr, _marker: PhantomData }
+    }
+}
+
+impl<T> Deref for Unique<T> {
+    type Target = *mut T;
+    fn deref(&self) -> &*mut T {
+        // There's no way to cast the *const to a *mut
+        // while also taking a reference. So we just
+        // transmute it since it's all "just pointers".
+        unsafe { mem::transmute(&self.ptr) }
+    }
+}
+```
+
+Unfortunately the mechanism for stating that your value is non-zero is
+unstable and unlikely to be stabilized soon. As such we're just going to
+take the hit and use std's Unique:
 
 
 ```rust
@@ -38,29 +87,11 @@ pub struct Vec<T> {
 # fn main() {}
 ```
 
-As a recap, Unique is a wrapper around a raw pointer that declares that:
-
-* We may own a value of type `T`
-* We are Send/Sync iff `T` is Send/Sync
-* Our pointer is never null (and therefore `Option<Vec>` is
-  null-pointer-optimized)
-
-That last point is subtle. First, it makes `Unique::new` unsafe to call, because
-putting `null` inside of it is Undefined Behaviour. It also throws a
-wrench in an important feature of Vec (and indeed all of the std collections):
-an empty Vec doesn't actually allocate at all. So if we can't allocate,
-but also can't put a null pointer in `ptr`, what do we do in
-`Vec::new`? Well, we just put some other garbage in there!
-
-This is perfectly fine because we already have `cap == 0` as our sentinel for no
-allocation. We don't even need to handle it specially in almost any code because
-we usually need to check if `cap > len` or `len > 0` anyway. The traditional
-Rust value to put here is `0x01`. The standard library actually exposes this
-as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
-`heap::EMPTY` because there's no real allocation to talk about but `null` would
-make the compiler angry.
-
-All of the `heap` API is totally unstable under the `heap_api` feature, though.
-We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
-the `heap` API anyway, so let's just get that dependency over with.
+If you don't care about the null-pointer optimization, then you can use the
+stable code. However we will be designing the rest of the code around enabling
+the optimization. In particular, `Unique::new` is unsafe to call, because
+putting `null` inside of it is Undefined Behaviour. Our stable Unique doesn't
+need `new` to be unsafe because it doesn't make any interesting guarantees about
+its contents.
 
+[ownership]: ownership.html
diff --git a/src/doc/tarpl/vec.md b/src/doc/tarpl/vec.md
@@ -2,5 +2,19 @@
 
 To bring everything together, we're going to write `std::Vec` from scratch.
 Because all the best tools for writing unsafe code are unstable, this
-project will only work on nightly (as of Rust 1.2.0).
+project will only work on nightly (as of Rust 1.2.0). With the exception of the
+allocator API, much of the unstable code we'll use is expected to be stabilized
+in a similar form as it is today.
 
+However we will generally try to avoid unstable code where possible. In
+particular we won't use any intrinsics that could make a code a little
+bit nicer or efficient because intrinsics are permanently unstable. Although
+many intrinsics *do* become stabilized elsewhere (`std::ptr` and `str::mem`
+consist of many intrinsics).
+
+Ultimately this means out implementation may not take advantage of all
+possible optimizations, though it will be by no means *naive*. We will
+definitely get into the weeds over nitty-gritty details, even
+when the problem doesn't *really* merit it.
+
+You wanted advanced. We're gonna go advanced.