|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Rust Once, Run Everywhere" |
| 4 | +author: Alex Crichton |
| 5 | +description: "Zero-cost and safe FFI in Rust" |
| 6 | +--- |
| 7 | + |
| 8 | +Rust's quest for world domination was never destined to happen overnight, so |
| 9 | +Rust needs to be able to interoperate with the existing world just as easily as |
| 10 | +it talks to itself. For this reason, **Rust makes it easy to communicate with C |
| 11 | +APIs without overhead, and to leverage its ownership system to provide much |
| 12 | +stronger safety guarantees for those APIs at the same time**. |
| 13 | + |
| 14 | +To communicate with other languages, Rust provides a *foreign function |
| 15 | +interface* (FFI). Following Rust's design principles, the FFI provides a |
| 16 | +**zero-cost abstraction** where function calls between Rust and C have identical |
| 17 | +performance to C function calls. FFI bindings can also leverage language |
| 18 | +features such as ownership and borrowing to provide a **safe interface** that |
| 19 | +enforces protocols around pointers and other resources. These protocols usually |
| 20 | +appear only in the documentation for C APIs -- at best -- but Rust makes them |
| 21 | +explicit. |
| 22 | + |
| 23 | +In this post we'll explore how to encapsulate unsafe FFI calls to C in safe, |
| 24 | +zero-cost abstractions. Working with C is, however, just an example; we'll also |
| 25 | +see how Rust can easily talk to languages like Python and Ruby just as |
| 26 | +seamlessly as with C. |
| 27 | + |
| 28 | +### Rust talking to C |
| 29 | + |
| 30 | +Let's start with a simple example of calling C code from Rust and then |
| 31 | +demonstrate that Rust imposes no additional overhead. Here's a C program which |
| 32 | +will simply double all the input it's given: |
| 33 | + |
| 34 | +```c |
| 35 | +int double_input(int input) { |
| 36 | + return input * 2; |
| 37 | +} |
| 38 | +``` |
| 39 | +
|
| 40 | +To call this from Rust, you might write a program like this: |
| 41 | +
|
| 42 | +```rust |
| 43 | +extern crate libc; |
| 44 | +
|
| 45 | +extern { |
| 46 | + fn double_input(input: libc::c_int) -> libc::c_int; |
| 47 | +} |
| 48 | +
|
| 49 | +fn main() { |
| 50 | + let input = 4; |
| 51 | + let output = unsafe { double_input(input) }; |
| 52 | + println!("{} * 2 = {}", input, output); |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +And that's it! You can try this out for yourself by |
| 57 | +[checking out the code on GitHub][rust2c] and running `cargo run` from that |
| 58 | +directory. **At the source level we can see that there's no burden in calling an |
| 59 | +external function beyond stating its signature, and we'll see soon that the |
| 60 | +generated code indeed has no overhead, either.** There are, however, a few |
| 61 | +subtle aspects of this Rust program, so let's cover each piece in detail. |
| 62 | + |
| 63 | +[rust2c]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/rust-to-c |
| 64 | + |
| 65 | +First up we see `extern crate libc`. [The libc crate][libc] provides many useful |
| 66 | +type definitions for FFI bindings when talking with C, and it makes it easy to |
| 67 | +ensure that both C and Rust agree on the types crossing the language boundary. |
| 68 | + |
| 69 | +[libc]: https://crates.io/crates/libc |
| 70 | + |
| 71 | +This leads us nicely into the next part of the program: |
| 72 | + |
| 73 | +```rust |
| 74 | +extern { |
| 75 | + fn double_input(input: libc::c_int) -> libc::c_int; |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +In Rust this is a **declaration** of an externally available function. You can |
| 80 | +think of this along the lines of a C header file. Here's where the compiler |
| 81 | +learns about the inputs and outputs of the function, and you can see above that |
| 82 | +this matches our definition in C. Next up we have the main body of the program: |
| 83 | + |
| 84 | +```rust |
| 85 | +fn main() { |
| 86 | + let input = 4; |
| 87 | + let output = unsafe { double_input(input) }; |
| 88 | + println!("{} * 2 = {}", input, output); |
| 89 | +} |
| 90 | +``` |
| 91 | + |
| 92 | +We see one of the crucial aspects of FFI in Rust here, the `unsafe` block. The |
| 93 | +compiler knows nothing about the implementation of `double_input`, so it must |
| 94 | +assume that memory unsafety *could* happen whenever you call a foreign function. |
| 95 | +The `unsafe` block is how the programmer takes responsibility for ensuring |
| 96 | +safety -- you are promising that the actual call you make will not, in fact, |
| 97 | +violate memory safety, and thus that Rust's basic guarantees are upheld. This |
| 98 | +may seem limiting, but Rust has just the right set of tools to allow consumers |
| 99 | +to not worry about `unsafe` (more on this in a moment). |
| 100 | + |
| 101 | +Now that we've seen how to call a C function from Rust, let's see if we can |
| 102 | +verify this claim of zero overhead. Almost all programming languages can call |
| 103 | +into C one way or another, but it often comes at a cost with runtime type |
| 104 | +conversions or perhaps some language-runtime juggling. To get a handle on what |
| 105 | +Rust is doing, let's go straight to the assembly code of the above `main` |
| 106 | +function's call to `double_input`: |
| 107 | + |
| 108 | +``` |
| 109 | +mov $0x4,%edi |
| 110 | +callq 3bc30 <double_input> |
| 111 | +``` |
| 112 | + |
| 113 | +And as before, that's it! Here we can see that calling a C function from Rust |
| 114 | +involves precisely one call instruction after moving the arguments into place, |
| 115 | +exactly the same cost as it would be in C. |
| 116 | + |
| 117 | +### Safe Abstractions |
| 118 | + |
| 119 | +Most features in Rust tie into its core concept of ownership, and the FFI is no |
| 120 | +exception. When binding a C library in Rust you not only have the benefit of zero |
| 121 | +overhead, but you are also able to make it *safer* than C can! **Bindings can |
| 122 | +leverage the ownership and borrowing principles in Rust to codify comments |
| 123 | +typically found in a C header about how its API should be used.** |
| 124 | + |
| 125 | +For example, consider a C library for parsing a tarball. This library will |
| 126 | +expose functions to read the contents of each file in the tarball, probably |
| 127 | +something along the lines of: |
| 128 | + |
| 129 | +```c |
| 130 | +// Gets the data for a file in the tarball at the given index, returning NULL if |
| 131 | +// it does not exist. The `size` pointer is filled in with the size of the file |
| 132 | +// if successful. |
| 133 | +const char *tarball_file_data(tarball_t *tarball, unsigned index, size_t *size); |
| 134 | +``` |
| 135 | +
|
| 136 | +This function is implicitly making assumptions about how it can be used, |
| 137 | +however, by assuming that the `char*` pointer returned cannot outlive the input |
| 138 | +tarball. When bound in Rust, this API might look like this instead: |
| 139 | +
|
| 140 | +```rust |
| 141 | +pub struct Tarball { raw: *mut tarball_t } |
| 142 | +
|
| 143 | +impl Tarball { |
| 144 | + pub fn file(&self, index: u32) -> Option<&[u8]> { |
| 145 | + unsafe { |
| 146 | + let mut size = 0; |
| 147 | + let data = tarball_file_data(self.raw, index as libc::c_uint, |
| 148 | + &mut size); |
| 149 | + if data.is_null() { |
| 150 | + None |
| 151 | + } else { |
| 152 | + Some(slice::from_raw_parts(data as *const u8, size as usize)) |
| 153 | + } |
| 154 | + } |
| 155 | + } |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +Here the `*mut tarball_t` pointer is *owned by* a `Tarball`, which is |
| 160 | +responsible for any destruction and cleanup, so we already have rich knowledge |
| 161 | +about the lifetime of the tarball's memory. Additionally, the `file` method |
| 162 | +returns a **borrowed slice** whose lifetime is implicitly connected to the |
| 163 | +lifetime of the source tarball itself (the `&self` argument). This is Rust's way |
| 164 | +of indicating that the returned slice can only be used within the lifetime of |
| 165 | +the tarball, statically preventing dangling pointer bugs that are easy to |
| 166 | +make when working directly with C. (If you're not familiar with this kind of |
| 167 | +borrowing in Rust, have a look at Yehuda Katz's [blog post on ownership].) |
| 168 | + |
| 169 | +[blog post]: http://blog.skylight.io/rust-means-never-having-to-close-a-socket/ |
| 170 | + |
| 171 | +A key aspect of the Rust binding here is that it is a safe function, meaning |
| 172 | +that callers do not have to use `unsafe` blocks to invoke it! Although it has an |
| 173 | +`unsafe` *implementation* (due to calling an FFI function), the *interface* uses |
| 174 | +borrowing to guarantee that no memory unsafety can occur in any Rust code that |
| 175 | +uses it. That is, due to Rust's static checking, it's simply not possible to |
| 176 | +cause a segfault using the API on the Rust side. And don't forget, all of this |
| 177 | +is coming at zero cost: the raw types in C are representable in Rust with no |
| 178 | +extra allocations or overhead. |
| 179 | + |
| 180 | +Rust's amazing community has already built some substantial safe bindings around |
| 181 | +existing C libraries, including [OpenSSL][rust-openssl], [libgit2][git2-rs], |
| 182 | +[libdispatch][dispatch], [libcurl][curl-rust], [sdl2][sdl2], [Unix APIs][nix], |
| 183 | +and [libsodium][sodiumoxide]. This list is also growing quite rapidly on |
| 184 | +[crates.io][crates-io], so your favorite C library may already be bound or will |
| 185 | +be bound soon! |
| 186 | + |
| 187 | +[rust-openssl]: https://crates.io/crates/openssl |
| 188 | +[git2-rs]: https://crates.io/crates/git2 |
| 189 | +[curl-rust]: https://crates.io/crates/curl |
| 190 | +[dispatch]: https://crates.io/crates/dispatch |
| 191 | +[sdl2]: https://crates.io/crates/sdl2 |
| 192 | +[nix]: https://crates.io/crates/nix |
| 193 | +[sodiumoxide]: https://crates.io/crates/sodiumoxide |
| 194 | +[crates-io]: https://crates.io |
| 195 | + |
| 196 | +### C talking to Rust |
| 197 | + |
| 198 | +**Despite guaranteeing memory safety, Rust does not have a garbage collector or |
| 199 | +runtime, and one of the benefits of this is that Rust code can be called from C |
| 200 | +with no setup at all.** This means that the zero overhead FFI not only applies |
| 201 | +when Rust calls into C, but also when C calls into Rust! |
| 202 | + |
| 203 | +Let's take the example above, but reverse the roles of each language. As before, |
| 204 | +all the code below is [available on GitHub][c2rust]. First we'll start off with |
| 205 | +our Rust code: |
| 206 | + |
| 207 | +[c2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/c-to-rust |
| 208 | + |
| 209 | +```rust |
| 210 | +#[no_mangle] |
| 211 | +pub extern fn double_input(input: i32) -> i32 { |
| 212 | + input * 2 |
| 213 | +} |
| 214 | +``` |
| 215 | + |
| 216 | +As with the Rust code before, there's not a whole lot here but there are some |
| 217 | +subtle aspects in play. First off, we've labeled our function definition with a |
| 218 | +`#[no_mangle]` attribute. This instructs the compiler to not mangle the symbol |
| 219 | +name for the function `double_input`. Rust employs name mangling similar to C++ |
| 220 | +to ensure that libraries do not clash with one another, and this attribute |
| 221 | +means that you don't have to guess a symbol name like |
| 222 | +`double_input::h485dee7f568bebafeaa` from C. |
| 223 | + |
| 224 | +Next we've got our function definition, and the most interesting part about |
| 225 | +this is the keyword `extern`. This is a specialized form of specifying the [ABI |
| 226 | +for a function][abi-fn] which enables the function to be compatible with a C |
| 227 | +function call. |
| 228 | + |
| 229 | +[abi-fn]: http://doc.rust-lang.org/reference.html#extern-functions |
| 230 | + |
| 231 | +Finally, if you [take a look at the `Cargo.toml`][cargo-toml] you'll see that |
| 232 | +this library is not compiled as a normal Rust library (rlib) but instead as a |
| 233 | +static archive which Rust calls a 'staticlib'. This enables all the relevant |
| 234 | +Rust code to be linked statically into the C program we're about to produce. |
| 235 | + |
| 236 | +[cargo-toml]: https://github.com/alexcrichton/rust-ffi-examples/blob/master/c-to-rust/Cargo.toml#L8 |
| 237 | + |
| 238 | +Now that we've got our Rust library squared away, let's write our C program |
| 239 | +which will call Rust. |
| 240 | + |
| 241 | +```c |
| 242 | +#include <stdint.h> |
| 243 | +#include <stdio.h> |
| 244 | + |
| 245 | +extern int32_t double_input(int32_t input); |
| 246 | + |
| 247 | +int main() { |
| 248 | + int input = 4; |
| 249 | + int output = double_input(input); |
| 250 | + printf("%d * 2 = %d\n", input, output); |
| 251 | + return 0; |
| 252 | +} |
| 253 | +``` |
| 254 | +
|
| 255 | +Here we can see that C, like Rust, needs to declare the `double_input` function |
| 256 | +that Rust defined. Other than that though everything is ready to go! If you run |
| 257 | +`make` from the [directory on GitHub][c2rust] you'll see these examples getting |
| 258 | +compiled and linked together and the final executable should run and print |
| 259 | +`4 * 2 = 8`. |
| 260 | +
|
| 261 | +Rust's lack of a garbage collector and runtime enables this seamless transition |
| 262 | +from C to Rust. The external C code does not need to perform any setup on Rust's |
| 263 | +behalf, making the transition that much cheaper. |
| 264 | +
|
| 265 | +### Beyond C |
| 266 | +
|
| 267 | +Up to now we've seen how FFI in Rust has zero overhead and how we can use Rust's |
| 268 | +concept of ownership to write safe bindings to C libraries. If you're not using |
| 269 | +C, however, you're still in luck! These features of Rust enable it to also be |
| 270 | +called from [Python][py2rust], [Ruby][rb2rust], [Javascript][js2rust], and many |
| 271 | +more languages. |
| 272 | +
|
| 273 | +[py2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/python-to-rust |
| 274 | +[rb2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/ruby-to-rust |
| 275 | +[js2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/node-to-rust |
| 276 | +
|
| 277 | +When writing code in these languages, you sometimes want to speed up some |
| 278 | +component that's performance critical, but in the past this often required |
| 279 | +dropping all the way to C, and thereby giving up the memory safety, high-level |
| 280 | +abstractions, and ergonomics of these languages. |
| 281 | +
|
| 282 | +The fact that Rust can talk to easily with C, however, means that it is also |
| 283 | +viable for this sort of usage. One of Rust's first production users, |
| 284 | +[Skylight](https://www.skylight.io), was able to improve the performance and |
| 285 | +memory usage of their data collection agent almost instantly by just using Rust, |
| 286 | +and the Rust code is all published as a Ruby gem. |
| 287 | +
|
| 288 | +Moving from a language like Python and Ruby down to C to optimize performance is |
| 289 | +often quite difficult as it's tough to ensure that the program won't crash in a |
| 290 | +difficult-to-debug way. Rust, however, not only brings zero cost FFI, but *also* |
| 291 | +makes it possible to retain the same safety guarantees as the original source |
| 292 | +language. In the long run, this should make it much easier for programmers in |
| 293 | +these languages to drop down and do some systems programming to squeeze out |
| 294 | +critical performance when they need it. |
| 295 | +
|
| 296 | +FFI is just one of many tools in the toolbox of Rust, but it's a key component |
| 297 | +to Rust's adoption as it allows Rust to seamlessly integrate with existing code |
| 298 | +bases today. I'm personally quite excited to see the benefits of Rust reach as |
| 299 | +many projects as possible! |
0 commit comments