Skip to content

Commit a2074c0

Browse files
committed
Merge pull request #51 from alexcrichton/ffi-and-rust
Rust Once, Run Everywhere
2 parents 078517d + 939bc43 commit a2074c0

File tree

1 file changed

+299
-0
lines changed

1 file changed

+299
-0
lines changed
Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
---
2+
layout: post
3+
title: "Rust Once, Run Everywhere"
4+
author: Alex Crichton
5+
description: "Zero-cost and safe FFI in Rust"
6+
---
7+
8+
Rust's quest for world domination was never destined to happen overnight, so
9+
Rust needs to be able to interoperate with the existing world just as easily as
10+
it talks to itself. For this reason, **Rust makes it easy to communicate with C
11+
APIs without overhead, and to leverage its ownership system to provide much
12+
stronger safety guarantees for those APIs at the same time**.
13+
14+
To communicate with other languages, Rust provides a *foreign function
15+
interface* (FFI). Following Rust's design principles, the FFI provides a
16+
**zero-cost abstraction** where function calls between Rust and C have identical
17+
performance to C function calls. FFI bindings can also leverage language
18+
features such as ownership and borrowing to provide a **safe interface** that
19+
enforces protocols around pointers and other resources. These protocols usually
20+
appear only in the documentation for C APIs -- at best -- but Rust makes them
21+
explicit.
22+
23+
In this post we'll explore how to encapsulate unsafe FFI calls to C in safe,
24+
zero-cost abstractions. Working with C is, however, just an example; we'll also
25+
see how Rust can easily talk to languages like Python and Ruby just as
26+
seamlessly as with C.
27+
28+
### Rust talking to C
29+
30+
Let's start with a simple example of calling C code from Rust and then
31+
demonstrate that Rust imposes no additional overhead. Here's a C program which
32+
will simply double all the input it's given:
33+
34+
```c
35+
int double_input(int input) {
36+
return input * 2;
37+
}
38+
```
39+
40+
To call this from Rust, you might write a program like this:
41+
42+
```rust
43+
extern crate libc;
44+
45+
extern {
46+
fn double_input(input: libc::c_int) -> libc::c_int;
47+
}
48+
49+
fn main() {
50+
let input = 4;
51+
let output = unsafe { double_input(input) };
52+
println!("{} * 2 = {}", input, output);
53+
}
54+
```
55+
56+
And that's it! You can try this out for yourself by
57+
[checking out the code on GitHub][rust2c] and running `cargo run` from that
58+
directory. **At the source level we can see that there's no burden in calling an
59+
external function beyond stating its signature, and we'll see soon that the
60+
generated code indeed has no overhead, either.** There are, however, a few
61+
subtle aspects of this Rust program, so let's cover each piece in detail.
62+
63+
[rust2c]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/rust-to-c
64+
65+
First up we see `extern crate libc`. [The libc crate][libc] provides many useful
66+
type definitions for FFI bindings when talking with C, and it makes it easy to
67+
ensure that both C and Rust agree on the types crossing the language boundary.
68+
69+
[libc]: https://crates.io/crates/libc
70+
71+
This leads us nicely into the next part of the program:
72+
73+
```rust
74+
extern {
75+
fn double_input(input: libc::c_int) -> libc::c_int;
76+
}
77+
```
78+
79+
In Rust this is a **declaration** of an externally available function. You can
80+
think of this along the lines of a C header file. Here's where the compiler
81+
learns about the inputs and outputs of the function, and you can see above that
82+
this matches our definition in C. Next up we have the main body of the program:
83+
84+
```rust
85+
fn main() {
86+
let input = 4;
87+
let output = unsafe { double_input(input) };
88+
println!("{} * 2 = {}", input, output);
89+
}
90+
```
91+
92+
We see one of the crucial aspects of FFI in Rust here, the `unsafe` block. The
93+
compiler knows nothing about the implementation of `double_input`, so it must
94+
assume that memory unsafety *could* happen whenever you call a foreign function.
95+
The `unsafe` block is how the programmer takes responsibility for ensuring
96+
safety -- you are promising that the actual call you make will not, in fact,
97+
violate memory safety, and thus that Rust's basic guarantees are upheld. This
98+
may seem limiting, but Rust has just the right set of tools to allow consumers
99+
to not worry about `unsafe` (more on this in a moment).
100+
101+
Now that we've seen how to call a C function from Rust, let's see if we can
102+
verify this claim of zero overhead. Almost all programming languages can call
103+
into C one way or another, but it often comes at a cost with runtime type
104+
conversions or perhaps some language-runtime juggling. To get a handle on what
105+
Rust is doing, let's go straight to the assembly code of the above `main`
106+
function's call to `double_input`:
107+
108+
```
109+
mov $0x4,%edi
110+
callq 3bc30 <double_input>
111+
```
112+
113+
And as before, that's it! Here we can see that calling a C function from Rust
114+
involves precisely one call instruction after moving the arguments into place,
115+
exactly the same cost as it would be in C.
116+
117+
### Safe Abstractions
118+
119+
Most features in Rust tie into its core concept of ownership, and the FFI is no
120+
exception. When binding a C library in Rust you not only have the benefit of zero
121+
overhead, but you are also able to make it *safer* than C can! **Bindings can
122+
leverage the ownership and borrowing principles in Rust to codify comments
123+
typically found in a C header about how its API should be used.**
124+
125+
For example, consider a C library for parsing a tarball. This library will
126+
expose functions to read the contents of each file in the tarball, probably
127+
something along the lines of:
128+
129+
```c
130+
// Gets the data for a file in the tarball at the given index, returning NULL if
131+
// it does not exist. The `size` pointer is filled in with the size of the file
132+
// if successful.
133+
const char *tarball_file_data(tarball_t *tarball, unsigned index, size_t *size);
134+
```
135+
136+
This function is implicitly making assumptions about how it can be used,
137+
however, by assuming that the `char*` pointer returned cannot outlive the input
138+
tarball. When bound in Rust, this API might look like this instead:
139+
140+
```rust
141+
pub struct Tarball { raw: *mut tarball_t }
142+
143+
impl Tarball {
144+
pub fn file(&self, index: u32) -> Option<&[u8]> {
145+
unsafe {
146+
let mut size = 0;
147+
let data = tarball_file_data(self.raw, index as libc::c_uint,
148+
&mut size);
149+
if data.is_null() {
150+
None
151+
} else {
152+
Some(slice::from_raw_parts(data as *const u8, size as usize))
153+
}
154+
}
155+
}
156+
}
157+
```
158+
159+
Here the `*mut tarball_t` pointer is *owned by* a `Tarball`, which is
160+
responsible for any destruction and cleanup, so we already have rich knowledge
161+
about the lifetime of the tarball's memory. Additionally, the `file` method
162+
returns a **borrowed slice** whose lifetime is implicitly connected to the
163+
lifetime of the source tarball itself (the `&self` argument). This is Rust's way
164+
of indicating that the returned slice can only be used within the lifetime of
165+
the tarball, statically preventing dangling pointer bugs that are easy to
166+
make when working directly with C. (If you're not familiar with this kind of
167+
borrowing in Rust, have a look at Yehuda Katz's [blog post on ownership].)
168+
169+
[blog post]: http://blog.skylight.io/rust-means-never-having-to-close-a-socket/
170+
171+
A key aspect of the Rust binding here is that it is a safe function, meaning
172+
that callers do not have to use `unsafe` blocks to invoke it! Although it has an
173+
`unsafe` *implementation* (due to calling an FFI function), the *interface* uses
174+
borrowing to guarantee that no memory unsafety can occur in any Rust code that
175+
uses it. That is, due to Rust's static checking, it's simply not possible to
176+
cause a segfault using the API on the Rust side. And don't forget, all of this
177+
is coming at zero cost: the raw types in C are representable in Rust with no
178+
extra allocations or overhead.
179+
180+
Rust's amazing community has already built some substantial safe bindings around
181+
existing C libraries, including [OpenSSL][rust-openssl], [libgit2][git2-rs],
182+
[libdispatch][dispatch], [libcurl][curl-rust], [sdl2][sdl2], [Unix APIs][nix],
183+
and [libsodium][sodiumoxide]. This list is also growing quite rapidly on
184+
[crates.io][crates-io], so your favorite C library may already be bound or will
185+
be bound soon!
186+
187+
[rust-openssl]: https://crates.io/crates/openssl
188+
[git2-rs]: https://crates.io/crates/git2
189+
[curl-rust]: https://crates.io/crates/curl
190+
[dispatch]: https://crates.io/crates/dispatch
191+
[sdl2]: https://crates.io/crates/sdl2
192+
[nix]: https://crates.io/crates/nix
193+
[sodiumoxide]: https://crates.io/crates/sodiumoxide
194+
[crates-io]: https://crates.io
195+
196+
### C talking to Rust
197+
198+
**Despite guaranteeing memory safety, Rust does not have a garbage collector or
199+
runtime, and one of the benefits of this is that Rust code can be called from C
200+
with no setup at all.** This means that the zero overhead FFI not only applies
201+
when Rust calls into C, but also when C calls into Rust!
202+
203+
Let's take the example above, but reverse the roles of each language. As before,
204+
all the code below is [available on GitHub][c2rust]. First we'll start off with
205+
our Rust code:
206+
207+
[c2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/c-to-rust
208+
209+
```rust
210+
#[no_mangle]
211+
pub extern fn double_input(input: i32) -> i32 {
212+
input * 2
213+
}
214+
```
215+
216+
As with the Rust code before, there's not a whole lot here but there are some
217+
subtle aspects in play. First off, we've labeled our function definition with a
218+
`#[no_mangle]` attribute. This instructs the compiler to not mangle the symbol
219+
name for the function `double_input`. Rust employs name mangling similar to C++
220+
to ensure that libraries do not clash with one another, and this attribute
221+
means that you don't have to guess a symbol name like
222+
`double_input::h485dee7f568bebafeaa` from C.
223+
224+
Next we've got our function definition, and the most interesting part about
225+
this is the keyword `extern`. This is a specialized form of specifying the [ABI
226+
for a function][abi-fn] which enables the function to be compatible with a C
227+
function call.
228+
229+
[abi-fn]: http://doc.rust-lang.org/reference.html#extern-functions
230+
231+
Finally, if you [take a look at the `Cargo.toml`][cargo-toml] you'll see that
232+
this library is not compiled as a normal Rust library (rlib) but instead as a
233+
static archive which Rust calls a 'staticlib'. This enables all the relevant
234+
Rust code to be linked statically into the C program we're about to produce.
235+
236+
[cargo-toml]: https://github.com/alexcrichton/rust-ffi-examples/blob/master/c-to-rust/Cargo.toml#L8
237+
238+
Now that we've got our Rust library squared away, let's write our C program
239+
which will call Rust.
240+
241+
```c
242+
#include <stdint.h>
243+
#include <stdio.h>
244+
245+
extern int32_t double_input(int32_t input);
246+
247+
int main() {
248+
int input = 4;
249+
int output = double_input(input);
250+
printf("%d * 2 = %d\n", input, output);
251+
return 0;
252+
}
253+
```
254+
255+
Here we can see that C, like Rust, needs to declare the `double_input` function
256+
that Rust defined. Other than that though everything is ready to go! If you run
257+
`make` from the [directory on GitHub][c2rust] you'll see these examples getting
258+
compiled and linked together and the final executable should run and print
259+
`4 * 2 = 8`.
260+
261+
Rust's lack of a garbage collector and runtime enables this seamless transition
262+
from C to Rust. The external C code does not need to perform any setup on Rust's
263+
behalf, making the transition that much cheaper.
264+
265+
### Beyond C
266+
267+
Up to now we've seen how FFI in Rust has zero overhead and how we can use Rust's
268+
concept of ownership to write safe bindings to C libraries. If you're not using
269+
C, however, you're still in luck! These features of Rust enable it to also be
270+
called from [Python][py2rust], [Ruby][rb2rust], [Javascript][js2rust], and many
271+
more languages.
272+
273+
[py2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/python-to-rust
274+
[rb2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/ruby-to-rust
275+
[js2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/node-to-rust
276+
277+
When writing code in these languages, you sometimes want to speed up some
278+
component that's performance critical, but in the past this often required
279+
dropping all the way to C, and thereby giving up the memory safety, high-level
280+
abstractions, and ergonomics of these languages.
281+
282+
The fact that Rust can talk to easily with C, however, means that it is also
283+
viable for this sort of usage. One of Rust's first production users,
284+
[Skylight](https://www.skylight.io), was able to improve the performance and
285+
memory usage of their data collection agent almost instantly by just using Rust,
286+
and the Rust code is all published as a Ruby gem.
287+
288+
Moving from a language like Python and Ruby down to C to optimize performance is
289+
often quite difficult as it's tough to ensure that the program won't crash in a
290+
difficult-to-debug way. Rust, however, not only brings zero cost FFI, but *also*
291+
makes it possible to retain the same safety guarantees as the original source
292+
language. In the long run, this should make it much easier for programmers in
293+
these languages to drop down and do some systems programming to squeeze out
294+
critical performance when they need it.
295+
296+
FFI is just one of many tools in the toolbox of Rust, but it's a key component
297+
to Rust's adoption as it allows Rust to seamlessly integrate with existing code
298+
bases today. I'm personally quite excited to see the benefits of Rust reach as
299+
many projects as possible!

0 commit comments

Comments
 (0)