Skip to content

Commit cc7249e

Browse files
committed
---
yaml --- r: 229019 b: refs/heads/try c: 58f6f2d h: refs/heads/master i: 229017: 36493e5 229015: 3873b83 v: v3
1 parent 9cc546f commit cc7249e

16 files changed

+502
-482
lines changed

[refs]

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
refs/heads/master: aca2057ed5fb7af3f8905b2bc01f72fa001c35c8
33
refs/heads/snap-stage3: 1af31d4974e33027a68126fa5a5a3c2c6491824f
4-
refs/heads/try: dba548d3634d1f69b6210b642e700c2c41e69ce9
4+
refs/heads/try: 58f6f2d57a4d0a62f17003facd0d2406da75a035
55
refs/tags/release-0.1: 1f5c5126e96c79d22cb7862f75304136e204f105
66
refs/tags/release-0.2: c870d2dffb391e14efb05aa27898f1f6333a9596
77
refs/tags/release-0.3: b5f0d0f648d9a6153664837026ba1be43d3e2503

branches/try/src/doc/tarpl/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,6 @@ Due to the nature of advanced Rust programming, we will be spending a lot of tim
3434
talking about *safety* and *guarantees*. In particular, a significant portion of
3535
the book will be dedicated to correctly writing and understanding Unsafe Rust.
3636

37-
[trpl]: https://doc.rust-lang.org/book/
38-
[The stack and heap]: https://doc.rust-lang.org/book/the-stack-and-the-heap.html
39-
[Basic Rust]: https://doc.rust-lang.org/book/syntax-and-semantics.html
37+
[trpl]: ../book/
38+
[The stack and heap]: ../book/the-stack-and-the-heap.html
39+
[Basic Rust]: ../book/syntax-and-semantics.html

branches/try/src/doc/tarpl/arc-and-mutex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
% Implementing Arc and Mutex
22

3-
Knowing the theory is all fine and good, but the *best* was to understand
3+
Knowing the theory is all fine and good, but the *best* way to understand
44
something is to use it. To better understand atomics and interior mutability,
55
we'll be implementing versions of the standard library's Arc and Mutex types.
66

branches/try/src/doc/tarpl/atomics.md

Lines changed: 80 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,22 @@
22

33
Rust pretty blatantly just inherits C11's memory model for atomics. This is not
44
due this model being particularly excellent or easy to understand. Indeed, this
5-
model is quite complex and known to have [several flaws][C11-busted]. Rather,
6-
it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
5+
model is quite complex and known to have [several flaws][C11-busted]. Rather, it
6+
is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
77
atomics. At very least, we can benefit from existing tooling and research around
88
C.
99

1010
Trying to fully explain the model in this book is fairly hopeless. It's defined
11-
in terms of madness-inducing causality graphs that require a full book to properly
12-
understand in a practical way. If you want all the nitty-gritty details, you
13-
should check out [C's specification (Section 7.17)][C11-model]. Still, we'll try
14-
to cover the basics and some of the problems Rust developers face.
11+
in terms of madness-inducing causality graphs that require a full book to
12+
properly understand in a practical way. If you want all the nitty-gritty
13+
details, you should check out [C's specification (Section 7.17)][C11-model].
14+
Still, we'll try to cover the basics and some of the problems Rust developers
15+
face.
1516

16-
The C11 memory model is fundamentally about trying to bridge the gap between
17-
the semantics we want, the optimizations compilers want, and the inconsistent
18-
chaos our hardware wants. *We* would like to just write programs and have them
19-
do exactly what we said but, you know, *fast*. Wouldn't that be great?
17+
The C11 memory model is fundamentally about trying to bridge the gap between the
18+
semantics we want, the optimizations compilers want, and the inconsistent chaos
19+
our hardware wants. *We* would like to just write programs and have them do
20+
exactly what we said but, you know, *fast*. Wouldn't that be great?
2021

2122

2223

@@ -41,33 +42,35 @@ x = 2;
4142
y = 3;
4243
```
4344

44-
This has inverted the order of events *and* completely eliminated one event. From
45-
a single-threaded perspective this is completely unobservable: after all the
46-
statements have executed we are in exactly the same state. But if our program is
47-
multi-threaded, we may have been relying on `x` to *actually* be assigned to 1 before
48-
`y` was assigned. We would *really* like the compiler to be able to make these kinds
49-
of optimizations, because they can seriously improve performance. On the other hand,
50-
we'd really like to be able to depend on our program *doing the thing we said*.
45+
This has inverted the order of events *and* completely eliminated one event.
46+
From a single-threaded perspective this is completely unobservable: after all
47+
the statements have executed we are in exactly the same state. But if our
48+
program is multi-threaded, we may have been relying on `x` to *actually* be
49+
assigned to 1 before `y` was assigned. We would *really* like the compiler to be
50+
able to make these kinds of optimizations, because they can seriously improve
51+
performance. On the other hand, we'd really like to be able to depend on our
52+
program *doing the thing we said*.
5153

5254

5355

5456

5557
# Hardware Reordering
5658

5759
On the other hand, even if the compiler totally understood what we wanted and
58-
respected our wishes, our *hardware* might instead get us in trouble. Trouble comes
59-
from CPUs in the form of memory hierarchies. There is indeed a global shared memory
60-
space somewhere in your hardware, but from the perspective of each CPU core it is
61-
*so very far away* and *so very slow*. Each CPU would rather work with its local
62-
cache of the data and only go through all the *anguish* of talking to shared
63-
memory *only* when it doesn't actually have that memory in cache.
60+
respected our wishes, our *hardware* might instead get us in trouble. Trouble
61+
comes from CPUs in the form of memory hierarchies. There is indeed a global
62+
shared memory space somewhere in your hardware, but from the perspective of each
63+
CPU core it is *so very far away* and *so very slow*. Each CPU would rather work
64+
with its local cache of the data and only go through all the *anguish* of
65+
talking to shared memory *only* when it doesn't actually have that memory in
66+
cache.
6467

6568
After all, that's the whole *point* of the cache, right? If every read from the
6669
cache had to run back to shared memory to double check that it hadn't changed,
6770
what would the point be? The end result is that the hardware doesn't guarantee
68-
that events that occur in the same order on *one* thread, occur in the same order
69-
on *another* thread. To guarantee this, we must issue special instructions to
70-
the CPU telling it to be a bit less smart.
71+
that events that occur in the same order on *one* thread, occur in the same
72+
order on *another* thread. To guarantee this, we must issue special instructions
73+
to the CPU telling it to be a bit less smart.
7174

7275
For instance, say we convince the compiler to emit this logic:
7376

@@ -82,86 +85,82 @@ x = 1; y *= 2;
8285

8386
Ideally this program has 2 possible final states:
8487

85-
* `y = 3`: (thread 2 did the check before thread 1 completed)
86-
* `y = 6`: (thread 2 did the check after thread 1 completed)
88+
* `y = 3`: (thread 2 did the check before thread 1 completed) y = 6`: (thread 2
89+
* `did the check after thread 1 completed)
8790

8891
However there's a third potential state that the hardware enables:
8992

9093
* `y = 2`: (thread 2 saw `x = 2`, but not `y = 3`, and then overwrote `y = 3`)
9194

9295
It's worth noting that different kinds of CPU provide different guarantees. It
93-
is common to seperate hardware into two categories: strongly-ordered and weakly-
94-
ordered. Most notably x86/64 provides strong ordering guarantees, while ARM and
95-
provides weak ordering guarantees. This has two consequences for
96-
concurrent programming:
96+
is common to separate hardware into two categories: strongly-ordered and weakly-
97+
ordered. Most notably x86/64 provides strong ordering guarantees, while ARM
98+
provides weak ordering guarantees. This has two consequences for concurrent
99+
programming:
97100

98101
* Asking for stronger guarantees on strongly-ordered hardware may be cheap or
99102
even *free* because they already provide strong guarantees unconditionally.
100103
Weaker guarantees may only yield performance wins on weakly-ordered hardware.
101104

102-
* Asking for guarantees that are *too* weak on strongly-ordered hardware
103-
is more likely to *happen* to work, even though your program is strictly
104-
incorrect. If possible, concurrent algorithms should be tested on
105-
weakly-ordered hardware.
105+
* Asking for guarantees that are *too* weak on strongly-ordered hardware is
106+
more likely to *happen* to work, even though your program is strictly
107+
incorrect. If possible, concurrent algorithms should be tested on weakly-
108+
ordered hardware.
106109

107110

108111

109112

110113

111114
# Data Accesses
112115

113-
The C11 memory model attempts to bridge the gap by allowing us to talk about
114-
the *causality* of our program. Generally, this is by establishing a
115-
*happens before* relationships between parts of the program and the threads
116-
that are running them. This gives the hardware and compiler room to optimize the
117-
program more aggressively where a strict happens-before relationship isn't
118-
established, but forces them to be more careful where one *is* established.
119-
The way we communicate these relationships are through *data accesses* and
120-
*atomic accesses*.
116+
The C11 memory model attempts to bridge the gap by allowing us to talk about the
117+
*causality* of our program. Generally, this is by establishing a *happens
118+
before* relationships between parts of the program and the threads that are
119+
running them. This gives the hardware and compiler room to optimize the program
120+
more aggressively where a strict happens-before relationship isn't established,
121+
but forces them to be more careful where one *is* established. The way we
122+
communicate these relationships are through *data accesses* and *atomic
123+
accesses*.
121124

122125
Data accesses are the bread-and-butter of the programming world. They are
123126
fundamentally unsynchronized and compilers are free to aggressively optimize
124-
them. In particular, data accesses are free to be reordered by the compiler
125-
on the assumption that the program is single-threaded. The hardware is also free
126-
to propagate the changes made in data accesses to other threads
127-
as lazily and inconsistently as it wants. Mostly critically, data accesses are
128-
how data races happen. Data accesses are very friendly to the hardware and
129-
compiler, but as we've seen they offer *awful* semantics to try to
130-
write synchronized code with. Actually, that's too weak. *It is literally
131-
impossible to write correct synchronized code using only data accesses*.
127+
them. In particular, data accesses are free to be reordered by the compiler on
128+
the assumption that the program is single-threaded. The hardware is also free to
129+
propagate the changes made in data accesses to other threads as lazily and
130+
inconsistently as it wants. Mostly critically, data accesses are how data races
131+
happen. Data accesses are very friendly to the hardware and compiler, but as
132+
we've seen they offer *awful* semantics to try to write synchronized code with.
133+
Actually, that's too weak. *It is literally impossible to write correct
134+
synchronized code using only data accesses*.
132135

133136
Atomic accesses are how we tell the hardware and compiler that our program is
134-
multi-threaded. Each atomic access can be marked with
135-
an *ordering* that specifies what kind of relationship it establishes with
136-
other accesses. In practice, this boils down to telling the compiler and hardware
137-
certain things they *can't* do. For the compiler, this largely revolves
138-
around re-ordering of instructions. For the hardware, this largely revolves
139-
around how writes are propagated to other threads. The set of orderings Rust
140-
exposes are:
141-
142-
* Sequentially Consistent (SeqCst)
143-
* Release
144-
* Acquire
145-
* Relaxed
137+
multi-threaded. Each atomic access can be marked with an *ordering* that
138+
specifies what kind of relationship it establishes with other accesses. In
139+
practice, this boils down to telling the compiler and hardware certain things
140+
they *can't* do. For the compiler, this largely revolves around re-ordering of
141+
instructions. For the hardware, this largely revolves around how writes are
142+
propagated to other threads. The set of orderings Rust exposes are:
143+
144+
* Sequentially Consistent (SeqCst) Release Acquire Relaxed
146145

147146
(Note: We explicitly do not expose the C11 *consume* ordering)
148147

149-
TODO: negative reasoning vs positive reasoning?
150-
TODO: "can't forget to synchronize"
148+
TODO: negative reasoning vs positive reasoning? TODO: "can't forget to
149+
synchronize"
151150

152151

153152

154153
# Sequentially Consistent
155154

156155
Sequentially Consistent is the most powerful of all, implying the restrictions
157-
of all other orderings. Intuitively, a sequentially consistent operation *cannot*
158-
be reordered: all accesses on one thread that happen before and after it *stay*
159-
before and after it. A data-race-free program that uses only sequentially consistent
160-
atomics and data accesses has the very nice property that there is a single global
161-
execution of the program's instructions that all threads agree on. This execution
162-
is also particularly nice to reason about: it's just an interleaving of each thread's
163-
individual executions. This *does not* hold if you start using the weaker atomic
164-
orderings.
156+
of all other orderings. Intuitively, a sequentially consistent operation
157+
*cannot* be reordered: all accesses on one thread that happen before and after a
158+
SeqCst access *stay* before and after it. A data-race-free program that uses
159+
only sequentially consistent atomics and data accesses has the very nice
160+
property that there is a single global execution of the program's instructions
161+
that all threads agree on. This execution is also particularly nice to reason
162+
about: it's just an interleaving of each thread's individual executions. This
163+
*does not* hold if you start using the weaker atomic orderings.
165164

166165
The relative developer-friendliness of sequential consistency doesn't come for
167166
free. Even on strongly-ordered platforms sequential consistency involves
@@ -173,26 +172,26 @@ confident about the other memory orders. Having your program run a bit slower
173172
than it needs to is certainly better than it running incorrectly! It's also
174173
*mechanically* trivial to downgrade atomic operations to have a weaker
175174
consistency later on. Just change `SeqCst` to e.g. `Relaxed` and you're done! Of
176-
course, proving that this transformation is *correct* is whole other matter.
175+
course, proving that this transformation is *correct* is a whole other matter.
177176

178177

179178

180179

181180
# Acquire-Release
182181

183-
Acquire and Release are largely intended to be paired. Their names hint at
184-
their use case: they're perfectly suited for acquiring and releasing locks,
185-
and ensuring that critical sections don't overlap.
182+
Acquire and Release are largely intended to be paired. Their names hint at their
183+
use case: they're perfectly suited for acquiring and releasing locks, and
184+
ensuring that critical sections don't overlap.
186185

187186
Intuitively, an acquire access ensures that every access after it *stays* after
188187
it. However operations that occur before an acquire are free to be reordered to
189188
occur after it. Similarly, a release access ensures that every access before it
190-
*stays* before it. However operations that occur after a release are free to
191-
be reordered to occur before it.
189+
*stays* before it. However operations that occur after a release are free to be
190+
reordered to occur before it.
192191

193192
When thread A releases a location in memory and then thread B subsequently
194193
acquires *the same* location in memory, causality is established. Every write
195-
that happened *before* A's release will be observed by B *after* it's release.
194+
that happened *before* A's release will be observed by B *after* its release.
196195
However no causality is established with any other threads. Similarly, no
197196
causality is established if A and B access *different* locations in memory.
198197

branches/try/src/doc/tarpl/casts.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
% Casts
22

3-
Casts are a superset of coercions: every coercion can be explicitly invoked via a
4-
cast, but some conversions *require* a cast. These "true casts" are generally regarded
5-
as dangerous or problematic actions. True casts revolve around raw pointers and
6-
the primitive numeric types. True casts aren't checked.
3+
Casts are a superset of coercions: every coercion can be explicitly invoked via
4+
a cast, but some conversions *require* a cast. These "true casts" are generally
5+
regarded as dangerous or problematic actions. True casts revolve around raw
6+
pointers and the primitive numeric types. True casts aren't checked.
77

88
Here's an exhaustive list of all the true casts. For brevity, we will use `*`
9-
to denote either a `*const` or `*mut`, and `integer` to denote any integral primitive:
9+
to denote either a `*const` or `*mut`, and `integer` to denote any integral
10+
primitive:
1011

1112
* `*T as *U` where `T, U: Sized`
1213
* `*T as *U` TODO: explain unsized situation
@@ -37,19 +38,21 @@ expression, `e as U2` is not necessarily so (in fact it will only be valid if
3738
For numeric casts, there are quite a few cases to consider:
3839

3940
* casting between two integers of the same size (e.g. i32 -> u32) is a no-op
40-
* casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate
41+
* casting from a larger integer to a smaller integer (e.g. u32 -> u8) will
42+
truncate
4143
* casting from a smaller integer to a larger integer (e.g. u8 -> u32) will
4244
* zero-extend if the source is unsigned
4345
* sign-extend if the source is signed
4446
* casting from a float to an integer will round the float towards zero
4547
* **NOTE: currently this will cause Undefined Behaviour if the rounded
46-
value cannot be represented by the target integer type**. This is a bug
47-
and will be fixed. (TODO: figure out what Inf and NaN do)
48-
* casting from an integer to float will produce the floating point representation
49-
of the integer, rounded if necessary (rounding strategy unspecified).
50-
* casting from an f32 to an f64 is perfect and lossless.
48+
value cannot be represented by the target integer type**. This includes
49+
Inf and NaN. This is a bug and will be fixed.
50+
* casting from an integer to float will produce the floating point
51+
representation of the integer, rounded if necessary (rounding strategy
52+
unspecified)
53+
* casting from an f32 to an f64 is perfect and lossless
5154
* casting from an f64 to an f32 will produce the closest possible value
52-
(rounding strategy unspecified).
55+
(rounding strategy unspecified)
5356
* **NOTE: currently this will cause Undefined Behaviour if the value
5457
is finite but larger or smaller than the largest or smallest finite
5558
value representable by f32**. This is a bug and will be fixed.

0 commit comments

Comments
 (0)