17
17
The C11 memory model is fundamentally about trying to bridge the gap between the
18
18
semantics we want, the optimizations compilers want, and the inconsistent chaos
19
19
our hardware wants. * We* would like to just write programs and have them do
20
- exactly what we said but, you know, fast. Wouldn't that be great?
20
+ exactly what we said but, you know, * fast* . Wouldn't that be great?
21
21
22
22
23
23
@@ -35,20 +35,20 @@ y = 3;
35
35
x = 2;
36
36
```
37
37
38
- The compiler may conclude that it would be best if your program did
38
+ The compiler may conclude that it would * really * be best if your program did
39
39
40
40
``` rust,ignore
41
41
x = 2;
42
42
y = 3;
43
43
```
44
44
45
- This has inverted the order of events and completely eliminated one event.
45
+ This has inverted the order of events * and* completely eliminated one event.
46
46
From a single-threaded perspective this is completely unobservable: after all
47
47
the statements have executed we are in exactly the same state. But if our
48
- program is multi-threaded, we may have been relying on ` x ` to actually be
49
- assigned to 1 before ` y ` was assigned. We would like the compiler to be
48
+ program is multi-threaded, we may have been relying on ` x ` to * actually* be
49
+ assigned to 1 before ` y ` was assigned. We would * really * like the compiler to be
50
50
able to make these kinds of optimizations, because they can seriously improve
51
- performance. On the other hand, we'd also like to be able to depend on our
51
+ performance. On the other hand, we'd really like to be able to depend on our
52
52
program * doing the thing we said* .
53
53
54
54
@@ -57,15 +57,15 @@ program *doing the thing we said*.
57
57
# Hardware Reordering
58
58
59
59
On the other hand, even if the compiler totally understood what we wanted and
60
- respected our wishes, our hardware might instead get us in trouble. Trouble
60
+ respected our wishes, our * hardware* might instead get us in trouble. Trouble
61
61
comes from CPUs in the form of memory hierarchies. There is indeed a global
62
62
shared memory space somewhere in your hardware, but from the perspective of each
63
63
CPU core it is * so very far away* and * so very slow* . Each CPU would rather work
64
- with its local cache of the data and only go through all the anguish of
65
- talking to shared memory only when it doesn't actually have that memory in
64
+ with its local cache of the data and only go through all the * anguish* of
65
+ talking to shared memory * only* when it doesn't actually have that memory in
66
66
cache.
67
67
68
- After all, that's the whole point of the cache, right? If every read from the
68
+ After all, that's the whole * point* of the cache, right? If every read from the
69
69
cache had to run back to shared memory to double check that it hadn't changed,
70
70
what would the point be? The end result is that the hardware doesn't guarantee
71
71
that events that occur in the same order on * one* thread, occur in the same
@@ -99,13 +99,13 @@ provides weak ordering guarantees. This has two consequences for concurrent
99
99
programming:
100
100
101
101
* Asking for stronger guarantees on strongly-ordered hardware may be cheap or
102
- even free because they already provide strong guarantees unconditionally.
102
+ even * free* because they already provide strong guarantees unconditionally.
103
103
Weaker guarantees may only yield performance wins on weakly-ordered hardware.
104
104
105
- * Asking for guarantees that are too weak on strongly-ordered hardware is
105
+ * Asking for guarantees that are * too* weak on strongly-ordered hardware is
106
106
more likely to * happen* to work, even though your program is strictly
107
- incorrect. If possible, concurrent algorithms should be tested on
108
- weakly- ordered hardware.
107
+ incorrect. If possible, concurrent algorithms should be tested on weakly-
108
+ ordered hardware.
109
109
110
110
111
111
@@ -115,10 +115,10 @@ programming:
115
115
116
116
The C11 memory model attempts to bridge the gap by allowing us to talk about the
117
117
* causality* of our program. Generally, this is by establishing a * happens
118
- before* relationship between parts of the program and the threads that are
118
+ before* relationships between parts of the program and the threads that are
119
119
running them. This gives the hardware and compiler room to optimize the program
120
120
more aggressively where a strict happens-before relationship isn't established,
121
- but forces them to be more careful where one is established. The way we
121
+ but forces them to be more careful where one * is * established. The way we
122
122
communicate these relationships are through * data accesses* and * atomic
123
123
accesses* .
124
124
@@ -130,10 +130,8 @@ propagate the changes made in data accesses to other threads as lazily and
130
130
inconsistently as it wants. Mostly critically, data accesses are how data races
131
131
happen. Data accesses are very friendly to the hardware and compiler, but as
132
132
we've seen they offer * awful* semantics to try to write synchronized code with.
133
- Actually, that's too weak.
134
-
135
- ** It is literally impossible to write correct synchronized code using only data
136
- accesses.**
133
+ Actually, that's too weak. * It is literally impossible to write correct
134
+ synchronized code using only data accesses* .
137
135
138
136
Atomic accesses are how we tell the hardware and compiler that our program is
139
137
multi-threaded. Each atomic access can be marked with an * ordering* that
@@ -143,10 +141,7 @@ they *can't* do. For the compiler, this largely revolves around re-ordering of
143
141
instructions. For the hardware, this largely revolves around how writes are
144
142
propagated to other threads. The set of orderings Rust exposes are:
145
143
146
- * Sequentially Consistent (SeqCst)
147
- * Release
148
- * Acquire
149
- * Relaxed
144
+ * Sequentially Consistent (SeqCst) Release Acquire Relaxed
150
145
151
146
(Note: We explicitly do not expose the C11 * consume* ordering)
152
147
@@ -159,13 +154,13 @@ synchronize"
159
154
160
155
Sequentially Consistent is the most powerful of all, implying the restrictions
161
156
of all other orderings. Intuitively, a sequentially consistent operation
162
- cannot be reordered: all accesses on one thread that happen before and after a
163
- SeqCst access stay before and after it. A data-race-free program that uses
157
+ * cannot* be reordered: all accesses on one thread that happen before and after a
158
+ SeqCst access * stay* before and after it. A data-race-free program that uses
164
159
only sequentially consistent atomics and data accesses has the very nice
165
160
property that there is a single global execution of the program's instructions
166
161
that all threads agree on. This execution is also particularly nice to reason
167
162
about: it's just an interleaving of each thread's individual executions. This
168
- does not hold if you start using the weaker atomic orderings.
163
+ * does not* hold if you start using the weaker atomic orderings.
169
164
170
165
The relative developer-friendliness of sequential consistency doesn't come for
171
166
free. Even on strongly-ordered platforms sequential consistency involves
@@ -175,8 +170,8 @@ In practice, sequential consistency is rarely necessary for program correctness.
175
170
However sequential consistency is definitely the right choice if you're not
176
171
confident about the other memory orders. Having your program run a bit slower
177
172
than it needs to is certainly better than it running incorrectly! It's also
178
- mechanically trivial to downgrade atomic operations to have a weaker
179
- consistency later on. Just change ` SeqCst ` to ` Relaxed ` and you're done! Of
173
+ * mechanically* trivial to downgrade atomic operations to have a weaker
174
+ consistency later on. Just change ` SeqCst ` to e.g. ` Relaxed ` and you're done! Of
180
175
course, proving that this transformation is * correct* is a whole other matter.
181
176
182
177
@@ -188,15 +183,15 @@ Acquire and Release are largely intended to be paired. Their names hint at their
188
183
use case: they're perfectly suited for acquiring and releasing locks, and
189
184
ensuring that critical sections don't overlap.
190
185
191
- Intuitively, an acquire access ensures that every access after it stays after
186
+ Intuitively, an acquire access ensures that every access after it * stays* after
192
187
it. However operations that occur before an acquire are free to be reordered to
193
188
occur after it. Similarly, a release access ensures that every access before it
194
- stays before it. However operations that occur after a release are free to be
189
+ * stays* before it. However operations that occur after a release are free to be
195
190
reordered to occur before it.
196
191
197
192
When thread A releases a location in memory and then thread B subsequently
198
193
acquires * the same* location in memory, causality is established. Every write
199
- that happened before A's release will be observed by B after its release.
194
+ that happened * before* A's release will be observed by B * after* its release.
200
195
However no causality is established with any other threads. Similarly, no
201
196
causality is established if A and B access * different* locations in memory.
202
197
@@ -235,7 +230,7 @@ weakly-ordered platforms.
235
230
# Relaxed
236
231
237
232
Relaxed accesses are the absolute weakest. They can be freely re-ordered and
238
- provide no happens-before relationship. Still, relaxed operations are still
233
+ provide no happens-before relationship. Still, relaxed operations * are* still
239
234
atomic. That is, they don't count as data accesses and any read-modify-write
240
235
operations done to them occur atomically. Relaxed operations are appropriate for
241
236
things that you definitely want to happen, but don't particularly otherwise care
0 commit comments