Skip to content

Commit 8a45cec

Browse files
authored
[LangRef] adjust IR atomics specification following C++20 model tweaks. (#77263)
C++20 accepted two papers, [P0668](https://wg21.link/P0668) and [P0982](https://wg21.link/P0982), which changed the atomics memory model slightly in order to reflect the realities of the existing implementations. The rationale for these changes applies as well to the LLVM IR atomics model. No code changes are expected to be required from this change: it is primarily a matter of more-correctly-documenting the existing state of the world. There's three changes: two of them weaken guarantees, and one strengthens them: 1. The memory ordering guaranteed by some backends/CPUs when seq_cst operations are mixed with acquire/release operations on the same location was weaker than the spec guaranteed. Therefore, the specification is changed to remove the requirement that seq_cst ordering is consistent with happens-before, and replaces it with a slightly weaker requirement of consistency with a new relation named strongly-happens-before. 2. The rules for a "release sequence" were weakened. Previously, an acquire synchronizes with an release even if it observes a later monotonic store from the same thread as the release store. That has now been removed: now, only read-modify-write operations can extend a release sequence. 3. The model for a a seq_cst fence is strengthened, such that placing a seq_cst between monotonic accesses now _is_ sufficient to guarantee sequential consistency in the model (as it always has been on existing implementations.) Note that I've directly referenced the C++ standard's atomics.order section for the precise semantics of seq_cst, instead of fully describing them. They are quite complex, and a lot of work has gone into refining the words in the standard. I'm afraid if I attempt to reiterate them, I would only introduce errors.
1 parent 3942027 commit 8a45cec

File tree

3 files changed

+63
-59
lines changed

3 files changed

+63
-59
lines changed

llvm/docs/Atomics.rst

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,16 @@ asynchronous signals.
1414
The atomic instructions are designed specifically to provide readable IR and
1515
optimized code generation for the following:
1616

17-
* The C++11 ``<atomic>`` header. (`C++11 draft available here
18-
<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
19-
<http://www.open-std.org/jtc1/sc22/wg14/>`_.)
17+
* The C++ ``<atomic>`` header and C ``<stdatomic.h>`` headers. These
18+
were originally added in C++11 and C11. The memory model has been
19+
subsequently adjusted to correct errors in the initial
20+
specification, so LLVM currently intends to implement the version
21+
specified by C++20. (See the `C++20 draft standard
22+
<https://isocpp.org/files/papers/N4860.pdf>`_ or the unofficial
23+
`latest C++ draft <https://eel.is/c++draft/>`_. A `C2x draft
24+
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf>`_ is
25+
also available, though the text has not yet been updated with the
26+
errata corrected by C++20.)
2027

2128
* Proper semantics for Java-style memory, for both ``volatile`` and regular
2229
shared variables. (`Java Specification
@@ -110,13 +117,14 @@ where threads and signals are involved.
110117
atomic store (where the store is conditional for ``cmpxchg``), but no other
111118
memory operation can happen on any thread between the load and store.
112119

113-
A ``fence`` provides Acquire and/or Release ordering which is not part of
114-
another operation; it is normally used along with Monotonic memory operations.
115-
A Monotonic load followed by an Acquire fence is roughly equivalent to an
116-
Acquire load, and a Monotonic store following a Release fence is roughly
117-
equivalent to a Release store. SequentiallyConsistent fences behave as both
118-
an Acquire and a Release fence, and offer some additional complicated
119-
guarantees, see the C++11 standard for details.
120+
A ``fence`` provides Acquire and/or Release ordering which is not part
121+
of another operation; it is normally used along with Monotonic memory
122+
operations. A Monotonic load followed by an Acquire fence is roughly
123+
equivalent to an Acquire load, and a Monotonic store following a
124+
Release fence is roughly equivalent to a Release
125+
store. SequentiallyConsistent fences behave as both an Acquire and a
126+
Release fence, and additionally provide a total ordering with some
127+
complicated guarantees, see the C++ standard for details.
120128

121129
Frontends generating atomic instructions generally need to be aware of the
122130
target to some degree; atomic instructions are guaranteed to be lock-free, and
@@ -222,7 +230,7 @@ essentially guarantees that if you take all the operations affecting a specific
222230
address, a consistent ordering exists.
223231

224232
Relevant standard
225-
This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
233+
This corresponds to the C++/C ``memory_order_relaxed``; see those
226234
standards for the exact definition.
227235

228236
Notes for frontends
@@ -252,8 +260,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
252260
other memory with normal loads and stores.
253261

254262
Relevant standard
255-
This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
256-
used for C++11/C11 ``memory_order_consume``.
263+
This corresponds to the C++/C ``memory_order_acquire``. It should also be
264+
used for C++/C ``memory_order_consume``.
257265

258266
Notes for frontends
259267
If you are writing a frontend which uses this directly, use with caution.
@@ -282,7 +290,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
282290
release a lock.
283291

284292
Relevant standard
285-
This corresponds to the C++11/C11 ``memory_order_release``.
293+
This corresponds to the C++/C ``memory_order_release``.
286294

287295
Notes for frontends
288296
If you are writing a frontend which uses this directly, use with caution.
@@ -308,7 +316,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
308316
barrier (for fences and operations which both read and write memory).
309317

310318
Relevant standard
311-
This corresponds to the C++11/C11 ``memory_order_acq_rel``.
319+
This corresponds to the C++/C ``memory_order_acq_rel``.
312320

313321
Notes for frontends
314322
If you are writing a frontend which uses this directly, use with caution.
@@ -331,7 +339,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
331339
ordering exists between all SequentiallyConsistent operations.
332340

333341
Relevant standard
334-
This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
342+
This corresponds to the C++/C ``memory_order_seq_cst``, Java volatile, and
335343
the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
336344

337345
Notes for frontends

llvm/docs/LangRef.rst

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3312,15 +3312,15 @@ Memory Model for Concurrent Operations
33123312
The LLVM IR does not define any way to start parallel threads of
33133313
execution or to register signal handlers. Nonetheless, there are
33143314
platform-specific ways to create them, and we define LLVM IR's behavior
3315-
in their presence. This model is inspired by the C++0x memory model.
3315+
in their presence. This model is inspired by the C++ memory model.
33163316

33173317
For a more informal introduction to this model, see the :doc:`Atomics`.
33183318

33193319
We define a *happens-before* partial order as the least partial order
33203320
that
33213321

33223322
- Is a superset of single-thread program order, and
3323-
- When a *synchronizes-with* ``b``, includes an edge from ``a`` to
3323+
- When ``a`` *synchronizes-with* ``b``, includes an edge from ``a`` to
33243324
``b``. *Synchronizes-with* pairs are introduced by platform-specific
33253325
techniques, like pthread locks, thread creation, thread joining,
33263326
etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
@@ -3384,13 +3384,12 @@ Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
33843384
:ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
33853385
:ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
33863386
ordering parameters that determine which other atomic instructions on
3387-
the same address they *synchronize with*. These semantics are borrowed
3388-
from Java and C++0x, but are somewhat more colloquial. If these
3389-
descriptions aren't precise enough, check those specs (see spec
3390-
references in the :doc:`atomics guide <Atomics>`).
3391-
:ref:`fence <i_fence>` instructions treat these orderings somewhat
3392-
differently since they don't take an address. See that instruction's
3393-
documentation for details.
3387+
the same address they *synchronize with*. These semantics implement
3388+
the Java or C++ memory models; if these descriptions aren't precise
3389+
enough, check those specs (see spec references in the
3390+
:doc:`atomics guide <Atomics>`). :ref:`fence <i_fence>` instructions
3391+
treat these orderings somewhat differently since they don't take an
3392+
address. See that instruction's documentation for details.
33943393

33953394
For a simpler introduction to the ordering constraints, see the
33963395
:doc:`Atomics`.
@@ -3418,32 +3417,37 @@ For a simpler introduction to the ordering constraints, see the
34183417
stronger) operations on the same address. If an address is written
34193418
``monotonic``-ally by one thread, and other threads ``monotonic``-ally
34203419
read that address repeatedly, the other threads must eventually see
3421-
the write. This corresponds to the C++0x/C1x
3422-
``memory_order_relaxed``.
3420+
the write. This corresponds to the C/C++ ``memory_order_relaxed``.
34233421
``acquire``
34243422
In addition to the guarantees of ``monotonic``, a
34253423
*synchronizes-with* edge may be formed with a ``release`` operation.
3426-
This is intended to model C++'s ``memory_order_acquire``.
3424+
This is intended to model C/C++'s ``memory_order_acquire``.
34273425
``release``
34283426
In addition to the guarantees of ``monotonic``, if this operation
34293427
writes a value which is subsequently read by an ``acquire``
3430-
operation, it *synchronizes-with* that operation. (This isn't a
3431-
complete description; see the C++0x definition of a release
3432-
sequence.) This corresponds to the C++0x/C1x
3428+
operation, it *synchronizes-with* that operation. Furthermore,
3429+
this occurs even if the value written by a ``release`` operation
3430+
has been modified by a read-modify-write operation before being
3431+
read. (Such a set of operations comprises a *release
3432+
sequence*). This corresponds to the C/C++
34333433
``memory_order_release``.
34343434
``acq_rel`` (acquire+release)
34353435
Acts as both an ``acquire`` and ``release`` operation on its
3436-
address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
3436+
address. This corresponds to the C/C++ ``memory_order_acq_rel``.
34373437
``seq_cst`` (sequentially consistent)
34383438
In addition to the guarantees of ``acq_rel`` (``acquire`` for an
34393439
operation that only reads, ``release`` for an operation that only
34403440
writes), there is a global total order on all
3441-
sequentially-consistent operations on all addresses, which is
3442-
consistent with the *happens-before* partial order and with the
3443-
modification orders of all the affected addresses. Each
3441+
sequentially-consistent operations on all addresses. Each
34443442
sequentially-consistent read sees the last preceding write to the
3445-
same address in this global order. This corresponds to the C++0x/C1x
3446-
``memory_order_seq_cst`` and Java volatile.
3443+
same address in this global order. This corresponds to the C/C++
3444+
``memory_order_seq_cst`` and Java ``volatile``.
3445+
3446+
Note: this global total order is *not* guaranteed to be fully
3447+
consistent with the *happens-before* partial order if
3448+
non-``seq_cst`` accesses are involved. See the C++ standard
3449+
`[atomics.order] <https://wg21.link/atomics.order>`_ section
3450+
for more details on the exact guarantees.
34473451

34483452
.. _syncscope:
34493453

@@ -10762,7 +10766,13 @@ still *synchronize-with* the explicit ``fence`` and establish the
1076210766

1076310767
A ``fence`` which has ``seq_cst`` ordering, in addition to having both
1076410768
``acquire`` and ``release`` semantics specified above, participates in
10765-
the global program order of other ``seq_cst`` operations and/or fences.
10769+
the global program order of other ``seq_cst`` operations and/or
10770+
fences. Furthermore, the global ordering created by a ``seq_cst``
10771+
fence must be compatible with the individual total orders of
10772+
``monotonic`` (or stronger) memory accesses occurring before and after
10773+
such a fence. The exact semantics of this interaction are somewhat
10774+
complicated, see the C++ standard's `[atomics.order]
10775+
<https://wg21.link/atomics.order>`_ section for more details.
1076610776

1076710777
A ``fence`` instruction can also take an optional
1076810778
":ref:`syncscope <syncscope>`" argument.

llvm/include/llvm/CodeGen/TargetLowering.h

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2166,27 +2166,13 @@ class TargetLoweringBase {
21662166
/// This function should either return a nullptr, or a pointer to an IR-level
21672167
/// Instruction*. Even complex fence sequences can be represented by a
21682168
/// single Instruction* through an intrinsic to be lowered later.
2169-
/// Backends should override this method to produce target-specific intrinsic
2170-
/// for their fences.
2171-
/// FIXME: Please note that the default implementation here in terms of
2172-
/// IR-level fences exists for historical/compatibility reasons and is
2173-
/// *unsound* ! Fences cannot, in general, be used to restore sequential
2174-
/// consistency. For example, consider the following example:
2175-
/// atomic<int> x = y = 0;
2176-
/// int r1, r2, r3, r4;
2177-
/// Thread 0:
2178-
/// x.store(1);
2179-
/// Thread 1:
2180-
/// y.store(1);
2181-
/// Thread 2:
2182-
/// r1 = x.load();
2183-
/// r2 = y.load();
2184-
/// Thread 3:
2185-
/// r3 = y.load();
2186-
/// r4 = x.load();
2187-
/// r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all
2188-
/// seq_cst. But if they are lowered to monotonic accesses, no amount of
2189-
/// IR-level fences can prevent it.
2169+
///
2170+
/// The default implementation emits an IR fence before any release (or
2171+
/// stronger) operation that stores, and after any acquire (or stronger)
2172+
/// operation. This is generally a correct implementation, but backends may
2173+
/// override if they wish to use alternative schemes (e.g. the PowerPC
2174+
/// standard ABI uses a fence before a seq_cst load instead of after a
2175+
/// seq_cst store).
21902176
/// @{
21912177
virtual Instruction *emitLeadingFence(IRBuilderBase &Builder,
21922178
Instruction *Inst,

0 commit comments

Comments
 (0)