Skip to content

Commit 256d76f

Browse files
authored
[Docs] Improve the description of convergence (#89038)
- Clarify convergence of threads v/s convergence of operations. - Explicitly address operations that are not in any cycle. This was inspired by a discussion on Discourse: https://discourse.llvm.org/t/llvm-convergence-semantics/77642
1 parent 4cec3b3 commit 256d76f

File tree

2 files changed

+71
-54
lines changed

2 files changed

+71
-54
lines changed

llvm/docs/ConvergenceAndUniformity.rst

Lines changed: 69 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,61 @@ Convergence And Uniformity
1010
Introduction
1111
============
1212

13-
Some parallel environments execute threads in groups that allow
14-
communication within the group using special primitives called
15-
*convergent* operations. The outcome of a convergent operation is
16-
sensitive to the set of threads that executes it "together", i.e.,
17-
convergently.
18-
19-
A value is said to be *uniform* across a set of threads if it is the
20-
same across those threads, and *divergent* otherwise. Correspondingly,
21-
a branch is said to be a uniform branch if its condition is uniform,
22-
and it is a divergent branch otherwise.
23-
24-
Whether threads are *converged* or not depends on the paths they take
25-
through the control flow graph. Threads take different outgoing edges
26-
at a *divergent branch*. Divergent branches constrain
13+
In some environments, groups of threads execute the same program in parallel,
14+
where efficient communication within a group is established using special
15+
primitives called :ref:`convergent operations<convergent_operations>`. The
16+
outcome of a convergent operation is sensitive to the set of threads that
17+
participate in it.
18+
19+
The intuitive picture of *convergence* is built around threads executing in
20+
"lock step" --- a set of threads is thought of as *converged* if they are all
21+
executing "the same sequence of instructions together". Such threads may
22+
*diverge* at a *divergent branch*, and they may later *reconverge* at some
23+
common program point.
24+
25+
In this intuitive picture, when converged threads execute an instruction, the
26+
resulting value is said to be *uniform* if it is the same in those threads, and
27+
*divergent* otherwise. Correspondingly, a branch is said to be a uniform branch
28+
if its condition is uniform, and it is a divergent branch otherwise.
29+
30+
But the assumption of lock-step execution is not necessary for describing
31+
communication at convergent operations. It also constrains the implementation
32+
(compiler as well as hardware) by overspecifying how threads execute in such a
33+
parallel environment. To eliminate this assumption:
34+
35+
- We define convergence as a relation between the execution of each instruction
36+
by different threads and not as a relation between the threads themselves.
37+
This definition is reasonable for known targets and is compatible with the
38+
semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR.
39+
- We also define uniformity in terms of this convergence. The output of an
40+
instruction can be examined for uniformity across multiple threads only if the
41+
corresponding executions of that instruction are converged.
42+
43+
This document decribes a static analysis for determining convergence at each
44+
instruction in a function. The analysis extends previous work on divergence
45+
analysis [DivergenceSPMD]_ to cover irreducible control-flow. The described
46+
analysis is used in LLVM to implement a UniformityAnalysis that determines the
47+
uniformity of value(s) computed at each instruction in an LLVM IR or MIR
48+
function.
49+
50+
.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
51+
Hack. 2021. An Abstract Interpretation for SPMD Divergence on
52+
Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
53+
Article 31 (January 2021), 35 pages.
54+
https://doi.org/10.1145/3434312
55+
56+
Motivation
57+
==========
58+
59+
Divergent branches constrain
2760
program transforms such as changing the CFG or moving a convergent
2861
operation to a different point of the CFG. Performing these
2962
transformations across a divergent branch can change the sets of
3063
threads that execute convergent operations convergently. While these
31-
constraints are out of scope for this document, the described
32-
*uniformity analysis* allows these transformations to identify
64+
constraints are out of scope for this document,
65+
uniformity analysis allows these transformations to identify
3366
uniform branches where these constraints do not hold.
3467

35-
Convergence and
36-
uniformity are inter-dependent: When threads diverge at a divergent
37-
branch, they may later *reconverge* at a common program point.
38-
Subsequent operations are performed convergently, but the inputs may
39-
be non-uniform, thus producing divergent outputs.
40-
4168
Uniformity is also useful by itself on targets that execute threads in
4269
groups with shared execution resources (e.g. waves, warps, or
4370
subgroups):
@@ -50,18 +77,6 @@ subgroups):
5077
branches, since the whole group of threads follows either one side
5178
of the branch or the other.
5279

53-
This document presents a definition of convergence that is reasonable
54-
for real targets and is compatible with the currently implicit
55-
semantics of convergent operations in LLVM IR. This is accompanied by
56-
a *uniformity analysis* that extends previous work on divergence analysis
57-
[DivergenceSPMD]_ to cover irreducible control-flow.
58-
59-
.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
60-
Hack. 2021. An Abstract Interpretation for SPMD Divergence on
61-
Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
62-
Article 31 (January 2021), 35 pages.
63-
https://doi.org/10.1145/3434312
64-
6580
Terminology
6681
===========
6782

@@ -133,12 +148,6 @@ meaning. Dynamic instances listed in the same column are converged.
133148
Convergence
134149
===========
135150

136-
*Converged-with* is a transitive symmetric relation over dynamic
137-
instances produced by *different threads* for the *same static
138-
instance*. Informally, two threads that produce converged dynamic
139-
instances are said to be *converged*, and they are said to execute
140-
that static instance *convergently*, at that point in the execution.
141-
142151
*Convergence-before* is a strict partial order over dynamic instances
143152
that is defined as the transitive closure of:
144153

@@ -171,11 +180,16 @@ to be converged (i.e., related to each other in the converged-with
171180
relation). The resulting convergence order includes the edges ``P ->
172181
Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.
173182

174-
The fact that *convergence-before* is a strict partial order is a
175-
constraint on the *converged-with* relation. It is trivially satisfied
176-
if different dynamic instances are never converged. It is also
177-
trivially satisfied for all known implementations for which
178-
convergence plays some role.
183+
*Converged-with* is a transitive symmetric relation over dynamic instances
184+
produced by *different threads* for the *same static instance*.
185+
186+
It is impractical to provide any one definition for the *converged-with*
187+
relation, since different environments may wish to relate dynamic instances in
188+
different ways. The fact that *convergence-before* is a strict partial order is
189+
a constraint on the *converged-with* relation. It is trivially satisfied if
190+
different dynamic instances are never converged. Below, we provide a relation
191+
called :ref:`maximal converged-with<convergence-maximal>`, which satisifies
192+
*convergence-before* and is suitable for known targets.
179193

180194
.. _convergence-note-convergence:
181195

@@ -217,14 +231,16 @@ iterations of parent cycles as well.
217231

218232
Dynamic instances ``X1`` and ``X2`` produced by different threads
219233
for the same static instance ``X`` are converged in the maximal
220-
converged-with relation if and only if for every cycle ``C`` with
221-
header ``H`` that contains ``X``:
222-
223-
- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
224-
the respective thread is convergence-before ``X2``, and,
225-
- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
226-
the respective thread is convergence-before ``X1``,
227-
- without assuming that ``X1`` is converged with ``X2``.
234+
converged-with relation if and only if:
235+
236+
- ``X`` is not contained in any cycle, or,
237+
- For every cycle ``C`` with header ``H`` that contains ``X``:
238+
239+
- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
240+
the respective thread is convergence-before ``X2``, and,
241+
- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
242+
the respective thread is convergence-before ``X1``,
243+
- without assuming that ``X1`` is converged with ``X2``.
228244

229245
.. note::
230246

llvm/docs/ConvergentOperations.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -936,7 +936,8 @@ property <uniformity-analysis>` of static instances in the convergence region of
936936
1. Both threads executed converged dynamic instances of every token
937937
definition ``D`` such that ``X`` is in the convergence region of ``D``,
938938
and,
939-
2. For every cycle ``C`` with header ``H`` that contains ``X``:
939+
2. Either ``X`` is not contained in any cycle, or, for every cycle ``C``
940+
with header ``H`` that contains ``X``:
940941

941942
- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in the
942943
respective thread is convergence-before ``X2``, and,

0 commit comments

Comments
 (0)