@@ -10,34 +10,61 @@ Convergence And Uniformity
10
10
Introduction
11
11
============
12
12
13
- Some parallel environments execute threads in groups that allow
14
- communication within the group using special primitives called
15
- *convergent * operations. The outcome of a convergent operation is
16
- sensitive to the set of threads that executes it "together", i.e.,
17
- convergently.
18
-
19
- A value is said to be *uniform * across a set of threads if it is the
20
- same across those threads, and *divergent * otherwise. Correspondingly,
21
- a branch is said to be a uniform branch if its condition is uniform,
22
- and it is a divergent branch otherwise.
23
-
24
- Whether threads are *converged * or not depends on the paths they take
25
- through the control flow graph. Threads take different outgoing edges
26
- at a *divergent branch *. Divergent branches constrain
13
+ In some environments, groups of threads execute the same program in parallel,
14
+ where efficient communication within a group is established using special
15
+ primitives called :ref: `convergent operations<convergent_operations> `. The
16
+ outcome of a convergent operation is sensitive to the set of threads that
17
+ participate in it.
18
+
19
+ The intuitive picture of *convergence * is built around threads executing in
20
+ "lock step" --- a set of threads is thought of as *converged * if they are all
21
+ executing "the same sequence of instructions together". Such threads may
22
+ *diverge * at a *divergent branch *, and they may later *reconverge * at some
23
+ common program point.
24
+
25
+ In this intuitive picture, when converged threads execute an instruction, the
26
+ resulting value is said to be *uniform * if it is the same in those threads, and
27
+ *divergent * otherwise. Correspondingly, a branch is said to be a uniform branch
28
+ if its condition is uniform, and it is a divergent branch otherwise.
29
+
30
+ But the assumption of lock-step execution is not necessary for describing
31
+ communication at convergent operations. It also constrains the implementation
32
+ (compiler as well as hardware) by overspecifying how threads execute in such a
33
+ parallel environment. To eliminate this assumption:
34
+
35
+ - We define convergence as a relation between the execution of each instruction
36
+ by different threads and not as a relation between the threads themselves.
37
+ This definition is reasonable for known targets and is compatible with the
38
+ semantics of :ref: `convergent operations<convergent_operations> ` in LLVM IR.
39
+ - We also define uniformity in terms of this convergence. The output of an
40
+ instruction can be examined for uniformity across multiple threads only if the
41
+ corresponding executions of that instruction are converged.
42
+
43
+ This document decribes a static analysis for determining convergence at each
44
+ instruction in a function. The analysis extends previous work on divergence
45
+ analysis [DivergenceSPMD ]_ to cover irreducible control-flow. The described
46
+ analysis is used in LLVM to implement a UniformityAnalysis that determines the
47
+ uniformity of value(s) computed at each instruction in an LLVM IR or MIR
48
+ function.
49
+
50
+ .. [DivergenceSPMD ] Julian Rosemann, Simon Moll, and Sebastian
51
+ Hack. 2021. An Abstract Interpretation for SPMD Divergence on
52
+ Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
53
+ Article 31 (January 2021), 35 pages.
54
+ https://doi.org/10.1145/3434312
55
+
56
+ Motivation
57
+ ==========
58
+
59
+ Divergent branches constrain
27
60
program transforms such as changing the CFG or moving a convergent
28
61
operation to a different point of the CFG. Performing these
29
62
transformations across a divergent branch can change the sets of
30
63
threads that execute convergent operations convergently. While these
31
- constraints are out of scope for this document, the described
32
- * uniformity analysis * allows these transformations to identify
64
+ constraints are out of scope for this document,
65
+ uniformity analysis allows these transformations to identify
33
66
uniform branches where these constraints do not hold.
34
67
35
- Convergence and
36
- uniformity are inter-dependent: When threads diverge at a divergent
37
- branch, they may later *reconverge * at a common program point.
38
- Subsequent operations are performed convergently, but the inputs may
39
- be non-uniform, thus producing divergent outputs.
40
-
41
68
Uniformity is also useful by itself on targets that execute threads in
42
69
groups with shared execution resources (e.g. waves, warps, or
43
70
subgroups):
@@ -50,18 +77,6 @@ subgroups):
50
77
branches, since the whole group of threads follows either one side
51
78
of the branch or the other.
52
79
53
- This document presents a definition of convergence that is reasonable
54
- for real targets and is compatible with the currently implicit
55
- semantics of convergent operations in LLVM IR. This is accompanied by
56
- a *uniformity analysis * that extends previous work on divergence analysis
57
- [DivergenceSPMD ]_ to cover irreducible control-flow.
58
-
59
- .. [DivergenceSPMD ] Julian Rosemann, Simon Moll, and Sebastian
60
- Hack. 2021. An Abstract Interpretation for SPMD Divergence on
61
- Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
62
- Article 31 (January 2021), 35 pages.
63
- https://doi.org/10.1145/3434312
64
-
65
80
Terminology
66
81
===========
67
82
@@ -133,12 +148,6 @@ meaning. Dynamic instances listed in the same column are converged.
133
148
Convergence
134
149
===========
135
150
136
- *Converged-with * is a transitive symmetric relation over dynamic
137
- instances produced by *different threads * for the *same static
138
- instance *. Informally, two threads that produce converged dynamic
139
- instances are said to be *converged *, and they are said to execute
140
- that static instance *convergently *, at that point in the execution.
141
-
142
151
*Convergence-before * is a strict partial order over dynamic instances
143
152
that is defined as the transitive closure of:
144
153
@@ -171,11 +180,16 @@ to be converged (i.e., related to each other in the converged-with
171
180
relation). The resulting convergence order includes the edges ``P ->
172
181
Q2 ``, ``Q1 -> R ``, ``P -> R ``, ``P -> T ``, etc.
173
182
174
- The fact that *convergence-before * is a strict partial order is a
175
- constraint on the *converged-with * relation. It is trivially satisfied
176
- if different dynamic instances are never converged. It is also
177
- trivially satisfied for all known implementations for which
178
- convergence plays some role.
183
+ *Converged-with * is a transitive symmetric relation over dynamic instances
184
+ produced by *different threads * for the *same static instance *.
185
+
186
+ It is impractical to provide any one definition for the *converged-with *
187
+ relation, since different environments may wish to relate dynamic instances in
188
+ different ways. The fact that *convergence-before * is a strict partial order is
189
+ a constraint on the *converged-with * relation. It is trivially satisfied if
190
+ different dynamic instances are never converged. Below, we provide a relation
191
+ called :ref: `maximal converged-with<convergence-maximal> `, which satisifies
192
+ *convergence-before * and is suitable for known targets.
179
193
180
194
.. _convergence-note-convergence :
181
195
@@ -217,14 +231,16 @@ iterations of parent cycles as well.
217
231
218
232
Dynamic instances ``X1 `` and ``X2 `` produced by different threads
219
233
for the same static instance ``X `` are converged in the maximal
220
- converged-with relation if and only if for every cycle ``C `` with
221
- header ``H `` that contains ``X ``:
222
-
223
- - every dynamic instance ``H1 `` of ``H `` that precedes ``X1 `` in
224
- the respective thread is convergence-before ``X2 ``, and,
225
- - every dynamic instance ``H2 `` of ``H `` that precedes ``X2 `` in
226
- the respective thread is convergence-before ``X1 ``,
227
- - without assuming that ``X1 `` is converged with ``X2 ``.
234
+ converged-with relation if and only if:
235
+
236
+ - ``X `` is not contained in any cycle, or,
237
+ - For every cycle ``C `` with header ``H `` that contains ``X ``:
238
+
239
+ - every dynamic instance ``H1 `` of ``H `` that precedes ``X1 `` in
240
+ the respective thread is convergence-before ``X2 ``, and,
241
+ - every dynamic instance ``H2 `` of ``H `` that precedes ``X2 `` in
242
+ the respective thread is convergence-before ``X1 ``,
243
+ - without assuming that ``X1 `` is converged with ``X2 ``.
228
244
229
245
.. note ::
230
246
0 commit comments