Skip to content

Commit 706f745

Browse files
committed
[InstCombine] Add contributor guide
1 parent 628a79d commit 706f745

File tree

2 files changed

+329
-0
lines changed

2 files changed

+329
-0
lines changed
Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
# InstCombine contributor guide
2+
3+
This guide lays out a series of rules that contributions to InstCombine should
4+
follow. **Following these rules will results in much faster PR approvals.**
5+
6+
## Tests
7+
8+
### Precommit tests
9+
10+
Tests for new optimizations or miscompilation fixes should be pre-committed.
11+
This means that you first commit the test with CHECK lines prior to your fix.
12+
Your actual change will then only contain CHECK line diffs relative to that
13+
baseline.
14+
15+
This means that pull requests should generally contain two commits: First,
16+
one commit adding new tests with baseline check lines. Second, a commit with
17+
functional changes and test diffs.
18+
19+
If the second commit in your PR does not contain test diffs, you did something
20+
wrong. Either you made a mistake when generating CHECK lines, or your tests are
21+
not actually affected by your patch.
22+
23+
Exceptions: When fixing assertion failures or infinite loops, do not pre-commit
24+
tests.
25+
26+
### Use `update_test_checks.py`
27+
28+
CHECK lines should be generated using the `update_test_checks.py` script. Do
29+
**not** manually edit check lines after using it.
30+
31+
Be sure to use the correct opt binary when using the script. For example, if
32+
your build directory is `build`, then you'll want to run:
33+
34+
```sh
35+
llvm/utils/update_test_checks.py --opt-binary build/bin/opt \
36+
llvm/test/Transforms/InstCombine/the_test.ll
37+
```
38+
39+
Exceptions: Hand-written CHECK lines are allowed for debuginfo tests.
40+
41+
### General testing considerations
42+
43+
Place all tests relating to a transform into a single file. If you are adding
44+
a regression test for a crash/miscompile in an existing transform, find the
45+
file where the existing tests are located. A good way to do that is to comment
46+
out the transform and see which tests fail.
47+
48+
Make tests minimal. Only test exactly the pattern being transformed. If your
49+
original motivating case is a larger pattern that your fold enables to
50+
optimize in some non-trivial way, you may add it as well -- however, the bulk
51+
of the test coverage should be minimal.
52+
53+
Give tests short, but meaningful names. Don't call them `@test1`, `@test2` etc.
54+
For example, a test checking multi-use behavior of a fold involving the
55+
addition of two selects might be called `@add_of_selects_multi_use`.
56+
57+
Add representative tests for each test category (discussed below), but don't
58+
test all combinations of everything. If you have multi-use tests, and you have
59+
commuted tests, you shouldn't also add commuted multi-use tests.
60+
61+
Prefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible.
62+
63+
### Add negative tests
64+
65+
Make sure to add tests for which your transform does **not** apply. Start with
66+
one of the test cases that succeeds and then create a sequence of negative
67+
tests, such that **exactly one** different pre-condition of your transform is
68+
not satisfied in each test.
69+
70+
### Add multi-use tests
71+
72+
Add multi-use tests that ensures your transform does not increase instruction
73+
count if some instructions have additional uses. The standard pattern is to
74+
introduce extra uses with function calls:
75+
76+
```llvm
77+
declare void @use(i8)
78+
79+
define i8 @add_mul_const_multi_use(i8 %x) {
80+
%add = add i8 %x, 1
81+
call void @use(i8 %add)
82+
%mul = mul i8 %add, 3
83+
ret i8 %mul
84+
}
85+
```
86+
87+
Exceptions: For transform that only produce one instruction, multi-use tests
88+
may be omitted.
89+
90+
### Add commuted tests
91+
92+
If the transform involves commutative operations, add tests with commuted
93+
(swapped) operands.
94+
95+
Make sure that that the operand order stays intact in the CHECK lines of your
96+
pre-commited tests. You should not see something like this:
97+
98+
```llvm
99+
; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]]
100+
; ...
101+
%or = or i8 %y, %x
102+
```
103+
104+
If this happens, you may need to change one of the operands to have higher
105+
complexity (include the "thwart" comment in that case):
106+
107+
```llvm
108+
%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization
109+
%or = or i8 %y, %x
110+
```
111+
112+
### Add vector tests
113+
114+
When possible, it is recommended to add at least one test that uses vectors
115+
instead of scalars.
116+
117+
For patterns that include constants, we distinguish three kinds of tests.
118+
The first are "splat" vectors, where all the vector elements are the same.
119+
These tests *should* usually fold without additional effort.
120+
121+
```llvm
122+
define <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) {
123+
%add = add <2 x i8> %x, <i8 1, i8 1>
124+
%mul = mul <2 x i8> %add, <i8 3, i8 3>
125+
ret <2 x i8> %mul
126+
}
127+
```
128+
129+
A minor variant is to replace some of the splat elements with poison. These
130+
will often also fold without additional effort.
131+
132+
```llvm
133+
define <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) {
134+
%add = add <2 x i8> %x, <i8 1, i8 poison>
135+
%mul = mul <2 x i8> %add, <i8 3, i8 poison>
136+
ret <2 x i8> %mul
137+
}
138+
```
139+
140+
Finally, you can have non-splat vectors, where the vector elements are not
141+
the same:
142+
143+
```llvm
144+
define <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) {
145+
%add = add <2 x i8> %x, <i8 1, i8 5>
146+
%mul = mul <2 x i8> %add, <i8 3, i8 6>
147+
ret <2 x i8> %mul
148+
}
149+
```
150+
151+
Non-splat vectors will often not fold by default. You should **not** try to
152+
make them fold, unless doing so does not add **any** additional complexity.
153+
You should still add the test though, even if it does not fold.
154+
155+
### Poison flag tests
156+
157+
If your transform involves instructions that can have poison-generating flags,
158+
such as `nuw` and `nsw` on `add`, you should test how these interact with the
159+
transform.
160+
161+
If your transform *requires* a certain flag for correctness, make sure to add
162+
negative tests missing the required flag.
163+
164+
If your transform doesn't require flags for correctness, you should have tests
165+
for preservation behavior. If the input instructions have certain flags, are
166+
they preserved in the output instructions, if it is valid to preserve them?
167+
(This depends on the transform. Check with alive2.)
168+
169+
### Other tests
170+
171+
The test categories mentioned above are non-exhaustive. There may be more tests
172+
to be added, depending on the instructions involved in the transform. Some
173+
examples:
174+
175+
* For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled.
176+
* For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1.
177+
* For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly.
178+
179+
## Proofs
180+
181+
Your pull request description should contain one or more
182+
[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the
183+
proposed transform.
184+
185+
### Basics
186+
187+
Proofs are written using LLVM IR, by specifying a `@src` and `@tgt` function.
188+
It is possible to include multiple proofs in a single file by giving the src
189+
and tgt functions matching suffixes.
190+
191+
For example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can
192+
be simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)):
193+
194+
```llvm
195+
define i8 @src_add_sub(i8 %x, i8 %y) {
196+
%add = add i8 %x, %y
197+
%sub = sub i8 %add, %y
198+
ret i8 %sub
199+
}
200+
201+
define i8 @tgt_add_sub(i8 %x, i8 %y) {
202+
ret i8 %x
203+
}
204+
205+
206+
define i8 @src_sub_add(i8 %x, i8 %y) {
207+
%sub = sub i8 %x, %y
208+
%add = add i8 %sub, %y
209+
ret i8 %add
210+
}
211+
212+
define i8 @tgt_sub_add(i8 %x, i8 %y) {
213+
ret i8 %x
214+
}
215+
```
216+
217+
### Use generic values in proofs
218+
219+
Proofs should operate on generic values, rather than specific constants, to the degree that this is possible.
220+
221+
For example, this is a **bad** proof:
222+
223+
```llvm
224+
; Don't do this!
225+
define i8 @src(i8 %x) {
226+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 5)
227+
%umax = call i8 @llvm.umax.i8(i8 %smax, i8 7)
228+
ret i8 %umax
229+
}
230+
231+
define i8 @tgt(i8 %x) {
232+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 7)
233+
ret i8 %smax
234+
}
235+
```
236+
237+
The correct way to write this proof is as follows
238+
([online](https://alive2.llvm.org/ce/z/Sgfq37)):
239+
240+
```
241+
define i8 @src(i8 %x, i8 %c1, i8 %c2) {
242+
%cond1 = icmp sgt i8 %c1, -1
243+
call void @llvm.assume(i1 %cond1)
244+
%cond2 = icmp sgt i8 %c2, -1
245+
call void @llvm.assume(i1 %cond2)
246+
247+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 %c1)
248+
%umax = call i8 @llvm.umax.i8(i8 %smax, i8 %c2)
249+
ret i8 %umax
250+
}
251+
252+
define i8 @tgt(i8 %x, i8 %c1, i8 %c2) {
253+
%umax = call i8 @llvm.umax.i8(i8 %c1, i8 %c2)
254+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 %umax)
255+
ret i8 %smax
256+
}
257+
```
258+
259+
Note that the `@llvm.assume` intrinsic is used to specify pre-conditions for
260+
the transform. In this case, it specifies that the constants `%c1` and `%c2`
261+
must be non-negative.
262+
263+
It should be emphasized that there is, in general, no expectation that the
264+
IR in the proofs will be transformed by the implemented fold. In the above
265+
example, the transform would only actually apply if `%c1` and `%c2` are
266+
actually constants, but we need to use non-constants in the proof.
267+
268+
### Common pre-conditions
269+
270+
Here are some examples of common preconditions.
271+
272+
```llvm
273+
; %x is non-negative:
274+
%nonneg = icmp sgt i8 %x, -1
275+
call void @llvm.assume(i1 %nonneg)
276+
277+
; %x is a power of two:
278+
%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
279+
%pow2 = icmp eq i8 %x, 1
280+
call void @llvm.assume(i1 %pow2)
281+
282+
; %x is a power of two or zero:
283+
%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
284+
%pow2orzero = icmp ult i8 %x, 2
285+
call void @llvm.assume(i1 %pow2orzero)
286+
287+
; Adding %x and %y does not overflow in a signed sense:
288+
%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y)
289+
%ov = extractvalue { i8, i1 } %wo, 1
290+
%ov.not = xor i1 %ov, true
291+
call void @llvm.assume(i1 %ov.not)
292+
```
293+
294+
### Timeouts
295+
296+
Alive2 proofs will sometimes produce a timeout with the following message:
297+
298+
```
299+
Alive2 timed out while processing your query.
300+
There are a few things you can try:
301+
302+
- remove extraneous instructions, if any
303+
304+
- reduce variable widths, for example to i16, i8, or i4
305+
306+
- add the --disable-undef-input command line flag, which
307+
allows Alive2 to assume that arguments to your IR are not
308+
undef. This is, in general, unsound: it can cause Alive2
309+
to miss bugs.
310+
```
311+
312+
This is good advice, follow it!
313+
314+
Reducing the bitwidth usually helps. For floating point numbers, you can use
315+
the `half` type for bitwidth reduction purposes. For pointers, you can reduce
316+
the bitwidth by specifying a custom data layout:
317+
318+
```
319+
; For 16-bit pointers
320+
target datalayout = "p:16:16"
321+
```
322+
323+
If reducing the bitwidth does not help, try `-disable-undef-input`. If this
324+
still does not work, submit your proof despite the timeout.

llvm/docs/UserGuides.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ intermediate LLVM representation.
4343
HowToCrossCompileBuiltinsOnArm
4444
HowToCrossCompileLLVM
4545
HowToUpdateDebugInfo
46+
InstCombineContributorGuide
4647
InstrProfileFormat
4748
InstrRefDebugInfo
4849
LinkTimeOptimization
@@ -186,6 +187,10 @@ Optimizations
186187
:doc:`InstrProfileFormat`
187188
This document explains two binary formats of instrumentation-based profiles.
188189

190+
:doc:`InstCombineContributorGuide`
191+
This document specifies guidelines for contributions for InstCombine and
192+
related passes.
193+
189194
Code Generation
190195
---------------
191196

0 commit comments

Comments
 (0)