|
| 1 | +# InstCombine contributor guide |
| 2 | + |
| 3 | +This guide lays out a series of rules that contributions to InstCombine should |
| 4 | +follow. **Following these rules will results in much faster PR approvals.** |
| 5 | + |
| 6 | +## Tests |
| 7 | + |
| 8 | +### Precommit tests |
| 9 | + |
| 10 | +Tests for new optimizations or miscompilation fixes should be pre-committed. |
| 11 | +This means that you first commit the test with CHECK lines prior to your fix. |
| 12 | +Your actual change will then only contain CHECK line diffs relative to that |
| 13 | +baseline. |
| 14 | + |
| 15 | +This means that pull requests should generally contain two commits: First, |
| 16 | +one commit adding new tests with baseline check lines. Second, a commit with |
| 17 | +functional changes and test diffs. |
| 18 | + |
| 19 | +If the second commit in your PR does not contain test diffs, you did something |
| 20 | +wrong. Either you made a mistake when generating CHECK lines, or your tests are |
| 21 | +not actually affected by your patch. |
| 22 | + |
| 23 | +Exceptions: When fixing assertion failures or infinite loops, do not pre-commit |
| 24 | +tests. |
| 25 | + |
| 26 | +### Use `update_test_checks.py` |
| 27 | + |
| 28 | +CHECK lines should be generated using the `update_test_checks.py` script. Do |
| 29 | +**not** manually edit check lines after using it. |
| 30 | + |
| 31 | +Be sure to use the correct opt binary when using the script. For example, if |
| 32 | +your build directory is `build`, then you'll want to run: |
| 33 | + |
| 34 | +```sh |
| 35 | +llvm/utils/update_test_checks.py --opt-binary build/bin/opt \ |
| 36 | + llvm/test/Transforms/InstCombine/the_test.ll |
| 37 | +``` |
| 38 | + |
| 39 | +Exceptions: Hand-written CHECK lines are allowed for debuginfo tests. |
| 40 | + |
| 41 | +### General testing considerations |
| 42 | + |
| 43 | +Place all tests relating to a transform into a single file. If you are adding |
| 44 | +a regression test for a crash/miscompile in an existing transform, find the |
| 45 | +file where the existing tests are located. A good way to do that is to comment |
| 46 | +out the transform and see which tests fail. |
| 47 | + |
| 48 | +Make tests minimal. Only test exactly the pattern being transformed. If your |
| 49 | +original motivating case is a larger pattern that your fold enables to |
| 50 | +optimize in some non-trivial way, you may add it as well -- however, the bulk |
| 51 | +of the test coverage should be minimal. |
| 52 | + |
| 53 | +Give tests short, but meaningful names. Don't call them `@test1`, `@test2` etc. |
| 54 | +For example, a test checking multi-use behavior of a fold involving the |
| 55 | +addition of two selects might be called `@add_of_selects_multi_use`. |
| 56 | + |
| 57 | +Add representative tests for each test category (discussed below), but don't |
| 58 | +test all combinations of everything. If you have multi-use tests, and you have |
| 59 | +commuted tests, you shouldn't also add commuted multi-use tests. |
| 60 | + |
| 61 | +Prefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible. |
| 62 | + |
| 63 | +### Add negative tests |
| 64 | + |
| 65 | +Make sure to add tests for which your transform does **not** apply. Start with |
| 66 | +one of the test cases that succeeds and then create a sequence of negative |
| 67 | +tests, such that **exactly one** different pre-condition of your transform is |
| 68 | +not satisfied in each test. |
| 69 | + |
| 70 | +### Add multi-use tests |
| 71 | + |
| 72 | +Add multi-use tests that ensures your transform does not increase instruction |
| 73 | +count if some instructions have additional uses. The standard pattern is to |
| 74 | +introduce extra uses with function calls: |
| 75 | + |
| 76 | +```llvm |
| 77 | +declare void @use(i8) |
| 78 | +
|
| 79 | +define i8 @add_mul_const_multi_use(i8 %x) { |
| 80 | + %add = add i8 %x, 1 |
| 81 | + call void @use(i8 %add) |
| 82 | + %mul = mul i8 %add, 3 |
| 83 | + ret i8 %mul |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +Exceptions: For transform that only produce one instruction, multi-use tests |
| 88 | +may be omitted. |
| 89 | + |
| 90 | +### Add commuted tests |
| 91 | + |
| 92 | +If the transform involves commutative operations, add tests with commuted |
| 93 | +(swapped) operands. |
| 94 | + |
| 95 | +Make sure that that the operand order stays intact in the CHECK lines of your |
| 96 | +pre-commited tests. You should not see something like this: |
| 97 | + |
| 98 | +```llvm |
| 99 | +; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]] |
| 100 | +; ... |
| 101 | +%or = or i8 %y, %x |
| 102 | +``` |
| 103 | + |
| 104 | +If this happens, you may need to change one of the operands to have higher |
| 105 | +complexity (include the "thwart" comment in that case): |
| 106 | + |
| 107 | +```llvm |
| 108 | +%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization |
| 109 | +%or = or i8 %y, %x |
| 110 | +``` |
| 111 | + |
| 112 | +### Add vector tests |
| 113 | + |
| 114 | +When possible, it is recommended to add at least one test that uses vectors |
| 115 | +instead of scalars. |
| 116 | + |
| 117 | +For patterns that include constants, we distinguish three kinds of tests. |
| 118 | +The first are "splat" vectors, where all the vector elements are the same. |
| 119 | +These tests *should* usually fold without additional effort. |
| 120 | + |
| 121 | +```llvm |
| 122 | +define <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) { |
| 123 | + %add = add <2 x i8> %x, <i8 1, i8 1> |
| 124 | + %mul = mul <2 x i8> %add, <i8 3, i8 3> |
| 125 | + ret <2 x i8> %mul |
| 126 | +} |
| 127 | +``` |
| 128 | + |
| 129 | +A minor variant is to replace some of the splat elements with poison. These |
| 130 | +will often also fold without additional effort. |
| 131 | + |
| 132 | +```llvm |
| 133 | +define <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) { |
| 134 | + %add = add <2 x i8> %x, <i8 1, i8 poison> |
| 135 | + %mul = mul <2 x i8> %add, <i8 3, i8 poison> |
| 136 | + ret <2 x i8> %mul |
| 137 | +} |
| 138 | +``` |
| 139 | + |
| 140 | +Finally, you can have non-splat vectors, where the vector elements are not |
| 141 | +the same: |
| 142 | + |
| 143 | +```llvm |
| 144 | +define <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) { |
| 145 | + %add = add <2 x i8> %x, <i8 1, i8 5> |
| 146 | + %mul = mul <2 x i8> %add, <i8 3, i8 6> |
| 147 | + ret <2 x i8> %mul |
| 148 | +} |
| 149 | +``` |
| 150 | + |
| 151 | +Non-splat vectors will often not fold by default. You should **not** try to |
| 152 | +make them fold, unless doing so does not add **any** additional complexity. |
| 153 | +You should still add the test though, even if it does not fold. |
| 154 | + |
| 155 | +### Poison flag tests |
| 156 | + |
| 157 | +If your transform involves instructions that can have poison-generating flags, |
| 158 | +such as `nuw` and `nsw` on `add`, you should test how these interact with the |
| 159 | +transform. |
| 160 | + |
| 161 | +If your transform *requires* a certain flag for correctness, make sure to add |
| 162 | +negative tests missing the required flag. |
| 163 | + |
| 164 | +If your transform doesn't require flags for correctness, you should have tests |
| 165 | +for preservation behavior. If the input instructions have certain flags, are |
| 166 | +they preserved in the output instructions, if it is valid to preserve them? |
| 167 | +(This depends on the transform. Check with alive2.) |
| 168 | + |
| 169 | +### Other tests |
| 170 | + |
| 171 | +The test categories mentioned above are non-exhaustive. There may be more tests |
| 172 | +to be added, depending on the instructions involved in the transform. Some |
| 173 | +examples: |
| 174 | + |
| 175 | + * For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled. |
| 176 | + * For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1. |
| 177 | + * For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly. |
| 178 | + |
| 179 | +## Proofs |
| 180 | + |
| 181 | +Your pull request description should contain one or more |
| 182 | +[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the |
| 183 | +proposed transform. |
| 184 | + |
| 185 | +### Basics |
| 186 | + |
| 187 | +Proofs are written using LLVM IR, by specifying a `@src` and `@tgt` function. |
| 188 | +It is possible to include multiple proofs in a single file by giving the src |
| 189 | +and tgt functions matching suffixes. |
| 190 | + |
| 191 | +For example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can |
| 192 | +be simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)): |
| 193 | + |
| 194 | +```llvm |
| 195 | +define i8 @src_add_sub(i8 %x, i8 %y) { |
| 196 | + %add = add i8 %x, %y |
| 197 | + %sub = sub i8 %add, %y |
| 198 | + ret i8 %sub |
| 199 | +} |
| 200 | +
|
| 201 | +define i8 @tgt_add_sub(i8 %x, i8 %y) { |
| 202 | + ret i8 %x |
| 203 | +} |
| 204 | +
|
| 205 | +
|
| 206 | +define i8 @src_sub_add(i8 %x, i8 %y) { |
| 207 | + %sub = sub i8 %x, %y |
| 208 | + %add = add i8 %sub, %y |
| 209 | + ret i8 %add |
| 210 | +} |
| 211 | +
|
| 212 | +define i8 @tgt_sub_add(i8 %x, i8 %y) { |
| 213 | + ret i8 %x |
| 214 | +} |
| 215 | +``` |
| 216 | + |
| 217 | +### Use generic values in proofs |
| 218 | + |
| 219 | +Proofs should operate on generic values, rather than specific constants, to the degree that this is possible. |
| 220 | + |
| 221 | +For example, this is a **bad** proof: |
| 222 | + |
| 223 | +```llvm |
| 224 | +; Don't do this! |
| 225 | +define i8 @src(i8 %x) { |
| 226 | + %smax = call i8 @llvm.smax.i8(i8 %x, i8 5) |
| 227 | + %umax = call i8 @llvm.umax.i8(i8 %smax, i8 7) |
| 228 | + ret i8 %umax |
| 229 | +} |
| 230 | +
|
| 231 | +define i8 @tgt(i8 %x) { |
| 232 | + %smax = call i8 @llvm.smax.i8(i8 %x, i8 7) |
| 233 | + ret i8 %smax |
| 234 | +} |
| 235 | +``` |
| 236 | + |
| 237 | +The correct way to write this proof is as follows |
| 238 | +([online](https://alive2.llvm.org/ce/z/Sgfq37)): |
| 239 | + |
| 240 | +``` |
| 241 | +define i8 @src(i8 %x, i8 %c1, i8 %c2) { |
| 242 | + %cond1 = icmp sgt i8 %c1, -1 |
| 243 | + call void @llvm.assume(i1 %cond1) |
| 244 | + %cond2 = icmp sgt i8 %c2, -1 |
| 245 | + call void @llvm.assume(i1 %cond2) |
| 246 | +
|
| 247 | + %smax = call i8 @llvm.smax.i8(i8 %x, i8 %c1) |
| 248 | + %umax = call i8 @llvm.umax.i8(i8 %smax, i8 %c2) |
| 249 | + ret i8 %umax |
| 250 | +} |
| 251 | +
|
| 252 | +define i8 @tgt(i8 %x, i8 %c1, i8 %c2) { |
| 253 | + %umax = call i8 @llvm.umax.i8(i8 %c1, i8 %c2) |
| 254 | + %smax = call i8 @llvm.smax.i8(i8 %x, i8 %umax) |
| 255 | + ret i8 %smax |
| 256 | +} |
| 257 | +``` |
| 258 | + |
| 259 | +Note that the `@llvm.assume` intrinsic is used to specify pre-conditions for |
| 260 | +the transform. In this case, it specifies that the constants `%c1` and `%c2` |
| 261 | +must be non-negative. |
| 262 | + |
| 263 | +It should be emphasized that there is, in general, no expectation that the |
| 264 | +IR in the proofs will be transformed by the implemented fold. In the above |
| 265 | +example, the transform would only actually apply if `%c1` and `%c2` are |
| 266 | +actually constants, but we need to use non-constants in the proof. |
| 267 | + |
| 268 | +### Common pre-conditions |
| 269 | + |
| 270 | +Here are some examples of common preconditions. |
| 271 | + |
| 272 | +```llvm |
| 273 | +; %x is non-negative: |
| 274 | +%nonneg = icmp sgt i8 %x, -1 |
| 275 | +call void @llvm.assume(i1 %nonneg) |
| 276 | +
|
| 277 | +; %x is a power of two: |
| 278 | +%ctpop = call i8 @llvm.ctpop.i8(i8 %x) |
| 279 | +%pow2 = icmp eq i8 %x, 1 |
| 280 | +call void @llvm.assume(i1 %pow2) |
| 281 | +
|
| 282 | +; %x is a power of two or zero: |
| 283 | +%ctpop = call i8 @llvm.ctpop.i8(i8 %x) |
| 284 | +%pow2orzero = icmp ult i8 %x, 2 |
| 285 | +call void @llvm.assume(i1 %pow2orzero) |
| 286 | +
|
| 287 | +; Adding %x and %y does not overflow in a signed sense: |
| 288 | +%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y) |
| 289 | +%ov = extractvalue { i8, i1 } %wo, 1 |
| 290 | +%ov.not = xor i1 %ov, true |
| 291 | +call void @llvm.assume(i1 %ov.not) |
| 292 | +``` |
| 293 | + |
| 294 | +### Timeouts |
| 295 | + |
| 296 | +Alive2 proofs will sometimes produce a timeout with the following message: |
| 297 | + |
| 298 | +``` |
| 299 | +Alive2 timed out while processing your query. |
| 300 | +There are a few things you can try: |
| 301 | +
|
| 302 | +- remove extraneous instructions, if any |
| 303 | +
|
| 304 | +- reduce variable widths, for example to i16, i8, or i4 |
| 305 | +
|
| 306 | +- add the --disable-undef-input command line flag, which |
| 307 | + allows Alive2 to assume that arguments to your IR are not |
| 308 | + undef. This is, in general, unsound: it can cause Alive2 |
| 309 | + to miss bugs. |
| 310 | +``` |
| 311 | + |
| 312 | +This is good advice, follow it! |
| 313 | + |
| 314 | +Reducing the bitwidth usually helps. For floating point numbers, you can use |
| 315 | +the `half` type for bitwidth reduction purposes. For pointers, you can reduce |
| 316 | +the bitwidth by specifying a custom data layout: |
| 317 | + |
| 318 | +``` |
| 319 | +; For 16-bit pointers |
| 320 | +target datalayout = "p:16:16" |
| 321 | +``` |
| 322 | + |
| 323 | +If reducing the bitwidth does not help, try `-disable-undef-input`. If this |
| 324 | +still does not work, submit your proof despite the timeout. |
0 commit comments