|
1 |
| -## How to update Debug Info in the Swift Compiler |
| 1 | +# How to update Debug Info in the Swift Compiler |
2 | 2 |
|
3 |
| -### Introduction |
| 3 | +## Introduction |
4 | 4 |
|
5 | 5 | This document describes how debug info works at the SIL level and how to
|
6 | 6 | correctly update debug info in SIL optimization passes. This document is
|
7 | 7 | inspired by its LLVM analog, [How to Update Debug Info: A Guide for LLVM Pass
|
8 | 8 | Authors](https://llvm.org/docs/HowToUpdateDebugInfo.html), which is recommended
|
9 | 9 | reading, since all of the concepts discussed there also apply to SIL.
|
10 | 10 |
|
11 |
| -### Source Locations |
| 11 | +## Source Locations |
12 | 12 |
|
13 | 13 | Contrary to LLVM IR, SIL makes source locations and lexical scopes mandatory on
|
14 | 14 | all instructions. SIL transformations should follow the LLVM guide for when to
|
15 | 15 | merge drop and copy locations, since all the same considerations apply. Helpers
|
16 | 16 | like `SILBuilderWithScope` make it easy to copy source locations when expanding
|
17 | 17 | SIL instructions.
|
18 | 18 |
|
19 |
| -### Variables, Variable Locations |
| 19 | +## Variables |
| 20 | + |
| 21 | +Each `debug_value` (and variable-carrying instruction) defines an update point |
| 22 | +for the location of (part of) that source variable. A variable location is an |
| 23 | +SSA value, modified by a debug expression that can transform that value, |
| 24 | +yielding the value of that variable. Optimizations like SROA may split a source |
| 25 | +variable into multiple smaller fragments, other optimizations such as Mem2Reg |
| 26 | +may split a debug value describing an address into multiple debug values |
| 27 | +describing different SSA values. Each variable (fragment) location is valid |
| 28 | +until the end of the current basic block, or until another `debug_value` |
| 29 | +describes another location for a variable fragment for the same unique variable |
| 30 | +that overlaps with that (fragment of the) variable. |
| 31 | + |
| 32 | +### Debug variable-carrying instructions |
20 | 33 |
|
21 | 34 | Source variables are represented by `debug_value` instructions, and may also be
|
22 |
| -described in variable-carrying instructions (`alloc_stack`, `alloc_box`). There |
23 |
| -is no semantic difference between describing a variable in an allocation |
| 35 | +described in debug variable-carrying instructions (`alloc_stack`, `alloc_box`). |
| 36 | +There is no semantic difference between describing a variable in an allocation |
24 | 37 | instruction directly or describing it in an `debug_value` following the
|
25 |
| -allocation instruction. Variables are uniquely identified via their lexical |
26 |
| -scope, which also includes inline information, and their name and binding kind. |
| 38 | +allocation instruction. |
27 | 39 |
|
28 |
| -Each `debug_value` (and variable-carrying instruction) defines an update point |
29 |
| -for the location of (part of) that source variable. A variable location is an |
30 |
| -SSA value or constant, modified by a debug expression that can transform that |
31 |
| -value, yielding the value of that variable. The debug expressions get lowered |
32 |
| -into LLVM [DIExpressions](https://llvm.org/docs/LangRef.html#diexpression) which |
33 |
| -get lowered into [DWARF](https://dwarfstd.org) expressions. Optimizations like |
34 |
| -SROA may split a source variable into multiple smaller fragments. An |
35 |
| -`op_fragment` is used to denote a location of a partial variable. Each variable |
36 |
| -(fragment) location is valid until the end of the current basic block, or until |
37 |
| -another `debug_value` describes another location for a variable fragment for the |
38 |
| -same unique variable that overlaps with that (fragment of the) variable. |
39 |
| -Variables may be undefined, in which case the SSA value is `undef`. |
40 |
| - |
41 |
| -### Rules of thumb |
| 40 | +This is equivalent, and should be optimized similarly: |
| 41 | +``` |
| 42 | +%0 = alloc_stack $T, var, name "value", loc "a.swift":4:2, scope 1 |
| 43 | +// equivalent to: |
| 44 | +%0 = alloc_stack $T, loc "a.swift":4:2, scope 1 |
| 45 | +debug_value %0 : $*T, var, name "value", expr op_deref, loc "a.swift":4:2, scope 1 |
| 46 | +``` |
| 47 | + |
| 48 | +> [!Note] |
| 49 | +> In the future, we may want to remove the debug variable from the `alloc_stack` |
| 50 | +> to only use the second form, in order to simplify SIL. Additionally, we could |
| 51 | +> then move the `debug_value` instruction to the point where the variable is |
| 52 | +> initialized to avoid showing ununitialized memory in the debugger. This would |
| 53 | +> be a change in SILGen, which should not affect the optimizer. |
| 54 | +
|
| 55 | +For now, the `DebugVarCarryingInst` type can be used to handle both cases. |
| 56 | + |
| 57 | +### Variable identity, location and scope |
| 58 | + |
| 59 | +Variables are uniquely identified via their debug scope, their location, and |
| 60 | +their name. |
| 61 | + |
| 62 | +The debug scope, is the range in which the variable is declared and available. |
| 63 | +More information about debug scopes is available on |
| 64 | +[the Swift blog](https://www.swift.org/blog/whats-new-swift-debugging-5.9/#fine-grained-scope-information) |
| 65 | +For arguments, this will be the function's scope, otherwise, this will be a |
| 66 | +subscope within a function. When a function is inlined, a new scope is created, |
| 67 | +including information about the inlined function, and in which function it was |
| 68 | +inlined (inlined_at). |
| 69 | + |
| 70 | +The location of the variable is the source location where the variable was |
| 71 | +declared. |
| 72 | + |
| 73 | +If the location and scope of a debug variable isn't set, it will use the scope |
| 74 | +and location of the instruction, which is correct in most cases. However, if a |
| 75 | +`debug_value` describes a modification of a variable, the instruction should |
| 76 | +have the location of the update point, and the variable must keep the location |
| 77 | +of the variable declaration: |
| 78 | + |
| 79 | +``` |
| 80 | +%0 = integer_literal $Int, 2 |
| 81 | +debug_value %0 : $Int, var, name "a", loc "a.swift":2:5, scope 2 |
| 82 | +%2 = integer_literal $Int, 3 |
| 83 | +debug_value %2 : $Int, var, (name "a", loc "a.swift":2:5, scope 2), loc "a.swift":3:3, scope 2 |
| 84 | +``` |
| 85 | +For this code: |
| 86 | +```swift |
| 87 | +var a = 2 |
| 88 | +a = 3 |
| 89 | +``` |
| 90 | + |
| 91 | +### Variable types |
| 92 | + |
| 93 | +By default the type of the variable will be the object type of the SSA value. |
| 94 | +If this is not the correct type, a type must be attached to the debug variable |
| 95 | +to override it. |
| 96 | + |
| 97 | +Example: |
| 98 | + |
| 99 | +``` |
| 100 | +debug_value %0 : $*T, let, name "address", type $UnsafeRawPointer |
| 101 | +``` |
| 102 | + |
| 103 | +The variable will usually have an associated expression yielding the correct |
| 104 | +type. |
| 105 | + |
| 106 | +### Variable expressions |
| 107 | + |
| 108 | +A variable can have an associated expression if the value needs computation. |
| 109 | +This can be for dereferencing a pointer, arithmetic, or for splitting structs. |
| 110 | +An expression is a sequence of operations to be executed left to right. Debug |
| 111 | +expressions get lowered into LLVM |
| 112 | +[DIExpressions](https://llvm.org/docs/LangRef.html#diexpression) which get |
| 113 | +lowered into [DWARF](https://dwarfstd.org) expressions. |
| 114 | + |
| 115 | +#### Address types and op_deref |
| 116 | + |
| 117 | +A variable's expression may include an `op_deref`, usually at the beginning, in |
| 118 | +which case the SSA value is a pointer that must be dereferenced to access the |
| 119 | +value of the variable. |
| 120 | + |
| 121 | +In this example, the value returned by the `alloc_stack` is an address that must |
| 122 | +be dereferenced. |
| 123 | +``` |
| 124 | +%0 = alloc_stack $T |
| 125 | +debug_value %0 : $*T, var, name "value", expr op_deref |
| 126 | +``` |
| 127 | + |
| 128 | +SILGen can use `SILBuilder::createDebugValue` and |
| 129 | +`SILBuilder::createDebugValueAddr` to create debug values, respectively without |
| 130 | +and with an op_deref, or use `SILBuilder::emitDebugDescription` which will |
| 131 | +automatically choose the correct one depending on the type of the SSA value. As |
| 132 | +there are no pointers in Swift, this should always do the right thing. |
| 133 | + |
| 134 | +> [!Warning] |
| 135 | +> At the optimizer level, Swift `Unsafe*Pointer` types can be simplified |
| 136 | +> to address types. As such, a `debug_value` with an address type without an |
| 137 | +> `op_deref` can be valid. SIL passes must not assume that `op_deref` and |
| 138 | +> address types correlate. |
| 139 | +
|
| 140 | +Even if `op_deref` is usually at the beginning, it doesn't have to be: |
| 141 | +``` |
| 142 | +debug_value %0 : $*UInt8, let, name "hello", expr op_constu:3:op_plus:op_deref |
| 143 | +``` |
| 144 | +This will add `3` to the pointer contained in `%0`, then dereference the result. |
| 145 | + |
| 146 | +#### Fragments |
| 147 | + |
| 148 | +If a variable is partially updated, a fragment can be used to specify that this |
| 149 | +update refers to an element of an aggregate type. |
| 150 | + |
| 151 | +> [!Tip] |
| 152 | +> When using fragments, always specify the type of the variable, as it will be |
| 153 | +> different from the SSA value. |
| 154 | +
|
| 155 | +When SROA is splitting a struct or tuple, it will also split the debug values, |
| 156 | +and add a fragment to specify which field is being updated. |
| 157 | + |
| 158 | +``` |
| 159 | +struct Pair { var a, b: Int } |
| 160 | +
|
| 161 | +alloc_stack $Pair, var, name "pair" |
| 162 | +// --> |
| 163 | +alloc_stack $Int, var, name "pair", type $Pair, expr op_fragment:#Pair.a |
| 164 | +alloc_stack $Int, var, name "pair", type $Pair, expr op_fragment:#Pair.b |
| 165 | +// --> |
| 166 | +alloc_stack $Builtin.Int64, var, name "pair", type $Pair, expr op_fragment:#Pair.a:op_fragment:#Int._value |
| 167 | +alloc_stack $Builtin.Int64, var, name "pair", type $Pair, expr op_fragment:#Pair.b:op_fragment:#Int._value |
| 168 | +``` |
| 169 | + |
| 170 | +Here, Pair is a struct containing two Ints, so each `alloc_stack` will receive a |
| 171 | +fragment with the field it is describing. Int, in Swift, is itself a struct |
| 172 | +containing one Builtin.Int64 (on 64 bits systems), so it can itself be SROA'ed. |
| 173 | +Fragments can be chained to describe this. |
| 174 | + |
| 175 | +Tuple fragments use a different syntax, but work similarly: |
| 176 | + |
| 177 | +``` |
| 178 | +alloc_stack $(Int, Int), var, name "pair" |
| 179 | +// --> |
| 180 | +alloc_stack $Int, var, name "pair", type $(Int, Int), expr op_tuple_fragment:$(Int, Int):0 |
| 181 | +alloc_stack $Int, var, name "pair", type $(Int, Int), expr op_tuple_fragment:$(Int, Int):1 |
| 182 | +// --> |
| 183 | +alloc_stack $Builtin.Int64, var, name "pair", type $(Int, Int), expr op_tuple_fragment:$(Int, Int):0:op_fragment:#Int._value |
| 184 | +alloc_stack $Builtin.Int64, var, name "pair", type $(Int, Int), expr op_tuple_fragment:$(Int, Int):1:op_fragment:#Int._value |
| 185 | +``` |
| 186 | + |
| 187 | +Tuple fragments and struct fragments can be mixed freely, however, they must all |
| 188 | +be at the end of the expression. That is because the fragment operator can be |
| 189 | +seen as returning a struct containing a single element, with the rest undefined, |
| 190 | +and, except fragments, no debug expression operator take a struct as input. |
| 191 | + |
| 192 | +> [!Note] |
| 193 | +> When multiple fragments are present, they are evaluated in the reverse way — |
| 194 | +> from the field within the variable first, to the SSA's type at the end |
| 195 | +
|
| 196 | +#### Arithmetic |
| 197 | + |
| 198 | +An expression can add or subtract a constant offset to a value. To do so, an |
| 199 | +`op_constu` or `op_consts` can be used to push a constant integer to the stack, |
| 200 | +respectively unsigned and signed. Then, the `op_plus` and `op_minus` operators |
| 201 | +can be used to sum or subtract the two values on the stack. |
| 202 | + |
| 203 | +``` |
| 204 | +debug_value %0 : $Builtin.Int64, var, name "previous", type $Int, expr op_consts:1:op_minus:op_fragment:#Int._value |
| 205 | +debug_value %0 : $Builtin.Int64, var, name "next", type $Int, expr op_consts:1:op_plus:op_fragment:#Int._value |
| 206 | +``` |
| 207 | + |
| 208 | +> [!Caution] |
| 209 | +> This currently doesn't work if a fragment is present. |
| 210 | +
|
| 211 | +#### Constants |
| 212 | + |
| 213 | +If a `debug_value` is describing a constant, such as in `let x = 1`, and the |
| 214 | +value is optimized out, we can keep it, using a constant expression, and no SSA |
| 215 | +value. |
| 216 | + |
| 217 | +``` |
| 218 | +debug_value undef : $Int, let, name "x", expr op_consts:1:op_fragment:#Int._value |
| 219 | +``` |
| 220 | + |
| 221 | +### Undef variables |
| 222 | + |
| 223 | +If the value of the variable cannot be recovered as the value is entirely |
| 224 | +optimized away, an undef debug value should still be kept: |
| 225 | + |
| 226 | +``` |
| 227 | +debug_value undef : $Int, let, name "x" |
| 228 | +``` |
| 229 | + |
| 230 | +Additionally, if a previous `debug_value` exists for the variable, a debug value |
| 231 | +of undef invalidates the previous value, in case the value of the variable isn't |
| 232 | +known anymore: |
| 233 | + |
| 234 | +``` |
| 235 | +debug_value %0 : $Int, var, name "x" // var x = a |
| 236 | +... |
| 237 | +debug_value undef : $Int, var, name "x" // x = <optimized out> |
| 238 | +``` |
| 239 | + |
| 240 | +Combined with fragments, some parts of the variable can be undefined and some |
| 241 | +not: |
| 242 | + |
| 243 | +``` |
| 244 | +... // pair = ? |
| 245 | +debug_value %0 : $Int, var, name "pair", type $Pair, expr op_fragment:#Pair.a // pair.a = x |
| 246 | +debug_value %0 : $Int, var, name "pair", type $Pair, expr op_fragment:#Pair.b // pair.b = x |
| 247 | +... // pair = (x, x) |
| 248 | +debug_value undef : $Pair, var, name "pair", expr op_fragment:#Pair.a // pair.a = <optimized out> |
| 249 | +... // pair = (?, x) |
| 250 | +debug_value undef : $Pair, var, name "pair" // pair = <optimized out> |
| 251 | +... // pair = ? |
| 252 | +debug_value %1 : $Int, var, name "pair", type $Pair, expr op_fragment:#Pair.a // pair.a = y |
| 253 | +... // pair = (y, ?) |
| 254 | +``` |
| 255 | + |
| 256 | +## Rules of thumb |
42 | 257 | - Optimization passes may never drop a variable entirely. If a variable is
|
43 | 258 | entirely optimized away, an `undef` debug value should still be kept.
|
44 | 259 | - A `debug_value` must always describe a correct value for that source variable
|
45 | 260 | at that source location. If a value is only correct on some paths through that
|
46 | 261 | instruction, it must be replaced with `undef`. Debug info never speculates.
|
47 |
| -- When a SIL instruction referenced by a `debug_value` is (really, any |
48 |
| - instruction) deleted, call salvageDebugInfo(). It will try to capture the |
49 |
| - effect of the deleted instruction in a debug expression, so the location can |
50 |
| - be preserved. |
| 262 | +- When a SIL instruction is deleted, call salvageDebugInfo(). It will try to |
| 263 | + capture the effect of the deleted instruction in a debug expression, so the |
| 264 | + location can be preserved. You can also use an `InstructionDeleter` which will |
| 265 | + automatically call `salvageDebugInfo`. |
0 commit comments