|
| 1 | +========= |
| 2 | +MemorySSA |
| 3 | +========= |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +Introduction |
| 9 | +============ |
| 10 | + |
| 11 | +``MemorySSA`` is an analysis that allows us to cheaply reason about the |
| 12 | +interactions between various memory operations. Its goal is to replace |
| 13 | +``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because, |
| 14 | +unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily |
| 15 | +result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't |
| 16 | +have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get |
| 17 | +better results, too. |
| 18 | + |
| 19 | +At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based |
| 20 | +form for memory, complete with def-use and use-def chains, which |
| 21 | +enables users to quickly find may-def and may-uses of memory operations. |
| 22 | +It can also be thought of as a way to cheaply give versions to the complete |
| 23 | +state of heap memory, and associate memory operations with those versions. |
| 24 | + |
| 25 | +This document goes over how ``MemorySSA`` is structured, and some basic |
| 26 | +intuition on how ``MemorySSA`` works. |
| 27 | + |
| 28 | +A paper on MemorySSA (with notes about how it's implemented in GCC) `can be |
| 29 | +found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's |
| 30 | +relatively out-of-date; the paper references multiple heap partitions, but GCC |
| 31 | +eventually swapped to just using one, like we now have in LLVM. Like |
| 32 | +GCC's, LLVM's MemorySSA is intraprocedural. |
| 33 | + |
| 34 | + |
| 35 | +MemorySSA Structure |
| 36 | +=================== |
| 37 | + |
| 38 | +MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a |
| 39 | +structure that maps ``Instruction`` s to ``MemoryAccess`` es, which are |
| 40 | +``MemorySSA``'s parallel to LLVM ``Instruction`` s. |
| 41 | + |
| 42 | +Each ``MemoryAccess`` can be one of three types: |
| 43 | + |
| 44 | +- ``MemoryPhi`` |
| 45 | +- ``MemoryUse`` |
| 46 | +- ``MemoryDef`` |
| 47 | + |
| 48 | +``MemoryPhi`` s are ``PhiNode`` , but for memory operations. If at any |
| 49 | +point we have two (or more) ``MemoryDef`` s that could flow into a |
| 50 | +``BasicBlock``, the block's top ``MemoryAccess`` will be a |
| 51 | +``MemoryPhi``. As in LLVM IR, ``MemoryPhi`` s don't correspond to any |
| 52 | +concrete operation. As such, you can't look up a ``MemoryPhi`` with an |
| 53 | +``Instruction`` (though we do allow you to do so with a |
| 54 | +``BasicBlock``). |
| 55 | + |
| 56 | +Note also that in SSA, Phi nodes merge must-reach definitions (that |
| 57 | +is, definite new versions of variables). In MemorySSA, PHI nodes merge |
| 58 | +may-reach definitions (that is, until disambiguated, the versions that |
| 59 | +reach a phi node may or may not clobber a given variable) |
| 60 | + |
| 61 | +``MemoryUse`` s are operations which use but don't modify memory. An example of |
| 62 | +a ``MemoryUse`` is a ``load``, or a ``readonly`` function call. |
| 63 | + |
| 64 | +``MemoryDef`` s are operations which may either modify memory, or which |
| 65 | +otherwise clobber memory in unquantifiable ways. Examples of ``MemoryDef`` s |
| 66 | +include ``store`` s, function calls, ``load`` s with ``acquire`` (or higher) |
| 67 | +ordering, volatile operations, memory fences, etc. |
| 68 | + |
| 69 | +Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``. |
| 70 | +It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being |
| 71 | +run on, and implies that we've hit the top of the function. It's the only |
| 72 | +``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of |
| 73 | +``liveOnEntry`` implies that the memory being used is either undefined or |
| 74 | +defined before the function begins. |
| 75 | + |
| 76 | +An example of all of this overlayed on LLVM IR (obtained by running ``opt |
| 77 | +-passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When |
| 78 | +viewing this example, it may be helpful to view it in terms of clobbers. The |
| 79 | +operands of a given ``MemoryAccess`` are all (potential) clobbers of said |
| 80 | +MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber |
| 81 | +for other ``MemoryAccess`` es. Another useful way of looking at it is in |
| 82 | +terms of heap versions. In that view, operands of of a given |
| 83 | +``MemoryAccess`` are the version of the heap before the operation, and |
| 84 | +if the access produces a value, the value is the new version of the heap |
| 85 | +after the operation. |
| 86 | + |
| 87 | +.. code-block:: llvm |
| 88 | +
|
| 89 | + define void @foo() { |
| 90 | + entry: |
| 91 | + %p1 = alloca i8 |
| 92 | + %p2 = alloca i8 |
| 93 | + %p3 = alloca i8 |
| 94 | + ; 1 = MemoryDef(liveOnEntry) |
| 95 | + store i8 0, i8* %p3 |
| 96 | + br label %while.cond |
| 97 | +
|
| 98 | + while.cond: |
| 99 | + ; 6 = MemoryPhi({%0,1},{if.end,4}) |
| 100 | + br i1 undef, label %if.then, label %if.else |
| 101 | +
|
| 102 | + if.then: |
| 103 | + ; 2 = MemoryDef(6) |
| 104 | + store i8 0, i8* %p1 |
| 105 | + br label %if.end |
| 106 | +
|
| 107 | + if.else: |
| 108 | + ; 3 = MemoryDef(6) |
| 109 | + store i8 1, i8* %p2 |
| 110 | + br label %if.end |
| 111 | +
|
| 112 | + if.end: |
| 113 | + ; 5 = MemoryPhi({if.then,2},{if.then,3}) |
| 114 | + ; MemoryUse(5) |
| 115 | + %1 = load i8, i8* %p1 |
| 116 | + ; 4 = MemoryDef(5) |
| 117 | + store i8 2, i8* %p2 |
| 118 | + ; MemoryUse(1) |
| 119 | + %2 = load i8, i8* %p3 |
| 120 | + br label %while.cond |
| 121 | + } |
| 122 | +
|
| 123 | +The ``MemorySSA`` IR is located comments that precede the instructions they map |
| 124 | +to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)`` |
| 125 | +is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM |
| 126 | +instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this |
| 127 | +particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8* |
| 128 | +%p1`` in LLVM with ``%1``). Again, ``MemoryPhi`` s don't correspond to any LLVM |
| 129 | +Instruction, so the line directly below a ``MemoryPhi`` isn't special. |
| 130 | + |
| 131 | +Going from the top down: |
| 132 | + |
| 133 | +- ``6 = MemoryPhi({%0,1},{if.end,4})`` notes that, when entering ``while.cond``, |
| 134 | + the reaching definition for it is either ``1`` or ``4``. This ``MemoryPhi`` is |
| 135 | + referred to in the textual IR by the number ``6``. |
| 136 | +- ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition, |
| 137 | + and its reaching definition before it is ``6``, or the ``MemoryPhi`` after |
| 138 | + ``while.cond``. |
| 139 | +- ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its |
| 140 | + reaching definition is also ``6``. |
| 141 | +- ``5 = MemoryPhi({if.then,2},{if.then,3})`` notes that the clobber before |
| 142 | + this block could either be ``2`` or ``3``. |
| 143 | +- ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that |
| 144 | + it's clobbered by ``5``. |
| 145 | +- ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; it's |
| 146 | + reaching definition is ``5``. |
| 147 | +- ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory, |
| 148 | + and the last thing that could clobber this use is above ``while.cond`` (e.g. |
| 149 | + the store to ``%p3``). In heap versioning parlance, it really |
| 150 | + only depends on the heap version 1, and is unaffected by the new |
| 151 | + heap versions generated since then. |
| 152 | + |
| 153 | +As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not |
| 154 | +meant to interact with LLVM IR. |
| 155 | + |
| 156 | +Design of MemorySSA |
| 157 | +=================== |
| 158 | + |
| 159 | +``MemorySSA`` is an analysis that can be built for any arbitrary function. When |
| 160 | +it's built, it does a pass over the function's IR in order to build up its |
| 161 | +mapping of ``MemoryAccess`` es. You can then query ``MemorySSA`` for things like |
| 162 | +the dominance relation between ``MemoryAccess`` es, and get the ``MemoryAccess`` |
| 163 | +for any given ``Instruction`` . |
| 164 | + |
| 165 | +When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker`` |
| 166 | +that you can use (see below). |
| 167 | + |
| 168 | + |
| 169 | +The walker |
| 170 | +---------- |
| 171 | + |
| 172 | +A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or |
| 173 | +the walker, for short. The goal of the walker is to provide answers to clobber |
| 174 | +queries beyond what's represented directly by ``MemoryAccess`` es. For example, |
| 175 | +given: |
| 176 | + |
| 177 | +.. code-block:: llvm |
| 178 | +
|
| 179 | + define void @foo() { |
| 180 | + %a = alloca i8 |
| 181 | + %b = alloca i8 |
| 182 | +
|
| 183 | + ; 1 = MemoryDef(liveOnEntry) |
| 184 | + store i8 0, i8* %a |
| 185 | + ; 2 = MemoryDef(1) |
| 186 | + store i8 0, i8* %b |
| 187 | + } |
| 188 | +
|
| 189 | +The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would |
| 190 | +be the walker's goal to figure this out, and return ``liveOnEntry`` when queried |
| 191 | +for the clobber of ``MemoryAccess`` ``2``. |
| 192 | + |
| 193 | +By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef`` s |
| 194 | +and ``MemoryUse`` s by consulting alias analysis. Walkers were built to be |
| 195 | +flexible, though, so it's entirely reasonable (and expected) to create more |
| 196 | +specialized walkers (e.g. one that queries ``GlobalsAA``). |
| 197 | + |
| 198 | + |
| 199 | +Locating clobbers yourself |
| 200 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 201 | + |
| 202 | +If you choose to make your own walker, you can find the clobber for a |
| 203 | +``MemoryAccess`` by walking every ``MemoryDef`` that dominates said |
| 204 | +``MemoryAccess``. The structure of ``MemoryDef`` s makes this relatively simple; |
| 205 | +they ultimately form a linked list of every clobber that dominates the |
| 206 | +``MemoryAccess`` that you're trying to optimize. In other words, the |
| 207 | +``definingAccess`` of a ``MemoryDef`` is always the nearest dominating |
| 208 | +``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``. |
| 209 | + |
| 210 | + |
| 211 | +Use optimization |
| 212 | +---------------- |
| 213 | + |
| 214 | +``MemorySSA`` will optimize some ``MemoryAccess`` es at build-time. |
| 215 | +Specifically, we optimize the operand of every ``MemoryUse`` s to point to the |
| 216 | +actual clobber of said ``MemoryUse``. This can be seen in the above example; the |
| 217 | +second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a |
| 218 | +``MemoryDef`` from the entry block. This is done to make walking, |
| 219 | +value numbering, etc, faster and easier. |
| 220 | +It is not possible to optimize ``MemoryDef`` in the same way, as we |
| 221 | +restrict ``MemorySSA`` to one heap variable and, thus, one Phi node |
| 222 | +per block. |
| 223 | + |
| 224 | + |
| 225 | +Invalidation and updating |
| 226 | +------------------------- |
| 227 | + |
| 228 | +Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever |
| 229 | +the IR is updated. "Update", in this case, includes the addition, deletion, and |
| 230 | +motion of IR instructions. The update API is being made on an as-needed basis. |
| 231 | + |
| 232 | + |
| 233 | +Phi placement |
| 234 | +^^^^^^^^^^^^^ |
| 235 | + |
| 236 | +``MemorySSA`` only places ``MemoryPhi`` s where they're actually |
| 237 | +needed. That is, it is a pruned SSA form, like LLVM's SSA form. For |
| 238 | +example, consider: |
| 239 | + |
| 240 | +.. code-block:: llvm |
| 241 | +
|
| 242 | + define void @foo() { |
| 243 | + entry: |
| 244 | + %p1 = alloca i8 |
| 245 | + %p2 = alloca i8 |
| 246 | + %p3 = alloca i8 |
| 247 | + ; 1 = MemoryDef(liveOnEntry) |
| 248 | + store i8 0, i8* %p3 |
| 249 | + br label %while.cond |
| 250 | +
|
| 251 | + while.cond: |
| 252 | + ; 3 = MemoryPhi({%0,1},{if.end,2}) |
| 253 | + br i1 undef, label %if.then, label %if.else |
| 254 | +
|
| 255 | + if.then: |
| 256 | + br label %if.end |
| 257 | +
|
| 258 | + if.else: |
| 259 | + br label %if.end |
| 260 | +
|
| 261 | + if.end: |
| 262 | + ; MemoryUse(1) |
| 263 | + %1 = load i8, i8* %p1 |
| 264 | + ; 2 = MemoryDef(3) |
| 265 | + store i8 2, i8* %p2 |
| 266 | + ; MemoryUse(1) |
| 267 | + %2 = load i8, i8* %p3 |
| 268 | + br label %while.cond |
| 269 | + } |
| 270 | +
|
| 271 | +Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi`` |
| 272 | +for ``if.end`` would be pointless, so we don't place one. So, if you need to |
| 273 | +place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create |
| 274 | +a ``MemoryPhi`` for ``if.end``. |
| 275 | + |
| 276 | +If it turns out that this is a large burden, we can just place ``MemoryPhi`` s |
| 277 | +everywhere. Because we have Walkers that are capable of optimizing above said |
| 278 | +phis, doing so shouldn't prohibit optimizations. |
| 279 | + |
| 280 | + |
| 281 | +Non-Goals |
| 282 | +--------- |
| 283 | + |
| 284 | +``MemorySSA`` is meant to reason about the relation between memory |
| 285 | +operations, and enable quicker querying. |
| 286 | +It isn't meant to be the single source of truth for all potential memory-related |
| 287 | +optimizations. Specifically, care must be taken when trying to use ``MemorySSA`` |
| 288 | +to reason about atomic or volatile operations, as in: |
| 289 | + |
| 290 | +.. code-block:: llvm |
| 291 | +
|
| 292 | + define i8 @foo(i8* %a) { |
| 293 | + entry: |
| 294 | + br i1 undef, label %if.then, label %if.end |
| 295 | +
|
| 296 | + if.then: |
| 297 | + ; 1 = MemoryDef(liveOnEntry) |
| 298 | + %0 = load volatile i8, i8* %a |
| 299 | + br label %if.end |
| 300 | +
|
| 301 | + if.end: |
| 302 | + %av = phi i8 [0, %entry], [%0, %if.then] |
| 303 | + ret i8 %av |
| 304 | + } |
| 305 | +
|
| 306 | +Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may |
| 307 | +seem legal. Because it's a volatile load, though, it's not. |
| 308 | + |
| 309 | + |
| 310 | +Design tradeoffs |
| 311 | +---------------- |
| 312 | + |
| 313 | +Precision |
| 314 | +^^^^^^^^^ |
| 315 | +``MemorySSA`` in LLVM deliberately trades off precision for speed. |
| 316 | +Let us think about memory variables as if they were disjoint partitions of the |
| 317 | +heap (that is, if you have one variable, as above, it represents the entire |
| 318 | +heap, and if you have multiple variables, each one represents some |
| 319 | +disjoint portion of the heap) |
| 320 | + |
| 321 | +First, because alias analysis results conflict with each other, and |
| 322 | +each result may be what an analysis wants (IE |
| 323 | +TBAA may say no-alias, and something else may say must-alias), it is |
| 324 | +not possible to partition the heap the way every optimization wants. |
| 325 | +Second, some alias analysis results are not transitive (IE A noalias B, |
| 326 | +and B noalias C, does not mean A noalias C), so it is not possible to |
| 327 | +come up with a precise partitioning in all cases without variables to |
| 328 | +represent every pair of possible aliases. Thus, partitioning |
| 329 | +precisely may require introducing at least N^2 new virtual variables, |
| 330 | +phi nodes, etc. |
| 331 | + |
| 332 | +Each of these variables may be clobbered at multiple def sites. |
| 333 | + |
| 334 | +To give an example, if you were to split up struct fields into |
| 335 | +individual variables, all aliasing operations that may-def multiple struct |
| 336 | +fields, will may-def more than one of them. This is pretty common (calls, |
| 337 | +copies, field stores, etc). |
| 338 | + |
| 339 | +Experience with SSA forms for memory in other compilers has shown that |
| 340 | +it is simply not possible to do this precisely, and in fact, doing it |
| 341 | +precisely is not worth it, because now all the optimizations have to |
| 342 | +walk tons and tons of virtual variables and phi nodes. |
| 343 | + |
| 344 | +So we partition. At the point at which you partition, again, |
| 345 | +experience has shown us there is no point in partitioning to more than |
| 346 | +one variable. It simply generates more IR, and optimizations still |
| 347 | +have to query something to disambiguate further anyway. |
| 348 | + |
| 349 | +As a result, LLVM partitions to one variable. |
| 350 | + |
| 351 | +Use Optimization |
| 352 | +^^^^^^^^^^^^^^^^ |
| 353 | + |
| 354 | +Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one |
| 355 | +useful guarantee - all loads are optimized to point at the thing that |
| 356 | +actually clobbers them. This gives some nice properties. For example, |
| 357 | +for a given store, you can find all loads actually clobbered by that |
| 358 | +store by walking the immediate uses of the store. |
0 commit comments