|
1 |
| -==== |
2 |
| -MLGO |
3 |
| -==== |
| 1 | +============================================= |
| 2 | +Machine Learning - Guided Optimization (MLGO) |
| 3 | +============================================= |
4 | 4 |
|
5 | 5 | Introduction
|
6 | 6 | ============
|
7 | 7 |
|
8 |
| -MLGO is a framework for integrating ML techniques systematically in LLVM. It is |
9 |
| -designed primarily to replace heuristics within LLVM with machine learned |
10 |
| -models. Currently there is upstream infrastructure for the following |
11 |
| -heuristics: |
| 8 | +MLGO refers to integrating ML techniques (primarily) to replace heuristics within |
| 9 | +LLVM with machine learned models. |
| 10 | + |
| 11 | +Currently the following heuristics feature such integration: |
12 | 12 |
|
13 | 13 | * Inlining for size
|
14 | 14 | * Register allocation (LLVM greedy eviction heuristic) for performance
|
15 | 15 |
|
16 |
| -This document is an outline of the tooling that composes MLGO. |
| 16 | +This document is an outline of the tooling and APIs facilitating MLGO. |
| 17 | + |
| 18 | +Note that tools for orchestrating ML training are not part of LLVM, as they are |
| 19 | +dependency-heavy - both on the ML infrastructure choice, as well as choices of |
| 20 | +distrubuted computing. For the training scenario, LLVM only contains facilities |
| 21 | +enabling it, such as corpus extraction, training data extraction, and evaluation |
| 22 | +of models during training. |
| 23 | + |
| 24 | + |
| 25 | +.. contents:: |
17 | 26 |
|
18 | 27 | Corpus Tooling
|
19 | 28 | ==============
|
20 | 29 |
|
21 | 30 | ..
|
22 | 31 | TODO(boomanaiden154): Write this section.
|
23 | 32 |
|
24 |
| -Model Runner Interfaces |
25 |
| -======================= |
| 33 | +Interacting with ML models |
| 34 | +========================== |
| 35 | + |
| 36 | +We interact with ML models in 2 primary scenarios: one is to train such a model. |
| 37 | +The other, inference, is to use a model during compilation, to make optimization |
| 38 | +decisions. |
| 39 | + |
| 40 | +For a specific optimization problem - i.e. inlining, or regalloc eviction - we |
| 41 | +first separate correctness - preserving decisions from optimization decisions. |
| 42 | +For example, not inlining functions marked "no inline" is an example of the |
| 43 | +former. Same is not evicting an unevictable live range. An exmple of the latter |
| 44 | +is deciding to inline a function that will bloat the caller size, just because |
| 45 | +we have reason to believe that later, the effect will be some constant |
| 46 | +propagation that will actually reduce the size (or dynamic instruction count). |
| 47 | + |
| 48 | +ML models can be understood as functions. Their inputs are tensors - buffers of |
| 49 | +scalars. The output (in our case, singular) is a scalar. For example, for |
| 50 | +inlining, the inputs are properties of the caller, callee, and the callsite |
| 51 | +being analyzed for inlining. The output is a boolean. |
| 52 | + |
| 53 | +Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape |
| 54 | +(e.g. 3x4). These are the elements that we use to bind to a ML model. |
| 55 | + |
| 56 | +In both training and inference, we want to expose to ML (training algorithms or |
| 57 | +trained model, respectively) the features we want to make optimization |
| 58 | +decisions on. In that regard, the interface from the compiler side to the ML |
| 59 | +side is the same: pass features, and get a decision. It's essentially a function |
| 60 | +call, where the parameters and result are bound by name and are described by |
| 61 | +name, scalar type, and shape tuples. |
| 62 | + |
| 63 | +The main types in LLVM are: |
| 64 | +- ``MLModelRunner`` - an abstraction for the decision making mechanism |
| 65 | +- ``TensorSpec`` which describes a tensor. |
| 66 | + |
| 67 | +TensorSpec |
| 68 | +---------- |
| 69 | + |
| 70 | +See ``llvm/Analysis/TensorSpec.h``. This is a simple data bag, identifying a |
| 71 | +tensor by name (a string), scalar type, and shape (a vector of ints). The scalar |
| 72 | +type can only be int (8, 16, 32, or 64), signed or unsigned; float; or double. |
| 73 | + |
| 74 | +MLModelRunner |
| 75 | +------------- |
| 76 | + |
| 77 | +See ``llvm/Analysis/MLModelRunner.h``. The abstraction has a pure virtual, |
| 78 | +``evaluateUntyped``, but the contract with implementers is a bit more involved: |
| 79 | + |
| 80 | +Implementers |
| 81 | +^^^^^^^^^^^^ |
| 82 | + |
| 83 | +At construction, the implementer is expected to receive a list of ``TensorSpec`` |
| 84 | +for input features and the ``TensorSpec`` of the output (e.g. |
| 85 | +``std::vector<TensorSpec>``). The list type is not contractual, but it must be |
| 86 | +a 0-based indexing array-like container. Given a ``TensorSpec`` at index "I" in |
| 87 | +the input list, that has a name "N", shape "D1 x D2x ... Dn", and scalar type |
| 88 | +"T", the implementer must: |
| 89 | + |
| 90 | +- set up a contiguous buffer sized ``sizeof(T) * D1 * D2 * ... * Dn``. This |
| 91 | + buffer's lifetime must be the same as the lifetime of the implementer object. |
| 92 | +- call ``MLModelRunner::setUpBufferForTensor`` passing I, the ``TensorSpec``, |
| 93 | + and the buffer above. |
| 94 | + |
| 95 | +Internally, the expectation is that the implementer uses the name (and maybe |
| 96 | +shape) of a ``TensorSpec`` for binding (e.g. lookup in an underlying ML model). |
| 97 | + |
| 98 | +``MLModelRunner::setUpBufferForTensor`` stores each buffer at the corresponding |
| 99 | +index (i.e. its position in the list used at construction). The expectation is |
| 100 | +that the user will use that position when calling ``MLModelRunner::getTensor`` |
| 101 | +to retrieve the underlying buffer (more on that in a bit). |
| 102 | + |
| 103 | +The implementation of ``evaluateUntyped`` is expected to use the value in the |
| 104 | +buffers described above, carry out whatever computation (e.g. evaluate a ML |
| 105 | +model) and then place the outcome in an output buffer which will be returned to |
| 106 | +the caller. Importantly, ``evaluateUntyped`` must not reset the input buffers. |
| 107 | +This is because during training we may want to log the features and decisions, |
| 108 | +and since the data is already buffered, there's no reason to force backing it |
| 109 | +up elsewhere. |
| 110 | + |
| 111 | +Users |
| 112 | +^^^^^ |
| 113 | + |
| 114 | +The users must pass the input ``TensorSpec`` list at the construction of a |
| 115 | +specific ``MLModelRunner`` object. After that, users can be agnostic of the |
| 116 | +specific implementation, and would typically follow the following workflow: |
| 117 | + |
| 118 | +- call ``getTensor`` or ``getTensorUntyped``, for each input tensor, identified |
| 119 | + by its index (i.e. the index of the corresponding ``TensorSpec`` in the list |
| 120 | + used at construction). |
| 121 | +- populate the tensor buffer of each input tensor with values. Users can take |
| 122 | + advantage of the stability of the tensor buffers like set only once those that |
| 123 | + don't change, or cache the buffer address |
| 124 | +- call ``evaluate`` and use its result. |
| 125 | + |
| 126 | +Versioning |
| 127 | +^^^^^^^^^^ |
| 128 | + |
| 129 | +We support a model "knowing" less inputs than the compiler. This is supported by |
| 130 | +``MLModelRunner::setUpBufferForTensor``. If a ``TensorSpec`` requested by the |
| 131 | +compiler is not supported by the underlying model, the ``MLModelRunner`` |
| 132 | +implementer must still call ``setUpBufferForTensor`` with a ``nullptr`` value |
| 133 | +for the buffer. In turn, ``MLModelRunner`` will allocate an appropriately - sized |
| 134 | +buffer and track its lifetime. The user can safely populate that buffer. Since |
| 135 | +the rest of the inputs are still provided, this allows an evolution model where |
| 136 | +we first add features to the compiler and continue using older models without |
| 137 | +regressing. Then, the new compiler can be used to train new models. Deprecating |
| 138 | +features in the compiler involves, then, training first a model without those |
| 139 | +features. |
| 140 | + |
| 141 | +``MLModelRunner`` implementations |
| 142 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 143 | + |
| 144 | +We currently feature 3 implementations: |
| 145 | + |
| 146 | +- ``ModelUnderTrainingRunner``. This requires the compiler be built with TFLite |
| 147 | + support. It allows loading a TFLite model dynamically and is primarily |
| 148 | + intended for training scenarios, but it can be used relatively easily in |
| 149 | + production build environments, as it does not change how the compiler operates |
| 150 | + (why this remark is necessary will become clear in a few paragraphs) |
| 151 | + |
| 152 | +- ``ReleaseModeModelRunner``. This is intended for inference scenarios. This |
| 153 | + uses the rules defined in ``llvm/cmake/modules/TensorFlowCompile.cmake`` to |
| 154 | + convert, at the time the compiler is built, TensorFlow Saved Models into a |
| 155 | + header (.h) and native object (.o). The latter is a CPU-based implementation of |
| 156 | + the neural network, together with its weights (essentially, loops performing |
| 157 | + matrix multiplications) |
| 158 | + |
| 159 | +NOTE: we are actively working on replacing this with an EmitC implementation |
| 160 | +requiring no out of tree build-time dependencies. |
| 161 | + |
| 162 | +- ``InteractiveModelRunner``. This is intended for training scenarios where the |
| 163 | + training algorithm drives compilation. This model runner has no special |
| 164 | + dependencies, and relies on I/O pipes to communicate with a separate process |
| 165 | +- presumably a python training algorithm. We do not envision using this in a |
| 166 | + production environment. |
| 167 | + |
| 168 | +Note that training leaves it to the training infrastructure to handle |
| 169 | +distributed computing. The assumed architecture has python processes |
| 170 | +communicating remotely between themselves, but managing local communication with |
| 171 | +clang. |
26 | 172 |
|
27 | 173 | ..
|
28 |
| - TODO(mtrofin): Write this section. |
| 174 | + TODO(mtrofin): |
| 175 | + - logging, and the use in interactive mode. |
| 176 | + - discuss an example (like the inliner) |
0 commit comments