Skip to content

Commit 77d1db6

Browse files
authored
[docs][mlgo] Document MLModelRunner (#139205)
1 parent 802d8d9 commit 77d1db6

File tree

1 file changed

+159
-11
lines changed

1 file changed

+159
-11
lines changed

llvm/docs/MLGO.rst

Lines changed: 159 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,176 @@
1-
====
2-
MLGO
3-
====
1+
=============================================
2+
Machine Learning - Guided Optimization (MLGO)
3+
=============================================
44

55
Introduction
66
============
77

8-
MLGO is a framework for integrating ML techniques systematically in LLVM. It is
9-
designed primarily to replace heuristics within LLVM with machine learned
10-
models. Currently there is upstream infrastructure for the following
11-
heuristics:
8+
MLGO refers to integrating ML techniques (primarily) to replace heuristics within
9+
LLVM with machine learned models.
10+
11+
Currently the following heuristics feature such integration:
1212

1313
* Inlining for size
1414
* Register allocation (LLVM greedy eviction heuristic) for performance
1515

16-
This document is an outline of the tooling that composes MLGO.
16+
This document is an outline of the tooling and APIs facilitating MLGO.
17+
18+
Note that tools for orchestrating ML training are not part of LLVM, as they are
19+
dependency-heavy - both on the ML infrastructure choice, as well as choices of
20+
distrubuted computing. For the training scenario, LLVM only contains facilities
21+
enabling it, such as corpus extraction, training data extraction, and evaluation
22+
of models during training.
23+
24+
25+
.. contents::
1726

1827
Corpus Tooling
1928
==============
2029

2130
..
2231
TODO(boomanaiden154): Write this section.
2332
24-
Model Runner Interfaces
25-
=======================
33+
Interacting with ML models
34+
==========================
35+
36+
We interact with ML models in 2 primary scenarios: one is to train such a model.
37+
The other, inference, is to use a model during compilation, to make optimization
38+
decisions.
39+
40+
For a specific optimization problem - i.e. inlining, or regalloc eviction - we
41+
first separate correctness - preserving decisions from optimization decisions.
42+
For example, not inlining functions marked "no inline" is an example of the
43+
former. Same is not evicting an unevictable live range. An exmple of the latter
44+
is deciding to inline a function that will bloat the caller size, just because
45+
we have reason to believe that later, the effect will be some constant
46+
propagation that will actually reduce the size (or dynamic instruction count).
47+
48+
ML models can be understood as functions. Their inputs are tensors - buffers of
49+
scalars. The output (in our case, singular) is a scalar. For example, for
50+
inlining, the inputs are properties of the caller, callee, and the callsite
51+
being analyzed for inlining. The output is a boolean.
52+
53+
Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape
54+
(e.g. 3x4). These are the elements that we use to bind to a ML model.
55+
56+
In both training and inference, we want to expose to ML (training algorithms or
57+
trained model, respectively) the features we want to make optimization
58+
decisions on. In that regard, the interface from the compiler side to the ML
59+
side is the same: pass features, and get a decision. It's essentially a function
60+
call, where the parameters and result are bound by name and are described by
61+
name, scalar type, and shape tuples.
62+
63+
The main types in LLVM are:
64+
- ``MLModelRunner`` - an abstraction for the decision making mechanism
65+
- ``TensorSpec`` which describes a tensor.
66+
67+
TensorSpec
68+
----------
69+
70+
See ``llvm/Analysis/TensorSpec.h``. This is a simple data bag, identifying a
71+
tensor by name (a string), scalar type, and shape (a vector of ints). The scalar
72+
type can only be int (8, 16, 32, or 64), signed or unsigned; float; or double.
73+
74+
MLModelRunner
75+
-------------
76+
77+
See ``llvm/Analysis/MLModelRunner.h``. The abstraction has a pure virtual,
78+
``evaluateUntyped``, but the contract with implementers is a bit more involved:
79+
80+
Implementers
81+
^^^^^^^^^^^^
82+
83+
At construction, the implementer is expected to receive a list of ``TensorSpec``
84+
for input features and the ``TensorSpec`` of the output (e.g.
85+
``std::vector<TensorSpec>``). The list type is not contractual, but it must be
86+
a 0-based indexing array-like container. Given a ``TensorSpec`` at index "I" in
87+
the input list, that has a name "N", shape "D1 x D2x ... Dn", and scalar type
88+
"T", the implementer must:
89+
90+
- set up a contiguous buffer sized ``sizeof(T) * D1 * D2 * ... * Dn``. This
91+
buffer's lifetime must be the same as the lifetime of the implementer object.
92+
- call ``MLModelRunner::setUpBufferForTensor`` passing I, the ``TensorSpec``,
93+
and the buffer above.
94+
95+
Internally, the expectation is that the implementer uses the name (and maybe
96+
shape) of a ``TensorSpec`` for binding (e.g. lookup in an underlying ML model).
97+
98+
``MLModelRunner::setUpBufferForTensor`` stores each buffer at the corresponding
99+
index (i.e. its position in the list used at construction). The expectation is
100+
that the user will use that position when calling ``MLModelRunner::getTensor``
101+
to retrieve the underlying buffer (more on that in a bit).
102+
103+
The implementation of ``evaluateUntyped`` is expected to use the value in the
104+
buffers described above, carry out whatever computation (e.g. evaluate a ML
105+
model) and then place the outcome in an output buffer which will be returned to
106+
the caller. Importantly, ``evaluateUntyped`` must not reset the input buffers.
107+
This is because during training we may want to log the features and decisions,
108+
and since the data is already buffered, there's no reason to force backing it
109+
up elsewhere.
110+
111+
Users
112+
^^^^^
113+
114+
The users must pass the input ``TensorSpec`` list at the construction of a
115+
specific ``MLModelRunner`` object. After that, users can be agnostic of the
116+
specific implementation, and would typically follow the following workflow:
117+
118+
- call ``getTensor`` or ``getTensorUntyped``, for each input tensor, identified
119+
by its index (i.e. the index of the corresponding ``TensorSpec`` in the list
120+
used at construction).
121+
- populate the tensor buffer of each input tensor with values. Users can take
122+
advantage of the stability of the tensor buffers like set only once those that
123+
don't change, or cache the buffer address
124+
- call ``evaluate`` and use its result.
125+
126+
Versioning
127+
^^^^^^^^^^
128+
129+
We support a model "knowing" less inputs than the compiler. This is supported by
130+
``MLModelRunner::setUpBufferForTensor``. If a ``TensorSpec`` requested by the
131+
compiler is not supported by the underlying model, the ``MLModelRunner``
132+
implementer must still call ``setUpBufferForTensor`` with a ``nullptr`` value
133+
for the buffer. In turn, ``MLModelRunner`` will allocate an appropriately - sized
134+
buffer and track its lifetime. The user can safely populate that buffer. Since
135+
the rest of the inputs are still provided, this allows an evolution model where
136+
we first add features to the compiler and continue using older models without
137+
regressing. Then, the new compiler can be used to train new models. Deprecating
138+
features in the compiler involves, then, training first a model without those
139+
features.
140+
141+
``MLModelRunner`` implementations
142+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
143+
144+
We currently feature 3 implementations:
145+
146+
- ``ModelUnderTrainingRunner``. This requires the compiler be built with TFLite
147+
support. It allows loading a TFLite model dynamically and is primarily
148+
intended for training scenarios, but it can be used relatively easily in
149+
production build environments, as it does not change how the compiler operates
150+
(why this remark is necessary will become clear in a few paragraphs)
151+
152+
- ``ReleaseModeModelRunner``. This is intended for inference scenarios. This
153+
uses the rules defined in ``llvm/cmake/modules/TensorFlowCompile.cmake`` to
154+
convert, at the time the compiler is built, TensorFlow Saved Models into a
155+
header (.h) and native object (.o). The latter is a CPU-based implementation of
156+
the neural network, together with its weights (essentially, loops performing
157+
matrix multiplications)
158+
159+
NOTE: we are actively working on replacing this with an EmitC implementation
160+
requiring no out of tree build-time dependencies.
161+
162+
- ``InteractiveModelRunner``. This is intended for training scenarios where the
163+
training algorithm drives compilation. This model runner has no special
164+
dependencies, and relies on I/O pipes to communicate with a separate process
165+
- presumably a python training algorithm. We do not envision using this in a
166+
production environment.
167+
168+
Note that training leaves it to the training infrastructure to handle
169+
distributed computing. The assumed architecture has python processes
170+
communicating remotely between themselves, but managing local communication with
171+
clang.
26172

27173
..
28-
TODO(mtrofin): Write this section.
174+
TODO(mtrofin):
175+
- logging, and the use in interactive mode.
176+
- discuss an example (like the inliner)

0 commit comments

Comments
 (0)