Skip to content

Commit c4b4c0c

Browse files
committed
[mlir] Expand shape functions in ShapeInference doc
Summary: Start filling in some requirements for the shape function descriptions that will be used to derive shape computations. This requiement part may later be reworked to be part of the "context" section of shape dialect. Without examples this may be a bit too abstract but I hope not (given mappings to existing shape functions). Differential Revision: https://reviews.llvm.org/D73572
1 parent 5932f7b commit c4b4c0c

File tree

1 file changed

+223
-0
lines changed

1 file changed

+223
-0
lines changed

mlir/docs/ShapeInference.md

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,229 @@ Shape inference is currently tested alongside type inference by
5151
`testCreateFunctions` as both operands, and 2) using combinations of input
5252
operands of the function.
5353

54+
## Shape dialect
55+
56+
This section details the shape type inference dialect (`shape`). The initial
57+
focus will be on shape functions that describe shape functions could be used in
58+
runtime and compiler (for constructions of ops/refinement of shapes, reification
59+
of dynamic allocations for dialect including TF, TFLite, XLA & tensor compute
60+
dialect under discussion).
61+
62+
This will focus on the shape functions (e.g., determine the rank and dimensions
63+
of the output shape). As shown in the shaped container type, shape will be one
64+
of 3 components, the others being elemental type and attribute (which is
65+
currently left open with the intention of supporting extensions such as layouts
66+
or bounded shapes). This allows for decoupling of these:
67+
68+
* Not all the information is needed for all analysis;
69+
* Not all shape functions need to provide all the information (e.g., one could
70+
define a base class function that only populates element type but composes
71+
with the others);
72+
* It allows reusing the constraints between, say, Tensor and Memref
73+
representation of an operation;
74+
75+
An argument could be made that these are metadata function instead of shape
76+
functions, with some considering shape and elemental type different and some as
77+
part of shape. But `shape function` is IMHO descriptive and metadata can span
78+
too large a range of potential uses/values.
79+
80+
### Requirements
81+
82+
The requirements for the shape inference functions are shaped by the
83+
requirements of shape inference, but we believe the requirements below still
84+
allow freedom to consider different shape inference approaches and so we don't
85+
constrain to a particular shape inference approach here.
86+
87+
#### Shape inference functions
88+
89+
* **Expressiveness** shape functions need to support programs where tensors
90+
have shapes that are not known statically (for example, `tensor<16x?xf32>`
91+
or `tensor<*xf32>*`);
92+
* **Shape error detection** Many operations will have constraints on their
93+
operands. If the constraints are not satisfied or cannot be determined if
94+
satisfied statically, then a runtime check/assertion could be generated.
95+
96+
* This also aligns with the requirement that the shape function description
97+
should be usable by both the compiler and runtime.
98+
* Shape error functions should be easy to understand, at least what
99+
constraint of the operation is violated. This also requires that shape
100+
function error messages should be configurable by the author of the
101+
shape function (e.g., the author would be able to give the semantic
102+
constraint invalidated rather the low-level check that failed).
103+
* The static analysis may be used to eliminate run-time checks that are
104+
guaranteed to pass.
105+
* Ideally all would eventually (see section
106+
[Inlining shape checking](#inline)) be elided.
107+
* Only report error guaranteed to occur at runtime, if an error is only
108+
possible rather use runtime assertion to fail and produce an error
109+
message with the invariant violated.
110+
111+
* Shape functions usable by compiler and runtime.
112+
113+
* This does not mean the exact same C++ function, but rather the
114+
description should be consumable by either.
115+
* Shape function description should not be constrained by either runtime
116+
or compiler's type system to handle types only used for analysis. That
117+
is, these two type systems differ and both should be supported, but the
118+
intersection of the two should not be required. As a particular example,
119+
if a compiler only wants to differentiate exact shapes vs dynamic
120+
shapes, then it need not consider a more generic shape latice even
121+
though the shape description supports it.
122+
123+
* Declarative (e.g., analyzable at compile time, possible to generate
124+
different versions for different use cases)
125+
126+
* This may not strictly be a requirement, but a way to handle the former:
127+
a declarative specification could be reused by both while avoiding a
128+
need to map to or from a 3rd representation given these two systems
129+
have/and will have different types.
130+
131+
* Shape inference functions are expressible at runtime
132+
133+
* User can define a shape function for a new op dynamically at runtime,
134+
this allows for vendors to describe an operation and shape function
135+
dynamically.
136+
137+
This requirement is on the wishlist.
138+
139+
* Doesn't require graph-wide shape information (e.g., only require local
140+
information)
141+
142+
* Shape functions should be cheap to invoke on each kernel launch.
143+
* Shape function dictated by arguments (operands, attributes and regions)
144+
only (e.g., same operands as the corresponding operation could be
145+
constructed & invoked with).
146+
* Shape information that need higher-level/graph information should use
147+
richer types (e.g., `TensorList<F32>`);
148+
* The function should be invocable before/while constructing an op (e.g.,
149+
can't rely on the op being constructed).
150+
151+
* Shape functions should be pure functions.
152+
153+
* Should support functions whose type is only known dynamically (e.g.,
154+
`read_from_file` op)
155+
156+
* Without needing to invoke the op (e.g., reading a file once for
157+
determining the shape & then post to be able to actually consume the
158+
output of the file).
159+
160+
* The shape function op dialect should interop with non shape dialect ops.
161+
162+
* There may be a common set of ops that satisfy most uses (e.g., merge,
163+
equal_type, arithmetic expressions, slice, concat, pattern matching on
164+
attributes such as padding etc.) that will be discovered and could cover
165+
a large percentage of the use cases. And among these there will be some
166+
which carry extra semantic info that could be used for symbolic
167+
constraints (e.g., checking equality of two dimensions resulting in
168+
setting an equality constraint) and higher-order interpretation for
169+
constraint solving.
170+
171+
It is therefore beneficial to reuse operations but not required.
172+
Especially as for statically known shapes, arbitrary arithmetic
173+
computations could still be performed. This means that the computations
174+
performed statically may or may not be supported by an arbitrary solver,
175+
but would still be allowed.
176+
177+
* The shape function should be expandable such that symbolic equality and
178+
upper bound constraints (say) could be represented and may be propagated by
179+
shape inference.
180+
181+
* E.g., the shape functions may contain more information that is only
182+
useful when used from shape inference;
183+
184+
* Shape functions are allowed to fail and report an error. The error reporting
185+
should report the location of the operation that failed with, where
186+
possible, a user actionable error message.
187+
188+
* These failures could become inlined and become runtime failures with
189+
runtime values and error messages.
190+
* Reporting errors should be optional. E.g., The same function
191+
may be used as to query validity without reporting an error.
192+
193+
#### Non-goals
194+
195+
1. The shape dialect is an IR representations and not a programming language;
196+
* While the functions should be readable, it doesn't carry the
197+
conveniences of a programming language. Deciding how people write these
198+
things, e.g. a mini dsl, a C++ API that generates them, extracting them
199+
programmatically from `SetShapeFn` calls, etc., is still TBD.
200+
1. Describe the shape inference approach that will use the shape functions;
201+
* The goal is that the shape functions and the constraints one could
202+
obtain from them are general enough that they would be useful for
203+
various analysis. But whether we follow very simple (e.g., only fully
204+
static information is used for shape output, unranked for everything
205+
else) to very advance (e.g., expression trees of symbolic constants) can
206+
be evaluated independently of this proposal and with concrete benefit
207+
analysis.
208+
1. Describe the approach whereby error messages will be generated;
209+
* While the shape functions will be able to emit errors optionally, it
210+
will be possible to dictate when they emit an error. This enables
211+
deciding whether or which error to emit: there have been proposals in
212+
the literature that the iteration order for shape inference affect the
213+
quality of the error message produced, and the shape functions do not
214+
mandate that.
215+
1. Flow sensitive shape functions;
216+
* To enable scalable/cheap shape inference, the shape functions do not
217+
intend to provide flow sensitive information. This facility could
218+
potentially be built as part of shome higher order analysis that reuse
219+
the shape functions/constraints due to the shape functions.
220+
1. All static functions are usable for dynamic/unknown shapes;
221+
* More involved computations can be performed with statically known shapes
222+
than what can be sensibly analyzed with unknown/symbolic variables.
223+
224+
### Discussion
225+
226+
#### Inline shape inference checks {#inline}
227+
228+
Shape functions should be lowerable to runtime checks for validity. E.g. verify
229+
as much as possible statically, but enable generating instructions to compute the
230+
shape dynamically and or falling back to runtime checks for attributes not
231+
verifiable at compile time. These checks inserted should ideally only check that
232+
which could not have been verified statically.
233+
234+
These inlined calls could interfere with optimization patterns/passes (e.g.,
235+
shape inference should not insert constructs that interfere with optimization
236+
patterns) and so could be delayed until later (with another round of
237+
optimizations, constant folding, CSE, etc., that should remove redundant runtime
238+
operations).
239+
240+
### Possibly Asked Questions
241+
242+
#### What about ODS specifications of ops?
243+
244+
In ODS we have been recording the constraints for the operands & attributes of
245+
an operation. Where these are sufficient to constrain the output shape (e.g.,
246+
`SameOperandAndResultType` or broadcastable) we should generate the shape
247+
function from those. Where not, an explicit shape function should be specified
248+
(spelling TBD but currently considering using the MLIR textual form as
249+
serialization approach).
250+
251+
#### Why not extract the shape function from reference implementation?
252+
253+
This could be done in future! The extracted shape function would use the shape
254+
inference dialect, so we are starting there. Especially for ops described in a
255+
structured way, one could autogenerate the shape function.
256+
257+
#### How/in what language will the shape functions be authored?
258+
259+
TBD. open to many approaches and suggestions, starting on the IR produced by
260+
whatever language is the priority of this proposal.
261+
262+
#### What shape inference approach is being suggested here?
263+
264+
None. There are multiple different shape inference approaches that we could
265+
layer on top of these. From the most basic (always return unranked), to more
266+
useful (return fixed shape for constant inputs/arguments) to the more advanced
267+
(create logical conjuctions of algebraic statements between symbolic named
268+
values).
269+
270+
### Open points
271+
272+
1. Should shape functions that produce dynamic outputs given all statically
273+
shaped inputs be marked specially? E.g., read from file.
274+
275+
TODO: Add examples here.
276+
54277
## WIP/Future considerations
55278

56279
Shape functions are determined by attributes and could be arbitrarily

0 commit comments

Comments
 (0)