1
1
# MLIR Bytecode Format
2
2
3
- This documents describes the MLIR bytecode format and its encoding.
3
+ This document describes the MLIR bytecode format and its encoding.
4
4
This format is versioned and stable: we don't plan to ever break
5
- compatibility, that is a dialect should be able to deserialize and
6
- older bytecode. Similarly, we support back-deployment we an older
7
- version of the format can be targetted.
5
+ compatibility, that is a dialect should be able to deserialize any
6
+ older bytecode. Similarly, we support back-deployment so that an
7
+ older version of the format can be targetted.
8
8
9
9
That said, it is important to realize that the promises of the
10
10
bytecode format are made assuming immutable dialects: the format
@@ -19,7 +19,7 @@ information while decoding the input IR, and gives an opportunity
19
19
to each dialect for which a version is present to perform IR
20
20
upgrades post-parsing through the ` upgradeFromVersion ` method.
21
21
There is no restriction on what kind of information a dialect
22
- is allowed to encode to model its versioning
22
+ is allowed to encode to model its versioning.
23
23
24
24
[ TOC]
25
25
@@ -172,31 +172,37 @@ dialects that were also referenced.
172
172
```
173
173
dialect_section {
174
174
numDialects: varint,
175
- dialectNames: varint[],
176
- numTotalOpNames: varint,
177
- opNames: op_name_group[]
175
+ dialectNames: dialect_name_group[],
176
+ opNames: dialect_ops_group[] // ops grouped by dialect
178
177
}
179
178
180
- op_name_group {
181
- dialect: varint // (dialectID << 1) | (hasVersion),
182
- version : dialect_version_section
183
- numOpNames: varint,
184
- opNames: varint[]
179
+ dialect_name_group {
180
+ nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion),
181
+ version: dialect_version_section // only if versioned
185
182
}
186
183
187
184
dialect_version_section {
188
185
size: varint,
189
186
version: byte[]
190
187
}
191
188
189
+ dialect_ops_group {
190
+ dialect: varint,
191
+ numOpNames: varint,
192
+ opNames: op_name_group[]
193
+ }
194
+
195
+ op_name_group {
196
+ nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp)
197
+ }
192
198
```
193
199
194
200
Dialects are encoded as a ` varint ` containing the index to the name string
195
201
within the string section, plus a flag indicating whether the dialect is
196
202
versioned. Operation names are encoded in groups by dialect, with each group
197
203
containing the dialect, the number of operation names, and the array of indexes
198
204
to each name within the string section. The version is encoded as a nested
199
- section.
205
+ section for each dialect .
200
206
201
207
### Attribute/Type Sections
202
208
@@ -249,19 +255,19 @@ its assembly format, or via a custom dialect defined encoding.
249
255
250
256
In the case where a dialect does not define a method for encoding the attribute
251
257
or type, the textual assembly format of that attribute or type is used as a
252
- fallback. For example, a type of ` !bytecode.type ` would be encoded as the null
253
- terminated string "!bytecode.type". This ensures that every attribute and type
254
- may be encoded, even if the owning dialect has not yet opted in to a more
258
+ fallback. For example, a type ` !bytecode.type<42> ` would be encoded as the null
259
+ terminated string "!bytecode.type<42> ". This ensures that every attribute and
260
+ type can be encoded, even if the owning dialect has not yet opted in to a more
255
261
efficient serialization.
256
262
257
263
TODO: We shouldn't redundantly encode the dialect name here, we should use a
258
264
reference to the parent dialect instead.
259
265
260
266
##### Dialect Defined Encoding
261
267
262
- In addition to the assembly format fallback, dialects may also provide a custom
263
- encoding for their attributes and types. Custom encodings are very beneficial in
264
- that they are significantly smaller and faster to read and write.
268
+ As an alternative to the assembly format fallback, dialects may also provide a
269
+ custom encoding for their attributes and types. Custom encodings are very
270
+ beneficial in that they are significantly smaller and faster to read and write.
265
271
266
272
Dialects can opt-in to providing custom encodings by implementing the
267
273
` BytecodeDialectInterface ` . This interface provides hooks, namely
@@ -377,7 +383,7 @@ uselist {
377
383
378
384
The encoding of an operation is important because this is generally the most
379
385
commonly appearing structure in the bytecode. A single encoding is used for
380
- every type of operation. Given this prevelance , many of the fields of an
386
+ every type of operation. Given this prevalence , many of the fields of an
381
387
operation are optional. The ` encodingMask ` field is a bitmask which indicates
382
388
which of the components of the operation are present.
383
389
0 commit comments