Skip to content

Commit 5795f9e

Browse files
[mlir][docs] Update Bytecode documentation (#99854)
There were some discrepancies between the dialect section documentation and the implementation.
1 parent 83879f4 commit 5795f9e

File tree

1 file changed

+27
-21
lines changed

1 file changed

+27
-21
lines changed

mlir/docs/BytecodeFormat.md

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# MLIR Bytecode Format
22

3-
This documents describes the MLIR bytecode format and its encoding.
3+
This document describes the MLIR bytecode format and its encoding.
44
This format is versioned and stable: we don't plan to ever break
5-
compatibility, that is a dialect should be able to deserialize and
6-
older bytecode. Similarly, we support back-deployment we an older
7-
version of the format can be targetted.
5+
compatibility, that is a dialect should be able to deserialize any
6+
older bytecode. Similarly, we support back-deployment so that an
7+
older version of the format can be targetted.
88

99
That said, it is important to realize that the promises of the
1010
bytecode format are made assuming immutable dialects: the format
@@ -19,7 +19,7 @@ information while decoding the input IR, and gives an opportunity
1919
to each dialect for which a version is present to perform IR
2020
upgrades post-parsing through the `upgradeFromVersion` method.
2121
There is no restriction on what kind of information a dialect
22-
is allowed to encode to model its versioning
22+
is allowed to encode to model its versioning.
2323

2424
[TOC]
2525

@@ -172,31 +172,37 @@ dialects that were also referenced.
172172
```
173173
dialect_section {
174174
numDialects: varint,
175-
dialectNames: varint[],
176-
numTotalOpNames: varint,
177-
opNames: op_name_group[]
175+
dialectNames: dialect_name_group[],
176+
opNames: dialect_ops_group[] // ops grouped by dialect
178177
}
179178
180-
op_name_group {
181-
dialect: varint // (dialectID << 1) | (hasVersion),
182-
version : dialect_version_section
183-
numOpNames: varint,
184-
opNames: varint[]
179+
dialect_name_group {
180+
nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion),
181+
version: dialect_version_section // only if versioned
185182
}
186183
187184
dialect_version_section {
188185
size: varint,
189186
version: byte[]
190187
}
191188
189+
dialect_ops_group {
190+
dialect: varint,
191+
numOpNames: varint,
192+
opNames: op_name_group[]
193+
}
194+
195+
op_name_group {
196+
nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp)
197+
}
192198
```
193199

194200
Dialects are encoded as a `varint` containing the index to the name string
195201
within the string section, plus a flag indicating whether the dialect is
196202
versioned. Operation names are encoded in groups by dialect, with each group
197203
containing the dialect, the number of operation names, and the array of indexes
198204
to each name within the string section. The version is encoded as a nested
199-
section.
205+
section for each dialect.
200206

201207
### Attribute/Type Sections
202208

@@ -249,19 +255,19 @@ its assembly format, or via a custom dialect defined encoding.
249255

250256
In the case where a dialect does not define a method for encoding the attribute
251257
or type, the textual assembly format of that attribute or type is used as a
252-
fallback. For example, a type of `!bytecode.type` would be encoded as the null
253-
terminated string "!bytecode.type". This ensures that every attribute and type
254-
may be encoded, even if the owning dialect has not yet opted in to a more
258+
fallback. For example, a type `!bytecode.type<42>` would be encoded as the null
259+
terminated string "!bytecode.type<42>". This ensures that every attribute and
260+
type can be encoded, even if the owning dialect has not yet opted in to a more
255261
efficient serialization.
256262

257263
TODO: We shouldn't redundantly encode the dialect name here, we should use a
258264
reference to the parent dialect instead.
259265

260266
##### Dialect Defined Encoding
261267

262-
In addition to the assembly format fallback, dialects may also provide a custom
263-
encoding for their attributes and types. Custom encodings are very beneficial in
264-
that they are significantly smaller and faster to read and write.
268+
As an alternative to the assembly format fallback, dialects may also provide a
269+
custom encoding for their attributes and types. Custom encodings are very
270+
beneficial in that they are significantly smaller and faster to read and write.
265271

266272
Dialects can opt-in to providing custom encodings by implementing the
267273
`BytecodeDialectInterface`. This interface provides hooks, namely
@@ -377,7 +383,7 @@ uselist {
377383

378384
The encoding of an operation is important because this is generally the most
379385
commonly appearing structure in the bytecode. A single encoding is used for
380-
every type of operation. Given this prevelance, many of the fields of an
386+
every type of operation. Given this prevalence, many of the fields of an
381387
operation are optional. The `encodingMask` field is a bitmask which indicates
382388
which of the components of the operation are present.
383389

0 commit comments

Comments
 (0)