Skip to content

Commit eee8c61

Browse files
authored
[LangRef] Try to clarify some Metadata semantics (#81948)
General cleanup in LangRef (and two outdated comments in LLParser.cpp) with the aim of making it easier to understand some of the terminology and subtle idiosyncrasies related to metadata in the IR. I'm still not happy with the fact that "node" is used both informally and with a particular category of metadata in mind, depending on the context. This also bleeds into the type names in the implementation. There are also several places where names from the implementation appear in the document with no other context or definition. In some cases I added a parenthetical to section titles to tie the two together, but I don't think this is ideal. I also think it might be useful to define the "abstract" metadata classes like "DIScope" in the document, so the hierarchy of metadata node kinds is direct, and so we can avoid repetitive descriptions of all of the members of on part of the hierarchy. This inheritance doesn't have to be in terms of C++ classes, but using the same names as the implementation seems helpful, and we already do it for many other things. Finally I added sections for the specialized nodes which are implemented in the IR but didn't have documentation in LangRef yet. These could use some work, and I admit I didn't dig too deep into the specifics beyond enumerating the fields, but I think we would ideally always have a LangRef section for every kind of node.
1 parent 6b149f7 commit eee8c61

File tree

2 files changed

+141
-18
lines changed

2 files changed

+141
-18
lines changed

llvm/docs/LangRef.rst

Lines changed: 140 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5622,34 +5622,106 @@ occurs on.
56225622
Metadata
56235623
========
56245624

5625-
LLVM IR allows metadata to be attached to instructions and global objects in the
5626-
program that can convey extra information about the code to the optimizers and
5627-
code generator. One example application of metadata is source-level
5628-
debug information. There are two metadata primitives: strings and nodes.
5625+
LLVM IR allows metadata to be attached to instructions and global objects in
5626+
the program that can convey extra information about the code to the optimizers
5627+
and code generator.
56295628

5630-
Metadata does not have a type, and is not a value. If referenced from a
5631-
``call`` instruction, it uses the ``metadata`` type.
5629+
There are two metadata primitives: strings and nodes. There are
5630+
also specialized nodes which have a distinguished name and a set of named
5631+
arguments.
5632+
5633+
.. note::
5634+
5635+
One example application of metadata is source-level debug information,
5636+
which is currently the only user of specialized nodes.
5637+
5638+
Metadata does not have a type, and is not a value.
5639+
5640+
A value of non-\ ``metadata`` type can be used in a metadata context using the
5641+
syntax '``<type> <value>``'.
5642+
5643+
All other metadata is identified in syntax as starting with an exclamation
5644+
point ('``!``').
5645+
5646+
Metadata may be used in the following value contexts by using the ``metadata``
5647+
type:
5648+
5649+
- Arguments to certain intrinsic functions, as described in their specification.
5650+
- Arguments to the ``catchpad``/``cleanuppad`` instructions.
5651+
5652+
.. note::
5653+
5654+
Metadata can be "wrapped" in a ``MetadataAsValue`` so it can be referenced
5655+
in a value context: ``MetadataAsValue`` is-a ``Value``.
5656+
5657+
A typed value can be "wrapped" in ``ValueAsMetadata`` so it can be
5658+
referenced in a metadata context: ``ValueAsMetadata`` is-a ``Metadata``.
5659+
5660+
There is no explicit syntax for a ``ValueAsMetadata``, and instead
5661+
the fact that a type identifier cannot begin with an exclamation point
5662+
is used to resolve ambiguity.
5663+
5664+
A ``metadata`` type implies a ``MetadataAsValue``, and when followed with a
5665+
'``<type> <value>``' pair it wraps the typed value in a ``ValueAsMetadata``.
56325666

5633-
All metadata are identified in syntax by an exclamation point ('``!``').
5667+
For example, the first argument
5668+
to this call is a ``MetadataAsValue(ValueAsMetadata(Value))``:
5669+
5670+
.. code-block:: llvm
5671+
5672+
call void @llvm.foo(metadata i32 1)
5673+
5674+
Whereas the first argument to this call is a ``MetadataAsValue(MDNode)``:
5675+
5676+
.. code-block:: llvm
5677+
5678+
call void @llvm.foo(metadata !0)
5679+
5680+
The first element of this ``MDTuple`` is a ``MDNode``:
5681+
5682+
.. code-block:: llvm
5683+
5684+
!{!0}
5685+
5686+
And the first element of this ``MDTuple`` is a ``ValueAsMetadata(Value)``:
5687+
5688+
.. code-block:: llvm
5689+
5690+
!{i32 1}
56345691

56355692
.. _metadata-string:
56365693

5637-
Metadata Nodes and Metadata Strings
5638-
-----------------------------------
5694+
Metadata Strings (``MDString``)
5695+
-------------------------------
5696+
5697+
.. FIXME Either fix all references to "MDString" in the docs, or make that
5698+
identifier a formal part of the document.
56395699

56405700
A metadata string is a string surrounded by double quotes. It can
56415701
contain any character by escaping non-printable characters with
56425702
"``\xx``" where "``xx``" is the two digit hex code. For example:
56435703
"``!"test\00"``".
56445704

5645-
Metadata nodes are represented with notation similar to structure
5646-
constants (a comma separated list of elements, surrounded by braces and
5647-
preceded by an exclamation point). Metadata nodes can have any values as
5705+
.. note::
5706+
5707+
A metadata string is metadata, but is not a metadata node.
5708+
5709+
.. _metadata-node:
5710+
5711+
Metadata Nodes (``MDNode``)
5712+
---------------------------
5713+
5714+
.. FIXME Either fix all references to "MDNode" in the docs, or make that
5715+
identifier a formal part of the document.
5716+
5717+
Metadata tuples are represented with notation similar to structure
5718+
constants: a comma separated list of elements, surrounded by braces and
5719+
preceded by an exclamation point. Metadata nodes can have any values as
56485720
their operand. For example:
56495721

56505722
.. code-block:: llvm
56515723

5652-
!{ !"test\00", i32 10}
5724+
!{!"test\00", i32 10}
56535725

56545726
Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
56555727

@@ -5676,6 +5748,12 @@ intrinsic is using three metadata arguments:
56765748

56775749
call void @llvm.dbg.value(metadata !24, metadata !25, metadata !26)
56785750

5751+
5752+
.. FIXME Attachments cannot be ValueAsMetadata, but we don't have a
5753+
particularly clear way to refer to ValueAsMetadata without getting into
5754+
implementation details. Ideally the restriction would be explicit somewhere,
5755+
though?
5756+
56795757
Metadata can be attached to an instruction. Here metadata ``!21`` is attached
56805758
to the ``add`` instruction using the ``!dbg`` identifier:
56815759

@@ -6309,7 +6387,7 @@ valid debug intrinsic.
63096387
!5 = !DIExpression(DW_OP_constu, 42, DW_OP_stack_value)
63106388

63116389
DIAssignID
6312-
""""""""""""
6390+
""""""""""
63136391

63146392
``DIAssignID`` nodes have no operands and are always distinct. They are used to
63156393
link together `@llvm.dbg.assign` intrinsics (:ref:`debug
@@ -6324,7 +6402,13 @@ Assignment Tracking <AssignmentTracking.html>`_ for more info.
63246402
!2 = distinct !DIAssignID()
63256403

63266404
DIArgList
6327-
""""""""""""
6405+
"""""""""
6406+
6407+
.. FIXME In the implementation this is not a "node", but as it can only appear
6408+
inline in a function context that distinction isn't observable anyway. Even
6409+
if it is not required, it would be nice to be more clear about what is a
6410+
"node", and what that actually means. The names in the implementation could
6411+
also be updated to mirror whatever we decide here.
63286412

63296413
``DIArgList`` nodes hold a list of constant or SSA value references. These are
63306414
used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
@@ -6340,7 +6424,7 @@ inlined, and cannot appear in named metadata.
63406424
metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus))
63416425

63426426
DIFlags
6343-
"""""""""""""""
6427+
"""""""
63446428

63456429
These flags encode various properties of DINodes.
63466430

@@ -6416,6 +6500,46 @@ within the file where the label is declared.
64166500

64176501
!2 = !DILabel(scope: !0, name: "foo", file: !1, line: 7)
64186502

6503+
DICommonBlock
6504+
"""""""""""""
6505+
6506+
``DICommonBlock`` nodes represent Fortran common blocks. The ``scope:`` field
6507+
is mandatory and points to a :ref:`DILexicalBlockFile`, a
6508+
:ref:`DILexicalBlock`, or a :ref:`DISubprogram`. The ``declaration:``,
6509+
``name:``, ``file:``, and ``line:`` fields are optional.
6510+
6511+
DIModule
6512+
""""""""
6513+
6514+
``DIModule`` nodes represent a source language module, for example, a Clang
6515+
module, or a Fortran module. The ``scope:`` field is mandatory and points to a
6516+
:ref:`DILexicalBlockFile`, a :ref:`DILexicalBlock`, or a :ref:`DISubprogram`.
6517+
The ``name:`` field is mandatory. The ``configMacros:``, ``includePath:``,
6518+
``apinotes:``, ``file:``, ``line:``, and ``isDecl:`` fields are optional.
6519+
6520+
DIStringType
6521+
""""""""""""
6522+
6523+
``DIStringType`` nodes represent a Fortran ``CHARACTER(n)`` type, with a
6524+
dynamic length and location encoded as an expression.
6525+
The ``tag:`` field is optional and defaults to ``DW_TAG_string_type``. The ``name:``,
6526+
``stringLength:``, ``stringLengthExpression``, ``stringLocationExpression:``,
6527+
``size:``, ``align:``, and ``encoding:`` fields are optional.
6528+
6529+
If not present, the ``size:`` and ``align:`` fields default to the value zero.
6530+
6531+
The length in bits of the string is specified by the first of the following
6532+
fields present:
6533+
6534+
- ``stringLength:``, which points to a ``DIVariable`` whose value is the string
6535+
length in bits.
6536+
- ``stringLengthExpression:``, which points to a ``DIExpression`` which
6537+
computes the length in bits.
6538+
- ``size``, which contains the literal length in bits.
6539+
6540+
The ``stringLocationExpression:`` points to a ``DIExpression`` which describes
6541+
the "data location" of the string object, if present.
6542+
64196543
'``tbaa``' Metadata
64206544
^^^^^^^^^^^^^^^^^^^
64216545

llvm/lib/AsmParser/LLParser.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8391,7 +8391,7 @@ int LLParser::parseInsertValue(Instruction *&Inst, PerFunctionState &PFS) {
83918391
/// parseMDNodeVector
83928392
/// ::= { Element (',' Element)* }
83938393
/// Element
8394-
/// ::= 'null' | TypeAndValue
8394+
/// ::= 'null' | Metadata
83958395
bool LLParser::parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts) {
83968396
if (parseToken(lltok::lbrace, "expected '{' here"))
83978397
return true;
@@ -8401,7 +8401,6 @@ bool LLParser::parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts) {
84018401
return false;
84028402

84038403
do {
8404-
// Null is a special case since it is typeless.
84058404
if (EatIfPresent(lltok::kw_null)) {
84068405
Elts.push_back(nullptr);
84078406
continue;

0 commit comments

Comments
 (0)