Skip to content

Commit ee76525

Browse files
committed
[DebugInfo] Enforce implicit constraints on distinct MDNodes
Add UNIQUED and DISTINCT properties in Metadata.def and use them to implement restrictions on the `distinct` property of MDNodes: * DIExpression can currently be parsed from IR or read from bitcode as `distinct`, but this property is silently dropped when printing to IR. This causes accepted IR to fail to round-trip. As DIExpression appears inline at each use in the canonical form of IR, it cannot actually be `distinct` anyway, as there is no syntax to describe it. * Similarly, DIArgList is conceptually always uniqued. It is currently restricted to only appearing in contexts where there is no syntax for `distinct`, but for consistency it is treated equivalently to DIExpression in this patch. * DICompileUnit is already restricted to always being `distinct`, but along with adding general support for the inverse restriction I went ahead and described this in Metadata.def and updated the parser to be general. Future nodes which have this restriction can share this support. The new UNIQUED property applies to DIExpression and DIArgList, and forbids them to be `distinct`. It also implies they are canonically printed inline at each use, rather than via MDNode ID. The new DISTINCT property applies to DICompileUnit, and requires it to be `distinct`. A potential alternative change is to forbid the non-inline syntax for DIExpression entirely, as is done with DIArgList implicitly by requiring it appear in the context of a function. For example, we would forbid: !named = !{!0} !0 = !DIExpression() Instead we would only accept the equivalent inlined version: !named = !{!DIExpression()} This essentially removes the ability to create a `distinct` DIExpression by construction, as there is no syntax for `distinct` inline. If this patch is accepted as-is, the result would be that the non-canonical version is accepted, but the following would be an error and produce a diagnostic: !named = !{!0} ; error: 'distinct' not allowed for !DIExpression() !0 = distinct !DIExpression() Also update some documentation to consistently use the inline syntax for DIExpression, and to describe the restrictions on `distinct` for nodes where applicable. Reviewed By: StephenTozer, t-tye Differential Revision: https://reviews.llvm.org/D104827
1 parent 181763d commit ee76525

18 files changed

+585
-394
lines changed

llvm/docs/LangRef.rst

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5200,21 +5200,22 @@ metadata nodes are related to debug info.
52005200
DICompileUnit
52015201
"""""""""""""
52025202

5203-
``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
5204-
``retainedTypes:``, ``globals:``, ``imports:`` and ``macros:`` fields are tuples
5205-
containing the debug info to be emitted along with the compile unit, regardless
5206-
of code optimizations (some nodes are only emitted if there are references to
5207-
them from instructions). The ``debugInfoForProfiling:`` field is a boolean
5208-
indicating whether or not line-table discriminators are updated to provide
5209-
more-accurate debug info for profiling results.
5203+
``DICompileUnit`` nodes represent a compile unit. ``DICompileUnit`` nodes must
5204+
be ``distinct``. The ``enums:``, ``retainedTypes:``, ``globals:``, ``imports:``
5205+
and ``macros:`` fields are tuples containing the debug info to be emitted along
5206+
with the compile unit, regardless of code optimizations (some nodes are only
5207+
emitted if there are references to them from instructions). The
5208+
``debugInfoForProfiling:`` field is a boolean indicating whether or not
5209+
line-table discriminators are updated to provide more-accurate debug info for
5210+
profiling results.
52105211

52115212
.. code-block:: text
52125213

5213-
!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
5214-
isOptimized: true, flags: "-O2", runtimeVersion: 2,
5215-
splitDebugFilename: "abc.debug", emissionKind: FullDebug,
5216-
enums: !2, retainedTypes: !3, globals: !4, imports: !5,
5217-
macros: !6, dwoId: 0x0abcd)
5214+
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
5215+
isOptimized: true, flags: "-O2", runtimeVersion: 2,
5216+
splitDebugFilename: "abc.debug", emissionKind: FullDebug,
5217+
enums: !2, retainedTypes: !3, globals: !4, imports: !5,
5218+
macros: !6, dwoId: 0x0abcd)
52185219

52195220
Compile unit descriptors provide the root scope for objects declared in a
52205221
specific compilation unit. File descriptors are defined using this scope. These
@@ -5625,12 +5626,14 @@ DIExpression
56255626
""""""""""""
56265627

56275628
``DIExpression`` nodes represent expressions that are inspired by the DWARF
5628-
expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
5629-
(such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
5630-
referenced LLVM variable relates to the source language variable. Debug
5631-
intrinsics are interpreted left-to-right: start by pushing the value/address
5632-
operand of the intrinsic onto a stack, then repeatedly push and evaluate
5633-
opcodes from the DIExpression until the final variable description is produced.
5629+
expression language. ``DIExpression`` nodes must not be ``distinct``, and are
5630+
canonically printed inline at each use. They are used in :ref:`debug
5631+
intrinsics<dbg_intrinsics>` (such as ``llvm.dbg.declare`` and
5632+
``llvm.dbg.value``) to describe how the referenced LLVM variable relates to the
5633+
source language variable. Debug intrinsics are interpreted left-to-right: start
5634+
by pushing the value/address operand of the intrinsic onto a stack, then
5635+
repeatedly push and evaluate opcodes from the DIExpression until the final
5636+
variable description is produced.
56345637

56355638
The current supported opcode vocabulary is limited:
56365639

@@ -5708,23 +5711,23 @@ The current supported opcode vocabulary is limited:
57085711

57095712
IR for "*ptr = 4;"
57105713
--------------
5711-
call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !20)
5714+
call void @llvm.dbg.value(metadata i32 4, metadata !17,
5715+
metadata !DIExpression(DW_OP_LLVM_implicit_pointer)))
57125716
!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
57135717
type: !18)
57145718
!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
57155719
!19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5716-
!20 = !DIExpression(DW_OP_LLVM_implicit_pointer))
57175720

57185721
IR for "**ptr = 4;"
57195722
--------------
5720-
call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !21)
5723+
call void @llvm.dbg.value(metadata i32 4, metadata !17,
5724+
metadata !DIExpression(DW_OP_LLVM_implicit_pointer,
5725+
DW_OP_LLVM_implicit_pointer)))
57215726
!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
57225727
type: !18)
57235728
!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
57245729
!19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
57255730
!20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5726-
!21 = !DIExpression(DW_OP_LLVM_implicit_pointer,
5727-
DW_OP_LLVM_implicit_pointer))
57285731

57295732
DWARF specifies three kinds of simple location descriptions: Register, memory,
57305733
and implicit location descriptions. Note that a location description is
@@ -5765,12 +5768,13 @@ valid debug intrinsic.
57655768
DIArgList
57665769
""""""""""""
57675770

5768-
``DIArgList`` nodes hold a list of constant or SSA value references. These are
5769-
used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
5771+
``DIArgList`` nodes hold a list of constant or SSA value references.
5772+
``DIArgList`` must not be ``distinct``, must only be used as an argument to a
5773+
function call, and must appear inline at each use. ``DIArgList`` may refer to
5774+
function-local values of the containing function. ``DIArgList`` nodes are used
5775+
in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
57705776
``llvm.dbg.value``) in combination with a ``DIExpression`` that uses the
5771-
``DW_OP_LLVM_arg`` operator. Because a DIArgList may refer to local values
5772-
within a function, it must only be used as a function argument, must always be
5773-
inlined, and cannot appear in named metadata.
5777+
``DW_OP_LLVM_arg`` operator.
57745778

57755779
.. code-block:: text
57765780

llvm/docs/SourceLevelDebugging.rst

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -291,17 +291,17 @@ Compiled to LLVM, this function would be represented like this:
291291
%X = alloca i32, align 4
292292
%Y = alloca i32, align 4
293293
%Z = alloca i32, align 4
294-
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
295-
store i32 21, i32* %X, align 4, !dbg !14
296-
call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
297-
store i32 22, i32* %Y, align 4, !dbg !16
298-
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
299-
store i32 23, i32* %Z, align 4, !dbg !19
300-
%0 = load i32, i32* %X, align 4, !dbg !20
301-
store i32 %0, i32* %Z, align 4, !dbg !21
302-
%1 = load i32, i32* %Y, align 4, !dbg !22
303-
store i32 %1, i32* %X, align 4, !dbg !23
304-
ret void, !dbg !24
294+
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !DIExpression()), !dbg !13
295+
store i32 21, i32* %X, align 4, !dbg !13
296+
call void @llvm.dbg.declare(metadata i32* %Y, metadata !14, metadata !DIExpression()), !dbg !15
297+
store i32 22, i32* %Y, align 4, !dbg !15
298+
call void @llvm.dbg.declare(metadata i32* %Z, metadata !16, metadata !DIExpression()), !dbg !18
299+
store i32 23, i32* %Z, align 4, !dbg !18
300+
%0 = load i32, i32* %X, align 4, !dbg !19
301+
store i32 %0, i32* %Z, align 4, !dbg !20
302+
%1 = load i32, i32* %Y, align 4, !dbg !21
303+
store i32 %1, i32* %X, align 4, !dbg !22
304+
ret void, !dbg !23
305305
}
306306
307307
; Function Attrs: nounwind readnone
@@ -327,18 +327,17 @@ Compiled to LLVM, this function would be represented like this:
327327
!10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
328328
!11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
329329
!12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
330-
!13 = !DIExpression()
331-
!14 = !DILocation(line: 2, column: 9, scope: !4)
332-
!15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
333-
!16 = !DILocation(line: 3, column: 9, scope: !4)
334-
!17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
335-
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
336-
!19 = !DILocation(line: 5, column: 11, scope: !18)
337-
!20 = !DILocation(line: 6, column: 11, scope: !18)
338-
!21 = !DILocation(line: 6, column: 9, scope: !18)
339-
!22 = !DILocation(line: 8, column: 9, scope: !4)
340-
!23 = !DILocation(line: 8, column: 7, scope: !4)
341-
!24 = !DILocation(line: 9, column: 3, scope: !4)
330+
!13 = !DILocation(line: 2, column: 9, scope: !4)
331+
!14 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
332+
!15 = !DILocation(line: 3, column: 9, scope: !4)
333+
!16 = !DILocalVariable(name: "Z", scope: !17, file: !1, line: 5, type: !12)
334+
!17 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
335+
!18 = !DILocation(line: 5, column: 11, scope: !17)
336+
!19 = !DILocation(line: 6, column: 11, scope: !17)
337+
!20 = !DILocation(line: 6, column: 9, scope: !17)
338+
!21 = !DILocation(line: 8, column: 9, scope: !4)
339+
!22 = !DILocation(line: 8, column: 7, scope: !4)
340+
!23 = !DILocation(line: 9, column: 3, scope: !4)
342341
343342
344343
This example illustrates a few important details about LLVM debugging
@@ -349,21 +348,21 @@ variable definitions, and the code used to implement the function.
349348

350349
.. code-block:: llvm
351350
352-
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
351+
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !DIExpression()), !dbg !13
353352
; [debug line = 2:7] [debug variable = X]
354353
355354
The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
356-
variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides
355+
variable ``X``. The metadata ``!dbg !13`` attached to the intrinsic provides
357356
scope information for the variable ``X``.
358357

359358
.. code-block:: text
360359
361-
!14 = !DILocation(line: 2, column: 9, scope: !4)
360+
!13 = !DILocation(line: 2, column: 9, scope: !4)
362361
!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
363362
isLocal: false, isDefinition: true, scopeLine: 1,
364363
isOptimized: false, retainedNodes: !2)
365364
366-
Here ``!14`` is metadata providing `location information
365+
Here ``!13`` is metadata providing `location information
367366
<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a
368367
`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location
369368
information attached to the intrinsics indicates that the variable ``X`` is
@@ -373,20 +372,20 @@ Now lets take another example.
373372

374373
.. code-block:: llvm
375374
376-
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
375+
call void @llvm.dbg.declare(metadata i32* %Z, metadata !16, metadata !DIExpression()), !dbg !18
377376
; [debug line = 5:9] [debug variable = Z]
378377
379378
The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
380-
variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides
379+
variable ``Z``. The metadata ``!dbg !18`` attached to the intrinsic provides
381380
scope information for the variable ``Z``.
382381

383382
.. code-block:: text
384383
385-
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
386-
!19 = !DILocation(line: 5, column: 11, scope: !18)
384+
!17 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
385+
!18 = !DILocation(line: 5, column: 11, scope: !17)
387386
388-
Here ``!19`` indicates that ``Z`` is declared at line number 5 and column
389-
number 11 inside of lexical scope ``!18``. The lexical scope itself resides
387+
Here ``!18`` indicates that ``Z`` is declared at line number 5 and column
388+
number 11 inside of lexical scope ``!17``. The lexical scope itself resides
390389
inside of subprogram ``!4`` described above.
391390

392391
The scope information attached with each instruction provides a straightforward
@@ -802,14 +801,14 @@ presents several difficulties:
802801
br label %exit, !dbg !26
803802
804803
truebr:
805-
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
806-
call void @llvm.dbg.value(metadata i32 1, metadata !23, metadata !DIExpression()), !dbg !24
804+
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !23
805+
call void @llvm.dbg.value(metadata i32 1, metadata !22, metadata !DIExpression()), !dbg !23
807806
%value1 = add i32 %input, 1
808807
br label %bb1
809808
810809
falsebr:
811-
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
812-
call void @llvm.dbg.value(metadata i32 2, metadata !23, metadata !DIExpression()), !dbg !24
810+
call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !23
811+
call void @llvm.dbg.value(metadata i32 2, metadata !22, metadata !DIExpression()), !dbg !23
813812
%value = add i32 %input, 2
814813
br label %bb1
815814
@@ -820,7 +819,7 @@ presents several difficulties:
820819
Here the difficulties are:
821820

822821
* The control flow is roughly the opposite of basic block order
823-
* The value of the ``!23`` variable merges into ``%bb1``, but there is no PHI
822+
* The value of the ``!22`` variable merges into ``%bb1``, but there is no PHI
824823
node
825824

826825
As mentioned above, the ``llvm.dbg.value`` intrinsics essentially form an
@@ -833,9 +832,9 @@ location, which would lead to a large number of debugging intrinsics being
833832
generated.
834833

835834
Examining the example above, variable ``!30`` is assigned ``%input`` on both
836-
conditional paths through the function, while ``!23`` is assigned differing
835+
conditional paths through the function, while ``!22`` is assigned differing
837836
constant values on either path. Where control flow merges in ``%bb1`` we would
838-
want ``!30`` to keep its location (``%input``), but ``!23`` to become undefined
837+
want ``!30`` to keep its location (``%input``), but ``!22`` to become undefined
839838
as we cannot determine at runtime what value it should have in %bb1 without
840839
inserting a PHI node. mem2reg does not insert the PHI node to avoid changing
841840
codegen when debugging is enabled, and does not insert the other dbg.values
@@ -854,7 +853,7 @@ DbgEntityHistoryCalculator) to build a map of each instruction to every
854853
valid variable location, without the need to consider control flow. From
855854
the example above, it is otherwise difficult to determine that the location
856855
of variable ``!30`` should flow "up" into block ``%bb1``, but that the location
857-
of variable ``!23`` should not flow "down" into the ``%exit`` block.
856+
of variable ``!22`` should not flow "down" into the ``%exit`` block.
858857

859858
.. _ccxx_frontend:
860859

llvm/include/llvm/AsmParser/LLParser.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,8 @@ namespace llvm {
520520
template <class ParserTy> bool parseMDFieldsImplBody(ParserTy ParseField);
521521
template <class ParserTy>
522522
bool parseMDFieldsImpl(ParserTy ParseField, LocTy &ClosingLoc);
523-
bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false);
523+
bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false,
524+
LocTy DistinctLoc = LocTy());
524525

525526
#define HANDLE_SPECIALIZED_MDNODE_LEAF(CLASS) \
526527
bool parse##CLASS(MDNode *&Result, bool IsDistinct);

0 commit comments

Comments
 (0)