Skip to content

Commit 0e42df4

Browse files
authored
[AMDGPU][NFC] DWARF vector composite location description operations (llvm#71623)
Summary: Add description to AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst for "DWARF Operations to Create Vector Composite Location Descriptions" proposal to explain the main motivation is to facilitate more compact DWARF that is faster to evaluate. Reviewers: kzhuravl, scott.linder, zoran.zaric Subscribers:
1 parent 83729e6 commit 0e42df4

File tree

1 file changed

+27
-4
lines changed

1 file changed

+27
-4
lines changed

llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -448,11 +448,34 @@ See ``DW_AT_LLVM_vector_size`` in :ref:`amdgpu-dwarf-base-type-entries`.
448448

449449
AMDGPU optimized code may spill vector registers to non-global address space
450450
memory, and this spilling may be done only for SIMT lanes that are active on
451-
entry to the subprogram.
452-
453-
To support this, a composite location description that can be created as a
454-
masked select is required. In addition, an operation that creates a composite
451+
entry to the subprogram. To support this the CFI rule for the partially spilled
452+
register needs to use an expression that uses the EXEC register as a bit mask to
453+
select between the register (for inactive lanes) and the stack spill location
454+
(for active lanes that are spilled). This needs to evaluate to a location
455+
description, and not a value, as a debugger needs to change the value if the
456+
user assigns to the variable.
457+
458+
Another usage is to create an expression that evaluates to provide a vector of
459+
logical PCs for active and inactive lanes in a SIMT execution model. Again the
460+
EXEC register is used to select between active and inactive PC values. In order
461+
to represent a vector of PC values, a way to create a composite location
462+
description that is a vector of a single location is used.
463+
464+
It may be possible to use existing DWARF to incrementally build the composite
465+
location description, possibly using the DWARF operations for control flow to
466+
create a loop. However, for the AMDGPU that would require loop iteration of 64.
467+
A concern is that the resulting DWARF would have a significant size and would be
468+
reasonably common as it is needed for every vector register that is spilled in a
469+
function. AMDGPU can have up to 512 vector registers. Another concern is the
470+
time taken to evaluate such non-trivial expressions repeatedly.
471+
472+
To avoid these issues, a composite location description that can be created as a
473+
masked select is proposed. In addition, an operation that creates a composite
455474
location description that is a vector on another location description is needed.
475+
These operations generate the composite location description using a single
476+
DWARF operation that combines all lanes of the vector in one step. The DWARF
477+
expression is more compact, and can be evaluated by a consumer far more
478+
efficiently.
456479

457480
An example that uses these operations is referenced in the
458481
:ref:`amdgpu-dwarf-further-examples` appendix.

0 commit comments

Comments
 (0)