Skip to content

[AMDGPU][NFC] DWARF vector composite location description operations #71623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 27 additions & 4 deletions llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -448,11 +448,34 @@ See ``DW_AT_LLVM_vector_size`` in :ref:`amdgpu-dwarf-base-type-entries`.

AMDGPU optimized code may spill vector registers to non-global address space
memory, and this spilling may be done only for SIMT lanes that are active on
entry to the subprogram.

To support this, a composite location description that can be created as a
masked select is required. In addition, an operation that creates a composite
entry to the subprogram. To support this the CFI rule for the partially spilled
register needs to use an expression that uses the EXEC register as a bit mask to
select between the register (for inactive lanes) and the stack spill location
(for active lanes that are spilled). This needs to evaluate to a location
description, and not a value, as a debugger needs to change the value if the
user assigns to the variable.

Another usage is to create an expression that evaluates to provide a vector of
logical PCs for active and inactive lanes in a SIMT execution model. Again the
EXEC register is used to select between active and inactive PC values. In order
to represent a vector of PC values, a way to create a composite location
description that is a vector of a single location is used.

It may be possible to use existing DWARF to incrementally build the composite
location description, possibly using the DWARF operations for control flow to
create a loop. However, for the AMDGPU that would require loop iteration of 64.
A concern is that the resulting DWARF would have a significant size and would be
reasonably common as it is needed for every vector register that is spilled in a
function. AMDGPU can have up to 512 vector registers. Another concern is the
time taken to evaluate such non-trivial expressions repeatedly.

To avoid these issues, a composite location description that can be created as a
masked select is proposed. In addition, an operation that creates a composite
location description that is a vector on another location description is needed.
These operations generate the composite location description using a single
DWARF operation that combines all lanes of the vector in one step. The DWARF
expression is more compact, and can be evaluated by a consumer far more
efficiently.

An example that uses these operations is referenced in the
:ref:`amdgpu-dwarf-further-examples` appendix.
Expand Down