llvm · t-tye · Nov 13, 2023 · Nov 3, 2023
diff --git a/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst b/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst
@@ -448,11 +448,34 @@ See ``DW_AT_LLVM_vector_size`` in :ref:`amdgpu-dwarf-base-type-entries`.
 
 AMDGPU optimized code may spill vector registers to non-global address space
 memory, and this spilling may be done only for SIMT lanes that are active on
-entry to the subprogram.
-
-To support this, a composite location description that can be created as a
-masked select is required. In addition, an operation that creates a composite
+entry to the subprogram. To support this the CFI rule for the partially spilled
+register needs to use an expression that uses the EXEC register as a bit mask to
+select between the register (for inactive lanes) and the stack spill location
+(for active lanes that are spilled). This needs to evaluate to a location
+description, and not a value, as a debugger needs to change the value if the
+user assigns to the variable.
+
+Another usage is to create an expression that evaluates to provide a vector of
+logical PCs for active and inactive lanes in a SIMT execution model. Again the
+EXEC register is used to select between active and inactive PC values. In order
+to represent a vector of PC values, a way to create a composite location
+description that is a vector of a single location is used.
+
+It may be possible to use existing DWARF to incrementally build the composite
+location description, possibly using the DWARF operations for control flow to
+create a loop. However, for the AMDGPU that would require loop iteration of 64.
+A concern is that the resulting DWARF would have a significant size and would be
+reasonably common as it is needed for every vector register that is spilled in a
+function. AMDGPU can have up to 512 vector registers. Another concern is the
+time taken to evaluate such non-trivial expressions repeatedly.
+
+To avoid these issues, a composite location description that can be created as a
+masked select is proposed. In addition, an operation that creates a composite
 location description that is a vector on another location description is needed.
+These operations generate the composite location description using a single
+DWARF operation that combines all lanes of the vector in one step. The DWARF
+expression is more compact, and can be evaluated by a consumer far more
+efficiently.
 
 An example that uses these operations is referenced in the
 :ref:`amdgpu-dwarf-further-examples` appendix.