@@ -448,11 +448,34 @@ See ``DW_AT_LLVM_vector_size`` in :ref:`amdgpu-dwarf-base-type-entries`.
448
448
449
449
AMDGPU optimized code may spill vector registers to non-global address space
450
450
memory, and this spilling may be done only for SIMT lanes that are active on
451
- entry to the subprogram.
452
-
453
- To support this, a composite location description that can be created as a
454
- masked select is required. In addition, an operation that creates a composite
451
+ entry to the subprogram. To support this the CFI rule for the partially spilled
452
+ register needs to use an expression that uses the EXEC register as a bit mask to
453
+ select between the register (for inactive lanes) and the stack spill location
454
+ (for active lanes that are spilled). This needs to evaluate to a location
455
+ description, and not a value, as a debugger needs to change the value if the
456
+ user assigns to the variable.
457
+
458
+ Another usage is to create an expression that evaluates to provide a vector of
459
+ logical PCs for active and inactive lanes in a SIMT execution model. Again the
460
+ EXEC register is used to select between active and inactive PC values. In order
461
+ to represent a vector of PC values, a way to create a composite location
462
+ description that is a vector of a single location is used.
463
+
464
+ It may be possible to use existing DWARF to incrementally build the composite
465
+ location description, possibly using the DWARF operations for control flow to
466
+ create a loop. However, for the AMDGPU that would require loop iteration of 64.
467
+ A concern is that the resulting DWARF would have a significant size and would be
468
+ reasonably common as it is needed for every vector register that is spilled in a
469
+ function. AMDGPU can have up to 512 vector registers. Another concern is the
470
+ time taken to evaluate such non-trivial expressions repeatedly.
471
+
472
+ To avoid these issues, a composite location description that can be created as a
473
+ masked select is proposed. In addition, an operation that creates a composite
455
474
location description that is a vector on another location description is needed.
475
+ These operations generate the composite location description using a single
476
+ DWARF operation that combines all lanes of the vector in one step. The DWARF
477
+ expression is more compact, and can be evaluated by a consumer far more
478
+ efficiently.
456
479
457
480
An example that uses these operations is referenced in the
458
481
:ref: `amdgpu-dwarf-further-examples ` appendix.
0 commit comments