Skip to content

Commit 90a9d04

Browse files
committed
AMDGPU: Add amdgpu-num-agpr attribute to control AGPR allocation
This provides a range to decide how to subdivide the vector register budget on gfx90a+. A single value declares the minimum AGPRs that should be allocatable. Eventually this should replace amdgpu-no-agpr. I want this primarily for testing agpr allocation behavior. We should have a heuristic try to detect a reasonable number of AGPRs to keep allocatable.
1 parent 1c4e986 commit 90a9d04

File tree

3 files changed

+569
-11
lines changed

3 files changed

+569
-11
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1702,6 +1702,22 @@ The AMDGPU backend supports the following LLVM IR attributes.
17021702
function which requires AGPRs is reached through any function marked
17031703
with this attribute.
17041704

1705+
"amdgpu-num-agpr"="min(,max)" Indicates a minimum and maximum range for the number of AGPRs to make
1706+
available to allocate. The values will be rounded up to the next multiple
1707+
of the allocation granularity (4). The minimum value is interpreted as the
1708+
minimum number of AGPRs the function will require to allocate. If only one
1709+
value is specified, it is interpreted as the minimum register budget.
1710+
1711+
The values may be ignored if satisfying it would violate other allocation
1712+
constraints.
1713+
1714+
The behavior is undefined if a function which requires more AGPRs than the
1715+
lower bound is reached through any function marked with a higher value of this
1716+
attribute. A minimum value of 0 indicates the function does not require
1717+
any AGPRs. A minimum of 0 is equivalent to "amdgpu-no-agpr".
1718+
1719+
This is only relevant on targets with AGPRs which support accum_offset (gfx90a+).
1720+
17051721
"amdgpu-hidden-argument" This attribute is used internally by the backend to mark function arguments
17061722
as hidden. Hidden arguments are managed by the compiler and are not part of
17071723
the explicit arguments supplied by the user.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -572,9 +572,10 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(
572572
std::pair<unsigned, unsigned>
573573
SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
574574
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
575-
unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);
576-
unsigned MaxNumAGPRs = MaxNumVGPRs;
577-
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
575+
const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);
576+
577+
unsigned MaxNumVGPRs = MaxVectorRegs;
578+
unsigned MaxNumAGPRs = 0;
578579

579580
// On GFX90A, the number of VGPRs and AGPRs need not be equal. Theoretically,
580581
// a wave may have up to 512 total vector registers combining together both
@@ -585,16 +586,49 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
585586
// TODO: it shall be possible to estimate maximum AGPR/VGPR pressure and split
586587
// register file accordingly.
587588
if (ST.hasGFX90AInsts()) {
588-
if (MFI->usesAGPRs(MF)) {
589-
MaxNumVGPRs /= 2;
590-
MaxNumAGPRs = MaxNumVGPRs;
589+
unsigned MinNumAGPRs = 0;
590+
const unsigned TotalNumAGPRs = AMDGPU::AGPR_32RegClass.getNumRegs();
591+
const unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
592+
593+
const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
594+
595+
// TODO: Replace amdgpu-no-agpr with amdgpu-num-agpr=0
596+
// TODO: Move this logic into subtarget on IR function
597+
//
598+
// TODO: The lower bound should probably force the number of required
599+
// registers up, overriding amdgpu-waves-per-eu.
600+
std::tie(MinNumAGPRs, MaxNumAGPRs) = AMDGPU::getIntegerPairAttribute(
601+
MF.getFunction(), "amdgpu-num-agpr", DefaultNumAGPR,
602+
/*OnlyFirstRequired=*/true);
603+
604+
if (MinNumAGPRs == DefaultNumAGPR.first) {
605+
// Default to splitting half the registers if AGPRs are required.
606+
607+
if (MFI->usesAGPRs(MF))
608+
MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
609+
else
610+
MinNumAGPRs = 0;
591611
} else {
592-
if (MaxNumVGPRs > TotalNumVGPRs) {
593-
MaxNumAGPRs = MaxNumVGPRs - TotalNumVGPRs;
594-
MaxNumVGPRs = TotalNumVGPRs;
595-
} else
596-
MaxNumAGPRs = 0;
612+
// Align to accum_offset's allocation granularity.
613+
MinNumAGPRs = alignTo(MinNumAGPRs, 4);
614+
615+
MinNumAGPRs = std::min(MinNumAGPRs, TotalNumAGPRs);
597616
}
617+
618+
// Clamp values to be inbounds of our limits, and ensure min <= max.
619+
620+
MaxNumAGPRs = std::min(std::max(MinNumAGPRs, MaxNumAGPRs), MaxVectorRegs);
621+
MinNumAGPRs = std::min(std::min(MinNumAGPRs, TotalNumAGPRs), MaxNumAGPRs);
622+
623+
MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, TotalNumVGPRs);
624+
MaxNumAGPRs = std::min(MaxVectorRegs - MaxNumVGPRs, MaxNumAGPRs);
625+
626+
assert(MaxNumVGPRs + MaxNumAGPRs <= MaxVectorRegs &&
627+
MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= TotalNumVGPRs &&
628+
"invalid register counts");
629+
} else if (ST.hasMAIInsts()) {
630+
// On gfx908 the number of AGPRs always equals the number of VGPRs.
631+
MaxNumAGPRs = MaxNumVGPRs = MaxVectorRegs;
598632
}
599633

600634
return std::pair(MaxNumVGPRs, MaxNumAGPRs);

0 commit comments

Comments
 (0)