Skip to content

Commit 288902b

Browse files
committed
AMDGPU: Add amdgpu-num-agpr attribute to control AGPR allocation
This provides a range to decide how to subdivide the vector register budget on gfx90a+. A single value declares the minimum AGPRs that should be allocatable. Eventually this should replace amdgpu-no-agpr. I want this primarily for testing agpr allocation behavior. We should have a heuristic try to detect a reasonable number of AGPRs to keep allocatable.
1 parent 9b52d9e commit 288902b

File tree

3 files changed

+569
-11
lines changed

3 files changed

+569
-11
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1707,6 +1707,22 @@ The AMDGPU backend supports the following LLVM IR attributes.
17071707
as hidden. Hidden arguments are managed by the compiler and are not part of
17081708
the explicit arguments supplied by the user.
17091709

1710+
"amdgpu-num-agpr"="min(,max)" Indicates a minimum and maximum range for the number of AGPRs to make
1711+
available to allocate. The values will be rounded up to the next multiple
1712+
of the allocation granularity (4). The minimum value is interpreted as the
1713+
minimum number of AGPRs the function will require to allocate. If only one
1714+
value is specified, it is interpreted as the minimum register budget.
1715+
1716+
The values may be ignored if satisfying it would violate other allocation
1717+
constraints.
1718+
1719+
The behavior is undefined if a function which requires more AGPRs than the
1720+
lower bound is reached through any function marked with a higher value of this
1721+
attribute. A minimum value of 0 indicates the function does not require
1722+
any AGPRs. A minimum of 0 is equivalent to "amdgpu-no-agpr".
1723+
1724+
This is only relevant on targets with AGPRs which support accum_offset (gfx90a+).
1725+
17101726
"amdgpu-sgpr-hazard-wait" Disabled SGPR hazard wait insertion if set to 0.
17111727
Exists for testing performance impact of SGPR hazard waits only.
17121728

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -572,9 +572,10 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(
572572
std::pair<unsigned, unsigned>
573573
SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
574574
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
575-
unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);
576-
unsigned MaxNumAGPRs = MaxNumVGPRs;
577-
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
575+
const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);
576+
577+
unsigned MaxNumVGPRs = MaxVectorRegs;
578+
unsigned MaxNumAGPRs = 0;
578579

579580
// On GFX90A, the number of VGPRs and AGPRs need not be equal. Theoretically,
580581
// a wave may have up to 512 total vector registers combining together both
@@ -585,16 +586,49 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
585586
// TODO: it shall be possible to estimate maximum AGPR/VGPR pressure and split
586587
// register file accordingly.
587588
if (ST.hasGFX90AInsts()) {
588-
if (MFI->mayNeedAGPRs()) {
589-
MaxNumVGPRs /= 2;
590-
MaxNumAGPRs = MaxNumVGPRs;
589+
unsigned MinNumAGPRs = 0;
590+
const unsigned TotalNumAGPRs = AMDGPU::AGPR_32RegClass.getNumRegs();
591+
const unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
592+
593+
const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
594+
595+
// TODO: Replace amdgpu-no-agpr with amdgpu-num-agpr=0
596+
// TODO: Move this logic into subtarget on IR function
597+
//
598+
// TODO: The lower bound should probably force the number of required
599+
// registers up, overriding amdgpu-waves-per-eu.
600+
std::tie(MinNumAGPRs, MaxNumAGPRs) = AMDGPU::getIntegerPairAttribute(
601+
MF.getFunction(), "amdgpu-num-agpr", DefaultNumAGPR,
602+
/*OnlyFirstRequired=*/true);
603+
604+
if (MinNumAGPRs == DefaultNumAGPR.first) {
605+
// Default to splitting half the registers if AGPRs are required.
606+
607+
if (MFI->mayNeedAGPRs())
608+
MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
609+
else
610+
MinNumAGPRs = 0;
591611
} else {
592-
if (MaxNumVGPRs > TotalNumVGPRs) {
593-
MaxNumAGPRs = MaxNumVGPRs - TotalNumVGPRs;
594-
MaxNumVGPRs = TotalNumVGPRs;
595-
} else
596-
MaxNumAGPRs = 0;
612+
// Align to accum_offset's allocation granularity.
613+
MinNumAGPRs = alignTo(MinNumAGPRs, 4);
614+
615+
MinNumAGPRs = std::min(MinNumAGPRs, TotalNumAGPRs);
597616
}
617+
618+
// Clamp values to be inbounds of our limits, and ensure min <= max.
619+
620+
MaxNumAGPRs = std::min(std::max(MinNumAGPRs, MaxNumAGPRs), MaxVectorRegs);
621+
MinNumAGPRs = std::min(std::min(MinNumAGPRs, TotalNumAGPRs), MaxNumAGPRs);
622+
623+
MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, TotalNumVGPRs);
624+
MaxNumAGPRs = std::min(MaxVectorRegs - MaxNumVGPRs, MaxNumAGPRs);
625+
626+
assert(MaxNumVGPRs + MaxNumAGPRs <= MaxVectorRegs &&
627+
MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= TotalNumVGPRs &&
628+
"invalid register counts");
629+
} else if (ST.hasMAIInsts()) {
630+
// On gfx908 the number of AGPRs always equals the number of VGPRs.
631+
MaxNumAGPRs = MaxNumVGPRs = MaxVectorRegs;
598632
}
599633

600634
return std::pair(MaxNumVGPRs, MaxNumAGPRs);

0 commit comments

Comments
 (0)