Skip to content

AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc #129893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 1 addition & 6 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1698,11 +1698,6 @@ The AMDGPU backend supports the following LLVM IR attributes.
``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
emits this attribute when all the three numbers are >= 1.

"amdgpu-no-agpr" Indicates the function will not require allocating AGPRs. This is only
relevant on subtargets with AGPRs. The behavior is undefined if a
function which requires AGPRs is reached through any function marked
with this attribute.

"amdgpu-hidden-argument" This attribute is used internally by the backend to mark function arguments
as hidden. Hidden arguments are managed by the compiler and are not part of
the explicit arguments supplied by the user.
Expand All @@ -1721,7 +1716,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
The behavior is undefined if a function which requires more AGPRs than the
lower bound is reached through any function marked with a higher value of this
attribute. A minimum value of 0 indicates the function does not require
any AGPRs. A minimum of 0 is equivalent to "amdgpu-no-agpr".
any AGPRs.

This is only relevant on targets with AGPRs which support accum_offset (gfx90a+).

Expand Down
9 changes: 7 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1235,6 +1235,8 @@ static bool inlineAsmUsesAGPRs(const InlineAsm *IA) {
return false;
}

// TODO: Migrate to range merge of amdgpu-agpr-alloc.
// FIXME: Why is this using Attribute::NoUnwind?
struct AAAMDGPUNoAGPR
: public IRAttribute<Attribute::NoUnwind,
StateWrapper<BooleanState, AbstractAttribute>,
Expand All @@ -1250,7 +1252,10 @@ struct AAAMDGPUNoAGPR

void initialize(Attributor &A) override {
Function *F = getAssociatedFunction();
if (F->hasFnAttribute("amdgpu-no-agpr"))
auto [MinNumAGPR, MaxNumAGPR] =
AMDGPU::getIntegerPairAttribute(*F, "amdgpu-agpr-alloc", {~0u, ~0u},
/*OnlyFirstRequired=*/true);
if (MinNumAGPR == 0)
indicateOptimisticFixpoint();
}

Expand Down Expand Up @@ -1297,7 +1302,7 @@ struct AAAMDGPUNoAGPR
return ChangeStatus::UNCHANGED;
LLVMContext &Ctx = getAssociatedFunction()->getContext();
return A.manifestAttrs(getIRPosition(),
{Attribute::get(Ctx, "amdgpu-no-agpr")});
{Attribute::get(Ctx, "amdgpu-agpr-alloc", "0")});
}

const std::string getName() const override { return "AAAMDGPUNoAGPR"; }
Expand Down
5 changes: 4 additions & 1 deletion llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -780,5 +780,8 @@ bool SIMachineFunctionInfo::initializeBaseYamlFields(
}

bool SIMachineFunctionInfo::mayUseAGPRs(const Function &F) const {
return !F.hasFnAttribute("amdgpu-no-agpr");
auto [MinNumAGPR, MaxNumAGPR] =
AMDGPU::getIntegerPairAttribute(F, "amdgpu-agpr-alloc", {~0u, ~0u},
/*OnlyFirstRequired=*/true);
return MinNumAGPR != 0u;
}
8 changes: 1 addition & 7 deletions llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,6 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(

std::pair<unsigned, unsigned>
SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);

unsigned MaxNumVGPRs = MaxVectorRegs;
Expand All @@ -592,7 +591,6 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {

const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};

// TODO: Replace amdgpu-no-agpr with amdgpu-agpr-alloc=0
// TODO: Move this logic into subtarget on IR function
//
// TODO: The lower bound should probably force the number of required
Expand All @@ -603,11 +601,7 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {

if (MinNumAGPRs == DefaultNumAGPR.first) {
// Default to splitting half the registers if AGPRs are required.

if (MFI->mayNeedAGPRs())
MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
else
MinNumAGPRs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the removal of the forced minima yields no functional change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mayNeedAGPRs is a wrapper around the attribute now, this is just redundant now

MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
} else {
// Align to accum_offset's allocation granularity.
MinNumAGPRs = alignTo(MinNumAGPRs, 4);
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,8 @@ attributes #1 = { nounwind }
; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
;.
; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
;.
; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
;.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ define void @no_free_vgprs_at_agpr_to_agpr_copy(float %v0, float %v1) #0 {
declare <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float, float, <16 x float>, i32 immarg, i32 immarg, i32 immarg) #1
declare noundef i32 @llvm.amdgcn.workitem.id.x() #2

attributes #0 = { "amdgpu-no-agpr" "amdgpu-waves-per-eu"="6,6" }
attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-waves-per-eu"="6,6" }
attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1144,6 +1144,6 @@ declare i32 @llvm.amdgcn.workitem.id.x() #2
attributes #0 = { "amdgpu-waves-per-eu"="6,6" }
attributes #1 = { convergent nounwind readnone willreturn }
attributes #2 = { nounwind readnone willreturn }
attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-no-agpr" }
attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-agpr-alloc"="0" }
attributes #4 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-flat-work-group-size"="1024,1024" }
attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-no-agpr" }
attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-agpr-alloc"="0" }
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
Original file line number Diff line number Diff line change
Expand Up @@ -252,13 +252,13 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
}


attributes #0 = { "amdgpu-no-agpr" }
attributes #0 = { "amdgpu-agpr-alloc"="0" }
;.
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR1]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-agpr" }
; CHECK: attributes #[[ATTR6]] = { "amdgpu-agpr-alloc"="0" }
;.
13 changes: 7 additions & 6 deletions llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s | FileCheck -check-prefixes=CHECK,GFX908 %s
; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s 2> %t.err | FileCheck -check-prefixes=CHECK,GFX90A %s
; RUN: FileCheck -check-prefix=ERR < %t.err %s
; RUN: FileCheck --implicit-check-not=error -check-prefix=ERR < %t.err %s

; Test undefined behavior where a function ends up needing AGPRs that
; was marked with "amdgpu-agpr-alloc="="0". There should be no asserts.
Expand All @@ -9,7 +9,6 @@

; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_illegal_agpr_use_asm'
; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'func_illegal_agpr_use_asm'
; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_calls_mfma.f32.32x32x1f32'

; CHECK: {{^}}kernel_illegal_agpr_use_asm:
; CHECK: ; use a0
Expand All @@ -32,14 +31,16 @@ define void @func_illegal_agpr_use_asm() #0 {
}

; CHECK-LABEL: {{^}}kernel_calls_mfma.f32.32x32x1f32:
; CHECK: v_accvgpr_write_b32
; GFX908: v_accvgpr_write_b32
; GFX90A-NOT: v_accvgpr_write_b32

; GFX908: NumVgprs: 5
; GFX90A: NumVgprs: 36
; CHECK: NumAgprs: 32
; GFX908: NumAgprs: 32
; GFX90A: NumVgprs: 35
; GFX90A: NumAgprs: 0

; GFX908: TotalNumVgprs: 32
; GFX90A: TotalNumVgprs: 68
; GFX90A: TotalNumVgprs: 35
define amdgpu_kernel void @kernel_calls_mfma.f32.32x32x1f32(ptr addrspace(1) %out, float %a, float %b, <32 x float> %c) #0 {
%result = call <32 x float> @llvm.amdgcn.mfma.f32.32x32x1f32(float %a, float %b, <32 x float> %c, i32 0, i32 0, i32 0)
store <32 x float> %result, ptr addrspace(1) %out
Expand Down
12 changes: 6 additions & 6 deletions llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ define amdgpu_kernel void @min_num_agpr_0_0__amdgpu_no_agpr() #0 {
ret void
}

attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" "amdgpu-no-agpr" }
attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" }

; Check parse of single entry 0

Expand All @@ -26,16 +26,16 @@ define amdgpu_kernel void @min_num_agpr_0__amdgpu_no_agpr() #1 {
ret void
}

attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" "amdgpu-no-agpr" }
attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" }


; Undefined use
define amdgpu_kernel void @min_num_agpr_1_1__amdgpu_no_agpr() #2 {
define amdgpu_kernel void @min_num_agpr_1_1() #2 {
call void asm sideeffect "; clobber $0","~{a0}"(), !srcloc !{i32 3}
ret void
}

attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" "amdgpu-no-agpr" }
attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" }

; Check parse of single entry 4, interpreted as the minimum. Total budget is 64.
; WARN: warning: <unknown>:0:0: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'min_num_agpr_4__amdgpu_no_agpr': desired occupancy was 8, final occupancy is 7
Expand All @@ -48,7 +48,7 @@ define amdgpu_kernel void @min_num_agpr_4__amdgpu_no_agpr() #3 {
ret void
}

attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" "amdgpu-no-agpr" }
attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" }


; Allocation granularity requires rounding this to use 4 AGPRs, so the
Expand Down Expand Up @@ -79,7 +79,7 @@ define amdgpu_kernel void @min_num_agpr_64_64__amdgpu_no_agpr() #5 {
ret void
}

attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" "amdgpu-no-agpr" }
attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" }

; No free VGPRs
; WARN: warning: inline asm clobber list contains reserved registers: v0 at line 7
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,4 @@ define amdgpu_kernel void @amdhsa_kernarg_preload_1_implicit_2(i32 inreg) #0 { r

define amdgpu_kernel void @amdhsa_kernarg_preload_0_implicit_2(i32) #0 { ret void }

attributes #0 = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
Loading
Loading