Skip to content

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 87 additions & 18 deletions llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,16 @@ struct AAAMDSizeRangeAttribute
if (!CallerInfo || !CallerInfo->isValidState())
return false;

/// When the caller AA is in its initial state, the state remains valid
/// but awaits propagation. We skip processing in this case. Note that we
/// must return true since the state is still considered valid.
if (CallerInfo->isAtInitialState()) {
LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
<< Caller->getName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use printAsOperand to handle anonymous functions correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from existing code. I'll do a follow up patch to change them all.

<< " is still at initial state. Skip the update.\n");
return true;
}

Change |=
clampStateAndIndicateChange(this->getState(), CallerInfo->getState());

Expand Down Expand Up @@ -870,6 +880,15 @@ struct AAAMDSizeRangeAttribute
/*ForceReplace=*/true);
}

/// The initial state of `IntegerRangeState` represents an empty set, which
/// does not constitute a valid range. This empty state complicates
/// propagation, particularly for arithmetic operations like
/// `getAssumed().getUpper() - 1`. Therefore, it is recommended to skip the
/// initial state during processing.
bool isAtInitialState() const {
return isValidState() && getAssumed().isEmptySet();
}

const std::string getAsStr(Attributor *) const override {
std::string Str;
raw_string_ostream OS(Str);
Expand Down Expand Up @@ -926,6 +945,11 @@ struct AAAMDFlatWorkGroupSize : public AAAMDSizeRangeAttribute {
Attributor &A);

ChangeStatus manifest(Attributor &A) override {
if (isAtInitialState()) {
LLVM_DEBUG(dbgs() << '[' << getName()
<< "] Still at initial state. No manifest.\n";);
return ChangeStatus::UNCHANGED;
}
Function *F = getAssociatedFunction();
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
return emitAttributeIfNotDefaultAfterClamp(
Expand Down Expand Up @@ -1152,31 +1176,71 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
ChangeStatus Change = ChangeStatus::UNCHANGED;

Function *F = getAssociatedFunction();

const auto *AAFlatWorkGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
*this, IRPosition::function(*F), DepClassTy::REQUIRED);
if (!AAFlatWorkGroupSize || !AAFlatWorkGroupSize->isValidState()) {
LLVM_DEBUG(
dbgs() << '[' << getName()
<< "] AAAMDFlatWorkGroupSize is unavailable or invalid.\n");
return ChangeStatus::UNCHANGED;
}

if (AAFlatWorkGroupSize->isAtInitialState()) {
LLVM_DEBUG(dbgs() << '[' << getName()
<< "] AAAMDFlatWorkGroupSize is still at initial "
"state. Skip the update.\n");
return ChangeStatus::UNCHANGED;
}
Comment on lines +1190 to +1195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isAtInitialState check is a superset of the above isValidState, just replace the isValidState call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial state is always a valid state, but the reverse is not necessarily true. Once the value is updated, it is no longer in the initial state; however, it remains valid. As such, updates should continue, and the manifest function must still be invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four states to consider: valid, invalid, best, and worst. An AA typically begins in the best state and transitions toward a "worse" state until reaching the worst state. For most AAs, the invalid state is the worst state, but this is not always true, especially for the IntegerRangeState, which we use extensively here.

In the case of IntegerRangeState, the worst state is considered invalid only if Known is set to the worst value (~0U by default). Once Known is modified, there is no longer invalid state, meaning that manifest will always run, and half-baked values are likely to be added. This is why I avoided updating Known in #113018 and all related PRs in this stack unless the changes were definitive (i.e., all call sites were seen).

The situation becomes more complex because the best state does not actually represent a valid pair of values in terms of the attribute. However, it must still be considered valid so that updates can occur, allowing Assumed to be updated accordingly.

Complications arise further because this AA relies on another AA that uses range state. Similarly, the best state of the dependent AA is not a valid pair of values either. If the dependent AA remains in its best (or initial, as introduced in this PR) state, we must skip the update, as its values are effectively nonsensical.


auto CurrentWorkGroupSize = std::make_pair(
AAFlatWorkGroupSize->getAssumed().getLower().getZExtValue(),
AAFlatWorkGroupSize->getAssumed().getUpper().getZExtValue() - 1);

auto DoUpdate = [&](std::pair<unsigned, unsigned> WavesPerEU,
std::pair<unsigned, unsigned> FlatWorkGroupSize) {
auto [Min, Max] =
InfoCache.getEffectiveWavesPerEU(*F, WavesPerEU, FlatWorkGroupSize);
ConstantRange CR(APInt(32, Min), APInt(32, Max + 1));
IntegerRangeState IRS(CR);
Change |= clampStateAndIndicateChange(this->getState(), IRS);
};

// We need to clamp once if we are not at initial state, because
// AAAMDFlatWorkGroupSize could be updated in last iteration.
if (!isAtInitialState()) {
auto CurrentWavesPerEU =
std::make_pair(getAssumed().getLower().getZExtValue(),
getAssumed().getUpper().getZExtValue() - 1);
DoUpdate(CurrentWavesPerEU, CurrentWorkGroupSize);
}

auto CheckCallSite = [&](AbstractCallSite CS) {
Function *Caller = CS.getInstruction()->getFunction();
Function *Func = getAssociatedFunction();

LLVM_DEBUG(dbgs() << '[' << getName() << "] Call " << Caller->getName()
<< "->" << Func->getName() << '\n');
<< "->" << F->getName() << '\n');

const auto *CallerInfo = A.getAAFor<AAAMDWavesPerEU>(
const auto *AAWavesPerEU = A.getAAFor<AAAMDWavesPerEU>(
*this, IRPosition::function(*Caller), DepClassTy::REQUIRED);
const auto *AssumedGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
*this, IRPosition::function(*Func), DepClassTy::REQUIRED);
if (!CallerInfo || !AssumedGroupSize || !CallerInfo->isValidState() ||
!AssumedGroupSize->isValidState())
if (!AAWavesPerEU || !AAWavesPerEU->isValidState()) {
LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
<< Caller->getName()
<< " is unavailable or invalid.\n");
return false;
}
if (AAWavesPerEU->isAtInitialState()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
<< Caller->getName()
<< " is still at initial state. Skip the update.\n");
return true;
}

unsigned Min, Max;
std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
*Caller,
{CallerInfo->getAssumed().getLower().getZExtValue(),
CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
{AssumedGroupSize->getAssumed().getLower().getZExtValue(),
AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
IntegerRangeState CallerRangeState(CallerRange);
Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);

auto CallerWavesPerEU = std::make_pair(
AAWavesPerEU->getAssumed().getLower().getZExtValue(),
AAWavesPerEU->getAssumed().getUpper().getZExtValue() - 1);
DoUpdate(CallerWavesPerEU, CurrentWorkGroupSize);
return true;
};

Expand All @@ -1192,6 +1256,11 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
Attributor &A);

ChangeStatus manifest(Attributor &A) override {
if (isAtInitialState()) {
LLVM_DEBUG(dbgs() << '[' << getName()
<< "] Still at initial state. No manifest.\n";);
return ChangeStatus::UNCHANGED;
}
Function *F = getAssociatedFunction();
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
return emitAttributeIfNotDefaultAfterClamp(
Expand Down
46 changes: 22 additions & 24 deletions llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
Original file line number Diff line number Diff line change
Expand Up @@ -688,7 +688,7 @@ define void @func_call_asm() #3 {
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_asm
; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
; ATTRIBUTOR_HSA-NEXT: call void asm sideeffect "", ""() #[[ATTR26:[0-9]+]]
; ATTRIBUTOR_HSA-NEXT: call void asm sideeffect "", ""() #[[ATTR24:[0-9]+]]
; ATTRIBUTOR_HSA-NEXT: ret void
;
call void asm sideeffect "", ""() #3
Expand Down Expand Up @@ -717,7 +717,7 @@ define amdgpu_kernel void @func_kern_defined() #3 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_kern_defined
; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
; ATTRIBUTOR_HSA-NEXT: call void @defined.func()
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand Down Expand Up @@ -845,7 +845,7 @@ define amdgpu_kernel void @kern_sanitize_address() #4 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_sanitize_address
; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
; ATTRIBUTOR_HSA-NEXT: store volatile i32 0, ptr addrspace(1) null, align 4
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand All @@ -861,7 +861,7 @@ define void @func_sanitize_address() #4 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_sanitize_address
; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
; ATTRIBUTOR_HSA-NEXT: store volatile i32 0, ptr addrspace(1) null, align 4
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand All @@ -877,7 +877,7 @@ define void @func_indirect_sanitize_address() #3 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_sanitize_address
; ATTRIBUTOR_HSA-SAME: () #[[ATTR19:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
; ATTRIBUTOR_HSA-NEXT: call void @func_sanitize_address()
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand All @@ -893,7 +893,7 @@ define amdgpu_kernel void @kern_indirect_sanitize_address() #3 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_indirect_sanitize_address
; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
; ATTRIBUTOR_HSA-NEXT: call void @func_sanitize_address()
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand Down Expand Up @@ -928,7 +928,7 @@ define internal void @enqueue_block_def() #6 {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@enqueue_block_def
; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR21:[0-9]+]] {
; ATTRIBUTOR_HSA-NEXT: ret void
;
ret void
Expand All @@ -941,7 +941,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_decl() {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_decl
; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
; ATTRIBUTOR_HSA-NEXT: call void @enqueue_block_decl()
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand All @@ -956,7 +956,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_def() {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_def
; ATTRIBUTOR_HSA-SAME: () #[[ATTR24:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
; ATTRIBUTOR_HSA-NEXT: call void @enqueue_block_def()
; ATTRIBUTOR_HSA-NEXT: ret void
;
Expand All @@ -969,7 +969,7 @@ define void @unused_enqueue_block() {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@unused_enqueue_block
; ATTRIBUTOR_HSA-SAME: () #[[ATTR25:[0-9]+]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
; ATTRIBUTOR_HSA-NEXT: ret void
;
ret void
Expand All @@ -980,7 +980,7 @@ define internal void @known_func() {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@known_func
; ATTRIBUTOR_HSA-SAME: () #[[ATTR25]] {
; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
; ATTRIBUTOR_HSA-NEXT: ret void
;
ret void
Expand All @@ -994,8 +994,8 @@ define amdgpu_kernel void @kern_callsite_enqueue_block() {
; AKF_HSA-NEXT: ret void
;
; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_callsite_enqueue_block
; ATTRIBUTOR_HSA-SAME: () #[[ATTR24]] {
; ATTRIBUTOR_HSA-NEXT: call void @known_func() #[[ATTR27:[0-9]+]]
; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
; ATTRIBUTOR_HSA-NEXT: call void @known_func() #[[ATTR25:[0-9]+]]
; ATTRIBUTOR_HSA-NEXT: ret void
;
call void @known_func() #6
Expand Down Expand Up @@ -1041,17 +1041,15 @@ attributes #6 = { "enqueued-block" }
; ATTRIBUTOR_HSA: attributes #[[ATTR14]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx900" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR15]] = { nounwind "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR16]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR17]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR18]] = { nounwind sanitize_address "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR19]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR20:[0-9]+]] = { nounwind sanitize_address "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR21:[0-9]+]] = { "enqueued-block" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR22]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "enqueued-block" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR23]] = { "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR24]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR25]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR26]] = { nounwind }
; ATTRIBUTOR_HSA: attributes #[[ATTR27]] = { "enqueued-block" }
; ATTRIBUTOR_HSA: attributes #[[ATTR17]] = { nounwind sanitize_address "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR18]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR19:[0-9]+]] = { nounwind sanitize_address "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR20:[0-9]+]] = { "enqueued-block" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR21]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "enqueued-block" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR22]] = { "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR23]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; ATTRIBUTOR_HSA: attributes #[[ATTR24]] = { nounwind }
; ATTRIBUTOR_HSA: attributes #[[ATTR25]] = { "enqueued-block" }
;.
; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
;.
Expand Down
Loading
Loading