Skip to content

[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass #118750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1709,6 +1709,20 @@ The AMDGPU backend supports the following LLVM IR attributes.
as hidden. Hidden arguments are managed by the compiler and are not part of
the explicit arguments supplied by the user.

"amdgpu-sgpr-hazard-wait" Disabled SGPR hazard wait insertion if set to 0.
Exists for testing performance impact of SGPR hazard waits only.

"amdgpu-sgpr-hazard-boundary-cull" Enable insertion of SGPR hazard cull sequences at function call boundaries.
Cull sequence reduces future hazard waits, but has a performance cost.

"amdgpu-sgpr-hazard-mem-wait-cull" Enable insertion of SGPR hazard cull sequences before memory waits.
Cull sequence reduces future hazard waits, but has a performance cost.
Attempt to amortize cost by overlapping with memory accesses.

"amdgpu-sgpr-hazard-mem-wait-cull-threshold"
Sets the number of active SGPR hazards that must be present before
inserting a cull sequence at a memory wait.

======================================= ==========================================================

Calling Conventions
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPU.h
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,9 @@ void initializeAMDGPUSetWavePriorityPass(PassRegistry &);
void initializeGCNRewritePartialRegUsesPass(llvm::PassRegistry &);
extern char &GCNRewritePartialRegUsesID;

void initializeAMDGPUWaitSGPRHazardsLegacyPass(PassRegistry &);
extern char &AMDGPUWaitSGPRHazardsLegacyID;

namespace AMDGPU {
enum TargetIndex {
TI_CONSTDATA_START,
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include "AMDGPUTargetObjectFile.h"
#include "AMDGPUTargetTransformInfo.h"
#include "AMDGPUUnifyDivergentExitNodes.h"
#include "AMDGPUWaitSGPRHazards.h"
#include "GCNDPPCombine.h"
#include "GCNIterativeScheduler.h"
#include "GCNSchedStrategy.h"
Expand Down Expand Up @@ -549,6 +550,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeGCNRewritePartialRegUsesPass(*PR);
initializeGCNRegPressurePrinterPass(*PR);
initializeAMDGPUPreloadKernArgPrologLegacyPass(*PR);
initializeAMDGPUWaitSGPRHazardsLegacyPass(*PR);
}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
Expand Down Expand Up @@ -1678,6 +1680,8 @@ void GCNPassConfig::addPreEmitPass() {
// cases.
addPass(&PostRAHazardRecognizerID);

addPass(&AMDGPUWaitSGPRHazardsLegacyID);

if (isPassEnabled(EnableInsertDelayAlu, CodeGenOptLevel::Less))
addPass(&AMDGPUInsertDelayAluID);

Expand Down
Loading
Loading