Skip to content

AMDGPU/GlobalISel: AMDGPURegBankSelect #112863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,16 @@
//===----------------------------------------------------------------------===//

#include "AMDGPUGlobalISelUtils.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/CodeGenTypes/LowLevelType.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"

using namespace llvm;
using namespace AMDGPU;
using namespace MIPatternMatch;

std::pair<Register, unsigned>
Expand Down Expand Up @@ -68,3 +72,37 @@ AMDGPU::getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg,

return std::pair(Reg, 0);
}

IntrinsicLaneMaskAnalyzer::IntrinsicLaneMaskAnalyzer(MachineFunction &MF)
: MRI(MF.getRegInfo()) {
initLaneMaskIntrinsics(MF);
}

bool IntrinsicLaneMaskAnalyzer::isS32S64LaneMask(Register Reg) const {
return S32S64LaneMask.contains(Reg);
}

void IntrinsicLaneMaskAnalyzer::initLaneMaskIntrinsics(MachineFunction &MF) {
for (auto &MBB : MF) {
for (auto &MI : MBB) {
GIntrinsic *GI = dyn_cast<GIntrinsic>(&MI);
if (GI && GI->is(Intrinsic::amdgcn_if_break)) {
S32S64LaneMask.insert(MI.getOperand(3).getReg());
findLCSSAPhi(MI.getOperand(0).getReg());
}

if (MI.getOpcode() == AMDGPU::SI_IF ||
MI.getOpcode() == AMDGPU::SI_ELSE) {
findLCSSAPhi(MI.getOperand(0).getReg());
}
Comment on lines +94 to +97
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you are mixing matching the intrinsic form of if.break above, but the selected pseudos for if and else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consequence of what legalizer does, si.if and si.else are inst-selected to SI_IF and SI_ELSE in AMDGPULegalizerInfo::legalizeIntrinsic, if.break is still intrinsic for reg bank selection

}
}
}

void IntrinsicLaneMaskAnalyzer::findLCSSAPhi(Register Reg) {
S32S64LaneMask.insert(Reg);
for (const MachineInstr &LCSSAPhi : MRI.use_instructions(Reg)) {
if (LCSSAPhi.isPHI())
S32S64LaneMask.insert(LCSSAPhi.getOperand(0).getReg());
}
}
Comment on lines +102 to +108
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: How sure are you/we that there can never be "nested" phis here, perhaps due to breaking out of nested loops?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntrinsicLaneMaskAnalyzer is very very trivial. It only handles LCSSA phis and if.break phis.

Control flow intrinsic should only be used by other control flow intrinsics. But since we run LCSSA they can also be used by lcssa-phis.

findLCSSAPhi is really meant to cover intrinsics used for control flow that are affected by LCSSA pass.
%64:_(s32) = G_PHI %15(s32), %bb.3 is lane mask. Uniformity analysis says it is divergent. If it was S1 we would be fine but since it is S32 we need to special case put it to sgpr.

    %15:sreg_32_xm0_xexec(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.if.break), %45(s1), %14(s32)
    SI_LOOP %15(s32), %bb.1, implicit-def $exec, implicit-def $scc, implicit $exec
    G_BR %bb.6

  bb.6:
    %64:_(s32) = G_PHI %15(s32), %bb.3
    G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.end.cf), %64(s32)
    S_ENDPGM 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just working around a bug in uniformity analysis? We already special case amdgcn_if and else in isAlwaysUniform. Did this just miss if.break?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if.break is already forced uniform.
Here we special case lcssa-phi lane masks.
Phi is divergent because it uses value(uniform if.break in this case) outside cycle with the divergent exit, but that phi is lane mask so it needs to end up in sgpr.

22 changes: 22 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUGLOBALISELUTILS_H
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUGLOBALISELUTILS_H

#include "llvm/ADT/DenseSet.h"
#include "llvm/CodeGen/Register.h"
#include <utility>

Expand All @@ -18,6 +19,7 @@ class MachineRegisterInfo;
class GCNSubtarget;
class GISelKnownBits;
class LLT;
class MachineFunction;

namespace AMDGPU {

Expand All @@ -26,6 +28,26 @@ std::pair<Register, unsigned>
getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg,
GISelKnownBits *KnownBits = nullptr,
bool CheckNUW = false);

// Currently finds S32/S64 lane masks that can be declared as divergent by
// uniformity analysis (all are phis at the moment).
// These are defined as i32/i64 in some IR intrinsics (not as i1).
// Tablegen forces(via telling that lane mask IR intrinsics are uniform) most of
// S32/S64 lane masks to be uniform, as this results in them ending up with sgpr
// reg class after instruction-select, don't search for all of them.
class IntrinsicLaneMaskAnalyzer {
SmallDenseSet<Register, 8> S32S64LaneMask;
MachineRegisterInfo &MRI;

public:
IntrinsicLaneMaskAnalyzer(MachineFunction &MF);
bool isS32S64LaneMask(Register Reg) const;

private:
void initLaneMaskIntrinsics(MachineFunction &MF);
// This will not be needed when we turn off LCSSA for global-isel.
void findLCSSAPhi(Register Reg);
};
}
}

Expand Down
206 changes: 205 additions & 1 deletion llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,18 @@
//===----------------------------------------------------------------------===//

#include "AMDGPU.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "AMDGPUGlobalISelUtils.h"
#include "GCNSubtarget.h"
#include "llvm/CodeGen/GlobalISel/CSEInfo.h"
#include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h"
#include "llvm/CodeGen/MachineUniformityAnalysis.h"
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/InitializePasses.h"

#define DEBUG_TYPE "amdgpu-regbankselect"

using namespace llvm;
using namespace AMDGPU;

namespace {

Expand All @@ -40,6 +46,9 @@ class AMDGPURegBankSelect : public MachineFunctionPass {
}

void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetPassConfig>();
AU.addRequired<GISelCSEAnalysisWrapperPass>();
AU.addRequired<MachineUniformityAnalysisPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}

Expand All @@ -55,6 +64,9 @@ class AMDGPURegBankSelect : public MachineFunctionPass {

INITIALIZE_PASS_BEGIN(AMDGPURegBankSelect, DEBUG_TYPE,
"AMDGPU Register Bank Select", false, false)
INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
INITIALIZE_PASS_DEPENDENCY(GISelCSEAnalysisWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MachineUniformityAnalysisPass)
INITIALIZE_PASS_END(AMDGPURegBankSelect, DEBUG_TYPE,
"AMDGPU Register Bank Select", false, false)

Expand All @@ -66,9 +78,201 @@ FunctionPass *llvm::createAMDGPURegBankSelectPass() {
return new AMDGPURegBankSelect();
}

class RegBankSelectHelper {
MachineIRBuilder &B;
MachineRegisterInfo &MRI;
AMDGPU::IntrinsicLaneMaskAnalyzer &ILMA;
const MachineUniformityInfo &MUI;
const RegisterBank *SgprRB;
const RegisterBank *VgprRB;
const RegisterBank *VccRB;

public:
RegBankSelectHelper(MachineIRBuilder &B,
AMDGPU::IntrinsicLaneMaskAnalyzer &ILMA,
const MachineUniformityInfo &MUI,
const RegisterBankInfo &RBI)
: B(B), MRI(*B.getMRI()), ILMA(ILMA), MUI(MUI),
SgprRB(&RBI.getRegBank(AMDGPU::SGPRRegBankID)),
VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)),
VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {}

const RegisterBank *getRegBankToAssign(Register Reg) {
if (MUI.isUniform(Reg) || ILMA.isS32S64LaneMask(Reg))
return SgprRB;
if (MRI.getType(Reg) == LLT::scalar(1))
return VccRB;
return VgprRB;
}

// %rc:RegClass(s32) = G_ ...
// ...
// %a = G_ ..., %rc
// ->
// %rb:RegBank(s32) = G_ ...
// %rc:RegClass(s32) = COPY %rb
// ...
// %a = G_ ..., %rb
void reAssignRegBankOnDef(MachineInstr &MI, MachineOperand &DefOP,
const RegisterBank *RB) {
// Register that already has Register class got it during pre-inst selection
// of another instruction. Maybe cross bank copy was required so we insert a
// copy that can be removed later. This simplifies post regbanklegalize
// combiner and avoids need to special case some patterns.
Register Reg = DefOP.getReg();
LLT Ty = MRI.getType(Reg);
Register NewReg = MRI.createVirtualRegister({RB, Ty});
DefOP.setReg(NewReg);

auto &MBB = *MI.getParent();
B.setInsertPt(MBB, MBB.SkipPHIsAndLabels(std::next(MI.getIterator())));
B.buildCopy(Reg, NewReg);

// The problem was discovered for uniform S1 that was used as both
// lane mask(vcc) and regular sgpr S1.
// - lane-mask(vcc) use was by si_if, this use is divergent and requires
// non-trivial sgpr-S1-to-vcc copy. But pre-inst-selection of si_if sets
// sreg_64_xexec(S1) on def of uniform S1 making it lane-mask.
// - the regular sgpr S1(uniform) instruction is now broken since
// it uses sreg_64_xexec(S1) which is divergent.

// Replace virtual registers with register class on generic instructions
// uses with virtual registers with register bank.
for (auto &UseMI : make_early_inc_range(MRI.use_instructions(Reg))) {
if (UseMI.isPreISelOpcode()) {
for (MachineOperand &Op : UseMI.operands()) {
if (Op.isReg() && Op.getReg() == Reg)
Op.setReg(NewReg);
}
}
}
}

// %a = G_ ..., %rc
// ->
// %rb:RegBank(s32) = COPY %rc
// %a = G_ ..., %rb
void constrainRegBankUse(MachineInstr &MI, MachineOperand &UseOP,
const RegisterBank *RB) {
Register Reg = UseOP.getReg();

LLT Ty = MRI.getType(Reg);
Register NewReg = MRI.createVirtualRegister({RB, Ty});
UseOP.setReg(NewReg);

if (MI.isPHI()) {
auto DefMI = MRI.getVRegDef(Reg)->getIterator();
MachineBasicBlock *DefMBB = DefMI->getParent();
B.setInsertPt(*DefMBB, DefMBB->SkipPHIsAndLabels(std::next(DefMI)));
} else {
B.setInstr(MI);
}

B.buildCopy(NewReg, Reg);
}
};

static Register getVReg(MachineOperand &Op) {
if (!Op.isReg())
return {};

// Operands of COPY and G_SI_CALL can be physical registers.
Register Reg = Op.getReg();
if (!Reg.isVirtual())
return {};

return Reg;
}

bool AMDGPURegBankSelect::runOnMachineFunction(MachineFunction &MF) {
if (MF.getProperties().hasProperty(
MachineFunctionProperties::Property::FailedISel))
return false;

// Setup the instruction builder with CSE.
const TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
GISelCSEAnalysisWrapper &Wrapper =
getAnalysis<GISelCSEAnalysisWrapperPass>().getCSEWrapper();
GISelCSEInfo &CSEInfo = Wrapper.get(TPC.getCSEConfig());
GISelObserverWrapper Observer;
Observer.addObserver(&CSEInfo);

CSEMIRBuilder B(MF);
B.setCSEInfo(&CSEInfo);
B.setChangeObserver(Observer);

RAIIDelegateInstaller DelegateInstaller(MF, &Observer);
RAIIMFObserverInstaller MFObserverInstaller(MF, Observer);

IntrinsicLaneMaskAnalyzer ILMA(MF);
MachineUniformityInfo &MUI =
getAnalysis<MachineUniformityAnalysisPass>().getUniformityInfo();
MachineRegisterInfo &MRI = *B.getMRI();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
RegBankSelectHelper RBSHelper(B, ILMA, MUI, *ST.getRegBankInfo());
// Virtual registers at this point don't have register banks.
// Virtual registers in def and use operands of already inst-selected
// instruction have register class.

for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {
// Vregs in def and use operands of COPY can have either register class
// or bank. If there is neither on vreg in def operand, assign bank.
if (MI.isCopy()) {
Register DefReg = getVReg(MI.getOperand(0));
if (!DefReg.isValid() || MRI.getRegClassOrNull(DefReg))
continue;

assert(!MRI.getRegBankOrNull(DefReg));
MRI.setRegBank(DefReg, *RBSHelper.getRegBankToAssign(DefReg));
continue;
}

if (!MI.isPreISelOpcode())
continue;

// Vregs in def and use operands of G_ instructions need to have register
// banks assigned. Before this loop possible case are
// - (1) vreg without register class or bank in def or use operand
// - (2) vreg with register class in def operand
// - (3) vreg, defined by G_ instruction, in use operand
// - (4) vreg, defined by pre-inst-selected instruction, in use operand

// First three cases are handled in loop through all def operands of G_
// instructions. For case (1) simply setRegBank. Cases (2) and (3) are
// handled by reAssignRegBankOnDef.
for (MachineOperand &DefOP : MI.defs()) {
Register DefReg = getVReg(DefOP);
if (!DefReg.isValid())
continue;

const RegisterBank *RB = RBSHelper.getRegBankToAssign(DefReg);
if (MRI.getRegClassOrNull(DefReg))
RBSHelper.reAssignRegBankOnDef(MI, DefOP, RB);
else {
assert(!MRI.getRegBankOrNull(DefReg));
MRI.setRegBank(DefReg, *RB);
}
}

// Register bank select doesn't modify pre-inst-selected instructions.
// For case (4) need to insert a copy, handled by constrainRegBankUse.
for (MachineOperand &UseOP : MI.uses()) {
Register UseReg = getVReg(UseOP);
if (!UseReg.isValid())
continue;

// Skip case (3).
if (!MRI.getRegClassOrNull(UseReg) ||
MRI.getVRegDef(UseReg)->isPreISelOpcode())
continue;

// Use with register class defined by pre-inst-selected instruction.
const RegisterBank *RB = RBSHelper.getRegBankToAssign(UseReg);
RBSHelper.constrainRegBankUse(MI, UseOP, RB);
}
}
}

return true;
}
Loading
Loading