-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Promote uniform ops to i32 in GISel #106557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,9 @@ class AMDGPURegBankCombinerImpl : public Combiner { | |
void applyMed3(MachineInstr &MI, Med3MatchInfo &MatchInfo) const; | ||
void applyClamp(MachineInstr &MI, Register &Reg) const; | ||
|
||
bool matchPromote16to32(MachineInstr &MI) const; | ||
void applyPromote16to32(MachineInstr &MI) const; | ||
|
||
private: | ||
SIModeRegisterDefaults getMode() const; | ||
bool getIEEE() const; | ||
|
@@ -348,6 +351,112 @@ bool AMDGPURegBankCombinerImpl::matchFPMed3ToClamp(MachineInstr &MI, | |
return false; | ||
} | ||
|
||
bool AMDGPURegBankCombinerImpl::matchPromote16to32(MachineInstr &MI) const { | ||
Register Dst = MI.getOperand(0).getReg(); | ||
LLT DstTy = MRI.getType(Dst); | ||
const auto *RB = MRI.getRegBankOrNull(Dst); | ||
|
||
// Only promote between 2 and 16 bits. | ||
// For ICMP use the LHS of the comparison to get the type. | ||
unsigned TyOpIdx = (MI.getOpcode() == AMDGPU::G_ICMP) ? 2 : 0; | ||
LLT OpTy = MRI.getType(MI.getOperand(TyOpIdx).getReg()); | ||
if (OpTy.getScalarSizeInBits() < 2 || OpTy.getScalarSizeInBits() > 16) | ||
return false; | ||
|
||
// Only promote uniform instructions. | ||
if (RB->getID() != AMDGPU::SGPRRegBankID) | ||
return false; | ||
Comment on lines
+366
to
+368
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check this first, up when RB is assigned. Also this is a case where really what we care about is scalar, not uniformity |
||
|
||
// TODO: Support vectors. Vectors will create illegal ops, such as | ||
// 2x32 exts, that we'd need to legalize. | ||
// We could just scalarize all vectors but then we don't respect | ||
// the legalizer's rules. Ideally we should be able to call | ||
// the legalizer here, or this should move into the legalizer | ||
// if it can tell between uniform and non-uniform values at | ||
// some point. | ||
if (DstTy.isVector()) | ||
return false; | ||
|
||
// Promote only if: | ||
// - We have 16 bit insts (not true 16 bit insts). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand the rationale here. Why would you only promote uniform 16-bit ops if you do have 16-bit VALU instructions? (I guess if you do not have 16-bit VALU instructions then these 16-bit ops would already have been legalized to 32-bit. But that does not really explain why it makes sense to check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I borrowed this from AMDGPUCodeGenPrepare, @arsenm wrote it I think? I also don't think all of the operations are legalized to 32 bits, at least not in the DAG part. I need to look a bit longer the GlobalISel case (my focus has been on the DAG and I want to land that first) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, but that should already be obvious from what types are legal at this point |
||
// - This is already checked by the predicate on the combine rule. | ||
// - We don't have packed instructions (for vector types only). | ||
// TODO: For vector types, the set of packed operations is more limited, so | ||
// may want to promote some anyway. | ||
assert(STI.has16BitInsts()); | ||
return (DstTy.isVector() ? !STI.hasVOP3PInsts() : true); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an and and doesn't need parentheses. Shouldn't need the assert, it's implied by legality |
||
} | ||
|
||
static unsigned getExtOpcodeForPromotedOp(MachineInstr &MI) { | ||
switch (MI.getOpcode()) { | ||
case AMDGPU::G_ASHR: | ||
case AMDGPU::G_SMIN: | ||
case AMDGPU::G_SMAX: | ||
return AMDGPU::G_SEXT; | ||
case AMDGPU::G_LSHR: | ||
case AMDGPU::G_UMIN: | ||
case AMDGPU::G_UMAX: | ||
return AMDGPU::G_ZEXT; | ||
case AMDGPU::G_ADD: | ||
case AMDGPU::G_SUB: | ||
case AMDGPU::G_AND: | ||
case AMDGPU::G_OR: | ||
case AMDGPU::G_XOR: | ||
case AMDGPU::G_SHL: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strictly speaking for all shifts the RHS should use ZEXT. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They should be able to use G_ANYEXT, only the bottom 5 bits are read (but I suppose the generic shifts don't know that) |
||
case AMDGPU::G_SELECT: | ||
case AMDGPU::G_MUL: | ||
// operation result won't be influenced by garbage high bits. | ||
return AMDGPU::G_ANYEXT; | ||
case AMDGPU::G_ICMP: { | ||
return CmpInst::isSigned(cast<GICmp>(MI).getCond()) ? AMDGPU::G_SEXT | ||
: AMDGPU::G_ZEXT; | ||
} | ||
default: | ||
llvm_unreachable("unexpected opcode!"); | ||
} | ||
} | ||
|
||
void AMDGPURegBankCombinerImpl::applyPromote16to32(MachineInstr &MI) const { | ||
const unsigned Opc = MI.getOpcode(); | ||
const unsigned ExtOpc = getExtOpcodeForPromotedOp(MI); | ||
|
||
Register Dst = MI.getOperand(0).getReg(); | ||
|
||
const bool IsSelectOrCmp = (Opc == AMDGPU::G_SELECT || Opc == AMDGPU::G_ICMP); | ||
const bool IsShift = | ||
(Opc == AMDGPU::G_ASHR || Opc == AMDGPU::G_LSHR || Opc == AMDGPU::G_SHL); | ||
Register LHS = MI.getOperand(IsSelectOrCmp + 1).getReg(); | ||
Register RHS = MI.getOperand(IsSelectOrCmp + 2).getReg(); | ||
|
||
assert(MRI.getRegBankOrNull(Dst)->getID() == AMDGPU::SGPRRegBankID && | ||
MRI.getRegBankOrNull(LHS)->getID() == AMDGPU::SGPRRegBankID && | ||
MRI.getRegBankOrNull(RHS)->getID() == AMDGPU::SGPRRegBankID); | ||
|
||
const RegisterBank &RB = *MRI.getRegBankOrNull(Dst); | ||
LLT S32 = MRI.getType(LHS).changeElementSize(32); | ||
|
||
B.setInstrAndDebugLoc(MI); | ||
LHS = B.buildInstr(ExtOpc, {S32}, {LHS}).getReg(0); | ||
RHS = B.buildInstr(IsShift ? AMDGPU::G_ZEXT : ExtOpc, {S32}, {RHS}).getReg(0); | ||
|
||
MRI.setRegBank(LHS, RB); | ||
MRI.setRegBank(RHS, RB); | ||
|
||
MachineInstr *NewInst; | ||
if (IsSelectOrCmp) | ||
NewInst = B.buildInstr(Opc, {Dst}, {MI.getOperand(1), LHS, RHS}); | ||
else | ||
NewInst = B.buildInstr(Opc, {S32}, {LHS, RHS}); | ||
|
||
if (Opc != AMDGPU::G_ICMP) { | ||
Register Dst32 = NewInst->getOperand(0).getReg(); | ||
MRI.setRegBank(Dst32, RB); | ||
B.buildTrunc(Dst, Dst32); | ||
} | ||
|
||
MI.eraseFromParent(); | ||
} | ||
|
||
void AMDGPURegBankCombinerImpl::applyClamp(MachineInstr &MI, | ||
Register &Reg) const { | ||
B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {Reg}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally would use RBI.getRegBank to handle the case where the register has a concrete register class. Failing that, this needs to return false on null