-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Allocate i1 argument to SGPRs #72461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
75f1a46
67365b8
ae46c82
721c34d
ca09ddd
f26afca
26fa9cc
3b323d9
b4c0bb9
ad7d657
d841a49
df1bbe3
4c098fe
e6e574d
4f54c98
a79ddae
d3338c9
4a82212
c0dfff7
6f2289b
265d5c6
6aaa564
4892dc7
da176a8
df3a52e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,38 @@ | |
|
||
using namespace llvm; | ||
|
||
static bool CC_AMDGPU_Custom_I1(unsigned ValNo, MVT ValVT, MVT LocVT, | ||
CCValAssign::LocInfo LocInfo, | ||
ISD::ArgFlagsTy ArgFlags, CCState &State) { | ||
static bool IsWave64 = | ||
State.getMachineFunction().getSubtarget<GCNSubtarget>().isWave64(); | ||
|
||
static const MCPhysReg SGPRArgsWave64[] = { | ||
AMDGPU::SGPR0_SGPR1, AMDGPU::SGPR2_SGPR3, AMDGPU::SGPR4_SGPR5, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to only pass in even aligned registers? We could also include all the aliasing odd-based cases There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By "odd-based cases" do you mean something like SGPR1_SGPR2? Do you have an example where such reg allocation is used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. I mean if you had (i32 inreg %arg0, i64 inreg %arg1), you could use arg0=s0, arg1=s[1:2] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue deals with i1 arg/return only. Inreg args are handled separately. If you are thinking about a mixed case as follows:
Currently for gfx900, %arg0 is assigned to s4, and %arg1 to s[6:7]. If in this case you want %arg1 to be given s[5:6], I suppose the list of registers can be changed from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arsenm Pls let me know if the above is what you wanted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the i64 is physically what is being passed, but doesn't have the alignment requirement in the argument list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems pairs of SGPRs that start with an odd number, e.g., SGPR5_SGPR6, are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you have to unroll them into the 32-bit pieces and reassemble the virtual register |
||
AMDGPU::SGPR6_SGPR7, AMDGPU::SGPR8_SGPR9, AMDGPU::SGPR10_SGPR11, | ||
AMDGPU::SGPR12_SGPR13, AMDGPU::SGPR14_SGPR15, AMDGPU::SGPR16_SGPR17, | ||
AMDGPU::SGPR18_SGPR19, AMDGPU::SGPR20_SGPR21, AMDGPU::SGPR22_SGPR23, | ||
AMDGPU::SGPR24_SGPR25, AMDGPU::SGPR26_SGPR27, AMDGPU::SGPR28_SGPR29}; | ||
|
||
static const MCPhysReg SGPRArgsWave32[] = { | ||
AMDGPU::SGPR0, AMDGPU::SGPR1, AMDGPU::SGPR2, AMDGPU::SGPR3, | ||
AMDGPU::SGPR4, AMDGPU::SGPR5, AMDGPU::SGPR6, AMDGPU::SGPR7, | ||
AMDGPU::SGPR8, AMDGPU::SGPR9, AMDGPU::SGPR10, AMDGPU::SGPR11, | ||
AMDGPU::SGPR12, AMDGPU::SGPR13, AMDGPU::SGPR14, AMDGPU::SGPR15, | ||
AMDGPU::SGPR16, AMDGPU::SGPR17, AMDGPU::SGPR18, AMDGPU::SGPR19, | ||
AMDGPU::SGPR20, AMDGPU::SGPR21, AMDGPU::SGPR22, AMDGPU::SGPR23, | ||
AMDGPU::SGPR24, AMDGPU::SGPR25, AMDGPU::SGPR26, AMDGPU::SGPR27, | ||
AMDGPU::SGPR28, AMDGPU::SGPR29}; | ||
|
||
assert(LocVT == MVT::i1); | ||
if (unsigned Reg = IsWave64 ? State.AllocateReg(SGPRArgsWave64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you avoid the determineAndHandleAssignments changes by using drop_front if !enableFlatScratch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we use drop_front, doesn't that mean we are assuming that the reserved reg is s[0:3]? How about putting in here something like the following:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, could also do that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Decided to create an overload function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See above, might as well just use the 2 pieces of determineAndHandleAssignments instead |
||
: State.AllocateReg(SGPRArgsWave32)) { | ||
State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo)); | ||
return true; | ||
} | ||
return false; // not allocated | ||
} | ||
|
||
#include "AMDGPUGenCallingConv.inc" | ||
|
||
static cl::opt<bool> AMDGPUBypassSlowDiv( | ||
|
@@ -784,6 +816,9 @@ EVT AMDGPUTargetLowering::getTypeForExtReturn(LLVMContext &Context, EVT VT, | |
ISD::NodeType ExtendKind) const { | ||
assert(!VT.isVector() && "only scalar expected"); | ||
|
||
if (VT == MVT::i1) | ||
return MVT::i1; | ||
|
||
// Round to the next multiple of 32-bits. | ||
unsigned Size = VT.getSizeInBits(); | ||
if (Size <= 32) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -131,6 +131,12 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const { | |
Register SrcReg = Src.getReg(); | ||
|
||
if (isVCC(DstReg, *MRI)) { | ||
if (SrcReg.isPhysical() && SrcReg != AMDGPU::SCC) { | ||
const TargetRegisterClass *DstRC = MRI->getRegClassOrNull(DstReg); | ||
if (DstRC) | ||
return DstRC->contains(SrcReg); | ||
} | ||
Comment on lines
+134
to
+138
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you split this into a separate change, with MIR tests? I suspect you can just directly return true. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be split to a separate change |
||
|
||
if (SrcReg == AMDGPU::SCC) { | ||
const TargetRegisterClass *RC | ||
= TRI.getConstrainedRegClassForOperand(Dst, *MRI); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3741,6 +3741,19 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { | |
if (!DstBank) | ||
DstBank = SrcBank; | ||
|
||
// For i1 function arguments, the call of getRegBank() currently gives | ||
// incorrect result. We set both src and dst banks to VCCRegBank. | ||
if (!MI.getOperand(1).getReg().isVirtual() && | ||
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) { | ||
DstBank = SrcBank = &AMDGPU::VCCRegBank; | ||
} | ||
|
||
// For i1 return value, the dst reg is an SReg but we need to set the reg | ||
// bank to VCCRegBank. | ||
if (!MI.getOperand(0).getReg().isVirtual() && | ||
SrcBank == &AMDGPU::VCCRegBank) | ||
DstBank = SrcBank; | ||
|
||
Comment on lines
+3744
to
+3756
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also make this a separate PR (I want 2, one for the RegBankSelect and one for the InstructionSelect parts) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ping There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you saying that the changes in |
||
unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI); | ||
if (MI.getOpcode() != AMDGPU::G_FREEZE && | ||
cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size))) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3026,8 +3026,13 @@ SDValue SITargetLowering::LowerFormalArguments( | |
RC = &AMDGPU::VGPR_32RegClass; | ||
else if (AMDGPU::SGPR_32RegClass.contains(Reg)) | ||
RC = &AMDGPU::SGPR_32RegClass; | ||
else | ||
llvm_unreachable("Unexpected register class in LowerFormalArguments!"); | ||
else { | ||
if (VT == MVT::i1) | ||
RC = Subtarget->getBoolRC(); | ||
else | ||
llvm_unreachable("Unexpected register class in LowerFormalArguments!"); | ||
} | ||
|
||
EVT ValVT = VA.getValVT(); | ||
|
||
Reg = MF.addLiveIn(Reg, RC); | ||
|
@@ -3144,6 +3149,9 @@ SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv, | |
CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), RVLocs, | ||
*DAG.getContext()); | ||
|
||
if (!Subtarget->enableFlatScratch()) | ||
CCInfo.AllocateReg(Info->getScratchRSrcReg()); | ||
|
||
// Analyze outgoing return values. | ||
CCInfo.AnalyzeReturn(Outs, CCAssignFnForReturn(CallConv, isVarArg)); | ||
|
||
|
@@ -3223,6 +3231,13 @@ SDValue SITargetLowering::LowerCallResult( | |
SmallVector<CCValAssign, 16> RVLocs; | ||
CCState CCInfo(CallConv, IsVarArg, DAG.getMachineFunction(), RVLocs, | ||
*DAG.getContext()); | ||
|
||
if (!Subtarget->enableFlatScratch()) { | ||
SIMachineFunctionInfo *FuncInfo = | ||
DAG.getMachineFunction().getInfo<SIMachineFunctionInfo>(); | ||
CCInfo.AllocateReg(FuncInfo->getScratchRSrcReg()); | ||
} | ||
|
||
CCInfo.AnalyzeCallResult(Ins, RetCC); | ||
|
||
// Copy all of the result registers out of their specified physreg. | ||
|
@@ -3234,6 +3249,23 @@ SDValue SITargetLowering::LowerCallResult( | |
Val = DAG.getCopyFromReg(Chain, DL, VA.getLocReg(), VA.getLocVT(), InGlue); | ||
Chain = Val.getValue(1); | ||
InGlue = Val.getValue(2); | ||
|
||
// For i1 return value allocated to an SGPR, the following is a | ||
// workaround before SILowerI1Copies is fixed. Basically we want the | ||
// dst reg for the above CopyFromReg not to be of the VReg_1 class | ||
// when emitting machine code. This workaround creats an addional | ||
// CopyToReg with a new virtual register, followed by another | ||
// CopyFromReg. | ||
jwanggit86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (VA.getLocVT() == MVT::i1) { | ||
const SIRegisterInfo *TRI = Subtarget->getRegisterInfo(); | ||
MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo(); | ||
|
||
if (TRI->isSGPRReg(MRI, VA.getLocReg())) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should always be true if LocVT is i1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So far only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. amdgpu_gfx should be the same. The others are entry points where we can't just change this |
||
Register TmpVReg = MRI.createVirtualRegister(TRI->getBoolRC()); | ||
SDValue TmpCopyTo = DAG.getCopyToReg(Chain, DL, TmpVReg, Val); | ||
Val = DAG.getCopyFromReg(TmpCopyTo, DL, VA.getLocReg(), MVT::i1); | ||
} | ||
} | ||
} else if (VA.isMemLoc()) { | ||
report_fatal_error("TODO: return values in memory"); | ||
} else | ||
|
@@ -3668,6 +3700,17 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI, | |
passSpecialInputs(CLI, CCInfo, *Info, RegsToPass, MemOpChains, Chain); | ||
} | ||
|
||
// In code below (after call of AnalyzeCallOperands), | ||
// if (!Subtarget->enableFlatScratch()), it would use either s[48:51] or | ||
// s[0:3]. Therefore, before calling AnalyzeCallOperands, we may need to | ||
// reserve these registers. | ||
if (!Subtarget->enableFlatScratch()) { | ||
if (IsChainCallConv) | ||
CCInfo.AllocateReg(AMDGPU::SGPR48_SGPR49_SGPR50_SGPR51); | ||
else | ||
CCInfo.AllocateReg(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3); | ||
} | ||
|
||
Comment on lines
+3703
to
+3713
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated change? |
||
CCInfo.AnalyzeCallOperands(Outs, AssignFn); | ||
|
||
// Get a count of how many bytes are to be pushed on the stack. | ||
|
Uh oh!
There was an error while loading. Please reload this page.