-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Set register bank for i1 register copies #96155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Set register bank for i1 register copies #96155
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Jun Wang (jwanggit86) ChangesIn planned work, the calling convention is to be updated such that i1 arguments and return values are assigned to SGPRs. For this change, we need to ensure the register banks are correctly assigned. Full diff: https://github.com/llvm/llvm-project/pull/96155.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 0510a1d2eff88..ddf49f153d2c6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -3745,6 +3745,21 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
if (!DstBank)
DstBank = SrcBank;
+ // The calling convention is to be updated such that i1 function arguments
+ // or return values are assigned to SGPRs without promoting to i32. With
+ // this, for i1 function arguments, the call of getRegBank() above gives
+ // incorrect result. We set both src and dst banks to VCCRegBank.
+ if (!MI.getOperand(1).getReg().isVirtual() &&
+ MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
+ DstBank = SrcBank = &AMDGPU::VCCRegBank;
+ }
+
+ // Similarly, for i1 return value, the dst reg is an SReg but we need to
+ // explicitly set the reg bank to VCCRegBank.
+ if (!MI.getOperand(0).getReg().isVirtual() &&
+ SrcBank == &AMDGPU::VCCRegBank)
+ DstBank = SrcBank;
+
unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
if (MI.getOpcode() != AMDGPU::G_FREEZE &&
cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
|
Needs mir test |
@@ -3745,6 +3745,21 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { | |||
if (!DstBank) | |||
DstBank = SrcBank; | |||
|
|||
// The calling convention is to be updated such that i1 function arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to talk about the calling convention or updates. There's just a consistent interpretation of copy
// or return values are assigned to SGPRs without promoting to i32. With | ||
// this, for i1 function arguments, the call of getRegBank() above gives | ||
// incorrect result. We set both src and dst banks to VCCRegBank. | ||
if (!MI.getOperand(1).getReg().isVirtual() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't directly depend on isVirtual / isPhysical. You should be relying on the bank retrieved above which should be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, once the calling convention for i1 arg is changed, we would have instructions such as %0:_(s1) = COPY $sgpr4_sgpr5
. For this instruction, after the 2 calls of getRegBank()
, DstBank
is null, and SrcBank
is SGPRRegBank
. We want both to be VCCRegBank
, hence the patch.
The reason it returns SGPRRegBank
instead of VCCRegBank
is because of a flaw in RegisterBankInfo::getRegBank()
, as follows:
const RegisterBank *
RegisterBankInfo::getRegBank(Register Reg, const MachineRegisterInfo &MRI,
const TargetRegisterInfo &TRI) const {
if (!Reg.isVirtual()) {
// FIXME: This was probably a copy to a virtual register that does have a
// type we could use.
const TargetRegisterClass *RC = getMinimalPhysRegClass(Reg, TRI);
return RC ? &getRegBankFromRegClass(*RC, LLT()) : nullptr;
}
...
Note that there's a FIXME. In the call of getRegBankFromRegClass
it passes the default type LLT()
, which is not what we want here.
The IF statement in the patch basically says, if the src is physical and dst type is s1, then set the reg bank to VCCRegBank, which corrects the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibly missing piece in the IF is MI.isCopy()
because the code actually can be reached for both COPY and FREEZE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really think I should get rid of the call of isVirtual()
, I've tried the following. All tests have passed, but the condition is less limiting than the original version:
@@ -3735,32 +3735,30 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
const MachineRegisterInfo &MRI = MF.getRegInfo();
if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
// The default logic bothers to analyze impossible alternative mappings. We
// want the most straightforward mapping, so just directly handle this.
const RegisterBank *DstBank = getRegBank(MI.getOperand(0).getReg(), MRI,
*TRI);
const RegisterBank *SrcBank = getRegBank(MI.getOperand(1).getReg(), MRI,
*TRI);
assert(SrcBank && "src bank should have been assigned already");
- if (!DstBank)
- DstBank = SrcBank;
- // The calling convention is to be updated such that i1 function arguments
- // or return values are assigned to SGPRs without promoting to i32. With
- // this, for i1 function arguments, the call of getRegBank() above gives
- // incorrect result. We set both src and dst banks to VCCRegBank.
- if (!MI.getOperand(1).getReg().isVirtual() &&
- MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
+ // For copy from physical reg to s1 dest, the call of getRegBank() above
+ // gives incorrect result. We set both src and dst banks to VCCRegBank.
+ if (MI.isCopy() && !DstBank && MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
DstBank = SrcBank = &AMDGPU::VCCRegBank;
}
+ if (!DstBank)
+ DstBank = SrcBank;
+
Pls let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add MIR tests showing the regclass / regbank / physreg with i1 permutations here first to review this usefully. You shouldn't need to check isCopy again either. Freeze is just the same as copy, except it can't have the physreg inputs
@arsenm ping. |
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) | ||
DstBank = SrcBank = &AMDGPU::VCCRegBank; | ||
// For copy from s1 src to a physical reg, we set both src and dst banks to | ||
// VCCRegBank. | ||
else if (!MI.getOperand(0).getReg().isVirtual() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) | |
DstBank = SrcBank = &AMDGPU::VCCRegBank; | |
// For copy from s1 src to a physical reg, we set both src and dst banks to | |
// VCCRegBank. | |
else if (!MI.getOperand(0).getReg().isVirtual() && | |
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) { | |
DstBank = SrcBank = &AMDGPU::VCCRegBank; | |
// For copy from s1 src to a physical reg, we set both src and dst banks to | |
// VCCRegBank. | |
} else if (!MI.getOperand(0).getReg().isVirtual() && |
// For copy from a physical reg to s1 dest, the call of getRegBank() above | ||
// gives incorrect result. We set both src and dst banks to VCCRegBank. | ||
if (!MI.getOperand(1).getReg().isVirtual() && !DstBank && | ||
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use some SrcReg/DstReg variables to get rid of some of the getOperand noise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
; CHECK-LABEL: name: copy_sgpr_32_to_s1 | ||
; CHECK: liveins: $sgpr0 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:vcc(s1) = COPY $sgpr0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need wave32/wave64 run lines, possibly in a new copy of the test. This case should not use vcc for wave32 (and maybe should be a verifier error)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added run lines for GFX1010.
%1:_(s1) = G_TRUNC %0 | ||
$sgpr0 = COPY %1 | ||
... | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a few AGPR tests for good measure? Also should test some s1 freezes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy from s1 to AGPR is not allowed, causing the following error: "Bad machine code: Copy Instruction is illegal with mismatching sizes".
// For copy from a physical reg to s1 dest, the call of getRegBank() above | ||
// gives incorrect result. We set both src and dst banks to VCCRegBank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should explain why it's incorrect. getRegBank only looks at the register in isolation, and expects the LLT to disambiguate the SGPRs. In the virtual<->physical copy case, we have to look at the other register's type. You could also do this by calling getRegBankFromRegClass directly with the other operand's type.
You should also do this case instead of the default getRegBankAbove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comments. For COPY between a physical reg and an s1, we set dst reg bank to VCCRegBank so that the COPY is allowed. Otherwise, getInstrMapping()
will reject it.
I was not aware of this. Why do you want to change calling convention? BTW, there is planned rework for reg bank select coming soon. You can't really do much with special casing S1. |
Because we are not taking advantage of scalar arguments and return values, and forcing everything into VGPRs, which introduces extra code to convert from and to boolean masks in the caller/callee |
There is inreg, First thing that comes to my mind is to promote Do you have some llvm-ir tests for me to check if it works on new reg bank select. |
Yes, that is the idea.
We definitely do. That is all supposed to be in #72461
It shouldn't need to worry about it, other than correctly handling physreg copies correctly like in this PR |
calling convention update In planned work, the calling convention is to be updated such that i1 arguments and return values are assigned to SGPRs. For this change, we need to ensure the register banks are correctly assigned.
VCCRegBank in getInstrMapping(). Also created mir tests.
dst instead of for both src and dst (3) add run line for GFX10 in test file.
7377060
to
d3d78d1
Compare
; CHECK-LABEL: name: copy_sgpr_32_to_s1 | ||
; CHECK: liveins: $sgpr0 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:vcc(s1) = COPY $sgpr0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wave64 case shouldn't have identified this as vcc (but we should have rejected this in the machine verifier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For wave64, removed the checks involving copy 32b physical reg to vcc.
@@ -1,6 +1,8 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | |||
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -run-pass=amdgpu-regbankselect -regbankselect-fast -verify-machineinstrs %s -o - | FileCheck %s | |||
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -run-pass=amdgpu-regbankselect -regbankselect-greedy -verify-machineinstrs %s -o - | FileCheck %s | |||
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=amdgpu-regbankselect -regbankselect-fast -verify-machineinstrs %s -o - | FileCheck --check-prefix=GFX10 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to name the check prefix WAVE32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
; GFX10-LABEL: name: copy_sgpr_64_and_sgpr_32_to_s1 | ||
; GFX10: liveins: $sgpr6, $sgpr4_sgpr5 | ||
; GFX10-NEXT: {{ $}} | ||
; GFX10-NEXT: [[COPY1:%[0-9]+]]:vcc(s1) = COPY $sgpr4_sgpr5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be vcc in wave32 (but should probably just be a verifier error, so I guess it doesn't matter. If we fix the lack of error the tests need splitting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For wave32, removed the checks involving copy 64b physical reg to vcc.
// For COPY between a physical reg and an s1, set dst bank to VCCRegBank | ||
// so that the copy is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// For COPY between a physical reg and an s1, set dst bank to VCCRegBank | |
// so that the copy is allowed. | |
// For physical registers, there is no type associated so we need to take the virtual register's type | |
// as a hint on how to interpret s1 values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
some checks involving vcc.
In planned work, the calling convention is to be updated such that i1 arguments and return values are assigned to SGPRs. For this change, we need to ensure the register banks are correctly assigned.