Skip to content

[AMDGPU] Set register bank for i1 register copies #96155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jwanggit86
Copy link
Contributor

In planned work, the calling convention is to be updated such that i1 arguments and return values are assigned to SGPRs. For this change, we need to ensure the register banks are correctly assigned.

@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)

Changes

In planned work, the calling convention is to be updated such that i1 arguments and return values are assigned to SGPRs. For this change, we need to ensure the register banks are correctly assigned.


Full diff: https://github.com/llvm/llvm-project/pull/96155.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+15)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 0510a1d2eff88..ddf49f153d2c6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -3745,6 +3745,21 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
     if (!DstBank)
       DstBank = SrcBank;
 
+    // The calling convention is to be updated such that i1 function arguments
+    // or return values are assigned to SGPRs without promoting to i32. With
+    // this, for i1 function arguments, the call of getRegBank() above gives
+    // incorrect result. We set both src and dst banks to VCCRegBank.
+    if (!MI.getOperand(1).getReg().isVirtual() &&
+        MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
+      DstBank = SrcBank = &AMDGPU::VCCRegBank;
+    }
+
+    // Similarly, for i1 return value, the dst reg is an SReg but we need to
+    // explicitly set the reg bank to VCCRegBank.
+    if (!MI.getOperand(0).getReg().isVirtual() &&
+        SrcBank == &AMDGPU::VCCRegBank)
+      DstBank = SrcBank;
+
     unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
     if (MI.getOpcode() != AMDGPU::G_FREEZE &&
         cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))

@arsenm
Copy link
Contributor

arsenm commented Jun 20, 2024

Needs mir test

@@ -3745,6 +3745,21 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
if (!DstBank)
DstBank = SrcBank;

// The calling convention is to be updated such that i1 function arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to talk about the calling convention or updates. There's just a consistent interpretation of copy

// or return values are assigned to SGPRs without promoting to i32. With
// this, for i1 function arguments, the call of getRegBank() above gives
// incorrect result. We set both src and dst banks to VCCRegBank.
if (!MI.getOperand(1).getReg().isVirtual() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't directly depend on isVirtual / isPhysical. You should be relying on the bank retrieved above which should be correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, once the calling convention for i1 arg is changed, we would have instructions such as %0:_(s1) = COPY $sgpr4_sgpr5. For this instruction, after the 2 calls of getRegBank(), DstBank is null, and SrcBank is SGPRRegBank. We want both to be VCCRegBank, hence the patch.

The reason it returns SGPRRegBank instead of VCCRegBank is because of a flaw in RegisterBankInfo::getRegBank(), as follows:

const RegisterBank *
RegisterBankInfo::getRegBank(Register Reg, const MachineRegisterInfo &MRI,
                             const TargetRegisterInfo &TRI) const {
  if (!Reg.isVirtual()) {
    // FIXME: This was probably a copy to a virtual register that does have a
    // type we could use.
    const TargetRegisterClass *RC = getMinimalPhysRegClass(Reg, TRI);
    return RC ? &getRegBankFromRegClass(*RC, LLT()) : nullptr;  
  }
...

Note that there's a FIXME. In the call of getRegBankFromRegClass it passes the default type LLT(), which is not what we want here.
The IF statement in the patch basically says, if the src is physical and dst type is s1, then set the reg bank to VCCRegBank, which corrects the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibly missing piece in the IF is MI.isCopy() because the code actually can be reached for both COPY and FREEZE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really think I should get rid of the call of isVirtual(), I've tried the following. All tests have passed, but the condition is less limiting than the original version:

@@ -3735,32 +3735,30 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
   const MachineRegisterInfo &MRI = MF.getRegInfo();

   if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
     // The default logic bothers to analyze impossible alternative mappings. We
     // want the most straightforward mapping, so just directly handle this.
     const RegisterBank *DstBank = getRegBank(MI.getOperand(0).getReg(), MRI,
                                              *TRI);
     const RegisterBank *SrcBank = getRegBank(MI.getOperand(1).getReg(), MRI,
                                              *TRI);
     assert(SrcBank && "src bank should have been assigned already");
-    if (!DstBank)
-      DstBank = SrcBank;

-    // The calling convention is to be updated such that i1 function arguments
-    // or return values are assigned to SGPRs without promoting to i32. With
-    // this, for i1 function arguments, the call of getRegBank() above gives
-    // incorrect result. We set both src and dst banks to VCCRegBank.
-    if (!MI.getOperand(1).getReg().isVirtual() &&
-        MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
+    // For copy from physical reg to s1 dest, the call of getRegBank() above
+    // gives incorrect result. We set both src and dst banks to VCCRegBank.
+    if (MI.isCopy() && !DstBank && MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
       DstBank = SrcBank = &AMDGPU::VCCRegBank;
     }

+    if (!DstBank)
+      DstBank = SrcBank;
+

Pls let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add MIR tests showing the regclass / regbank / physreg with i1 permutations here first to review this usefully. You shouldn't need to check isCopy again either. Freeze is just the same as copy, except it can't have the physreg inputs

@jwanggit86
Copy link
Contributor Author

@arsenm ping.

@arsenm arsenm changed the title [AMDGPU] Set register bank for i1 arguments/return values [AMDGPU] Set register bank for i1 register copies Jul 22, 2024
Comment on lines 3749 to 3753
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1))
DstBank = SrcBank = &AMDGPU::VCCRegBank;
// For copy from s1 src to a physical reg, we set both src and dst banks to
// VCCRegBank.
else if (!MI.getOperand(0).getReg().isVirtual() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1))
DstBank = SrcBank = &AMDGPU::VCCRegBank;
// For copy from s1 src to a physical reg, we set both src and dst banks to
// VCCRegBank.
else if (!MI.getOperand(0).getReg().isVirtual() &&
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1)) {
DstBank = SrcBank = &AMDGPU::VCCRegBank;
// For copy from s1 src to a physical reg, we set both src and dst banks to
// VCCRegBank.
} else if (!MI.getOperand(0).getReg().isVirtual() &&

// For copy from a physical reg to s1 dest, the call of getRegBank() above
// gives incorrect result. We set both src and dst banks to VCCRegBank.
if (!MI.getOperand(1).getReg().isVirtual() && !DstBank &&
MRI.getType(MI.getOperand(0).getReg()) == LLT::scalar(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use some SrcReg/DstReg variables to get rid of some of the getOperand noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

; CHECK-LABEL: name: copy_sgpr_32_to_s1
; CHECK: liveins: $sgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:vcc(s1) = COPY $sgpr0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need wave32/wave64 run lines, possibly in a new copy of the test. This case should not use vcc for wave32 (and maybe should be a verifier error)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added run lines for GFX1010.

%1:_(s1) = G_TRUNC %0
$sgpr0 = COPY %1
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a few AGPR tests for good measure? Also should test some s1 freezes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy from s1 to AGPR is not allowed, causing the following error: "Bad machine code: Copy Instruction is illegal with mismatching sizes".

Comment on lines 3746 to 3747
// For copy from a physical reg to s1 dest, the call of getRegBank() above
// gives incorrect result. We set both src and dst banks to VCCRegBank.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should explain why it's incorrect. getRegBank only looks at the register in isolation, and expects the LLT to disambiguate the SGPRs. In the virtual<->physical copy case, we have to look at the other register's type. You could also do this by calling getRegBankFromRegClass directly with the other operand's type.

You should also do this case instead of the default getRegBankAbove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments. For COPY between a physical reg and an s1, we set dst reg bank to VCCRegBank so that the COPY is allowed. Otherwise, getInstrMapping() will reject it.

@jayfoad jayfoad requested a review from petar-avramovic July 23, 2024 09:31
@petar-avramovic
Copy link
Collaborator

// The calling convention is to be updated such that i1 function arguments
// or return values are assigned to SGPRs without promoting to i32.

I was not aware of this. Why do you want to change calling convention?

BTW, there is planned rework for reg bank select coming soon. You can't really do much with special casing S1.

@arsenm
Copy link
Contributor

arsenm commented Jul 24, 2024

// The calling convention is to be updated such that i1 function arguments
// or return values are assigned to SGPRs without promoting to i32.

I was not aware of this. Why do you want to change calling convention?

Because we are not taking advantage of scalar arguments and return values, and forcing everything into VGPRs, which introduces extra code to convert from and to boolean masks in the caller/callee

@petar-avramovic
Copy link
Collaborator

petar-avramovic commented Jul 24, 2024

// The calling convention is to be updated such that i1 function arguments
// or return values are assigned to SGPRs without promoting to i32.

I was not aware of this. Why do you want to change calling convention?

Because we are not taking advantage of scalar arguments and return values, and forcing everything into VGPRs, which introduces extra code to convert from and to boolean masks in the caller/callee

There is inreg, i1 inreg %uniform_bool should be promoted S32 and i1 %lanemask is VCC (no promoting to i32, register size should match wave32/64). We might have to teach uniformity analysis to recognize this. Or just do it in reg-bank-select, there is piece of code that recognizes lane masks that uniformity analysis can't recognize.

First thing that comes to my mind is to promote i1 inreg to S32 in argument lowering, and for i1 (the lane mask) put register class + S1 on destination of COPY from physical register.

Do you have some llvm-ir tests for me to check if it works on new reg bank select.

@arsenm
Copy link
Contributor

arsenm commented Jul 24, 2024

There is inreg, i1 inreg %uniform_bool should be promoted S32 and i1 %lanemask is VCC (no promoting to i32, register size should match wave32/64).

Yes, that is the idea.

We might have to teach uniformity analysis to recognize this.

We definitely do. That is all supposed to be in #72461

Do you have some llvm-ir tests for me to check if it works on new reg bank select.

It shouldn't need to worry about it, other than correctly handling physreg copies correctly like in this PR

Jun Wang added 3 commits July 24, 2024 19:23
calling convention update

In planned work, the calling convention is to be updated such that
i1 arguments and return values are assigned to SGPRs. For this change,
we need to ensure the register banks are correctly assigned.
VCCRegBank
in getInstrMapping(). Also created mir tests.
dst instead of for both src and dst (3) add run line for GFX10 in
test file.
@jwanggit86 jwanggit86 force-pushed the Separate-PR-for-AMDGPURegisterBankInfo branch from 7377060 to d3d78d1 Compare July 29, 2024 22:29
@jwanggit86 jwanggit86 requested a review from arsenm July 29, 2024 23:50
; CHECK-LABEL: name: copy_sgpr_32_to_s1
; CHECK: liveins: $sgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:vcc(s1) = COPY $sgpr0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wave64 case shouldn't have identified this as vcc (but we should have rejected this in the machine verifier)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wave64, removed the checks involving copy 32b physical reg to vcc.

@@ -1,6 +1,8 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -run-pass=amdgpu-regbankselect -regbankselect-fast -verify-machineinstrs %s -o - | FileCheck %s
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -run-pass=amdgpu-regbankselect -regbankselect-greedy -verify-machineinstrs %s -o - | FileCheck %s
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=amdgpu-regbankselect -regbankselect-fast -verify-machineinstrs %s -o - | FileCheck --check-prefix=GFX10 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to name the check prefix WAVE32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

; GFX10-LABEL: name: copy_sgpr_64_and_sgpr_32_to_s1
; GFX10: liveins: $sgpr6, $sgpr4_sgpr5
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[COPY1:%[0-9]+]]:vcc(s1) = COPY $sgpr4_sgpr5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be vcc in wave32 (but should probably just be a verifier error, so I guess it doesn't matter. If we fix the lack of error the tests need splitting)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wave32, removed the checks involving copy 64b physical reg to vcc.

Comment on lines 3747 to 3748
// For COPY between a physical reg and an s1, set dst bank to VCCRegBank
// so that the copy is allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For COPY between a physical reg and an s1, set dst bank to VCCRegBank
// so that the copy is allowed.
// For physical registers, there is no type associated so we need to take the virtual register's type
// as a hint on how to interpret s1 values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jwanggit86 jwanggit86 requested a review from arsenm August 5, 2024 22:00
@jwanggit86 jwanggit86 merged commit 704f303 into llvm:main Aug 7, 2024
5 of 7 checks passed
@jwanggit86 jwanggit86 deleted the Separate-PR-for-AMDGPURegisterBankInfo branch August 7, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants