Skip to content

[AMDGPU][MC] Allow null where 128b or larger dst reg is expected #115200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jwanggit86
Copy link
Contributor

@jwanggit86 jwanggit86 commented Nov 6, 2024

For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.

@jwanggit86 jwanggit86 added backend:AMDGPU mc Machine (object) code labels Nov 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)

Changes

For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem.


Full diff: https://github.com/llvm/llvm-project/pull/115200.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+6)
  • (modified) llvm/test/MC/AMDGPU/gfx10_asm_smem.s (+16)
  • (modified) llvm/test/MC/AMDGPU/gfx11_asm_smem.s (+16)
  • (modified) llvm/test/MC/AMDGPU/gfx12_asm_smem.s (+16)
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 758f864fd20e6a..4403c033196686 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -9720,6 +9720,12 @@ unsigned AMDGPUAsmParser::validateTargetOperandClass(MCParsedAsmOperand &Op,
     // The following code enables it for SReg_64 operands
     // used as source and destination. Remaining source
     // operands are handled in isInlinableImm.
+    //
+    // Additionally, allow null where destination of 128-bit or larger is
+    // expected.
+  case MCK_SReg_128:
+  case MCK_SReg_256:
+  case MCK_SReg_512:
     return Operand.isNull() ? Match_Success : Match_InvalidOperand;
   default:
     return Match_InvalidOperand;
diff --git a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
index b582de83a2f291..683a0195037cf5 100644
--- a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
@@ -281,6 +281,22 @@ s_load_dwordx16 s[20:35], s[2:3], 0x1234 glc dlc
 s_load_dwordx16 s[20:35], s[2:3], s0 offset:0x12345 glc dlc
 // GFX10: encoding: [0x01,0x45,0x11,0xf4,0x45,0x23,0x01,0x00]
 
+// null as dst
+s_load_dword null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx2 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx4 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx8 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx16 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_dword s5, s[4:7], s0
 // GFX10: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
 
diff --git a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
index 1d6b9476090758..e071c67f85891b 100644
--- a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
@@ -239,6 +239,22 @@ s_load_b512 s[20:35], s[2:3], s0 glc dlc
 s_load_b512 s[20:35], s[2:3], 0x1234 glc dlc
 // GFX11: encoding: [0x01,0x65,0x10,0xf4,0x34,0x12,0x00,0xf8]
 
+// null as dst
+s_load_b32 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_b32 s5, s[4:7], s0
 // GFX11: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
 
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
index 668f767661f682..80082894b39fe1 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
@@ -541,6 +541,22 @@ s_load_b512 s[20:35], s[2:3], m0
 s_load_b512 s[20:35], s[2:3], 0x0
 // GFX12: s_load_b512 s[20:35], s[2:3], 0x0       ; encoding: [0x01,0x85,0x00,0xf4,0x00,0x00,0x00,0xf8]
 
+// null as dst
+s_load_b32 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x3f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x5f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x7f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x9f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_b32 s5, s[4:7], s0
 // GFX12: s_buffer_load_b32 s5, s[4:7], s0 offset:0x0 ; encoding: [0x42,0x01,0x02,0xf4,0x00,0x00,0x00,0x00]
 

@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/pr-subscribers-mc

Author: Jun Wang (jwanggit86)

Changes

For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem.


Full diff: https://github.com/llvm/llvm-project/pull/115200.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+6)
  • (modified) llvm/test/MC/AMDGPU/gfx10_asm_smem.s (+16)
  • (modified) llvm/test/MC/AMDGPU/gfx11_asm_smem.s (+16)
  • (modified) llvm/test/MC/AMDGPU/gfx12_asm_smem.s (+16)
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 758f864fd20e6a..4403c033196686 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -9720,6 +9720,12 @@ unsigned AMDGPUAsmParser::validateTargetOperandClass(MCParsedAsmOperand &Op,
     // The following code enables it for SReg_64 operands
     // used as source and destination. Remaining source
     // operands are handled in isInlinableImm.
+    //
+    // Additionally, allow null where destination of 128-bit or larger is
+    // expected.
+  case MCK_SReg_128:
+  case MCK_SReg_256:
+  case MCK_SReg_512:
     return Operand.isNull() ? Match_Success : Match_InvalidOperand;
   default:
     return Match_InvalidOperand;
diff --git a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
index b582de83a2f291..683a0195037cf5 100644
--- a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
@@ -281,6 +281,22 @@ s_load_dwordx16 s[20:35], s[2:3], 0x1234 glc dlc
 s_load_dwordx16 s[20:35], s[2:3], s0 offset:0x12345 glc dlc
 // GFX10: encoding: [0x01,0x45,0x11,0xf4,0x45,0x23,0x01,0x00]
 
+// null as dst
+s_load_dword null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx2 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx4 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx8 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx16 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_dword s5, s[4:7], s0
 // GFX10: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
 
diff --git a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
index 1d6b9476090758..e071c67f85891b 100644
--- a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
@@ -239,6 +239,22 @@ s_load_b512 s[20:35], s[2:3], s0 glc dlc
 s_load_b512 s[20:35], s[2:3], 0x1234 glc dlc
 // GFX11: encoding: [0x01,0x65,0x10,0xf4,0x34,0x12,0x00,0xf8]
 
+// null as dst
+s_load_b32 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_b32 s5, s[4:7], s0
 // GFX11: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
 
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
index 668f767661f682..80082894b39fe1 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
@@ -541,6 +541,22 @@ s_load_b512 s[20:35], s[2:3], m0
 s_load_b512 s[20:35], s[2:3], 0x0
 // GFX12: s_load_b512 s[20:35], s[2:3], 0x0       ; encoding: [0x01,0x85,0x00,0xf4,0x00,0x00,0x00,0xf8]
 
+// null as dst
+s_load_b32 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x3f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x5f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x7f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x9f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
 s_buffer_load_b32 s5, s[4:7], s0
 // GFX12: s_buffer_load_b32 s5, s[4:7], s0 offset:0x0 ; encoding: [0x42,0x01,0x02,0xf4,0x00,0x00,0x00,0x00]
 

@jwanggit86
Copy link
Contributor Author

@arsenm @jayfoad @mbrkusanin ping.

@jwanggit86 jwanggit86 requested a review from dpreobra November 21, 2024 02:57
Comment on lines 9724 to 9725
// Additionally, allow null where destination of 128-bit or larger is
// expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can merge with the above. It doesn't need to mention specific sizes, it "may be used anywhere scalar sources can normally be used"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


s_load_dwordx2 null, s[2:3], s0
// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the dwordx3 case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_load_dwordx3 is not a valid instruction.


s_load_b64 null, s[2:3], s0
// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the b96 case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b96 is not valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid for for GFX12, so can you test it there?

//
// Additionally, allow null where destination of 128-bit or larger is
// expected.
case MCK_SReg_128:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the 96-bit case

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual also states "NULL may not be used as an S#, V# or T#."

Can you add some mimg and mubuf/mtbuf cases? I don't think we can tell from just the operand that the use context is one of these

@jwanggit86 jwanggit86 force-pushed the allow-null-where-sreg-128-plus-is-expected branch from a3953fc to b2e6a18 Compare December 9, 2024 22:56
@jwanggit86
Copy link
Contributor Author

Manual also states "NULL may not be used as an S#, V# or T#."

Can you add some mimg and mubuf/mtbuf cases? I don't think we can tell from just the operand that the use context is one of these

Done.

@jwanggit86 jwanggit86 requested a review from arsenm December 10, 2024 23:04
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, thanks.

DECODE_OPERAND_REG_7(SReg_512, OPW512)
DECODE_OPERAND_REG_7(SReg_512_XNULL, OPW512)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the latest commit. Thanks!

if hasNull then {
def SReg_ # suffix :
SIRegisterClass<"AMDGPU", regTypes, 32,
!dag(add, [!cast<RegisterClass>("SReg_" # suffix # "_XNULL"), !cast<Register>("SGPR_NULL" # suffix)], ["", ""])> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the ["", ""] for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole thing is to use the !dag operator to create a dag. The 2nd and 3rd arguments are lists with the same length. The 2nd list is a list of values while the 3rd the corresponding names. It seems that the names are not important. That's why they are blank. But in the latest commit, I replaced the two blank names with "RegClass" and "NullReg" respectively.

Using 128b as an example, this creates the dag (add SReg_128:"RegClass", SReg_128_XNULL:"NullReg").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK but I don't see why you need to use !dag. You can write it literally as (add !cast<RegisterClass>("SReg_" # suffix # "_XNULL"), !cast<Register>("SGPR_NULL" # suffix)).

@@ -809,6 +809,10 @@ def SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16,
let BaseClassOrder = 32;
}

def SGPR_NULL128 : SIReg<"null">;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is there a good reason why this definition is so different from SGPR_NULL64? Maybe they can be commoned up a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually not very sure if this def is correct. SGPR_NULL64 is defined with 2 sub-regs. I wasn't sure if SGPR_128NULL should be defined similarly, and if so, what the sub-regs should be. On the other hand, changing SGPR_NULL64 to def SGPR_NULL64: SIReg<"null"> would fail a lot of tests, possibly because SReg_64 is used when it should really be SReg_64_XNULL. Any suggestions are appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why (or if) SGPR_NULL64 needs to have subregs. Maybe something to investigate later.

multiclass SRegClass<int numRegs,
list<ValueType> regTypes,
SIRegisterTuples regList,
SIRegisterTuples ttmpList = regList,
bit hasNull = 0,
int copyCost = !sra(!add(numRegs, 1), 1)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related to your patch, but copyCost could be a defvar if no-one is setting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think I'll leave it as is unless you think it should be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave it as-is in this patch.

For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem.
some cases but disallow it in others where SReg128 or larger is
expected.
@jwanggit86 jwanggit86 force-pushed the allow-null-where-sreg-128-plus-is-expected branch from 6ff2e7c to 5fcae4a Compare December 19, 2024 01:34
@jwanggit86 jwanggit86 requested a review from jayfoad December 20, 2024 03:38
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jwanggit86 jwanggit86 merged commit b2adeae into llvm:main Jan 3, 2025
8 checks passed
jayfoad added a commit to jayfoad/llvm-project that referenced this pull request Jan 7, 2025
Following on from llvm#115200, disallow the null sgpr as a resource operand
in some instructions that were missed.
jayfoad added a commit that referenced this pull request Jan 8, 2025
Following on from #115200, disallow the null sgpr as a resource operand
in some instructions that were missed.
abidh pushed a commit to abidh/llvm-project that referenced this pull request Feb 4, 2025
…m#115200)

For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.

Change-Id: I899493529f8d86d0426c177326ee1b20e3dd6c23
abidh pushed a commit to abidh/llvm-project that referenced this pull request Feb 4, 2025
Following on from llvm#115200, disallow the null sgpr as a resource operand
in some instructions that were missed.

Change-Id: I5068403999a04613b3826801bb2f4a1d55b2e9e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU][MC][GFX10] null cannot be used with 128 bit and wider dst operands
4 participants