-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected #115200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected #115200
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Jun Wang (jwanggit86) ChangesFor GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem. Full diff: https://github.com/llvm/llvm-project/pull/115200.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 758f864fd20e6a..4403c033196686 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -9720,6 +9720,12 @@ unsigned AMDGPUAsmParser::validateTargetOperandClass(MCParsedAsmOperand &Op,
// The following code enables it for SReg_64 operands
// used as source and destination. Remaining source
// operands are handled in isInlinableImm.
+ //
+ // Additionally, allow null where destination of 128-bit or larger is
+ // expected.
+ case MCK_SReg_128:
+ case MCK_SReg_256:
+ case MCK_SReg_512:
return Operand.isNull() ? Match_Success : Match_InvalidOperand;
default:
return Match_InvalidOperand;
diff --git a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
index b582de83a2f291..683a0195037cf5 100644
--- a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
@@ -281,6 +281,22 @@ s_load_dwordx16 s[20:35], s[2:3], 0x1234 glc dlc
s_load_dwordx16 s[20:35], s[2:3], s0 offset:0x12345 glc dlc
// GFX10: encoding: [0x01,0x45,0x11,0xf4,0x45,0x23,0x01,0x00]
+// null as dst
+s_load_dword null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx2 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx4 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx8 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx16 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_dword s5, s[4:7], s0
// GFX10: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
diff --git a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
index 1d6b9476090758..e071c67f85891b 100644
--- a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
@@ -239,6 +239,22 @@ s_load_b512 s[20:35], s[2:3], s0 glc dlc
s_load_b512 s[20:35], s[2:3], 0x1234 glc dlc
// GFX11: encoding: [0x01,0x65,0x10,0xf4,0x34,0x12,0x00,0xf8]
+// null as dst
+s_load_b32 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_b32 s5, s[4:7], s0
// GFX11: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
index 668f767661f682..80082894b39fe1 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
@@ -541,6 +541,22 @@ s_load_b512 s[20:35], s[2:3], m0
s_load_b512 s[20:35], s[2:3], 0x0
// GFX12: s_load_b512 s[20:35], s[2:3], 0x0 ; encoding: [0x01,0x85,0x00,0xf4,0x00,0x00,0x00,0xf8]
+// null as dst
+s_load_b32 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x3f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x5f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x7f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x9f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_b32 s5, s[4:7], s0
// GFX12: s_buffer_load_b32 s5, s[4:7], s0 offset:0x0 ; encoding: [0x42,0x01,0x02,0xf4,0x00,0x00,0x00,0x00]
|
@llvm/pr-subscribers-mc Author: Jun Wang (jwanggit86) ChangesFor GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem. Full diff: https://github.com/llvm/llvm-project/pull/115200.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 758f864fd20e6a..4403c033196686 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -9720,6 +9720,12 @@ unsigned AMDGPUAsmParser::validateTargetOperandClass(MCParsedAsmOperand &Op,
// The following code enables it for SReg_64 operands
// used as source and destination. Remaining source
// operands are handled in isInlinableImm.
+ //
+ // Additionally, allow null where destination of 128-bit or larger is
+ // expected.
+ case MCK_SReg_128:
+ case MCK_SReg_256:
+ case MCK_SReg_512:
return Operand.isNull() ? Match_Success : Match_InvalidOperand;
default:
return Match_InvalidOperand;
diff --git a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
index b582de83a2f291..683a0195037cf5 100644
--- a/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx10_asm_smem.s
@@ -281,6 +281,22 @@ s_load_dwordx16 s[20:35], s[2:3], 0x1234 glc dlc
s_load_dwordx16 s[20:35], s[2:3], s0 offset:0x12345 glc dlc
// GFX10: encoding: [0x01,0x45,0x11,0xf4,0x45,0x23,0x01,0x00]
+// null as dst
+s_load_dword null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx2 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx4 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx8 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_dwordx16 null, s[2:3], s0
+// GFX10: encoding: [0x41,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_dword s5, s[4:7], s0
// GFX10: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
diff --git a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
index 1d6b9476090758..e071c67f85891b 100644
--- a/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx11_asm_smem.s
@@ -239,6 +239,22 @@ s_load_b512 s[20:35], s[2:3], s0 glc dlc
s_load_b512 s[20:35], s[2:3], 0x1234 glc dlc
// GFX11: encoding: [0x01,0x65,0x10,0xf4,0x34,0x12,0x00,0xf8]
+// null as dst
+s_load_b32 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x08,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x0c,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0
+// GFX11: encoding: [0x01,0x1f,0x10,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_b32 s5, s[4:7], s0
// GFX11: encoding: [0x42,0x01,0x20,0xf4,0x00,0x00,0x00,0x00]
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
index 668f767661f682..80082894b39fe1 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_smem.s
@@ -541,6 +541,22 @@ s_load_b512 s[20:35], s[2:3], m0
s_load_b512 s[20:35], s[2:3], 0x0
// GFX12: s_load_b512 s[20:35], s[2:3], 0x0 ; encoding: [0x01,0x85,0x00,0xf4,0x00,0x00,0x00,0xf8]
+// null as dst
+s_load_b32 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x1f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b64 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x3f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b128 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x5f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b256 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x7f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
+s_load_b512 null, s[2:3], s0 offset:0x0
+// GFX12: encoding: [0x01,0x9f,0x00,0xf4,0x00,0x00,0x00,0x00]
+
s_buffer_load_b32 s5, s[4:7], s0
// GFX12: s_buffer_load_b32 s5, s[4:7], s0 offset:0x0 ; encoding: [0x42,0x01,0x02,0xf4,0x00,0x00,0x00,0x00]
|
@arsenm @jayfoad @mbrkusanin ping. |
// Additionally, allow null where destination of 128-bit or larger is | ||
// expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can merge with the above. It doesn't need to mention specific sizes, it "may be used anywhere scalar sources can normally be used"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
s_load_dwordx2 null, s[2:3], s0 | ||
// GFX10: encoding: [0x41,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the dwordx3 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s_load_dwordx3 is not a valid instruction.
|
||
s_load_b64 null, s[2:3], s0 | ||
// GFX11: encoding: [0x01,0x1f,0x04,0xf4,0x00,0x00,0x00,0x00] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the b96 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b96 is not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's valid for for GFX12, so can you test it there?
// | ||
// Additionally, allow null where destination of 128-bit or larger is | ||
// expected. | ||
case MCK_SReg_128: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the 96-bit case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual also states "NULL may not be used as an S#, V# or T#."
Can you add some mimg and mubuf/mtbuf cases? I don't think we can tell from just the operand that the use context is one of these
a3953fc
to
b2e6a18
Compare
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, thanks.
DECODE_OPERAND_REG_7(SReg_512, OPW512) | ||
DECODE_OPERAND_REG_7(SReg_512_XNULL, OPW512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in the latest commit. Thanks!
if hasNull then { | ||
def SReg_ # suffix : | ||
SIRegisterClass<"AMDGPU", regTypes, 32, | ||
!dag(add, [!cast<RegisterClass>("SReg_" # suffix # "_XNULL"), !cast<Register>("SGPR_NULL" # suffix)], ["", ""])> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the ["", ""]
for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole thing is to use the !dag operator to create a dag. The 2nd and 3rd arguments are lists with the same length. The 2nd list is a list of values while the 3rd the corresponding names. It seems that the names are not important. That's why they are blank. But in the latest commit, I replaced the two blank names with "RegClass" and "NullReg" respectively.
Using 128b as an example, this creates the dag (add SReg_128:"RegClass", SReg_128_XNULL:"NullReg").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but I don't see why you need to use !dag
. You can write it literally as (add !cast<RegisterClass>("SReg_" # suffix # "_XNULL"), !cast<Register>("SGPR_NULL" # suffix))
.
@@ -809,6 +809,10 @@ def SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16, | |||
let BaseClassOrder = 32; | |||
} | |||
|
|||
def SGPR_NULL128 : SIReg<"null">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is there a good reason why this definition is so different from SGPR_NULL64? Maybe they can be commoned up a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually not very sure if this def is correct. SGPR_NULL64 is defined with 2 sub-regs. I wasn't sure if SGPR_128NULL should be defined similarly, and if so, what the sub-regs should be. On the other hand, changing SGPR_NULL64 to def SGPR_NULL64: SIReg<"null">
would fail a lot of tests, possibly because SReg_64 is used when it should really be SReg_64_XNULL. Any suggestions are appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why (or if) SGPR_NULL64 needs to have subregs. Maybe something to investigate later.
multiclass SRegClass<int numRegs, | ||
list<ValueType> regTypes, | ||
SIRegisterTuples regList, | ||
SIRegisterTuples ttmpList = regList, | ||
bit hasNull = 0, | ||
int copyCost = !sra(!add(numRegs, 1), 1)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really related to your patch, but copyCost could be a defvar if no-one is setting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I think I'll leave it as is unless you think it should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave it as-is in this patch.
For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem.
some cases but disallow it in others where SReg128 or larger is expected.
6ff2e7c
to
5fcae4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Following on from llvm#115200, disallow the null sgpr as a resource operand in some instructions that were missed.
Following on from #115200, disallow the null sgpr as a resource operand in some instructions that were missed.
…m#115200) For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#. Change-Id: I899493529f8d86d0426c177326ee1b20e3dd6c23
Following on from llvm#115200, disallow the null sgpr as a resource operand in some instructions that were missed. Change-Id: I5068403999a04613b3826801bb2f4a1d55b2e9e7
For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.