-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Check vector sizes for physical register constraints in inline asm #109955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ne asm For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables. The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed. Fixes llvm#101190.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Fabian Ritter (ritter-x2a) ChangesFor register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables. The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed. Fixes #101190. Full diff: https://github.com/llvm/llvm-project/pull/109955.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 08f2ff4566b674..eb8885577ca19d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -15500,6 +15500,10 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
Failed |= !RegName.consume_back("]");
if (!Failed) {
uint32_t Width = (End - Idx + 1) * 32;
+ // Prohibit constraints for register ranges with a width that does not
+ // match the required type.
+ if (VT.SimpleTy != MVT::Other && Width != VT.getSizeInBits())
+ return std::pair(0U, nullptr);
MCRegister Reg = RC->getRegister(Idx);
if (SIRegisterInfo::isVGPRClass(RC))
RC = TRI->getVGPRClassForBitWidth(Width);
@@ -15513,6 +15517,9 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
}
}
} else {
+ // Check for lossy scalar/vector conversions.
+ if (VT.isVector() && VT.getSizeInBits() != 32)
+ return std::pair(0U, nullptr);
bool Failed = RegName.getAsInteger(10, Idx);
if (!Failed && Idx < RC->getNumRegs())
return std::pair(RC->getRegister(Idx), RC);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
index 696cbdb75f1ed9..69567b34ae6e60 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
@@ -3,11 +3,14 @@
; RUN: FileCheck -check-prefix=ERR %s < %t
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:12]}"()' (in function: return_type_is_too_big_vector)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}"()' (in function: return_type_is_too_small_vector)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call i64 asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_scalar)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call i32 asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_scalar)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(1) asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_pointer)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_pointer)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:9]}"(<8 x i32> %arg)' (in function: use_vector_too_big)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v0}"(i64 %arg)' (in function: use_scalar_too_small)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)' (in function: use_scalar_too_big)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v0}"(ptr addrspace(1) %arg)' (in function: use_pointer_too_small)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:1]}"(ptr addrspace(3) %arg)' (in function: use_pointer_too_big)
@@ -24,18 +27,25 @@ define amdgpu_kernel void @return_type_is_too_big_vector() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr2_sgpr3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1 (%ir-block.0):
- ; CHECK-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
%sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:12]}" ()
call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
ret void
}
-; FIXME: This is crashing in the DAG
-; define amdgpu_kernel void @return_type_is_too_small_vector() {
-; %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}" ()
-; call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
-; ret void
-; }
+; This is broken because it requests 3 32-bit sgprs to handle a 4xi32 result.
+define amdgpu_kernel void @return_type_is_too_small_vector() {
+ ; CHECK-LABEL: name: return_type_is_too_small_vector
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: liveins: $sgpr2_sgpr3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr2_sgpr3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1 (%ir-block.0):
+ %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}" ()
+ call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
+ ret void
+}
define i64 @return_type_is_too_big_scalar() {
; CHECK-LABEL: name: return_type_is_too_big_scalar
@@ -50,12 +60,10 @@ define i64 @return_type_is_too_big_scalar() {
define i32 @return_type_is_too_small_scalar() {
; CHECK-LABEL: name: return_type_is_too_small_scalar
- ; CHECK: bb.1 (%ir-block.0):
- ; CHECK-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8_vgpr9
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr8_vgpr9
- ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
- ; CHECK-NEXT: $vgpr0 = COPY [[TRUNC]](s32)
- ; CHECK-NEXT: SI_RETURN implicit $vgpr0
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1 (%ir-block.0):
%reg = call i32 asm sideeffect "; def $0", "={v[8:9]}" ()
ret i32 %reg
}
@@ -77,7 +85,6 @@ define ptr addrspace(3) @return_type_is_too_small_pointer() {
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1 (%ir-block.0):
- ; CHECK-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8_vgpr9
%reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}" ()
ret ptr addrspace(3) %reg
}
@@ -141,14 +148,13 @@ define void @use_scalar_too_small(i64 %arg) {
define void @use_scalar_too_big(i32 %arg) {
; CHECK-LABEL: name: use_scalar_too_big
- ; CHECK: bb.1 (%ir-block.0):
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
- ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[COPY]](s32)
- ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[ANYEXT]](s64)
- ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 9 /* reguse */, $vgpr0_vgpr1
- ; CHECK-NEXT: SI_RETURN
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1 (%ir-block.0):
call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)
ret void
}
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll
new file mode 100644
index 00000000000000..abfb138936fea5
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll
@@ -0,0 +1,157 @@
+; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s 2>&1 | FileCheck -check-prefix=ERR %s
+
+; Diagnose register constraints that are not wide enough.
+
+; ERR: error: couldn't allocate output register for constraint '{v[8:15]}'
+define <9 x i32> @inline_asm_9xi32_in_8v_def() {
+ %asm = call <9 x i32> asm sideeffect "; def $0", "={v[8:15]}"()
+ ret <9 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v[8:15]}'
+define void @inline_asm_9xi32_in_8v_use(<9 x i32> %val) {
+ call void asm sideeffect "; use $0", "{v[8:15]}"(<9 x i32> %val)
+ ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[8:15]}'
+define <9 x i32> @inline_asm_9xi32_in_8s_def() {
+ %asm = call <9 x i32> asm sideeffect "; def $0", "={s[8:15]}"()
+ ret <9 x i32> %asm
+}
+
+
+; Diagnose register constraints that are too wide.
+
+; ERR: error: couldn't allocate output register for constraint '{v[8:16]}'
+define <8 x i32> @inline_asm_8xi32_in_9v_def() {
+ %asm = call <8 x i32> asm sideeffect "; def $0", "={v[8:16]}"()
+ ret <8 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v[8:16]}'
+define void @inline_asm_8xi32_in_9v_use(<8 x i32> %val) {
+ call void asm sideeffect "; use $0", "{v[8:16]}"(<8 x i32> %val)
+ ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[8:16]}'
+define <8 x i32> @inline_asm_8xi32_in_9s_def() {
+ %asm = call <8 x i32> asm sideeffect "; def $0", "={s[8:16]}"()
+ ret <8 x i32> %asm
+}
+
+
+; Diagnose mismatched scalars with register ranges
+
+; ERR: error: couldn't allocate output register for constraint '{s[4:5]}'
+define void @inline_asm_scalar_read_too_wide() {
+ %asm = call i32 asm sideeffect "; def $0 ", "={s[4:5]}"()
+ ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[4:4]}'
+define void @inline_asm_scalar_read_too_narrow() {
+ %asm = call i64 asm sideeffect "; def $0 ", "={s[4:4]}"()
+ ret void
+}
+
+
+; Be more lenient with single registers that are too wide for the IR type:
+
+; ERR-NOT: error
+define i16 @inline_asm_i16_in_v_def() {
+ %asm = call i16 asm sideeffect "; def $0", "={v8}"()
+ ret i16 %asm
+}
+
+; ERR-NOT: error
+define void @inline_asm_i16_in_v_use(i16 %val) {
+ call void asm sideeffect "; use $0", "{v8}"(i16 %val)
+ ret void
+}
+
+; ERR-NOT: error
+define i16 @inline_asm_i16_in_s_def() {
+ %asm = call i16 asm sideeffect "; def $0", "={s8}"()
+ ret i16 %asm
+}
+
+; ERR-NOT: error
+define i8 @inline_asm_i8_in_v_def() {
+ %asm = call i8 asm sideeffect "; def $0", "={v8}"()
+ ret i8 %asm
+}
+
+; ERR-NOT: error
+define void @inline_asm_i8_in_v_use(i8 %val) {
+ call void asm sideeffect "; use $0", "{v8}"(i8 %val)
+ ret void
+}
+
+; ERR-NOT: error
+define i8 @inline_asm_i8_in_s_def() {
+ %asm = call i8 asm sideeffect "; def $0", "={s8}"()
+ ret i8 %asm
+}
+
+
+; Single registers for vector types that are too wide or too narrow should be
+; diagnosed.
+
+; ERR: error: couldn't allocate input reg for constraint '{v8}'
+define void @inline_asm_4xi32_in_v_use(<4 x i32> %val) {
+ call void asm sideeffect "; use $0", "{v8}"(<4 x i32> %val)
+ ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{v8}'
+define <4 x i32> @inline_asm_4xi32_in_v_def() {
+ %asm = call <4 x i32> asm sideeffect "; def $0", "={v8}"()
+ ret <4 x i32> %asm
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s8}'
+define <4 x i32> @inline_asm_4xi32_in_s_def() {
+ %asm = call <4 x i32> asm sideeffect "; def $0", "={s8}"()
+ ret <4 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v8}'
+define void @inline_asm_2xi8_in_v_use(<2 x i8> %val) {
+ call void asm sideeffect "; use $0", "{v8}"(<2 x i8> %val)
+ ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{v8}'
+define <2 x i8> @inline_asm_2xi8_in_v_def() {
+ %asm = call <2 x i8> asm sideeffect "; def $0", "={v8}"()
+ ret <2 x i8> %asm
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s8}'
+define <2 x i8> @inline_asm_2xi8_in_s_def() {
+ %asm = call <2 x i8> asm sideeffect "; def $0", "={s8}"()
+ ret <2 x i8> %asm
+}
+
+
+; Single registers for vector types that fit are fine.
+
+; ERR-NOT: error
+define void @inline_asm_2xi16_in_v_use(<2 x i16> %val) {
+ call void asm sideeffect "; use $0", "{v8}"(<2 x i16> %val)
+ ret void
+}
+
+; ERR-NOT: error
+define <2 x i16> @inline_asm_2xi16_in_v_def() {
+ %asm = call <2 x i16> asm sideeffect "; def $0", "={v8}"()
+ ret <2 x i16> %asm
+}
+
+; ERR-NOT: error
+define <2 x i16> @inline_asm_2xi16_in_s_def() {
+ %asm = call <2 x i16> asm sideeffect "; def $0", "={s8}"()
+ ret <2 x i16> %asm
+}
|
|
||
; ERR-NOT: error | ||
define void @inline_asm_i8_in_v_use(i8 %val) { | ||
call void asm sideeffect "; use $0", "{v8}"(i8 %val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also test these cases with a virtual register use, all of these use physical registers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that in 0611cf6. i8s however don't work properly, we don't find a matching register for them. If a specific physical register is requested, we can generate code, but it seems suspicious that i8 arguments (in contrast to i16 arguments) are not and-ed with 0xff before being passed into the ASM.
Since i8s don't seem to be handled in the code so far, should we just error out on them as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should fix i8 handling. @jhuber6 had issues in libc where generic code seems to assume any target can deal with them by promoting to int (IIRC I checked AArch64, which does not have legal i8 but does handle i8 asm operands)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(1) asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_pointer) | ||
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_pointer) | ||
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:9]}"(<8 x i32> %arg)' (in function: use_vector_too_big) | ||
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v0}"(i64 %arg)' (in function: use_scalar_too_small) | ||
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)' (in function: use_scalar_too_big) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this patch's problem, but ideally we would just directly error instead of falling back to let the DAG hit a different error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that would be better.
define <9 x i32> @inline_asm_9xi32_in_8v_def() { | ||
%asm = call <9 x i32> asm sideeffect "; def $0", "={v[8:15]}"() | ||
ret <9 x i32> %asm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases aren't testing the virtual case? Probably should just have the physical case and virtual case in the same function to keep them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the cases here, where I didn't add tests with plain v
or s
constraint (that is the virtual case you mean, right?), there is no mismatched size. We have register classes for ranges of 9 and 8 registers defined, so that is what the compiler uses for these cases.
Would you like me to add these cases to the non-error test, or is there a way of requesting exactly 8 or 9 virtual registers that I'm not aware off?
Co-authored-by: Matt Arsenault <[email protected]>
…r.ll Merge test functions for physical and virtual register constraints
|
||
; ERR-NOT: error | ||
define void @inline_asm_i8_in_v_use(i8 %val) { | ||
call void asm sideeffect "; use $0", "{v8}"(i8 %val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should fix i8 handling. @jhuber6 had issues in libc where generic code seems to assume any target can deal with them by promoting to int (IIRC I checked AArch64, which does not have legal i8 but does handle i8 asm operands)
…ne asm (llvm#109955) For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables. The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed. Fixes llvm#101190. --------- Co-authored-by: Matt Arsenault <[email protected]>
For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables.
The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed.
Fixes #101190.