Skip to content

[AMDGPU] Check vector sizes for physical register constraints in inline asm #109955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 1, 2024

Conversation

ritter-x2a
Copy link
Member

For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables.

The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed.

Fixes #101190.

…ne asm

For register constraints that require specific register ranges, the width of
the range should match the type of the associated parameter/return value. With
this PR, we error out when that is not the case. Previously, these cases would
hit assertions or llvm_unreachables.

The handling of register constraints that require only a single register
remains more lenient to allow narrower non-vector types for the associated IR
values. For example, constraining an i16 or i8 value to a 32-bit register is
still allowed.

Fixes llvm#101190.
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Fabian Ritter (ritter-x2a)

Changes

For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables.

The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed.

Fixes #101190.


Full diff: https://github.com/llvm/llvm-project/pull/109955.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll (+25-19)
  • (added) llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll (+157)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 08f2ff4566b674..eb8885577ca19d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -15500,6 +15500,10 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
         Failed |= !RegName.consume_back("]");
         if (!Failed) {
           uint32_t Width = (End - Idx + 1) * 32;
+          // Prohibit constraints for register ranges with a width that does not
+          // match the required type.
+          if (VT.SimpleTy != MVT::Other && Width != VT.getSizeInBits())
+            return std::pair(0U, nullptr);
           MCRegister Reg = RC->getRegister(Idx);
           if (SIRegisterInfo::isVGPRClass(RC))
             RC = TRI->getVGPRClassForBitWidth(Width);
@@ -15513,6 +15517,9 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
           }
         }
       } else {
+        // Check for lossy scalar/vector conversions.
+        if (VT.isVector() && VT.getSizeInBits() != 32)
+          return std::pair(0U, nullptr);
         bool Failed = RegName.getAsInteger(10, Idx);
         if (!Failed && Idx < RC->getNumRegs())
           return std::pair(RC->getRegister(Idx), RC);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
index 696cbdb75f1ed9..69567b34ae6e60 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
@@ -3,11 +3,14 @@
 ; RUN: FileCheck -check-prefix=ERR %s < %t
 
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:12]}"()' (in function: return_type_is_too_big_vector)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}"()' (in function: return_type_is_too_small_vector)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %reg = call i64 asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_scalar)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %reg = call i32 asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_scalar)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %reg = call ptr addrspace(1) asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_pointer)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_pointer)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  call void asm sideeffect "; use $0", "{v[0:9]}"(<8 x i32> %arg)' (in function: use_vector_too_big)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  call void asm sideeffect "; use $0", "{v0}"(i64 %arg)' (in function: use_scalar_too_small)
+; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)' (in function: use_scalar_too_big)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  call void asm sideeffect "; use $0", "{v0}"(ptr addrspace(1) %arg)' (in function: use_pointer_too_small)
 ; ERR: remark: <unknown>:0:0: unable to translate instruction: call: '  call void asm sideeffect "; use $0", "{v[0:1]}"(ptr addrspace(3) %arg)' (in function: use_pointer_too_big)
 
@@ -24,18 +27,25 @@ define amdgpu_kernel void @return_type_is_too_big_vector() {
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr2_sgpr3
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
   %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:12]}" ()
   call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
   ret void
 }
 
-; FIXME: This is crashing in the DAG
-; define amdgpu_kernel void @return_type_is_too_small_vector() {
-;   %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}" ()
-;   call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
-;   ret void
-; }
+; This is broken because it requests 3 32-bit sgprs to handle a 4xi32 result.
+define amdgpu_kernel void @return_type_is_too_small_vector() {
+  ; CHECK-LABEL: name: return_type_is_too_small_vector
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $sgpr2_sgpr3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr2_sgpr3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1 (%ir-block.0):
+  %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:10]}" ()
+  call void asm sideeffect "; use $0", "s"(<4 x i32> %sgpr) #0
+  ret void
+}
 
 define i64 @return_type_is_too_big_scalar() {
   ; CHECK-LABEL: name: return_type_is_too_big_scalar
@@ -50,12 +60,10 @@ define i64 @return_type_is_too_big_scalar() {
 
 define i32 @return_type_is_too_small_scalar() {
   ; CHECK-LABEL: name: return_type_is_too_small_scalar
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8_vgpr9
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr8_vgpr9
-  ; CHECK-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[TRUNC]](s32)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1 (%ir-block.0):
   %reg = call i32 asm sideeffect "; def $0", "={v[8:9]}" ()
   ret i32 %reg
 }
@@ -77,7 +85,6 @@ define ptr addrspace(3) @return_type_is_too_small_pointer() {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8_vgpr9
   %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}" ()
   ret ptr addrspace(3) %reg
 }
@@ -141,14 +148,13 @@ define void @use_scalar_too_small(i64 %arg) {
 
 define void @use_scalar_too_big(i32 %arg) {
   ; CHECK-LABEL: name: use_scalar_too_big
-  ; CHECK: bb.1 (%ir-block.0):
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $vgpr0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-  ; CHECK-NEXT:   [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[COPY]](s32)
-  ; CHECK-NEXT:   $vgpr0_vgpr1 = COPY [[ANYEXT]](s64)
-  ; CHECK-NEXT:   INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 9 /* reguse */, $vgpr0_vgpr1
-  ; CHECK-NEXT:   SI_RETURN
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1 (%ir-block.0):
   call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)
   ret void
 }
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll
new file mode 100644
index 00000000000000..abfb138936fea5
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size.ll
@@ -0,0 +1,157 @@
+; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s 2>&1 | FileCheck -check-prefix=ERR %s
+
+; Diagnose register constraints that are not wide enough.
+
+; ERR: error: couldn't allocate output register for constraint '{v[8:15]}'
+define <9 x i32> @inline_asm_9xi32_in_8v_def() {
+  %asm = call <9 x i32> asm sideeffect "; def $0", "={v[8:15]}"()
+  ret <9 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v[8:15]}'
+define void @inline_asm_9xi32_in_8v_use(<9 x i32> %val) {
+  call void asm sideeffect "; use $0", "{v[8:15]}"(<9 x i32> %val)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[8:15]}'
+define <9 x i32> @inline_asm_9xi32_in_8s_def() {
+  %asm = call <9 x i32> asm sideeffect "; def $0", "={s[8:15]}"()
+  ret <9 x i32> %asm
+}
+
+
+; Diagnose register constraints that are too wide.
+
+; ERR: error: couldn't allocate output register for constraint '{v[8:16]}'
+define <8 x i32> @inline_asm_8xi32_in_9v_def() {
+  %asm = call <8 x i32> asm sideeffect "; def $0", "={v[8:16]}"()
+  ret <8 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v[8:16]}'
+define void @inline_asm_8xi32_in_9v_use(<8 x i32> %val) {
+  call void asm sideeffect "; use $0", "{v[8:16]}"(<8 x i32> %val)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[8:16]}'
+define <8 x i32> @inline_asm_8xi32_in_9s_def() {
+  %asm = call <8 x i32> asm sideeffect "; def $0", "={s[8:16]}"()
+  ret <8 x i32> %asm
+}
+
+
+; Diagnose mismatched scalars with register ranges
+
+; ERR: error: couldn't allocate output register for constraint '{s[4:5]}'
+define void @inline_asm_scalar_read_too_wide() {
+  %asm = call i32 asm sideeffect "; def $0 ", "={s[4:5]}"()
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s[4:4]}'
+define void @inline_asm_scalar_read_too_narrow() {
+  %asm = call i64 asm sideeffect "; def $0 ", "={s[4:4]}"()
+  ret void
+}
+
+
+; Be more lenient with single registers that are too wide for the IR type:
+
+; ERR-NOT: error
+define i16 @inline_asm_i16_in_v_def() {
+  %asm = call i16 asm sideeffect "; def $0", "={v8}"()
+  ret i16 %asm
+}
+
+; ERR-NOT: error
+define void @inline_asm_i16_in_v_use(i16 %val) {
+  call void asm sideeffect "; use $0", "{v8}"(i16 %val)
+  ret void
+}
+
+; ERR-NOT: error
+define i16 @inline_asm_i16_in_s_def() {
+  %asm = call i16 asm sideeffect "; def $0", "={s8}"()
+  ret i16 %asm
+}
+
+; ERR-NOT: error
+define i8 @inline_asm_i8_in_v_def() {
+  %asm = call i8 asm sideeffect "; def $0", "={v8}"()
+  ret i8 %asm
+}
+
+; ERR-NOT: error
+define void @inline_asm_i8_in_v_use(i8 %val) {
+  call void asm sideeffect "; use $0", "{v8}"(i8 %val)
+  ret void
+}
+
+; ERR-NOT: error
+define i8 @inline_asm_i8_in_s_def() {
+  %asm = call i8 asm sideeffect "; def $0", "={s8}"()
+  ret i8 %asm
+}
+
+
+; Single registers for vector types that are too wide or too narrow should be
+; diagnosed.
+
+; ERR: error: couldn't allocate input reg for constraint '{v8}'
+define void @inline_asm_4xi32_in_v_use(<4 x i32> %val) {
+  call void asm sideeffect "; use $0", "{v8}"(<4 x i32> %val)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{v8}'
+define <4 x i32> @inline_asm_4xi32_in_v_def() {
+  %asm = call <4 x i32> asm sideeffect "; def $0", "={v8}"()
+  ret <4 x i32> %asm
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s8}'
+define <4 x i32> @inline_asm_4xi32_in_s_def() {
+  %asm = call <4 x i32> asm sideeffect "; def $0", "={s8}"()
+  ret <4 x i32> %asm
+}
+
+; ERR: error: couldn't allocate input reg for constraint '{v8}'
+define void @inline_asm_2xi8_in_v_use(<2 x i8> %val) {
+  call void asm sideeffect "; use $0", "{v8}"(<2 x i8> %val)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint '{v8}'
+define <2 x i8> @inline_asm_2xi8_in_v_def() {
+  %asm = call <2 x i8> asm sideeffect "; def $0", "={v8}"()
+  ret <2 x i8> %asm
+}
+
+; ERR: error: couldn't allocate output register for constraint '{s8}'
+define <2 x i8> @inline_asm_2xi8_in_s_def() {
+  %asm = call <2 x i8> asm sideeffect "; def $0", "={s8}"()
+  ret <2 x i8> %asm
+}
+
+
+; Single registers for vector types that fit are fine.
+
+; ERR-NOT: error
+define void @inline_asm_2xi16_in_v_use(<2 x i16> %val) {
+  call void asm sideeffect "; use $0", "{v8}"(<2 x i16> %val)
+  ret void
+}
+
+; ERR-NOT: error
+define <2 x i16> @inline_asm_2xi16_in_v_def() {
+  %asm = call <2 x i16> asm sideeffect "; def $0", "={v8}"()
+  ret <2 x i16> %asm
+}
+
+; ERR-NOT: error
+define <2 x i16> @inline_asm_2xi16_in_s_def() {
+  %asm = call <2 x i16> asm sideeffect "; def $0", "={s8}"()
+  ret <2 x i16> %asm
+}


; ERR-NOT: error
define void @inline_asm_i8_in_v_use(i8 %val) {
call void asm sideeffect "; use $0", "{v8}"(i8 %val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also test these cases with a virtual register use, all of these use physical registers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in 0611cf6. i8s however don't work properly, we don't find a matching register for them. If a specific physical register is requested, we can generate code, but it seems suspicious that i8 arguments (in contrast to i16 arguments) are not and-ed with 0xff before being passed into the ASM.
Since i8s don't seem to be handled in the code so far, should we just error out on them as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should fix i8 handling. @jhuber6 had issues in libc where generic code seems to assume any target can deal with them by promoting to int (IIRC I checked AArch64, which does not have legal i8 but does handle i8 asm operands)

; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(1) asm sideeffect "; def $0", "={v8}"()' (in function: return_type_is_too_big_pointer)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}"()' (in function: return_type_is_too_small_pointer)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:9]}"(<8 x i32> %arg)' (in function: use_vector_too_big)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v0}"(i64 %arg)' (in function: use_scalar_too_small)
; ERR: remark: <unknown>:0:0: unable to translate instruction: call: ' call void asm sideeffect "; use $0", "{v[0:1]}"(i32 %arg)' (in function: use_scalar_too_big)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this patch's problem, but ideally we would just directly error instead of falling back to let the DAG hit a different error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that would be better.

Comment on lines +6 to +8
define <9 x i32> @inline_asm_9xi32_in_8v_def() {
%asm = call <9 x i32> asm sideeffect "; def $0", "={v[8:15]}"()
ret <9 x i32> %asm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases aren't testing the virtual case? Probably should just have the physical case and virtual case in the same function to keep them together

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cases here, where I didn't add tests with plain v or s constraint (that is the virtual case you mean, right?), there is no mismatched size. We have register classes for ranges of 9 and 8 registers defined, so that is what the compiler uses for these cases.
Would you like me to add these cases to the non-error test, or is there a way of requesting exactly 8 or 9 virtual registers that I'm not aware off?

ritter-x2a and others added 2 commits September 26, 2024 08:41
…r.ll

Merge test functions for physical and virtual register constraints

; ERR-NOT: error
define void @inline_asm_i8_in_v_use(i8 %val) {
call void asm sideeffect "; use $0", "{v8}"(i8 %val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should fix i8 handling. @jhuber6 had issues in libc where generic code seems to assume any target can deal with them by promoting to int (IIRC I checked AArch64, which does not have legal i8 but does handle i8 asm operands)

@ritter-x2a ritter-x2a merged commit 3ba4092 into llvm:main Oct 1, 2024
8 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
…ne asm (llvm#109955)

For register constraints that require specific register ranges, the
width of the range should match the type of the associated
parameter/return value. With this PR, we error out when that is not the
case. Previously, these cases would hit assertions or llvm_unreachables.

The handling of register constraints that require only a single register
remains more lenient to allow narrower non-vector types for the
associated IR values. For example, constraining an i16 or i8 value to a
32-bit register is still allowed.

Fixes llvm#101190.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline assembly asserts when using wrong number of vector elements with physical register constraint
3 participants