-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807
Conversation
Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-arm Author: Eli Friedman (efriedma-quic) ChangesEven if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker Full diff: https://github.com/llvm/llvm-project/pull/126807.diff 8 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
if (isPosition() || isDebugInstr() || isTerminator() ||
mayRaiseFPException() || hasUnmodeledSideEffects() ||
- isJumpTableDebugInfo())
+ isJumpTableDebugInfo() || isInlineAsm())
return false;
// See if this instruction does a load. If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: cmp w0, #3
+; CHECK-NEXT: b.ne .LBB0_2
+; CHECK-NEXT: // %bb.1: // %if.then
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL2
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB0_2: // %if.else
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL1
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+entry:
+ %cmp = icmp eq i32 %argc, 3
+ br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+ %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+ br label %if.end
+
+if.else:
+ %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+ br label %if.end
+
+if.end:
+ %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+ %y.0 = trunc i64 %y.0.in to i32
+ ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5: ; preds = %bb3, %bb
}
; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
}
; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
entry:
%v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
; NOSDWA: ; %bb.0: ; %bb
; NOSDWA-NEXT: s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT: ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT: .LBB21_1: ; %bb1
+; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: ;;#ASMSTART
; NOSDWA-NEXT: v_and_b32_e32 v0, s0, v0
; NOSDWA-NEXT: ;;#ASMEND
; NOSDWA-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT: .LBB21_1: ; %bb1
-; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; NOSDWA-NEXT: s_waitcnt vmcnt(0)
; NOSDWA-NEXT: s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX89-LABEL: sdwa_crash_inlineasm_def:
; GFX89: ; %bb.0: ; %bb
; GFX89-NEXT: s_mov_b32 s0, 0xffff
+; GFX89-NEXT: s_and_b64 vcc, exec, -1
+; GFX89-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT: .LBB21_1: ; %bb1
+; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: ;;#ASMSTART
; GFX89-NEXT: v_and_b32_e32 v0, s0, v0
; GFX89-NEXT: ;;#ASMEND
; GFX89-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT: s_and_b64 vcc, exec, -1
-; GFX89-NEXT: .LBB21_1: ; %bb1
-; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; GFX89-NEXT: s_waitcnt vmcnt(0)
; GFX89-NEXT: s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX9-LABEL: sdwa_crash_inlineasm_def:
; GFX9: ; %bb.0: ; %bb
; GFX9-NEXT: s_mov_b32 s0, 0xffff
+; GFX9-NEXT: s_and_b64 vcc, exec, -1
+; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT: .LBB21_1: ; %bb1
+; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: v_and_b32_e32 v0, s0, v0
; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT: s_and_b64 vcc, exec, -1
-; GFX9-NEXT: .LBB21_1: ; %bb1
-; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX10-LABEL: sdwa_crash_inlineasm_def:
; GFX10: ; %bb.0: ; %bb
; GFX10-NEXT: s_mov_b32 s0, 0xffff
+; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT: .LBB21_1: ; %bb1
+; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: v_and_b32_e32 v0, s0, v0
; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT: .LBB21_1: ; %bb1
-; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-ENABLE-NEXT: @ %bb.1: @ %if.then
; ARM-ENABLE-NEXT: sub r1, sp, #16
; ARM-ENABLE-NEXT: mov sp, r1
+; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: @ InlineAsm Start
; ARM-ENABLE-NEXT: mov r2, #0
; ARM-ENABLE-NEXT: @ InlineAsm End
-; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: add r0, r2, r0
; ARM-ENABLE-NEXT: str r0, [r1]
; ARM-ENABLE-NEXT: @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-DISABLE-NEXT: @ %bb.1: @ %if.then
; ARM-DISABLE-NEXT: sub r1, sp, #16
; ARM-DISABLE-NEXT: mov sp, r1
+; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: @ InlineAsm Start
; ARM-DISABLE-NEXT: mov r2, #0
; ARM-DISABLE-NEXT: @ InlineAsm End
-; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: add r0, r2, r0
; ARM-DISABLE-NEXT: str r0, [r1]
; ARM-DISABLE-NEXT: @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-ENABLE-NEXT: sub.w r0, sp, #16
; THUMB-ENABLE-NEXT: mov sp, r0
; THUMB-ENABLE-NEXT: movs r1, #0
+; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: @ InlineAsm Start
; THUMB-ENABLE-NEXT: mov.w r2, #0
; THUMB-ENABLE-NEXT: @ InlineAsm End
-; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: add r1, r2
; THUMB-ENABLE-NEXT: str r1, [r0]
; THUMB-ENABLE-NEXT: @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-DISABLE-NEXT: sub.w r0, sp, #16
; THUMB-DISABLE-NEXT: mov sp, r0
; THUMB-DISABLE-NEXT: movs r1, #0
+; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: @ InlineAsm Start
; THUMB-DISABLE-NEXT: mov.w r2, #0
; THUMB-DISABLE-NEXT: @ InlineAsm End
-; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: add r1, r2
; THUMB-DISABLE-NEXT: str r1, [r0]
; THUMB-DISABLE-NEXT: @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
; ENABLE-NEXT: movq %rsp, %rcx
; ENABLE-NEXT: addq $-16, %rcx
; ENABLE-NEXT: movq %rcx, %rsp
-; ENABLE-NEXT: ## InlineAsm Start
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: LBB10_2: ## %for.body
; ENABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: ## InlineAsm Start
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: movl %eax, (%rcx)
; ENABLE-NEXT: jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
; DISABLE-NEXT: movq %rsp, %rcx
; DISABLE-NEXT: addq $-16, %rcx
; DISABLE-NEXT: movq %rcx, %rsp
-; DISABLE-NEXT: ## InlineAsm Start
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: LBB10_2: ## %for.body
; DISABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: ## InlineAsm Start
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: movl %eax, (%rcx)
; DISABLE-NEXT: jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: xorl %eax, %eax
; ENABLE-NEXT: movl $10, %ecx
-; ENABLE-NEXT: #APP
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: .LBB0_2: # %for.body
; ENABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: #APP
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: decl %ecx
; ENABLE-NEXT: jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: xorl %eax, %eax
; DISABLE-NEXT: movl $10, %ecx
-; DISABLE-NEXT: #APP
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: .LBB0_2: # %for.body
; DISABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: #APP
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: decl %ecx
; DISABLE-NEXT: jne .LBB0_2
|
@llvm/pr-subscribers-backend-x86 Author: Eli Friedman (efriedma-quic) ChangesEven if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker Full diff: https://github.com/llvm/llvm-project/pull/126807.diff 8 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
if (isPosition() || isDebugInstr() || isTerminator() ||
mayRaiseFPException() || hasUnmodeledSideEffects() ||
- isJumpTableDebugInfo())
+ isJumpTableDebugInfo() || isInlineAsm())
return false;
// See if this instruction does a load. If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: cmp w0, #3
+; CHECK-NEXT: b.ne .LBB0_2
+; CHECK-NEXT: // %bb.1: // %if.then
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL2
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB0_2: // %if.else
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL1
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+entry:
+ %cmp = icmp eq i32 %argc, 3
+ br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+ %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+ br label %if.end
+
+if.else:
+ %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+ br label %if.end
+
+if.end:
+ %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+ %y.0 = trunc i64 %y.0.in to i32
+ ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5: ; preds = %bb3, %bb
}
; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
}
; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
entry:
%v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
; NOSDWA: ; %bb.0: ; %bb
; NOSDWA-NEXT: s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT: ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT: .LBB21_1: ; %bb1
+; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: ;;#ASMSTART
; NOSDWA-NEXT: v_and_b32_e32 v0, s0, v0
; NOSDWA-NEXT: ;;#ASMEND
; NOSDWA-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT: .LBB21_1: ; %bb1
-; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; NOSDWA-NEXT: s_waitcnt vmcnt(0)
; NOSDWA-NEXT: s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX89-LABEL: sdwa_crash_inlineasm_def:
; GFX89: ; %bb.0: ; %bb
; GFX89-NEXT: s_mov_b32 s0, 0xffff
+; GFX89-NEXT: s_and_b64 vcc, exec, -1
+; GFX89-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT: .LBB21_1: ; %bb1
+; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: ;;#ASMSTART
; GFX89-NEXT: v_and_b32_e32 v0, s0, v0
; GFX89-NEXT: ;;#ASMEND
; GFX89-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT: s_and_b64 vcc, exec, -1
-; GFX89-NEXT: .LBB21_1: ; %bb1
-; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; GFX89-NEXT: s_waitcnt vmcnt(0)
; GFX89-NEXT: s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX9-LABEL: sdwa_crash_inlineasm_def:
; GFX9: ; %bb.0: ; %bb
; GFX9-NEXT: s_mov_b32 s0, 0xffff
+; GFX9-NEXT: s_and_b64 vcc, exec, -1
+; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT: .LBB21_1: ; %bb1
+; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: v_and_b32_e32 v0, s0, v0
; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT: s_and_b64 vcc, exec, -1
-; GFX9-NEXT: .LBB21_1: ; %bb1
-; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX10-LABEL: sdwa_crash_inlineasm_def:
; GFX10: ; %bb.0: ; %bb
; GFX10-NEXT: s_mov_b32 s0, 0xffff
+; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT: .LBB21_1: ; %bb1
+; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: v_and_b32_e32 v0, s0, v0
; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT: .LBB21_1: ; %bb1
-; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-ENABLE-NEXT: @ %bb.1: @ %if.then
; ARM-ENABLE-NEXT: sub r1, sp, #16
; ARM-ENABLE-NEXT: mov sp, r1
+; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: @ InlineAsm Start
; ARM-ENABLE-NEXT: mov r2, #0
; ARM-ENABLE-NEXT: @ InlineAsm End
-; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: add r0, r2, r0
; ARM-ENABLE-NEXT: str r0, [r1]
; ARM-ENABLE-NEXT: @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-DISABLE-NEXT: @ %bb.1: @ %if.then
; ARM-DISABLE-NEXT: sub r1, sp, #16
; ARM-DISABLE-NEXT: mov sp, r1
+; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: @ InlineAsm Start
; ARM-DISABLE-NEXT: mov r2, #0
; ARM-DISABLE-NEXT: @ InlineAsm End
-; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: add r0, r2, r0
; ARM-DISABLE-NEXT: str r0, [r1]
; ARM-DISABLE-NEXT: @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-ENABLE-NEXT: sub.w r0, sp, #16
; THUMB-ENABLE-NEXT: mov sp, r0
; THUMB-ENABLE-NEXT: movs r1, #0
+; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: @ InlineAsm Start
; THUMB-ENABLE-NEXT: mov.w r2, #0
; THUMB-ENABLE-NEXT: @ InlineAsm End
-; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: add r1, r2
; THUMB-ENABLE-NEXT: str r1, [r0]
; THUMB-ENABLE-NEXT: @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-DISABLE-NEXT: sub.w r0, sp, #16
; THUMB-DISABLE-NEXT: mov sp, r0
; THUMB-DISABLE-NEXT: movs r1, #0
+; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: @ InlineAsm Start
; THUMB-DISABLE-NEXT: mov.w r2, #0
; THUMB-DISABLE-NEXT: @ InlineAsm End
-; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: add r1, r2
; THUMB-DISABLE-NEXT: str r1, [r0]
; THUMB-DISABLE-NEXT: @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
; ENABLE-NEXT: movq %rsp, %rcx
; ENABLE-NEXT: addq $-16, %rcx
; ENABLE-NEXT: movq %rcx, %rsp
-; ENABLE-NEXT: ## InlineAsm Start
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: LBB10_2: ## %for.body
; ENABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: ## InlineAsm Start
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: movl %eax, (%rcx)
; ENABLE-NEXT: jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
; DISABLE-NEXT: movq %rsp, %rcx
; DISABLE-NEXT: addq $-16, %rcx
; DISABLE-NEXT: movq %rcx, %rsp
-; DISABLE-NEXT: ## InlineAsm Start
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: LBB10_2: ## %for.body
; DISABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: ## InlineAsm Start
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: movl %eax, (%rcx)
; DISABLE-NEXT: jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: xorl %eax, %eax
; ENABLE-NEXT: movl $10, %ecx
-; ENABLE-NEXT: #APP
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: .LBB0_2: # %for.body
; ENABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: #APP
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: decl %ecx
; ENABLE-NEXT: jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: xorl %eax, %eax
; DISABLE-NEXT: movl $10, %ecx
-; DISABLE-NEXT: #APP
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: .LBB0_2: # %for.body
; DISABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: #APP
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: decl %ecx
; DISABLE-NEXT: jne .LBB0_2
|
@llvm/pr-subscribers-backend-aarch64 Author: Eli Friedman (efriedma-quic) ChangesEven if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker Full diff: https://github.com/llvm/llvm-project/pull/126807.diff 8 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
if (isPosition() || isDebugInstr() || isTerminator() ||
mayRaiseFPException() || hasUnmodeledSideEffects() ||
- isJumpTableDebugInfo())
+ isJumpTableDebugInfo() || isInlineAsm())
return false;
// See if this instruction does a load. If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: cmp w0, #3
+; CHECK-NEXT: b.ne .LBB0_2
+; CHECK-NEXT: // %bb.1: // %if.then
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL2
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB0_2: // %if.else
+; CHECK-NEXT: //APP
+; CHECK-NEXT: mrs x0, SPSR_EL1
+; CHECK-NEXT: //NO_APP
+; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT: ret
+entry:
+ %cmp = icmp eq i32 %argc, 3
+ br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+ %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+ br label %if.end
+
+if.else:
+ %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+ br label %if.end
+
+if.end:
+ %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+ %y.0 = trunc i64 %y.0.in to i32
+ ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5: ; preds = %bb3, %bb
}
; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
}
; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
entry:
%v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
; NOSDWA: ; %bb.0: ; %bb
; NOSDWA-NEXT: s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT: ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT: .LBB21_1: ; %bb1
+; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: ;;#ASMSTART
; NOSDWA-NEXT: v_and_b32_e32 v0, s0, v0
; NOSDWA-NEXT: ;;#ASMEND
; NOSDWA-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT: s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT: .LBB21_1: ; %bb1
-; NOSDWA-NEXT: ; =>This Inner Loop Header: Depth=1
; NOSDWA-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; NOSDWA-NEXT: s_waitcnt vmcnt(0)
; NOSDWA-NEXT: s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX89-LABEL: sdwa_crash_inlineasm_def:
; GFX89: ; %bb.0: ; %bb
; GFX89-NEXT: s_mov_b32 s0, 0xffff
+; GFX89-NEXT: s_and_b64 vcc, exec, -1
+; GFX89-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT: .LBB21_1: ; %bb1
+; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: ;;#ASMSTART
; GFX89-NEXT: v_and_b32_e32 v0, s0, v0
; GFX89-NEXT: ;;#ASMEND
; GFX89-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT: s_and_b64 vcc, exec, -1
-; GFX89-NEXT: .LBB21_1: ; %bb1
-; GFX89-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX89-NEXT: flat_store_dwordx2 v[0:1], v[0:1]
; GFX89-NEXT: s_waitcnt vmcnt(0)
; GFX89-NEXT: s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX9-LABEL: sdwa_crash_inlineasm_def:
; GFX9: ; %bb.0: ; %bb
; GFX9-NEXT: s_mov_b32 s0, 0xffff
+; GFX9-NEXT: s_and_b64 vcc, exec, -1
+; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT: .LBB21_1: ; %bb1
+; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: v_and_b32_e32 v0, s0, v0
; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT: s_and_b64 vcc, exec, -1
-; GFX9-NEXT: .LBB21_1: ; %bb1
-; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX9-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
; GFX10-LABEL: sdwa_crash_inlineasm_def:
; GFX10: ; %bb.0: ; %bb
; GFX10-NEXT: s_mov_b32 s0, 0xffff
+; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT: .LBB21_1: ; %bb1
+; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: v_and_b32_e32 v0, s0, v0
; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT: s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT: .LBB21_1: ; %bb1
-; GFX10-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX10-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-ENABLE-NEXT: @ %bb.1: @ %if.then
; ARM-ENABLE-NEXT: sub r1, sp, #16
; ARM-ENABLE-NEXT: mov sp, r1
+; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: @ InlineAsm Start
; ARM-ENABLE-NEXT: mov r2, #0
; ARM-ENABLE-NEXT: @ InlineAsm End
-; ARM-ENABLE-NEXT: LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-ENABLE-NEXT: add r0, r2, r0
; ARM-ENABLE-NEXT: str r0, [r1]
; ARM-ENABLE-NEXT: @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; ARM-DISABLE-NEXT: @ %bb.1: @ %if.then
; ARM-DISABLE-NEXT: sub r1, sp, #16
; ARM-DISABLE-NEXT: mov sp, r1
+; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: @ InlineAsm Start
; ARM-DISABLE-NEXT: mov r2, #0
; ARM-DISABLE-NEXT: @ InlineAsm End
-; ARM-DISABLE-NEXT: LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; ARM-DISABLE-NEXT: add r0, r2, r0
; ARM-DISABLE-NEXT: str r0, [r1]
; ARM-DISABLE-NEXT: @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-ENABLE-NEXT: sub.w r0, sp, #16
; THUMB-ENABLE-NEXT: mov sp, r0
; THUMB-ENABLE-NEXT: movs r1, #0
+; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: @ InlineAsm Start
; THUMB-ENABLE-NEXT: mov.w r2, #0
; THUMB-ENABLE-NEXT: @ InlineAsm End
-; THUMB-ENABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-ENABLE-NEXT: add r1, r2
; THUMB-ENABLE-NEXT: str r1, [r0]
; THUMB-ENABLE-NEXT: @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
; THUMB-DISABLE-NEXT: sub.w r0, sp, #16
; THUMB-DISABLE-NEXT: mov sp, r0
; THUMB-DISABLE-NEXT: movs r1, #0
+; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: @ InlineAsm Start
; THUMB-DISABLE-NEXT: mov.w r2, #0
; THUMB-DISABLE-NEXT: @ InlineAsm End
-; THUMB-DISABLE-NEXT: LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1
; THUMB-DISABLE-NEXT: add r1, r2
; THUMB-DISABLE-NEXT: str r1, [r0]
; THUMB-DISABLE-NEXT: @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
; ENABLE-NEXT: movq %rsp, %rcx
; ENABLE-NEXT: addq $-16, %rcx
; ENABLE-NEXT: movq %rcx, %rsp
-; ENABLE-NEXT: ## InlineAsm Start
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: LBB10_2: ## %for.body
; ENABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: ## InlineAsm Start
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: ## InlineAsm End
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: movl %eax, (%rcx)
; ENABLE-NEXT: jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
; DISABLE-NEXT: movq %rsp, %rcx
; DISABLE-NEXT: addq $-16, %rcx
; DISABLE-NEXT: movq %rcx, %rsp
-; DISABLE-NEXT: ## InlineAsm Start
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: LBB10_2: ## %for.body
; DISABLE-NEXT: ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: ## InlineAsm Start
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: ## InlineAsm End
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: movl %eax, (%rcx)
; DISABLE-NEXT: jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: xorl %eax, %eax
; ENABLE-NEXT: movl $10, %ecx
-; ENABLE-NEXT: #APP
-; ENABLE-NEXT: movl $1, %edx
-; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: .p2align 4
; ENABLE-NEXT: .LBB0_2: # %for.body
; ENABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT: #APP
+; ENABLE-NEXT: movl $1, %edx
+; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: addl %edx, %eax
; ENABLE-NEXT: decl %ecx
; ENABLE-NEXT: jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: xorl %eax, %eax
; DISABLE-NEXT: movl $10, %ecx
-; DISABLE-NEXT: #APP
-; DISABLE-NEXT: movl $1, %edx
-; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: .p2align 4
; DISABLE-NEXT: .LBB0_2: # %for.body
; DISABLE-NEXT: # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT: #APP
+; DISABLE-NEXT: movl $1, %edx
+; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: addl %edx, %eax
; DISABLE-NEXT: decl %ecx
; DISABLE-NEXT: jne .LBB0_2
|
Note https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 has better discussion about inline-asm trapping than the other one. |
Should we limit this to volatile inline asm only? |
That should have already been the previous behavior |
For GCC, the behavior of (non-volatile) inline-asm is still going to be considered as not trapping, though I proposed a patch to improve the costing of inline-asm for ifcvt so it will look like the same behavor as treating as trapping (https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675552.html). I also I am going to propose a patch to make it be able to move outside of the loop still but only conditionally; the same way GCC treats possible trapping (or undefined behavior based) code motion. I will also be writing up slightly better documentation there too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 ) especially when it comes to what it means to move outside of the loop and some what about costing (there is section about size already). |
Given this, I think the precondition of this patch is not true. The LLVM IR level behavior cannot prove backend optimization is wrong. It might be LLVM IR not optimal enough and can be improved. Besides, LangRef says:
Hence it cannot be used for inline-asm. And from users' perspective, they don't have a way to tell compiler a inline-asm is light weight for optimization now. |
Making inline asm "speculatable" by default seems like it has weird effects in all sorts of scenarios:
It's hard enough to use inline asm correctly without adding more weird edge cases. We're not even getting much benefit from speculating inline asm, currently; we only do it in a couple obscure optimizations in the backend. Middle-end LLVM IR analysis already treats inline asm as potentially trapping. |
llvm/lib/CodeGen/MachineInstr.cpp
Outdated
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const { | |||
|
|||
if (isPosition() || isDebugInstr() || isTerminator() || | |||
mayRaiseFPException() || hasUnmodeledSideEffects() || | |||
isJumpTableDebugInfo()) | |||
isJumpTableDebugInfo() || isInlineAsm()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasUnmodeledSideEffects already special cases inline asm, so move these checks together? Could even weaken the hasUnmodeledSideEffects part to the raw property check. The other contexts where asm is special cased have a comment about it (e.g. isDead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried splitting the checks into two if statements, with comments to explain. Hopefully the split makes sense.
I think that's how GCC documents it and what's user expected https://stackoverflow.com/a/14449998
I don't get the point of the example. If the use of cpuid instruction may trap, why don't add volatile? Without it, the example shown in GCC document is already broken.
I don't see why we can't go along the opposite direction, i.e., optimize in the middle-end. |
Also wrote a an explanation in the code for future reference.
Ping It looks like with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150, gcc now has the behavior proposed in this patch.
This is just making the point that nobody actually expects this, as shown by the fact that even compiler developers don't expect this. (Obviously we control the implementation of our cpuid.h.) |
Alright, no objections from me then. |
Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.
This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.
Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.
See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 .