[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

efriedma-quic · 2025-02-11T22:02:05Z

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 .

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)

llvmbot · 2025-02-11T22:02:42Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-arm

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)

Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

(modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
(added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
(modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
(modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
(modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
(modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
(modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
(modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)

diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

llvmbot · 2025-02-11T22:02:43Z

@llvm/pr-subscribers-backend-x86

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)

Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

(modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
(added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
(modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
(modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
(modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
(modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
(modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
(modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)

diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

llvmbot · 2025-02-11T22:02:43Z

@llvm/pr-subscribers-backend-aarch64

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)

Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

(modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
(added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
(modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
(modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
(modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
(modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
(modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
(modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)

diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

pinskia · 2025-02-11T22:08:50Z

Note https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 has better discussion about inline-asm trapping than the other one.

phoebewang · 2025-02-12T04:06:20Z

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior.

Should we limit this to volatile inline asm only?

arsenm · 2025-02-12T04:27:26Z

Should we limit this to volatile inline asm only?

That should have already been the previous behavior

pinskia · 2025-02-12T04:45:15Z

For GCC, the behavior of (non-volatile) inline-asm is still going to be considered as not trapping, though I proposed a patch to improve the costing of inline-asm for ifcvt so it will look like the same behavor as treating as trapping (https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675552.html). I also I am going to propose a patch to make it be able to move outside of the loop still but only conditionally; the same way GCC treats possible trapping (or undefined behavior based) code motion. I will also be writing up slightly better documentation there too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 ) especially when it comes to what it means to move outside of the loop and some what about costing (there is section about size already).

phoebewang · 2025-02-12T05:35:14Z

For GCC, the behavior of (non-volatile) inline-asm is still going to be considered as not trapping, though I proposed a patch to improve the costing of inline-asm for ifcvt so it will look like the same behavor as treating as trapping (https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675552.html). I also I am going to propose a patch to make it be able to move outside of the loop still but only conditionally; the same way GCC treats possible trapping (or undefined behavior based) code motion. I will also be writing up slightly better documentation there too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 ) especially when it comes to what it means to move outside of the loop and some what about costing (there is section about size already).

Given this, I think the precondition of this patch is not true. The LLVM IR level behavior cannot prove backend optimization is wrong. It might be LLVM IR not optimal enough and can be improved. Besides, LangRef says:

speculatable... This attribute is only valid on functions and declarations, not on individual call sites.

Hence it cannot be used for inline-asm. And from users' perspective, they don't have a way to tell compiler a inline-asm is light weight for optimization now.

efriedma-quic · 2025-02-12T06:02:45Z

Making inline asm "speculatable" by default seems like it has weird effects in all sorts of scenarios:

You can't guard an instruction which is only valid on some CPUs with a runtime check. For example, if you use cpuid.h to check whether an instruction is legal, you can only use the instruction in a volatile asm.
You can't use an instruction which is only valid with certain operands, because speculation can change the operands to anything. For example, cpuid.h is broken because it doesn't mark the asm volatile.

It's hard enough to use inline asm correctly without adding more weird edge cases.

We're not even getting much benefit from speculating inline asm, currently; we only do it in a couple obscure optimizations in the backend. Middle-end LLVM IR analysis already treats inline asm as potentially trapping.

arsenm · 2025-02-12T06:55:14Z

llvm/lib/CodeGen/MachineInstr.cpp

@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {

  if (isPosition() || isDebugInstr() || isTerminator() ||
      mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())


hasUnmodeledSideEffects already special cases inline asm, so move these checks together? Could even weaken the hasUnmodeledSideEffects part to the raw property check. The other contexts where asm is special cased have a comment about it (e.g. isDead)

Tried splitting the checks into two if statements, with comments to explain. Hopefully the split makes sense.

phoebewang · 2025-02-12T08:10:24Z

Making inline asm "speculatable" by default seems like it has weird effects in all sorts of scenarios:

I think that's how GCC documents it and what's user expected https://stackoverflow.com/a/14449998

You can't guard an instruction which is only valid on some CPUs with a runtime check. For example, if you use cpuid.h to check whether an instruction is legal, you can only use the instruction in a volatile asm.

You can't use an instruction which is only valid with certain operands, because speculation can change the operands to anything. For example, cpuid.h is broken because it doesn't mark the asm volatile.

I don't get the point of the example. If the use of cpuid instruction may trap, why don't add volatile? Without it, the example shown in GCC document is already broken.

It's hard enough to use inline asm correctly without adding more weird edge cases.

We're not even getting much benefit from speculating inline asm, currently; we only do it in a couple obscure optimizations in the backend. Middle-end LLVM IR analysis already treats inline asm as potentially trapping.

I don't see why we can't go along the opposite direction, i.e., optimize in the middle-end.

Also wrote a an explanation in the code for future reference.

efriedma-quic · 2025-02-24T20:27:07Z

Ping

It looks like with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150, gcc now has the behavior proposed in this patch.

I don't get the point of the example. If the use of cpuid instruction may trap, why don't add volatile? Without it, the example shown in GCC document is already broken.

This is just making the point that nobody actually expects this, as shown by the fact that even compiler developers don't expect this. (Obviously we control the implementation of our cpuid.h.)

phoebewang · 2025-02-25T13:38:31Z

It looks like with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150, gcc now has the behavior proposed in this patch.

Alright, no objections from me then.

efriedma-quic requested review from arsenm, fhahn and RKSimon February 11, 2025 22:02

llvmbot added backend:ARM backend:AArch64 backend:AMDGPU backend:X86 labels Feb 11, 2025

arsenm reviewed Feb 12, 2025

View reviewed changes

Address review comment by splitting conditions

0b62c1d

Also wrote a an explanation in the code for future reference.

arsenm approved these changes Feb 25, 2025

View reviewed changes

efriedma-quic merged commit 1b39328 into llvm:main Feb 25, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

Uh oh!

efriedma-quic commented Feb 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

pinskia commented Feb 11, 2025

Uh oh!

phoebewang commented Feb 12, 2025

Uh oh!

arsenm commented Feb 12, 2025

Uh oh!

pinskia commented Feb 12, 2025

Uh oh!

phoebewang commented Feb 12, 2025 •

edited

Loading

Uh oh!

efriedma-quic commented Feb 12, 2025

Uh oh!

arsenm Feb 12, 2025

Uh oh!

efriedma-quic Feb 12, 2025

Uh oh!

phoebewang commented Feb 12, 2025

Uh oh!

efriedma-quic commented Feb 24, 2025

Uh oh!

phoebewang commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

Uh oh!

Conversation

efriedma-quic commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

pinskia commented Feb 11, 2025

Uh oh!

phoebewang commented Feb 12, 2025

Uh oh!

arsenm commented Feb 12, 2025

Uh oh!

pinskia commented Feb 12, 2025

Uh oh!

phoebewang commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efriedma-quic commented Feb 12, 2025

Uh oh!

arsenm Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

efriedma-quic Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang commented Feb 12, 2025

Uh oh!

efriedma-quic commented Feb 24, 2025

Uh oh!

phoebewang commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

efriedma-quic commented Feb 11, 2025 •

edited

Loading

llvmbot commented Feb 11, 2025 •

edited

Loading

phoebewang commented Feb 12, 2025 •

edited

Loading