Skip to content

[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. #126807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

efriedma-quic
Copy link
Collaborator

@efriedma-quic efriedma-quic commented Feb 11, 2025

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 .

Even if an inline asm doesn't have memory effects, we can't assume it's
safe to speculate: it could trap, or cause undefined behavior.  At the
LLVM IR level, this is handled correctly: we don't speculate inline asm
(unless it's marked "speculatable", but I don't think anyone does that).
Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from
speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different
API: isSafeToMove assumes you're hoisting, but we could handle some
forms of sinking more aggressively.  But I'll leave that for a followup,
if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give
the discussion there much weight, but just for reference.)
@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-arm

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)


Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
  • (modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
  • (modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
  • (modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-backend-x86

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)


Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
  • (modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
  • (modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
  • (modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Eli Friedman (efriedma-quic)

Changes

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction.

This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr.

Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant.

See also discussion on gcc bugtracker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 . (I don't give the discussion there much weight, but just for reference.)


Full diff: https://github.com/llvm/llvm-project/pull/126807.diff

8 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineInstr.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/inline-asm-speculation.ll (+38)
  • (modified) llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll (+1-5)
  • (modified) llvm/test/CodeGen/AMDGPU/early-if-convert.ll (+1-4)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+16-12)
  • (modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+6-6)
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 52c977a3651a3..35083ce5b5667 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {
 
   if (isPosition() || isDebugInstr() || isTerminator() ||
       mayRaiseFPException() || hasUnmodeledSideEffects() ||
-      isJumpTableDebugInfo())
+      isJumpTableDebugInfo() || isInlineAsm())
     return false;
 
   // See if this instruction does a load.  If so, we have to guarantee that the
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
new file mode 100644
index 0000000000000..d87c23cd85152
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/inline-asm-speculation.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; Make sure we don't speculatively execute inline asm statements.
+
+define dso_local i32 @main(i32 %argc, ptr %argv) {
+; CHECK-LABEL: main:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cmp w0, #3
+; CHECK-NEXT:    b.ne .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL2
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %if.else
+; CHECK-NEXT:    //APP
+; CHECK-NEXT:    mrs x0, SPSR_EL1
+; CHECK-NEXT:    //NO_APP
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %cmp = icmp eq i32 %argc, 3
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %0 = tail call i64 asm "mrs $0, SPSR_EL2", "=r"()
+  br label %if.end
+
+if.else:
+  %1 = tail call i64 asm "mrs $0, SPSR_EL1", "=r"()
+  br label %if.end
+
+if.end:
+  %y.0.in = phi i64 [ %0, %if.then ], [ %1, %if.else ]
+  %y.0 = trunc i64 %y.0.in to i32
+  ret i32 %y.0
+}
diff --git a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
index bd523d4ac30b9..0574de30abfb5 100644
--- a/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergent-inlineasm.ll
@@ -23,12 +23,8 @@ bb5:                                              ; preds = %bb3, %bb
 }
 
 ; GCN-LABEL: {{^}}nonconvergent_inlineasm:
-; GCN: s_cbranch_execz
-
-; GCN: ; %bb.{{[0-9]+}}:
 ; GCN: v_cmp_ne_u32_e64
-
-; GCN: BB{{[0-9]+_[0-9]+}}:
+; GCN: s_cbranch_execz
 
 define amdgpu_kernel void @nonconvergent_inlineasm(ptr addrspace(1) nocapture %arg) {
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
index 95577b44db764..028cd9110eb2b 100644
--- a/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
+++ b/llvm/test/CodeGen/AMDGPU/early-if-convert.ll
@@ -55,11 +55,8 @@ endif:
 }
 
 ; GCN-LABEL: {{^}}test_vccnz_ifcvt_triangle_vcc_clobber:
+; GCN: s_cbranch_vccnz
 ; GCN: ; clobber vcc
-; GCN: v_cmp_neq_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
-; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc
-; GCN: s_mov_b64 vcc, [[CMP]]
-; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, vcc
 define amdgpu_kernel void @test_vccnz_ifcvt_triangle_vcc_clobber(ptr addrspace(1) %out, ptr addrspace(1) %in, float %k) #0 {
 entry:
   %v = load i32, ptr addrspace(1) %in
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
index 459ef648fd806..7b82a6b6dcaa2 100644
--- a/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -2015,13 +2015,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; NOSDWA-LABEL: sdwa_crash_inlineasm_def:
 ; NOSDWA:       ; %bb.0: ; %bb
 ; NOSDWA-NEXT:    s_mov_b32 s0, 0xffff
+; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
+; NOSDWA-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; NOSDWA-NEXT:  .LBB21_1: ; %bb1
+; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    ;;#ASMSTART
 ; NOSDWA-NEXT:    v_and_b32_e32 v0, s0, v0
 ; NOSDWA-NEXT:    ;;#ASMEND
 ; NOSDWA-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; NOSDWA-NEXT:    s_and_b64 vcc, exec, -1
-; NOSDWA-NEXT:  .LBB21_1: ; %bb1
-; NOSDWA-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; NOSDWA-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; NOSDWA-NEXT:    s_waitcnt vmcnt(0)
 ; NOSDWA-NEXT:    s_mov_b64 vcc, vcc
@@ -2032,13 +2033,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX89-LABEL: sdwa_crash_inlineasm_def:
 ; GFX89:       ; %bb.0: ; %bb
 ; GFX89-NEXT:    s_mov_b32 s0, 0xffff
+; GFX89-NEXT:    s_and_b64 vcc, exec, -1
+; GFX89-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX89-NEXT:  .LBB21_1: ; %bb1
+; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    ;;#ASMSTART
 ; GFX89-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX89-NEXT:    ;;#ASMEND
 ; GFX89-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX89-NEXT:    s_and_b64 vcc, exec, -1
-; GFX89-NEXT:  .LBB21_1: ; %bb1
-; GFX89-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX89-NEXT:    flat_store_dwordx2 v[0:1], v[0:1]
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
 ; GFX89-NEXT:    s_mov_b64 vcc, vcc
@@ -2049,13 +2051,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX9-LABEL: sdwa_crash_inlineasm_def:
 ; GFX9:       ; %bb.0: ; %bb
 ; GFX9-NEXT:    s_mov_b32 s0, 0xffff
+; GFX9-NEXT:    s_and_b64 vcc, exec, -1
+; GFX9-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX9-NEXT:  .LBB21_1: ; %bb1
+; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    ;;#ASMSTART
 ; GFX9-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX9-NEXT:    ;;#ASMEND
 ; GFX9-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX9-NEXT:    s_and_b64 vcc, exec, -1
-; GFX9-NEXT:  .LBB21_1: ; %bb1
-; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_mov_b64 vcc, vcc
@@ -2066,13 +2069,14 @@ define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 ; GFX10-LABEL: sdwa_crash_inlineasm_def:
 ; GFX10:       ; %bb.0: ; %bb
 ; GFX10-NEXT:    s_mov_b32 s0, 0xffff
+; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
+; GFX10-NEXT:    ; implicit-def: $vgpr0_vgpr1
+; GFX10-NEXT:  .LBB21_1: ; %bb1
+; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    ;;#ASMSTART
 ; GFX10-NEXT:    v_and_b32_e32 v0, s0, v0
 ; GFX10-NEXT:    ;;#ASMEND
 ; GFX10-NEXT:    v_or_b32_e32 v0, 0x10000, v0
-; GFX10-NEXT:    s_mov_b32 vcc_lo, exec_lo
-; GFX10-NEXT:  .LBB21_1: ; %bb1
-; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    global_store_dwordx2 v[0:1], v[0:1], off
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB21_1
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
index 5b2d0a8a7c059..6bcffa29678eb 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
@@ -1601,11 +1601,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-ENABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-ENABLE-NEXT:    sub r1, sp, #16
 ; ARM-ENABLE-NEXT:    mov sp, r1
+; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
 ; ARM-ENABLE-NEXT:    mov r2, #0
 ; ARM-ENABLE-NEXT:    @ InlineAsm End
-; ARM-ENABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-ENABLE-NEXT:    add r0, r2, r0
 ; ARM-ENABLE-NEXT:    str r0, [r1]
 ; ARM-ENABLE-NEXT:    @ InlineAsm Start
@@ -1629,11 +1629,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; ARM-DISABLE-NEXT:  @ %bb.1: @ %if.then
 ; ARM-DISABLE-NEXT:    sub r1, sp, #16
 ; ARM-DISABLE-NEXT:    mov sp, r1
+; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
+; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
 ; ARM-DISABLE-NEXT:    mov r2, #0
 ; ARM-DISABLE-NEXT:    @ InlineAsm End
-; ARM-DISABLE-NEXT:  LBB10_2: @ %for.body
-; ARM-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ARM-DISABLE-NEXT:    add r0, r2, r0
 ; ARM-DISABLE-NEXT:    str r0, [r1]
 ; ARM-DISABLE-NEXT:    @ InlineAsm Start
@@ -1657,11 +1657,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-ENABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-ENABLE-NEXT:    mov sp, r0
 ; THUMB-ENABLE-NEXT:    movs r1, #0
+; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
 ; THUMB-ENABLE-NEXT:    mov.w r2, #0
 ; THUMB-ENABLE-NEXT:    @ InlineAsm End
-; THUMB-ENABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-ENABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-ENABLE-NEXT:    add r1, r2
 ; THUMB-ENABLE-NEXT:    str r1, [r0]
 ; THUMB-ENABLE-NEXT:    @ InlineAsm Start
@@ -1686,11 +1686,11 @@ define void @infiniteloop2() "frame-pointer"="all" {
 ; THUMB-DISABLE-NEXT:    sub.w r0, sp, #16
 ; THUMB-DISABLE-NEXT:    mov sp, r0
 ; THUMB-DISABLE-NEXT:    movs r1, #0
+; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
+; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
 ; THUMB-DISABLE-NEXT:    mov.w r2, #0
 ; THUMB-DISABLE-NEXT:    @ InlineAsm End
-; THUMB-DISABLE-NEXT:  LBB10_2: @ %for.body
-; THUMB-DISABLE-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; THUMB-DISABLE-NEXT:    add r1, r2
 ; THUMB-DISABLE-NEXT:    str r1, [r0]
 ; THUMB-DISABLE-NEXT:    @ InlineAsm Start
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index 928f29b7b1889..f007b316b92b2 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -821,12 +821,12 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    movq %rsp, %rcx
 ; ENABLE-NEXT:    addq $-16, %rcx
 ; ENABLE-NEXT:    movq %rcx, %rsp
-; ENABLE-NEXT:    ## InlineAsm Start
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  LBB10_2: ## %for.body
 ; ENABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    ## InlineAsm Start
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    ## InlineAsm End
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    movl %eax, (%rcx)
 ; ENABLE-NEXT:    jmp LBB10_2
@@ -853,12 +853,12 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    movq %rsp, %rcx
 ; DISABLE-NEXT:    addq $-16, %rcx
 ; DISABLE-NEXT:    movq %rcx, %rsp
-; DISABLE-NEXT:    ## InlineAsm Start
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  LBB10_2: ## %for.body
 ; DISABLE-NEXT:    ## =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    ## InlineAsm Start
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    ## InlineAsm End
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    movl %eax, (%rcx)
 ; DISABLE-NEXT:    jmp LBB10_2
diff --git a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
index 19637c8304dd8..8eccfe6664166 100644
--- a/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll
@@ -24,12 +24,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    xorl %eax, %eax
 ; ENABLE-NEXT:    movl $10, %ecx
-; ENABLE-NEXT:    #APP
-; ENABLE-NEXT:    movl $1, %edx
-; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    .p2align 4
 ; ENABLE-NEXT:  .LBB0_2: # %for.body
 ; ENABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; ENABLE-NEXT:    #APP
+; ENABLE-NEXT:    movl $1, %edx
+; ENABLE-NEXT:    #NO_APP
 ; ENABLE-NEXT:    addl %edx, %eax
 ; ENABLE-NEXT:    decl %ecx
 ; ENABLE-NEXT:    jne .LBB0_2
@@ -64,12 +64,12 @@ define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) #0 {
 ; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    xorl %eax, %eax
 ; DISABLE-NEXT:    movl $10, %ecx
-; DISABLE-NEXT:    #APP
-; DISABLE-NEXT:    movl $1, %edx
-; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    .p2align 4
 ; DISABLE-NEXT:  .LBB0_2: # %for.body
 ; DISABLE-NEXT:    # =>This Inner Loop Header: Depth=1
+; DISABLE-NEXT:    #APP
+; DISABLE-NEXT:    movl $1, %edx
+; DISABLE-NEXT:    #NO_APP
 ; DISABLE-NEXT:    addl %edx, %eax
 ; DISABLE-NEXT:    decl %ecx
 ; DISABLE-NEXT:    jne .LBB0_2

@pinskia
Copy link

pinskia commented Feb 11, 2025

Note https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 has better discussion about inline-asm trapping than the other one.

@phoebewang
Copy link
Contributor

Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior.

Should we limit this to volatile inline asm only?

@arsenm
Copy link
Contributor

arsenm commented Feb 12, 2025

Should we limit this to volatile inline asm only?

That should have already been the previous behavior

@pinskia
Copy link

pinskia commented Feb 12, 2025

For GCC, the behavior of (non-volatile) inline-asm is still going to be considered as not trapping, though I proposed a patch to improve the costing of inline-asm for ifcvt so it will look like the same behavor as treating as trapping (https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675552.html). I also I am going to propose a patch to make it be able to move outside of the loop still but only conditionally; the same way GCC treats possible trapping (or undefined behavior based) code motion. I will also be writing up slightly better documentation there too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 ) especially when it comes to what it means to move outside of the loop and some what about costing (there is section about size already).

@phoebewang
Copy link
Contributor

phoebewang commented Feb 12, 2025

For GCC, the behavior of (non-volatile) inline-asm is still going to be considered as not trapping, though I proposed a patch to improve the costing of inline-asm for ifcvt so it will look like the same behavor as treating as trapping (https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675552.html). I also I am going to propose a patch to make it be able to move outside of the loop still but only conditionally; the same way GCC treats possible trapping (or undefined behavior based) code motion. I will also be writing up slightly better documentation there too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 ) especially when it comes to what it means to move outside of the loop and some what about costing (there is section about size already).

Given this, I think the precondition of this patch is not true. The LLVM IR level behavior cannot prove backend optimization is wrong. It might be LLVM IR not optimal enough and can be improved. Besides, LangRef says:

speculatable... This attribute is only valid on functions and declarations, not on individual call sites.

Hence it cannot be used for inline-asm. And from users' perspective, they don't have a way to tell compiler a inline-asm is light weight for optimization now.

@efriedma-quic
Copy link
Collaborator Author

Making inline asm "speculatable" by default seems like it has weird effects in all sorts of scenarios:

  • You can't guard an instruction which is only valid on some CPUs with a runtime check. For example, if you use cpuid.h to check whether an instruction is legal, you can only use the instruction in a volatile asm.
  • You can't use an instruction which is only valid with certain operands, because speculation can change the operands to anything. For example, cpuid.h is broken because it doesn't mark the asm volatile.

It's hard enough to use inline asm correctly without adding more weird edge cases.

We're not even getting much benefit from speculating inline asm, currently; we only do it in a couple obscure optimizations in the backend. Middle-end LLVM IR analysis already treats inline asm as potentially trapping.

@@ -1310,7 +1310,7 @@ bool MachineInstr::isSafeToMove(bool &SawStore) const {

if (isPosition() || isDebugInstr() || isTerminator() ||
mayRaiseFPException() || hasUnmodeledSideEffects() ||
isJumpTableDebugInfo())
isJumpTableDebugInfo() || isInlineAsm())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasUnmodeledSideEffects already special cases inline asm, so move these checks together? Could even weaken the hasUnmodeledSideEffects part to the raw property check. The other contexts where asm is special cased have a comment about it (e.g. isDead)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried splitting the checks into two if statements, with comments to explain. Hopefully the split makes sense.

@phoebewang
Copy link
Contributor

Making inline asm "speculatable" by default seems like it has weird effects in all sorts of scenarios:

I think that's how GCC documents it and what's user expected https://stackoverflow.com/a/14449998

  • You can't guard an instruction which is only valid on some CPUs with a runtime check. For example, if you use cpuid.h to check whether an instruction is legal, you can only use the instruction in a volatile asm.
  • You can't use an instruction which is only valid with certain operands, because speculation can change the operands to anything. For example, cpuid.h is broken because it doesn't mark the asm volatile.

I don't get the point of the example. If the use of cpuid instruction may trap, why don't add volatile? Without it, the example shown in GCC document is already broken.

It's hard enough to use inline asm correctly without adding more weird edge cases.

We're not even getting much benefit from speculating inline asm, currently; we only do it in a couple obscure optimizations in the backend. Middle-end LLVM IR analysis already treats inline asm as potentially trapping.

I don't see why we can't go along the opposite direction, i.e., optimize in the middle-end.

Also wrote a an explanation in the code for future reference.
@efriedma-quic
Copy link
Collaborator Author

Ping

It looks like with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150, gcc now has the behavior proposed in this patch.

I don't get the point of the example. If the use of cpuid instruction may trap, why don't add volatile? Without it, the example shown in GCC document is already broken.

This is just making the point that nobody actually expects this, as shown by the fact that even compiler developers don't expect this. (Obviously we control the implementation of our cpuid.h.)

@phoebewang
Copy link
Contributor

It looks like with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150, gcc now has the behavior proposed in this patch.

Alright, no objections from me then.

@efriedma-quic efriedma-quic merged commit 1b39328 into llvm:main Feb 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants