AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

arsenm · 2025-02-17T13:02:41Z

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.

These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708.

arsenm · 2025-02-17T13:03:00Z

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-02-17T13:03:15Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
}

void bar() {
private int y;
foo(&y); // violation, wrong address space
}

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.

Patch is 33.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127487.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+2-5)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-5)
(modified) llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll (+646)
(modified) llvm/test/CodeGen/AMDGPU/invalid-addrspacecast.ll (+37-7)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 908d323c7fec9..649deee346e90 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
     return true;
   }
 
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-      MF.getFunction(), "invalid addrspacecast", B.getDebugLoc());
-
-  LLVMContext &Ctx = MF.getFunction().getContext();
-  Ctx.diagnose(InvalidAddrSpaceCast);
+  // Invalid casts are poison.
+  // TODO: Should return poison
   B.buildUndef(Dst);
   MI.eraseFromParent();
   return true;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 62ee196cf8e17..e09b310d107ac 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7341,11 +7341,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
 
   // global <-> flat are no-ops and never emitted.
 
-  const MachineFunction &MF = DAG.getMachineFunction();
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-      MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc());
-  DAG.getContext()->diagnose(InvalidAddrSpaceCast);
-
+  // Invalid casts are poison.
+  // TODO: Should return poison
   return DAG.getUNDEF(Op->getValueType(0));
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index 5f56568ef88e4..afcd9b5fcdc7e 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -444,6 +444,652 @@ define float @no_unsafe(ptr %addr, float %val) {
   ret float %res
 }
 
+@global = hidden addrspace(1) global i64 0, align 8
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX908:       ; %bb.0:
+; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:    s_mov_b32 s6, 32
+; GFX908-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], -1
+; GFX908-NEXT:    s_mov_b32 s6, 1
+; GFX908-NEXT:    v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX908-NEXT:  .LBB4_1: ; %Flow
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b32 s4, 1
+; GFX908-NEXT:    v_cmp_ne_u32_e64 s[4:5], v2, s4
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX908-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX908-NEXT:    buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v4, v3
+; GFX908-NEXT:    v_add_co_u32_e64 v0, s[4:5], v3, v0
+; GFX908-NEXT:    v_addc_co_u32_e64 v1, s[4:5], v4, v1, s[4:5]
+; GFX908-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_branch .LBB4_4
+; GFX908-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX908-NEXT:    s_getpc_b64 s[4:5]
+; GFX908-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v2, s4
+; GFX908-NEXT:    v_mov_b32_e32 v3, s5
+; GFX908-NEXT:    flat_atomic_add_x2 v[3:4], v[2:3], v[0:1] glc
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB4_1
+; GFX908-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX908-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX908-NEXT:    s_mov_b32 s4, 32
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT:    v_mov_b32_e32 v0, v3
+; GFX908-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX90A:       ; %bb.0:
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT:    s_mov_b32 s6, 32
+; GFX90A-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], -1
+; GFX90A-NEXT:    s_mov_b32 s6, 1
+; GFX90A-NEXT:    v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX90A-NEXT:  .LBB4_1: ; %Flow
+; GFX90A-NEXT:    v_cndmask_b32_e64 v4, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b32 s4, 1
+; GFX90A-NEXT:    v_cmp_ne_u32_e64 s[4:5], v4, s4
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX90A-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX90A-NEXT:    buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v3, v2
+; GFX90A-NEXT:    v_add_co_u32_e64 v0, s[4:5], v2, v0
+; GFX90A-NEXT:    v_addc_co_u32_e64 v1, s[4:5], v3, v1, s[4:5]
+; GFX90A-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_branch .LBB4_4
+; GFX90A-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX90A-NEXT:    s_getpc_b64 s[4:5]
+; GFX90A-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT:    flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] glc
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB4_1
+; GFX90A-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX90A-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX90A-NEXT:    s_mov_b32 s4, 32
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_lshrrev_b64 v[4:5], s4, v[2:3]
+; GFX90A-NEXT:    v_mov_b32_e32 v0, v2
+; GFX90A-NEXT:    v_mov_b32_e32 v1, v4
+; GFX90A-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX942-NEXT:    s_mov_b32 s2, 32
+; GFX942-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX942-NEXT:    s_getpc_b64 s[2:3]
+; GFX942-NEXT:    s_add_u32 s2, s2, global@rel32@lo+4
+; GFX942-NEXT:    s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX942-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX942-NEXT:    s_cselect_b64 s[0:1], -1, 0
+; GFX942-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[0:1]
+; GFX942-NEXT:    s_mov_b64 s[0:1], -1
+; GFX942-NEXT:    s_mov_b32 s2, 1
+; GFX942-NEXT:    v_cmp_ne_u32_e64 s[2:3], v2, s2
+; GFX942-NEXT:    s_and_b64 vcc, exec, s[2:3]
+; GFX942-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX942-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX942-NEXT:  .LBB4_1: ; %Flow
+; GFX942-NEXT:    v_cndmask_b32_e64 v4, 0, 1, s[0:1]
+; GFX942-NEXT:    s_mov_b32 s0, 1
+; GFX942-NEXT:    v_cmp_ne_u32_e64 s[0:1], v4, s0
+; GFX942-NEXT:    s_and_b64 vcc, exec, s[0:1]
+; GFX942-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX942-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX942-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX942-NEXT:    s_nop 1
+; GFX942-NEXT:    scratch_load_dwordx2 v[2:3], off, s0
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    v_lshl_add_u64 v[0:1], v[2:3], 0, v[0:1]
+; GFX942-NEXT:    scratch_store_dwordx2 off, v[0:1], s0
+; GFX942-NEXT:    s_branch .LBB4_4
+; GFX942-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX942-NEXT:    s_getpc_b64 s[0:1]
+; GFX942-NEXT:    s_add_u32 s0, s0, global@rel32@lo+4
+; GFX942-NEXT:    s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
+; GFX942-NEXT:    flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] sc0
+; GFX942-NEXT:    s_mov_b64 s[0:1], 0
+; GFX942-NEXT:    s_branch .LBB4_1
+; GFX942-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX942-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX942-NEXT:    s_mov_b32 s0, 32
+; GFX942-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_lshrrev_b64 v[4:5], s0, v[2:3]
+; GFX942-NEXT:    v_mov_b32_e32 v0, v2
+; GFX942-NEXT:    v_mov_b32_e32 v1, v4
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX1100-NEXT:    s_mov_b32 s2, 32
+; GFX1100-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1100-NEXT:    s_getpc_b64 s[2:3]
+; GFX1100-NEXT:    s_add_u32 s2, s2, global@rel32@lo+4
+; GFX1100-NEXT:    s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX1100-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX1100-NEXT:    s_cselect_b32 s0, -1, 0
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT:    s_mov_b32 s0, -1
+; GFX1100-NEXT:    s_mov_b32 s1, 1
+; GFX1100-NEXT:    v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1100-NEXT:    s_and_b32 vcc_lo, exec_lo, s1
+; GFX1100-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX1100-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX1100-NEXT:  .LBB4_1: ; %Flow
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT:    s_mov_b32 s0, 1
+; GFX1100-NEXT:    v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1100-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; GFX1100-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX1100-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1100-NEXT:    scratch_load_b64 v[3:4], off, s0
+; GFX1100-NEXT:    s_waitcnt vmcnt(0)
+; GFX1100-NEXT:    v_add_co_u32 v0, s0, v3, v0
+; GFX1100-NEXT:    v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1100-NEXT:    scratch_store_b64 off, v[0:1], s0
+; GFX1100-NEXT:    s_branch .LBB4_4
+; GFX1100-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX1100-NEXT:    s_getpc_b64 s[0:1]
+; GFX1100-NEXT:    s_add_u32 s0, s0, global@rel32@lo+4
+; GFX1100-NEXT:    s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX1100-NEXT:    v_mov_b32_e32 v3, s1
+; GFX1100-NEXT:    v_mov_b32_e32 v2, s0
+; GFX1100-NEXT:    flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] glc
+; GFX1100-NEXT:    s_mov_b32 s0, 0
+; GFX1100-NEXT:    s_branch .LBB4_1
+; GFX1100-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX1100-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX1100-NEXT:    s_mov_b32 s0, 32
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1100-NEXT:    v_mov_b32_e32 v0, v3
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1200-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1200:       ; %bb.0:
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    s_wait_expcnt 0x0
+; GFX1200-NEXT:    s_wait_samplecnt 0x0
+; GFX1200-NEXT:    s_wait_bvhcnt 0x0
+; GFX1200-NEXT:    s_wait_kmcnt 0x0
+; GFX1200-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX1200-NEXT:    s_mov_b32 s2, 32
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1200-NEXT:    s_getpc_b64 s[2:3]
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_sext_i32_i16 s3, s3
+; GFX1200-NEXT:    s_add_co_u32 s2, s2, global@rel32@lo+12
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_add_co_ci_u32 s3, s3, global@rel32@hi+24
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX1200-NEXT:    s_cselect_b32 s0, -1, 0
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT:    s_mov_b32 s0, -1
+; GFX1200-NEXT:    s_mov_b32 s1, 1
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1200-NEXT:    s_and_b32 vcc_lo, exec_lo, s1
+; GFX1200-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX1200-NEXT:  .LBB4_1: ; %Flow
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT:    s_mov_b32 s0, 1
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1200-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX1200-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    scratch_load_b64 v[3:4], off, s0
+; GFX1200-NEXT:    s_wait_loadcnt 0x0
+; GFX1200-NEXT:    v_add_co_u32 v0, s0, v3, v0
+; GFX1200-NEXT:    s_wait_alu 0xf1ff
+; GFX1200-NEXT:    v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1200-NEXT:    scratch_store_b64 off, v[0:1], s0
+; GFX1200-NEXT:    s_branch .LBB4_4
+; GFX1200-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX1200-NEXT:    s_getpc_b64 s[0:1]
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_sext_i32_i16 s1, s1
+; GFX1200-NEXT:    s_add_co_u32 s0, s0, global@rel32@lo+12
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_add_co_ci_u32 s1, s1, global@rel32@hi+24
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_mov_b32_e32 v3, s1
+; GFX1200-NEXT:    v_mov_b32_e32 v2, s0
+; GFX1200-NEXT:    flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1200-NEXT:    s_mov_b32 s0, 0
+; GFX1200-NEXT:    s_branch .LBB4_1
+; GFX1200-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX1200-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX1200-NEXT:    s_mov_b32 s0, 32
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    s_wait_alu 0xf1fe
+; GFX1200-NEXT:    v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1200-NEXT:    v_mov_b32_e32 v0, v3
+; GFX1200-NEXT:    s_setpc_b64 s[30:31]
+  %rmw = atomicrmw add ptr addrspacecast (ptr addrspace(1) @global to ptr), i64 %val syncscope("agent") monotonic, align 8
+  ret i64 %rmw
+}
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define double @optnone_atomicrmw_fadd_f64_expand(double %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX908:       ; %bb.0:
+; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:    s_mov_b32 s6, 32
+; GFX908-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], -1
+; GFX908-NEXT:    s_mov_b32 s6, 1
+; GFX908-NEXT:    v_readfirstlane_b32 s7, v2
+; GFX908-NEXT:    s_cmp_lg_u32 s7, s6
+; GFX908-NEXT:    s_cselect_b64 s[6:7], -1, 0
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:    s_cbranch_vccnz .LBB5_2
+; GFX908-NEXT:    s_branch .LBB5_3
+; GFX908-NEXT:  .LBB5_1: ; %atomicrmw.private
+; GFX908-NEXT:    buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v4, v3
+; GFX908-NEXT:    v_add_f64 v[0:1], v[3:4], v[0:1]
+; GFX908-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_branch .LBB5_6
+; GFX908-NEXT:  .LBB5_2: ; %atomicrmw.global
+; GFX908-NEXT:    s_getpc_b64 s[4:5]
+; GFX908-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v2, s4
+; GFX908-NEXT:    v_mov_b32_e32 v3, s5
+; GFX908-NEXT:    flat_load_dwordx2 v[3:4], v[2:3]
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB5_4
+; GFX908-NEXT:  .LBB5_3: ; %Flow
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_vccnz .LBB5_1
+; GFX908-NEXT:    s_branch .LBB5_6
+; GFX908-NEXT:  .LBB5_4: ; %atomicrmw.start
+; GFX908-NEXT:    ; =>This Inner Loop Header: Depth=1
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v6, v4
+; GFX908-NEXT:    v_mov_b32_e32 v5, v3
+; GFX908-NEXT:    v_add_f64 v[3:4], v[5:6], v[0:1]
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v8, s7
+; GFX908-NEXT:    v_mov_b32_e32 v7, s6
+; GFX908-NEXT:    flat_atomic_cmpswap_x2 v[3:4], v[7:8], v[3:6] glc
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_cmp_eq_u64_e64 s[6:7], v[3:4], v[5:6]
+; GFX908-NEXT:    s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX908-NEXT:    s_andn2_b64 exec, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_execnz .LBB5_4
+; GFX908-NEXT:  ; %bb.5: ; %atomicrmw.end1
+; GFX908-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB5_3
+; GFX908-NEXT:  .LBB5_6: ; %atomicrmw.phi
+; GFX908-NEXT:  ; %bb.7: ; %atomicrmw.end
+; GFX908-NEXT:    s_mov_b32 s4, 32
+; GFX908-NEXT:    v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT:    v_mov_b32_e32 v0, v3
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX90A:       ; %bb.0:
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT:    s_mov_b32 s6, 32
+; GFX90A-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], -1
+; GFX90A-NEXT:    s_mov_b32 s6, 1
+; GFX90A-NEXT:    v_readfirstlane_b32 s7, v2
+; GFX90A-NEXT:    s_cmp_lg_u32 s7, s6
+; GFX90A-NEXT:    s_cselect_b64 s[6:7], -1, 0
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB5_2
+; GFX90A-NEXT:    s_branch .LBB5_3
+; GFX90A-NEXT:  .LBB5_1: ; %atomicrmw.private
+; GFX90A-NEXT:    buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v3, v2
+; GFX90A-NEXT:    v_add_f64 v[0:1], v[2:3], v[0:1]
+; GFX90A-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_branch .LBB5_6
+; GFX90A-NEXT:  .LBB5_2: ; %atomicrmw.global
+; GFX90A-NEXT:    s_getpc_b64 s[4:5]
+; GFX90A-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT:    flat_load_dwordx2 v[2:3], v[2:3]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB5_4
+; GFX90A-NEXT:  .LBB5_3: ; %Flow
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB5_1
+; GFX90A-NEXT:    s_branch .LBB5_6
+; GFX90A-NEXT:  .LBB5_4: ; %atomicrmw.start
+; GFX90A-NEXT:    ; =>This Inner Loop Header: Depth=1
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
+; GFX90A-NEXT:    v_add_f64 v[2:3], v[4:5], v[0:1]
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[6:7], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NEXT:    flat_atomic_cmpswap_x2 v[2:3], v[6:7], v[2:5] glc
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_cmp_eq_u64_e64 s[6:7], v[2:3], v[4:5]
+; GFX90A-NEXT:    s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_execnz .LBB5_4
+; GFX90A-NEXT:  ; %bb.5: ; %atomicrmw.end1
+; GFX90A-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB5_3
+; GFX90A-NEXT:  .LBB5_6: ; %atomicrmw.phi
+;...
[truncated]

arsenm · 2025-02-17T14:06:04Z

/cherry-pick 18ea6c9

llvmbot · 2025-02-17T14:11:50Z

/pull-request #127496

These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708. (cherry picked from commit 18ea6c9)

…127751) These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708. (cherry picked from commit 18ea6c9)

arsenm added the backend:AMDGPU label Feb 17, 2025 — with Graphite App

arsenm requested review from Artem-B, jdoerfert, jhuber6, rampitec and shiltian February 17, 2025 13:03

arsenm marked this pull request as ready for review February 17, 2025 13:03

arsenm mentioned this pull request Feb 17, 2025

[NVPTX] Lower invalid ISD::ADDRSPACECAST #125607

Merged

arsenm added this to the LLVM 20.X Release milestone Feb 17, 2025

jhuber6 approved these changes Feb 17, 2025

View reviewed changes

arsenm merged commit 18ea6c9 into main Feb 17, 2025
10 of 11 checks passed

arsenm deleted the users/arsenm/amdgpu/no-error-on-invalid-addrspacecast branch February 17, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

Uh oh!

arsenm commented Feb 17, 2025 •

edited

Loading

Uh oh!

arsenm commented Feb 17, 2025

Uh oh!

llvmbot commented Feb 17, 2025

Uh oh!

Uh oh!

arsenm commented Feb 17, 2025

Uh oh!

llvmbot commented Feb 17, 2025

Uh oh!

Uh oh!

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

Uh oh!

Conversation

arsenm commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Feb 17, 2025

Uh oh!

llvmbot commented Feb 17, 2025

Uh oh!

Uh oh!

arsenm commented Feb 17, 2025

Uh oh!

llvmbot commented Feb 17, 2025

Uh oh!

Uh oh!

arsenm commented Feb 17, 2025 •

edited

Loading