Skip to content

AMDGPU: Stop emitting an error on illegal addrspacecasts #127487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Feb 17, 2025

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
   *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.
Copy link
Contributor Author

arsenm commented Feb 17, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
}

void bar() {
private int y;
foo(&y); // violation, wrong address space
}

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.


Patch is 33.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127487.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+2-5)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-5)
  • (modified) llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll (+646)
  • (modified) llvm/test/CodeGen/AMDGPU/invalid-addrspacecast.ll (+37-7)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 908d323c7fec9..649deee346e90 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
     return true;
   }
 
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-      MF.getFunction(), "invalid addrspacecast", B.getDebugLoc());
-
-  LLVMContext &Ctx = MF.getFunction().getContext();
-  Ctx.diagnose(InvalidAddrSpaceCast);
+  // Invalid casts are poison.
+  // TODO: Should return poison
   B.buildUndef(Dst);
   MI.eraseFromParent();
   return true;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 62ee196cf8e17..e09b310d107ac 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7341,11 +7341,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
 
   // global <-> flat are no-ops and never emitted.
 
-  const MachineFunction &MF = DAG.getMachineFunction();
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-      MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc());
-  DAG.getContext()->diagnose(InvalidAddrSpaceCast);
-
+  // Invalid casts are poison.
+  // TODO: Should return poison
   return DAG.getUNDEF(Op->getValueType(0));
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index 5f56568ef88e4..afcd9b5fcdc7e 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -444,6 +444,652 @@ define float @no_unsafe(ptr %addr, float %val) {
   ret float %res
 }
 
+@global = hidden addrspace(1) global i64 0, align 8
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX908:       ; %bb.0:
+; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:    s_mov_b32 s6, 32
+; GFX908-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], -1
+; GFX908-NEXT:    s_mov_b32 s6, 1
+; GFX908-NEXT:    v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX908-NEXT:  .LBB4_1: ; %Flow
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b32 s4, 1
+; GFX908-NEXT:    v_cmp_ne_u32_e64 s[4:5], v2, s4
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX908-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX908-NEXT:    buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v4, v3
+; GFX908-NEXT:    v_add_co_u32_e64 v0, s[4:5], v3, v0
+; GFX908-NEXT:    v_addc_co_u32_e64 v1, s[4:5], v4, v1, s[4:5]
+; GFX908-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_branch .LBB4_4
+; GFX908-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX908-NEXT:    s_getpc_b64 s[4:5]
+; GFX908-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v2, s4
+; GFX908-NEXT:    v_mov_b32_e32 v3, s5
+; GFX908-NEXT:    flat_atomic_add_x2 v[3:4], v[2:3], v[0:1] glc
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB4_1
+; GFX908-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX908-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX908-NEXT:    s_mov_b32 s4, 32
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT:    v_mov_b32_e32 v0, v3
+; GFX908-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX90A:       ; %bb.0:
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT:    s_mov_b32 s6, 32
+; GFX90A-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], -1
+; GFX90A-NEXT:    s_mov_b32 s6, 1
+; GFX90A-NEXT:    v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX90A-NEXT:  .LBB4_1: ; %Flow
+; GFX90A-NEXT:    v_cndmask_b32_e64 v4, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b32 s4, 1
+; GFX90A-NEXT:    v_cmp_ne_u32_e64 s[4:5], v4, s4
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX90A-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX90A-NEXT:    buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v3, v2
+; GFX90A-NEXT:    v_add_co_u32_e64 v0, s[4:5], v2, v0
+; GFX90A-NEXT:    v_addc_co_u32_e64 v1, s[4:5], v3, v1, s[4:5]
+; GFX90A-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_branch .LBB4_4
+; GFX90A-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX90A-NEXT:    s_getpc_b64 s[4:5]
+; GFX90A-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT:    flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] glc
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB4_1
+; GFX90A-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX90A-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX90A-NEXT:    s_mov_b32 s4, 32
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_lshrrev_b64 v[4:5], s4, v[2:3]
+; GFX90A-NEXT:    v_mov_b32_e32 v0, v2
+; GFX90A-NEXT:    v_mov_b32_e32 v1, v4
+; GFX90A-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX942-NEXT:    s_mov_b32 s2, 32
+; GFX942-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX942-NEXT:    s_getpc_b64 s[2:3]
+; GFX942-NEXT:    s_add_u32 s2, s2, global@rel32@lo+4
+; GFX942-NEXT:    s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX942-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX942-NEXT:    s_cselect_b64 s[0:1], -1, 0
+; GFX942-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[0:1]
+; GFX942-NEXT:    s_mov_b64 s[0:1], -1
+; GFX942-NEXT:    s_mov_b32 s2, 1
+; GFX942-NEXT:    v_cmp_ne_u32_e64 s[2:3], v2, s2
+; GFX942-NEXT:    s_and_b64 vcc, exec, s[2:3]
+; GFX942-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX942-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX942-NEXT:  .LBB4_1: ; %Flow
+; GFX942-NEXT:    v_cndmask_b32_e64 v4, 0, 1, s[0:1]
+; GFX942-NEXT:    s_mov_b32 s0, 1
+; GFX942-NEXT:    v_cmp_ne_u32_e64 s[0:1], v4, s0
+; GFX942-NEXT:    s_and_b64 vcc, exec, s[0:1]
+; GFX942-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX942-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX942-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX942-NEXT:    s_nop 1
+; GFX942-NEXT:    scratch_load_dwordx2 v[2:3], off, s0
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    v_lshl_add_u64 v[0:1], v[2:3], 0, v[0:1]
+; GFX942-NEXT:    scratch_store_dwordx2 off, v[0:1], s0
+; GFX942-NEXT:    s_branch .LBB4_4
+; GFX942-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX942-NEXT:    s_getpc_b64 s[0:1]
+; GFX942-NEXT:    s_add_u32 s0, s0, global@rel32@lo+4
+; GFX942-NEXT:    s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
+; GFX942-NEXT:    flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] sc0
+; GFX942-NEXT:    s_mov_b64 s[0:1], 0
+; GFX942-NEXT:    s_branch .LBB4_1
+; GFX942-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX942-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX942-NEXT:    s_mov_b32 s0, 32
+; GFX942-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_lshrrev_b64 v[4:5], s0, v[2:3]
+; GFX942-NEXT:    v_mov_b32_e32 v0, v2
+; GFX942-NEXT:    v_mov_b32_e32 v1, v4
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX1100-NEXT:    s_mov_b32 s2, 32
+; GFX1100-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1100-NEXT:    s_getpc_b64 s[2:3]
+; GFX1100-NEXT:    s_add_u32 s2, s2, global@rel32@lo+4
+; GFX1100-NEXT:    s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX1100-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX1100-NEXT:    s_cselect_b32 s0, -1, 0
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT:    s_mov_b32 s0, -1
+; GFX1100-NEXT:    s_mov_b32 s1, 1
+; GFX1100-NEXT:    v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1100-NEXT:    s_and_b32 vcc_lo, exec_lo, s1
+; GFX1100-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX1100-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX1100-NEXT:  .LBB4_1: ; %Flow
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT:    s_mov_b32 s0, 1
+; GFX1100-NEXT:    v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1100-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; GFX1100-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX1100-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1100-NEXT:    scratch_load_b64 v[3:4], off, s0
+; GFX1100-NEXT:    s_waitcnt vmcnt(0)
+; GFX1100-NEXT:    v_add_co_u32 v0, s0, v3, v0
+; GFX1100-NEXT:    v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1100-NEXT:    scratch_store_b64 off, v[0:1], s0
+; GFX1100-NEXT:    s_branch .LBB4_4
+; GFX1100-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX1100-NEXT:    s_getpc_b64 s[0:1]
+; GFX1100-NEXT:    s_add_u32 s0, s0, global@rel32@lo+4
+; GFX1100-NEXT:    s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX1100-NEXT:    v_mov_b32_e32 v3, s1
+; GFX1100-NEXT:    v_mov_b32_e32 v2, s0
+; GFX1100-NEXT:    flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] glc
+; GFX1100-NEXT:    s_mov_b32 s0, 0
+; GFX1100-NEXT:    s_branch .LBB4_1
+; GFX1100-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX1100-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX1100-NEXT:    s_mov_b32 s0, 32
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1100-NEXT:    v_mov_b32_e32 v0, v3
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1200-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1200:       ; %bb.0:
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    s_wait_expcnt 0x0
+; GFX1200-NEXT:    s_wait_samplecnt 0x0
+; GFX1200-NEXT:    s_wait_bvhcnt 0x0
+; GFX1200-NEXT:    s_wait_kmcnt 0x0
+; GFX1200-NEXT:    s_mov_b64 s[0:1], src_private_base
+; GFX1200-NEXT:    s_mov_b32 s2, 32
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1200-NEXT:    s_getpc_b64 s[2:3]
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_sext_i32_i16 s3, s3
+; GFX1200-NEXT:    s_add_co_u32 s2, s2, global@rel32@lo+12
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_add_co_ci_u32 s3, s3, global@rel32@hi+24
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cmp_eq_u32 s3, s0
+; GFX1200-NEXT:    s_cselect_b32 s0, -1, 0
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT:    s_mov_b32 s0, -1
+; GFX1200-NEXT:    s_mov_b32 s1, 1
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1200-NEXT:    s_and_b32 vcc_lo, exec_lo, s1
+; GFX1200-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cbranch_vccnz .LBB4_3
+; GFX1200-NEXT:  .LBB4_1: ; %Flow
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT:    s_mov_b32 s0, 1
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1200-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_cbranch_vccnz .LBB4_4
+; GFX1200-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    scratch_load_b64 v[3:4], off, s0
+; GFX1200-NEXT:    s_wait_loadcnt 0x0
+; GFX1200-NEXT:    v_add_co_u32 v0, s0, v3, v0
+; GFX1200-NEXT:    s_wait_alu 0xf1ff
+; GFX1200-NEXT:    v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1200-NEXT:    scratch_store_b64 off, v[0:1], s0
+; GFX1200-NEXT:    s_branch .LBB4_4
+; GFX1200-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX1200-NEXT:    s_getpc_b64 s[0:1]
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_sext_i32_i16 s1, s1
+; GFX1200-NEXT:    s_add_co_u32 s0, s0, global@rel32@lo+12
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    s_add_co_ci_u32 s1, s1, global@rel32@hi+24
+; GFX1200-NEXT:    s_wait_alu 0xfffe
+; GFX1200-NEXT:    v_mov_b32_e32 v3, s1
+; GFX1200-NEXT:    v_mov_b32_e32 v2, s0
+; GFX1200-NEXT:    flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1200-NEXT:    s_mov_b32 s0, 0
+; GFX1200-NEXT:    s_branch .LBB4_1
+; GFX1200-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX1200-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX1200-NEXT:    s_mov_b32 s0, 32
+; GFX1200-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT:    s_wait_alu 0xf1fe
+; GFX1200-NEXT:    v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1200-NEXT:    v_mov_b32_e32 v0, v3
+; GFX1200-NEXT:    s_setpc_b64 s[30:31]
+  %rmw = atomicrmw add ptr addrspacecast (ptr addrspace(1) @global to ptr), i64 %val syncscope("agent") monotonic, align 8
+  ret i64 %rmw
+}
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define double @optnone_atomicrmw_fadd_f64_expand(double %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX908:       ; %bb.0:
+; GFX908-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:    s_mov_b32 s6, 32
+; GFX908-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], -1
+; GFX908-NEXT:    s_mov_b32 s6, 1
+; GFX908-NEXT:    v_readfirstlane_b32 s7, v2
+; GFX908-NEXT:    s_cmp_lg_u32 s7, s6
+; GFX908-NEXT:    s_cselect_b64 s[6:7], -1, 0
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:    ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:    s_cbranch_vccnz .LBB5_2
+; GFX908-NEXT:    s_branch .LBB5_3
+; GFX908-NEXT:  .LBB5_1: ; %atomicrmw.private
+; GFX908-NEXT:    buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v4, v3
+; GFX908-NEXT:    v_add_f64 v[0:1], v[3:4], v[0:1]
+; GFX908-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT:    s_branch .LBB5_6
+; GFX908-NEXT:  .LBB5_2: ; %atomicrmw.global
+; GFX908-NEXT:    s_getpc_b64 s[4:5]
+; GFX908-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v2, s4
+; GFX908-NEXT:    v_mov_b32_e32 v3, s5
+; GFX908-NEXT:    flat_load_dwordx2 v[3:4], v[2:3]
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB5_4
+; GFX908-NEXT:  .LBB5_3: ; %Flow
+; GFX908-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_vccnz .LBB5_1
+; GFX908-NEXT:    s_branch .LBB5_6
+; GFX908-NEXT:  .LBB5_4: ; %atomicrmw.start
+; GFX908-NEXT:    ; =>This Inner Loop Header: Depth=1
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_mov_b32_e32 v6, v4
+; GFX908-NEXT:    v_mov_b32_e32 v5, v3
+; GFX908-NEXT:    v_add_f64 v[3:4], v[5:6], v[0:1]
+; GFX908-NEXT:    s_getpc_b64 s[6:7]
+; GFX908-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:    v_mov_b32_e32 v8, s7
+; GFX908-NEXT:    v_mov_b32_e32 v7, s6
+; GFX908-NEXT:    flat_atomic_cmpswap_x2 v[3:4], v[7:8], v[3:6] glc
+; GFX908-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:    v_cmp_eq_u64_e64 s[6:7], v[3:4], v[5:6]
+; GFX908-NEXT:    s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX908-NEXT:    s_andn2_b64 exec, exec, s[4:5]
+; GFX908-NEXT:    s_cbranch_execnz .LBB5_4
+; GFX908-NEXT:  ; %bb.5: ; %atomicrmw.end1
+; GFX908-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX908-NEXT:    s_mov_b64 s[4:5], 0
+; GFX908-NEXT:    s_branch .LBB5_3
+; GFX908-NEXT:  .LBB5_6: ; %atomicrmw.phi
+; GFX908-NEXT:  ; %bb.7: ; %atomicrmw.end
+; GFX908-NEXT:    s_mov_b32 s4, 32
+; GFX908-NEXT:    v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT:    v_mov_b32_e32 v0, v3
+; GFX908-NEXT:    s_waitcnt vmcnt(0)
+; GFX908-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX90A:       ; %bb.0:
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT:    s_mov_b32 s6, 32
+; GFX90A-NEXT:    s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], -1
+; GFX90A-NEXT:    s_mov_b32 s6, 1
+; GFX90A-NEXT:    v_readfirstlane_b32 s7, v2
+; GFX90A-NEXT:    s_cmp_lg_u32 s7, s6
+; GFX90A-NEXT:    s_cselect_b64 s[6:7], -1, 0
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT:    ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB5_2
+; GFX90A-NEXT:    s_branch .LBB5_3
+; GFX90A-NEXT:  .LBB5_1: ; %atomicrmw.private
+; GFX90A-NEXT:    buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v3, v2
+; GFX90A-NEXT:    v_add_f64 v[0:1], v[2:3], v[0:1]
+; GFX90A-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT:    s_branch .LBB5_6
+; GFX90A-NEXT:  .LBB5_2: ; %atomicrmw.global
+; GFX90A-NEXT:    s_getpc_b64 s[4:5]
+; GFX90A-NEXT:    s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT:    flat_load_dwordx2 v[2:3], v[2:3]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB5_4
+; GFX90A-NEXT:  .LBB5_3: ; %Flow
+; GFX90A-NEXT:    s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_vccnz .LBB5_1
+; GFX90A-NEXT:    s_branch .LBB5_6
+; GFX90A-NEXT:  .LBB5_4: ; %atomicrmw.start
+; GFX90A-NEXT:    ; =>This Inner Loop Header: Depth=1
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
+; GFX90A-NEXT:    v_add_f64 v[2:3], v[4:5], v[0:1]
+; GFX90A-NEXT:    s_getpc_b64 s[6:7]
+; GFX90A-NEXT:    s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT:    s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT:    v_pk_mov_b32 v[6:7], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NEXT:    flat_atomic_cmpswap_x2 v[2:3], v[6:7], v[2:5] glc
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_cmp_eq_u64_e64 s[6:7], v[2:3], v[4:5]
+; GFX90A-NEXT:    s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX90A-NEXT:    s_andn2_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:    s_cbranch_execnz .LBB5_4
+; GFX90A-NEXT:  ; %bb.5: ; %atomicrmw.end1
+; GFX90A-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:    s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:    s_branch .LBB5_3
+; GFX90A-NEXT:  .LBB5_6: ; %atomicrmw.phi
+;...
[truncated]

@arsenm arsenm marked this pull request as ready for review February 17, 2025 13:03
@arsenm arsenm added this to the LLVM 20.X Release milestone Feb 17, 2025
@arsenm arsenm merged commit 18ea6c9 into main Feb 17, 2025
10 of 11 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/no-error-on-invalid-addrspacecast branch February 17, 2025 14:03
@arsenm
Copy link
Contributor Author

arsenm commented Feb 17, 2025

/cherry-pick 18ea6c9

@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

/pull-request #127496

arsenm added a commit to arsenm/llvm-project that referenced this pull request Feb 19, 2025
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.

(cherry picked from commit 18ea6c9)
tstellar pushed a commit that referenced this pull request Feb 21, 2025
…127751)

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.

(cherry picked from commit 18ea6c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants