-
Notifications
You must be signed in to change notification settings - Fork 14.3k
release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487) #127496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708. (cherry picked from commit 18ea6c9)
@jhuber6 What do you think about merging this PR to the release branch? |
@llvm/pr-subscribers-backend-amdgpu Author: None (llvmbot) ChangesBackport 18ea6c9 Requested by: @arsenm Patch is 33.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127496.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index e9e47eaadd557..e84f0f5fa615a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
return true;
}
- DiagnosticInfoUnsupported InvalidAddrSpaceCast(
- MF.getFunction(), "invalid addrspacecast", B.getDebugLoc());
-
- LLVMContext &Ctx = MF.getFunction().getContext();
- Ctx.diagnose(InvalidAddrSpaceCast);
+ // Invalid casts are poison.
+ // TODO: Should return poison
B.buildUndef(Dst);
MI.eraseFromParent();
return true;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e3..e09df53995d61 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7340,11 +7340,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
// global <-> flat are no-ops and never emitted.
- const MachineFunction &MF = DAG.getMachineFunction();
- DiagnosticInfoUnsupported InvalidAddrSpaceCast(
- MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc());
- DAG.getContext()->diagnose(InvalidAddrSpaceCast);
-
+ // Invalid casts are poison.
+ // TODO: Should return poison
return DAG.getUNDEF(Op->getValueType(0));
}
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index f5c9b1a79b476..5c62730fdfe8e 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -444,6 +444,652 @@ define float @no_unsafe(ptr %addr, float %val) {
ret float %res
}
+@global = hidden addrspace(1) global i64 0, align 8
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX908: ; %bb.0:
+; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT: s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT: s_mov_b32 s6, 32
+; GFX908-NEXT: s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT: s_getpc_b64 s[6:7]
+; GFX908-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT: s_cmp_eq_u32 s7, s4
+; GFX908-NEXT: s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT: s_mov_b64 s[4:5], -1
+; GFX908-NEXT: s_mov_b32 s6, 1
+; GFX908-NEXT: v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX908-NEXT: s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT: ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT: s_cbranch_vccnz .LBB4_3
+; GFX908-NEXT: .LBB4_1: ; %Flow
+; GFX908-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT: s_mov_b32 s4, 1
+; GFX908-NEXT: v_cmp_ne_u32_e64 s[4:5], v2, s4
+; GFX908-NEXT: s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT: s_cbranch_vccnz .LBB4_4
+; GFX908-NEXT: ; %bb.2: ; %atomicrmw.private
+; GFX908-NEXT: s_waitcnt lgkmcnt(0)
+; GFX908-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT: s_waitcnt vmcnt(0)
+; GFX908-NEXT: v_mov_b32_e32 v4, v3
+; GFX908-NEXT: v_add_co_u32_e64 v0, s[4:5], v3, v0
+; GFX908-NEXT: v_addc_co_u32_e64 v1, s[4:5], v4, v1, s[4:5]
+; GFX908-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT: s_branch .LBB4_4
+; GFX908-NEXT: .LBB4_3: ; %atomicrmw.global
+; GFX908-NEXT: s_getpc_b64 s[4:5]
+; GFX908-NEXT: s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT: s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT: v_mov_b32_e32 v2, s4
+; GFX908-NEXT: v_mov_b32_e32 v3, s5
+; GFX908-NEXT: flat_atomic_add_x2 v[3:4], v[2:3], v[0:1] glc
+; GFX908-NEXT: s_mov_b64 s[4:5], 0
+; GFX908-NEXT: s_branch .LBB4_1
+; GFX908-NEXT: .LBB4_4: ; %atomicrmw.phi
+; GFX908-NEXT: ; %bb.5: ; %atomicrmw.end
+; GFX908-NEXT: s_mov_b32 s4, 32
+; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT: v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT: v_mov_b32_e32 v0, v3
+; GFX908-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX90A: ; %bb.0:
+; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT: s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT: s_mov_b32 s6, 32
+; GFX90A-NEXT: s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT: s_getpc_b64 s[6:7]
+; GFX90A-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT: s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT: s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT: s_mov_b64 s[4:5], -1
+; GFX90A-NEXT: s_mov_b32 s6, 1
+; GFX90A-NEXT: v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX90A-NEXT: s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT: ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT: s_cbranch_vccnz .LBB4_3
+; GFX90A-NEXT: .LBB4_1: ; %Flow
+; GFX90A-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[4:5]
+; GFX90A-NEXT: s_mov_b32 s4, 1
+; GFX90A-NEXT: v_cmp_ne_u32_e64 s[4:5], v4, s4
+; GFX90A-NEXT: s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT: s_cbranch_vccnz .LBB4_4
+; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.private
+; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
+; GFX90A-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT: s_waitcnt vmcnt(0)
+; GFX90A-NEXT: v_mov_b32_e32 v3, v2
+; GFX90A-NEXT: v_add_co_u32_e64 v0, s[4:5], v2, v0
+; GFX90A-NEXT: v_addc_co_u32_e64 v1, s[4:5], v3, v1, s[4:5]
+; GFX90A-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT: s_branch .LBB4_4
+; GFX90A-NEXT: .LBB4_3: ; %atomicrmw.global
+; GFX90A-NEXT: s_getpc_b64 s[4:5]
+; GFX90A-NEXT: s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT: s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT: flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] glc
+; GFX90A-NEXT: s_mov_b64 s[4:5], 0
+; GFX90A-NEXT: s_branch .LBB4_1
+; GFX90A-NEXT: .LBB4_4: ; %atomicrmw.phi
+; GFX90A-NEXT: ; %bb.5: ; %atomicrmw.end
+; GFX90A-NEXT: s_mov_b32 s4, 32
+; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT: v_lshrrev_b64 v[4:5], s4, v[2:3]
+; GFX90A-NEXT: v_mov_b32_e32 v0, v2
+; GFX90A-NEXT: v_mov_b32_e32 v1, v4
+; GFX90A-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX942: ; %bb.0:
+; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT: s_mov_b64 s[0:1], src_private_base
+; GFX942-NEXT: s_mov_b32 s2, 32
+; GFX942-NEXT: s_lshr_b64 s[0:1], s[0:1], s2
+; GFX942-NEXT: s_getpc_b64 s[2:3]
+; GFX942-NEXT: s_add_u32 s2, s2, global@rel32@lo+4
+; GFX942-NEXT: s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX942-NEXT: s_cmp_eq_u32 s3, s0
+; GFX942-NEXT: s_cselect_b64 s[0:1], -1, 0
+; GFX942-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[0:1]
+; GFX942-NEXT: s_mov_b64 s[0:1], -1
+; GFX942-NEXT: s_mov_b32 s2, 1
+; GFX942-NEXT: v_cmp_ne_u32_e64 s[2:3], v2, s2
+; GFX942-NEXT: s_and_b64 vcc, exec, s[2:3]
+; GFX942-NEXT: ; implicit-def: $vgpr2_vgpr3
+; GFX942-NEXT: s_cbranch_vccnz .LBB4_3
+; GFX942-NEXT: .LBB4_1: ; %Flow
+; GFX942-NEXT: v_cndmask_b32_e64 v4, 0, 1, s[0:1]
+; GFX942-NEXT: s_mov_b32 s0, 1
+; GFX942-NEXT: v_cmp_ne_u32_e64 s[0:1], v4, s0
+; GFX942-NEXT: s_and_b64 vcc, exec, s[0:1]
+; GFX942-NEXT: s_cbranch_vccnz .LBB4_4
+; GFX942-NEXT: ; %bb.2: ; %atomicrmw.private
+; GFX942-NEXT: s_waitcnt lgkmcnt(0)
+; GFX942-NEXT: s_nop 1
+; GFX942-NEXT: scratch_load_dwordx2 v[2:3], off, s0
+; GFX942-NEXT: s_waitcnt vmcnt(0)
+; GFX942-NEXT: v_lshl_add_u64 v[0:1], v[2:3], 0, v[0:1]
+; GFX942-NEXT: scratch_store_dwordx2 off, v[0:1], s0
+; GFX942-NEXT: s_branch .LBB4_4
+; GFX942-NEXT: .LBB4_3: ; %atomicrmw.global
+; GFX942-NEXT: s_getpc_b64 s[0:1]
+; GFX942-NEXT: s_add_u32 s0, s0, global@rel32@lo+4
+; GFX942-NEXT: s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX942-NEXT: v_mov_b64_e32 v[2:3], s[0:1]
+; GFX942-NEXT: flat_atomic_add_x2 v[2:3], v[2:3], v[0:1] sc0
+; GFX942-NEXT: s_mov_b64 s[0:1], 0
+; GFX942-NEXT: s_branch .LBB4_1
+; GFX942-NEXT: .LBB4_4: ; %atomicrmw.phi
+; GFX942-NEXT: ; %bb.5: ; %atomicrmw.end
+; GFX942-NEXT: s_mov_b32 s0, 32
+; GFX942-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX942-NEXT: v_lshrrev_b64 v[4:5], s0, v[2:3]
+; GFX942-NEXT: v_mov_b32_e32 v0, v2
+; GFX942-NEXT: v_mov_b32_e32 v1, v4
+; GFX942-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1100: ; %bb.0:
+; GFX1100-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT: s_mov_b64 s[0:1], src_private_base
+; GFX1100-NEXT: s_mov_b32 s2, 32
+; GFX1100-NEXT: s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1100-NEXT: s_getpc_b64 s[2:3]
+; GFX1100-NEXT: s_add_u32 s2, s2, global@rel32@lo+4
+; GFX1100-NEXT: s_addc_u32 s3, s3, global@rel32@hi+12
+; GFX1100-NEXT: s_cmp_eq_u32 s3, s0
+; GFX1100-NEXT: s_cselect_b32 s0, -1, 0
+; GFX1100-NEXT: v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT: s_mov_b32 s0, -1
+; GFX1100-NEXT: s_mov_b32 s1, 1
+; GFX1100-NEXT: v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1100-NEXT: s_and_b32 vcc_lo, exec_lo, s1
+; GFX1100-NEXT: ; implicit-def: $vgpr3_vgpr4
+; GFX1100-NEXT: s_cbranch_vccnz .LBB4_3
+; GFX1100-NEXT: .LBB4_1: ; %Flow
+; GFX1100-NEXT: v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1100-NEXT: s_mov_b32 s0, 1
+; GFX1100-NEXT: v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1100-NEXT: s_and_b32 vcc_lo, exec_lo, s0
+; GFX1100-NEXT: s_cbranch_vccnz .LBB4_4
+; GFX1100-NEXT: ; %bb.2: ; %atomicrmw.private
+; GFX1100-NEXT: s_waitcnt lgkmcnt(0)
+; GFX1100-NEXT: scratch_load_b64 v[3:4], off, s0
+; GFX1100-NEXT: s_waitcnt vmcnt(0)
+; GFX1100-NEXT: v_add_co_u32 v0, s0, v3, v0
+; GFX1100-NEXT: v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1100-NEXT: scratch_store_b64 off, v[0:1], s0
+; GFX1100-NEXT: s_branch .LBB4_4
+; GFX1100-NEXT: .LBB4_3: ; %atomicrmw.global
+; GFX1100-NEXT: s_getpc_b64 s[0:1]
+; GFX1100-NEXT: s_add_u32 s0, s0, global@rel32@lo+4
+; GFX1100-NEXT: s_addc_u32 s1, s1, global@rel32@hi+12
+; GFX1100-NEXT: v_mov_b32_e32 v3, s1
+; GFX1100-NEXT: v_mov_b32_e32 v2, s0
+; GFX1100-NEXT: flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] glc
+; GFX1100-NEXT: s_mov_b32 s0, 0
+; GFX1100-NEXT: s_branch .LBB4_1
+; GFX1100-NEXT: .LBB4_4: ; %atomicrmw.phi
+; GFX1100-NEXT: ; %bb.5: ; %atomicrmw.end
+; GFX1100-NEXT: s_mov_b32 s0, 32
+; GFX1100-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX1100-NEXT: v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1100-NEXT: v_mov_b32_e32 v0, v3
+; GFX1100-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX1200-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX1200: ; %bb.0:
+; GFX1200-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT: s_wait_expcnt 0x0
+; GFX1200-NEXT: s_wait_samplecnt 0x0
+; GFX1200-NEXT: s_wait_bvhcnt 0x0
+; GFX1200-NEXT: s_wait_kmcnt 0x0
+; GFX1200-NEXT: s_mov_b64 s[0:1], src_private_base
+; GFX1200-NEXT: s_mov_b32 s2, 32
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_lshr_b64 s[0:1], s[0:1], s2
+; GFX1200-NEXT: s_getpc_b64 s[2:3]
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_sext_i32_i16 s3, s3
+; GFX1200-NEXT: s_add_co_u32 s2, s2, global@rel32@lo+12
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_add_co_ci_u32 s3, s3, global@rel32@hi+24
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_cmp_eq_u32 s3, s0
+; GFX1200-NEXT: s_cselect_b32 s0, -1, 0
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT: s_mov_b32 s0, -1
+; GFX1200-NEXT: s_mov_b32 s1, 1
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: v_cmp_ne_u32_e64 s1, v2, s1
+; GFX1200-NEXT: s_and_b32 vcc_lo, exec_lo, s1
+; GFX1200-NEXT: ; implicit-def: $vgpr3_vgpr4
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_cbranch_vccnz .LBB4_3
+; GFX1200-NEXT: .LBB4_1: ; %Flow
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: v_cndmask_b32_e64 v2, 0, 1, s0
+; GFX1200-NEXT: s_mov_b32 s0, 1
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: v_cmp_ne_u32_e64 s0, v2, s0
+; GFX1200-NEXT: s_and_b32 vcc_lo, exec_lo, s0
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_cbranch_vccnz .LBB4_4
+; GFX1200-NEXT: ; %bb.2: ; %atomicrmw.private
+; GFX1200-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT: scratch_load_b64 v[3:4], off, s0
+; GFX1200-NEXT: s_wait_loadcnt 0x0
+; GFX1200-NEXT: v_add_co_u32 v0, s0, v3, v0
+; GFX1200-NEXT: s_wait_alu 0xf1ff
+; GFX1200-NEXT: v_add_co_ci_u32_e64 v1, s0, v4, v1, s0
+; GFX1200-NEXT: scratch_store_b64 off, v[0:1], s0
+; GFX1200-NEXT: s_branch .LBB4_4
+; GFX1200-NEXT: .LBB4_3: ; %atomicrmw.global
+; GFX1200-NEXT: s_getpc_b64 s[0:1]
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_sext_i32_i16 s1, s1
+; GFX1200-NEXT: s_add_co_u32 s0, s0, global@rel32@lo+12
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: s_add_co_ci_u32 s1, s1, global@rel32@hi+24
+; GFX1200-NEXT: s_wait_alu 0xfffe
+; GFX1200-NEXT: v_mov_b32_e32 v3, s1
+; GFX1200-NEXT: v_mov_b32_e32 v2, s0
+; GFX1200-NEXT: flat_atomic_add_u64 v[3:4], v[2:3], v[0:1] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1200-NEXT: s_mov_b32 s0, 0
+; GFX1200-NEXT: s_branch .LBB4_1
+; GFX1200-NEXT: .LBB4_4: ; %atomicrmw.phi
+; GFX1200-NEXT: ; %bb.5: ; %atomicrmw.end
+; GFX1200-NEXT: s_mov_b32 s0, 32
+; GFX1200-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1200-NEXT: s_wait_alu 0xf1fe
+; GFX1200-NEXT: v_lshrrev_b64 v[1:2], s0, v[3:4]
+; GFX1200-NEXT: v_mov_b32_e32 v0, v3
+; GFX1200-NEXT: s_setpc_b64 s[30:31]
+ %rmw = atomicrmw add ptr addrspacecast (ptr addrspace(1) @global to ptr), i64 %val syncscope("agent") monotonic, align 8
+ ret i64 %rmw
+}
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define double @optnone_atomicrmw_fadd_f64_expand(double %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX908: ; %bb.0:
+; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT: s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT: s_mov_b32 s6, 32
+; GFX908-NEXT: s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT: s_getpc_b64 s[6:7]
+; GFX908-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT: s_cmp_eq_u32 s7, s4
+; GFX908-NEXT: s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT: s_mov_b64 s[4:5], -1
+; GFX908-NEXT: s_mov_b32 s6, 1
+; GFX908-NEXT: v_readfirstlane_b32 s7, v2
+; GFX908-NEXT: s_cmp_lg_u32 s7, s6
+; GFX908-NEXT: s_cselect_b64 s[6:7], -1, 0
+; GFX908-NEXT: s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT: ; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT: s_cbranch_vccnz .LBB5_2
+; GFX908-NEXT: s_branch .LBB5_3
+; GFX908-NEXT: .LBB5_1: ; %atomicrmw.private
+; GFX908-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT: s_waitcnt vmcnt(0)
+; GFX908-NEXT: v_mov_b32_e32 v4, v3
+; GFX908-NEXT: v_add_f64 v[0:1], v[3:4], v[0:1]
+; GFX908-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT: s_branch .LBB5_6
+; GFX908-NEXT: .LBB5_2: ; %atomicrmw.global
+; GFX908-NEXT: s_getpc_b64 s[4:5]
+; GFX908-NEXT: s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT: s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT: v_mov_b32_e32 v2, s4
+; GFX908-NEXT: v_mov_b32_e32 v3, s5
+; GFX908-NEXT: flat_load_dwordx2 v[3:4], v[2:3]
+; GFX908-NEXT: s_mov_b64 s[4:5], 0
+; GFX908-NEXT: s_branch .LBB5_4
+; GFX908-NEXT: .LBB5_3: ; %Flow
+; GFX908-NEXT: s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT: s_cbranch_vccnz .LBB5_1
+; GFX908-NEXT: s_branch .LBB5_6
+; GFX908-NEXT: .LBB5_4: ; %atomicrmw.start
+; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
+; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT: v_mov_b32_e32 v6, v4
+; GFX908-NEXT: v_mov_b32_e32 v5, v3
+; GFX908-NEXT: v_add_f64 v[3:4], v[5:6], v[0:1]
+; GFX908-NEXT: s_getpc_b64 s[6:7]
+; GFX908-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT: v_mov_b32_e32 v8, s7
+; GFX908-NEXT: v_mov_b32_e32 v7, s6
+; GFX908-NEXT: flat_atomic_cmpswap_x2 v[3:4], v[7:8], v[3:6] glc
+; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT: v_cmp_eq_u64_e64 s[6:7], v[3:4], v[5:6]
+; GFX908-NEXT: s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX908-NEXT: s_andn2_b64 exec, exec, s[4:5]
+; GFX908-NEXT: s_cbranch_execnz .LBB5_4
+; GFX908-NEXT: ; %bb.5: ; %atomicrmw.end1
+; GFX908-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX908-NEXT: s_mov_b64 s[4:5], 0
+; GFX908-NEXT: s_branch .LBB5_3
+; GFX908-NEXT: .LBB5_6: ; %atomicrmw.phi
+; GFX908-NEXT: ; %bb.7: ; %atomicrmw.end
+; GFX908-NEXT: s_mov_b32 s4, 32
+; GFX908-NEXT: v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT: v_mov_b32_e32 v0, v3
+; GFX908-NEXT: s_waitcnt vmcnt(0)
+; GFX908-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: optnone_atomicrmw_fadd_f64_expand:
+; GFX90A: ; %bb.0:
+; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT: s_mov_b64 s[4:5], src_private_base
+; GFX90A-NEXT: s_mov_b32 s6, 32
+; GFX90A-NEXT: s_lshr_b64 s[4:5], s[4:5], s6
+; GFX90A-NEXT: s_getpc_b64 s[6:7]
+; GFX90A-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT: s_cmp_eq_u32 s7, s4
+; GFX90A-NEXT: s_cselect_b64 s[4:5], -1, 0
+; GFX90A-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX90A-NEXT: s_mov_b64 s[4:5], -1
+; GFX90A-NEXT: s_mov_b32 s6, 1
+; GFX90A-NEXT: v_readfirstlane_b32 s7, v2
+; GFX90A-NEXT: s_cmp_lg_u32 s7, s6
+; GFX90A-NEXT: s_cselect_b64 s[6:7], -1, 0
+; GFX90A-NEXT: s_and_b64 vcc, exec, s[6:7]
+; GFX90A-NEXT: ; implicit-def: $vgpr2_vgpr3
+; GFX90A-NEXT: s_cbranch_vccnz .LBB5_2
+; GFX90A-NEXT: s_branch .LBB5_3
+; GFX90A-NEXT: .LBB5_1: ; %atomicrmw.private
+; GFX90A-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen
+; GFX90A-NEXT: s_waitcnt vmcnt(0)
+; GFX90A-NEXT: v_mov_b32_e32 v3, v2
+; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], v[0:1]
+; GFX90A-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX90A-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX90A-NEXT: s_branch .LBB5_6
+; GFX90A-NEXT: .LBB5_2: ; %atomicrmw.global
+; GFX90A-NEXT: s_getpc_b64 s[4:5]
+; GFX90A-NEXT: s_add_u32 s4, s4, global@rel32@lo+4
+; GFX90A-NEXT: s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NEXT: flat_load_dwordx2 v[2:3], v[2:3]
+; GFX90A-NEXT: s_mov_b64 s[4:5], 0
+; GFX90A-NEXT: s_branch .LBB5_4
+; GFX90A-NEXT: .LBB5_3: ; %Flow
+; GFX90A-NEXT: s_and_b64 vcc, exec, s[4:5]
+; GFX90A-NEXT: s_cbranch_vccnz .LBB5_1
+; GFX90A-NEXT: s_branch .LBB5_6
+; GFX90A-NEXT: .LBB5_4: ; %atomicrmw.start
+; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
+; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
+; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], v[0:1]
+; GFX90A-NEXT: s_getpc_b64 s[6:7]
+; GFX90A-NEXT: s_add_u32 s6, s6, global@rel32@lo+4
+; GFX90A-NEXT: s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX90A-NEXT: v_pk_mov_b32 v[6:7], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[6:7], v[2:5] glc
+; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT: v_cmp_eq_u64_e64 s[6:7], v[2:3], v[4:5]
+; GFX90A-NEXT: s_or_b64 s[4:5], s[6:7], s[4:5]
+; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
+; GFX90A-NEXT: s_cbranch_execnz .LBB5_4
+; GFX90A-NEXT: ; %bb.5: ; %atomicrmw.end1
+; GFX90A-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX90A-NEXT: s_mov_b64 s[4:5], 0
+; GFX90A-NEXT: s_branch .LBB5_3
+; GFX90A-NEXT: .LBB5_6: ; %atomicrmw.phi
+;...
[truncated]
|
The backported test is failing. |
This should just need update_llc_test_checks. Is it possible to push to this PR or do I need to open a manual one? |
Opened manual version in #127751 |
Closing this as #127751 has landed. |
Backport 18ea6c9
Requested by: @arsenm