Skip to content

Commit 6405a5f

Browse files
pravinjagtapvikramRH
authored andcommitted
[AMDGPU] Support double type in atomic optimizer. (llvm#84307)
Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <[email protected]> Change-Id: Ie604b4e584d4e891cfb93f59897e45a2c2a53a1e
1 parent dd59888 commit 6405a5f

9 files changed

+14467
-7684
lines changed

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,8 +209,9 @@ void AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) {
209209
break;
210210
}
211211

212-
// Only 32-bit floating point atomic ops are supported.
213-
if (AtomicRMWInst::isFPOperation(Op) && !I.getType()->isFloatTy()) {
212+
// Only 32 and 64 bit floating point atomic ops are supported.
213+
if (AtomicRMWInst::isFPOperation(Op) &&
214+
!(I.getType()->isFloatTy() || I.getType()->isDoubleTy())) {
214215
return;
215216
}
216217

@@ -931,8 +932,10 @@ void AMDGPUAtomicOptimizerImpl::optimizeAtomic(Instruction &I,
931932
Value *BroadcastI = nullptr;
932933

933934
if (TyBitWidth == 64) {
934-
Value *const ExtractLo = B.CreateTrunc(PHI, Int32Ty);
935-
Value *const ExtractHi = B.CreateTrunc(B.CreateLShr(PHI, 32), Int32Ty);
935+
Value *CastedPhi = B.CreateBitCast(PHI, IntNTy);
936+
Value *const ExtractLo = B.CreateTrunc(CastedPhi, Int32Ty);
937+
Value *const ExtractHi =
938+
B.CreateTrunc(B.CreateLShr(CastedPhi, 32), Int32Ty);
936939
CallInst *const ReadFirstLaneLo =
937940
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractLo);
938941
CallInst *const ReadFirstLaneHi =

llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll

Lines changed: 255 additions & 51 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll

Lines changed: 245 additions & 57 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll

Lines changed: 560 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll

Lines changed: 420 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll

Lines changed: 3586 additions & 2216 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll

Lines changed: 2907 additions & 1979 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll

Lines changed: 2907 additions & 1978 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll

Lines changed: 3580 additions & 1399 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)