Skip to content

Commit e1a8120

Browse files
authored
[AMDGPU] Support double type in atomic optimizer. (#84307)
Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <[email protected]>
1 parent c67ed2f commit e1a8120

9 files changed

+20497
-112
lines changed

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,8 +209,9 @@ void AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) {
209209
break;
210210
}
211211

212-
// Only 32-bit floating point atomic ops are supported.
213-
if (AtomicRMWInst::isFPOperation(Op) && !I.getType()->isFloatTy()) {
212+
// Only 32 and 64 bit floating point atomic ops are supported.
213+
if (AtomicRMWInst::isFPOperation(Op) &&
214+
!(I.getType()->isFloatTy() || I.getType()->isDoubleTy())) {
214215
return;
215216
}
216217

@@ -920,8 +921,10 @@ void AMDGPUAtomicOptimizerImpl::optimizeAtomic(Instruction &I,
920921
Value *BroadcastI = nullptr;
921922

922923
if (TyBitWidth == 64) {
923-
Value *const ExtractLo = B.CreateTrunc(PHI, Int32Ty);
924-
Value *const ExtractHi = B.CreateTrunc(B.CreateLShr(PHI, 32), Int32Ty);
924+
Value *CastedPhi = B.CreateBitCast(PHI, IntNTy);
925+
Value *const ExtractLo = B.CreateTrunc(CastedPhi, Int32Ty);
926+
Value *const ExtractHi =
927+
B.CreateTrunc(B.CreateLShr(CastedPhi, 32), Int32Ty);
925928
CallInst *const ReadFirstLaneLo =
926929
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractLo);
927930
CallInst *const ReadFirstLaneHi =

llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll

Lines changed: 223 additions & 51 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll

Lines changed: 213 additions & 57 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll

Lines changed: 560 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll

Lines changed: 420 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll

Lines changed: 5578 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll

Lines changed: 3960 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll

Lines changed: 3960 additions & 0 deletions
Large diffs are not rendered by default.

llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll

Lines changed: 5576 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)