Skip to content

Commit 277e064

Browse files
committed
Do not copy long double and 128-bit fp format from aux target for AMDGPU
rC352620 caused regressions because it copied floating point format from aux target. floating point format decides whether extended long double is supported. It is x86_fp80 on x86 but IEEE double on amdgcn. Document usage of long doubel type in HIP programming guide ROCm/hip#890 Differential Revision: https://reviews.llvm.org/D57527 llvm-svn: 352801
1 parent fadf250 commit 277e064

File tree

2 files changed

+21
-0
lines changed

2 files changed

+21
-0
lines changed

clang/lib/Basic/Targets/AMDGPU.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,5 +307,16 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
307307
}
308308

309309
void AMDGPUTargetInfo::setAuxTarget(const TargetInfo *Aux) {
310+
assert(HalfFormat == Aux->HalfFormat);
311+
assert(FloatFormat == Aux->FloatFormat);
312+
assert(DoubleFormat == Aux->DoubleFormat);
313+
314+
// On x86_64 long double is 80-bit extended precision format, which is
315+
// not supported by AMDGPU. 128-bit floating point format is also not
316+
// supported by AMDGPU. Therefore keep its own format for these two types.
317+
auto SaveLongDoubleFormat = LongDoubleFormat;
318+
auto SaveFloat128Format = Float128Format;
310319
copyAuxTarget(Aux);
320+
LongDoubleFormat = SaveLongDoubleFormat;
321+
Float128Format = SaveFloat128Format;
311322
}

clang/test/CodeGenCUDA/types.cu

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// RUN: %clang_cc1 -triple amdgcn -aux-triple x86_64 -fcuda-is-device -emit-llvm %s -o - | FileCheck -check-prefix=DEV %s
2+
// RUN: %clang_cc1 -triple x86_64 -aux-triple amdgcn -emit-llvm %s -o - | FileCheck -check-prefix=HOST %s
3+
4+
#include "Inputs/cuda.h"
5+
6+
// HOST: @ld_host = global x86_fp80 0xK00000000000000000000
7+
long double ld_host;
8+
9+
// DEV: @ld_device = addrspace(1) externally_initialized global double 0.000000e+00
10+
__device__ long double ld_device;

0 commit comments

Comments
 (0)