Skip to content

[NVPTX][AMDGPU][CodeGen] Fix local_space nullptr handling for NVPTX and local/private nullptr value for AMDGPU. #78759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions clang/lib/Basic/Targets/AMDGPU.h
Original file line number Diff line number Diff line change
Expand Up @@ -414,8 +414,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo {
// value ~0.
uint64_t getNullPointerValue(LangAS AS) const override {
// FIXME: Also should handle region.
return (AS == LangAS::opencl_local || AS == LangAS::opencl_private)
? ~0 : 0;
return (AS == LangAS::opencl_local || AS == LangAS::opencl_private ||
AS == LangAS::sycl_local || AS == LangAS::sycl_private)
? ~0
: 0;
}

void setAuxTarget(const TargetInfo *Aux) override;
Expand Down
18 changes: 18 additions & 0 deletions clang/lib/CodeGen/Targets/NVPTX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ class NVPTXTargetCodeGenInfo : public TargetCodeGenInfo {
CodeGen::CodeGenModule &M) const override;
bool shouldEmitStaticExternCAliases() const override;

llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
llvm::PointerType *T,
QualType QT) const override;

llvm::Type *getCUDADeviceBuiltinSurfaceDeviceType() const override {
// On the device side, surface reference is represented as an object handle
// in 64-bit integer.
Expand Down Expand Up @@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
return false;
}

llvm::Constant *
NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,
llvm::PointerType *PT,
QualType QT) const {
auto &Ctx = CGM.getContext();
if (PT->getAddressSpace() != Ctx.getTargetAddressSpace(LangAS::opencl_local))
return llvm::ConstantPointerNull::get(PT);

auto NPT = llvm::PointerType::get(
PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic));
return llvm::ConstantExpr::getAddrSpaceCast(
llvm::ConstantPointerNull::get(NPT), PT);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what's going on here. Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

Will it work for CUDA (as in the CUDA language)? I think this code should be restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the dafault.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below code in the AMDGPU file shows that the ASC is already in place for HIP.
LOC = nullptr; // CHECK-DAG: store ptr addrspace(3) addrspacecast (ptr null to ptr addrspace(3)), ptr [[LOC]].ascast, align 4
For the CUDA case, not ASCing the nullptr in local_space results in undefined behaviour as only the 24 LSBs of the nullptr value is 0 and the rest is changing with each run. i.e. if I copy local_space nullptr in the kernel to a long var and print the value, it won't be zero. So, overall, the ASCing the local_space nullptr for CUDA is needed as it's already there for AMDGPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Artem-B

I'm shimming in at @mmoadeli's request. I advised him on the resolution of his issue.

I don't quite understand what's going on here.

So it is a similar story as for the AMDGPU backend. 0 as a pointer to shared memory is a valid one and points to the root of the shared memory, so that's means we cannot use this value as nullptr. AMDGPU uses -1 (all bits set) for this, but we couldn't find anything equivalent in the CUDA/PTX documentation. After a few investigation, we found out the most stable way to do this is simply by inserting this expression.

Note that 0 as a pointer to the generic address space the expected value for nullptr

Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

The patch isn't doing this, if the pointer type is to the cuda shared address space (opencl's local address space) then we do ASC. Otherwise this emits the simple llvm::ConstantPointerNull.
We used LangAS::opencl_generic as a way to emphasis there is a generic to shared address space cast going on. The other solution here would be to use LangAS::Default to retrieve the target address space, but Default doesn't sound right to me as you have to know this maps to NVPTX's generic target address space. Either way, we don't have a strong opinion on what to use. But a comment is probably needed regardless.

Will it work for CUDA (as in the CUDA language)? I think this code should be restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the dafault.

So yes and no. To the Will it work for CUDA ? part, yes it will because you actually cannot hit this return. CUDA doesn't expose address spaces so you can't have that nullptr as an address in the cuda shared address space, so the if above will always evaluate to true in CUDA.

For the leave CUDA/HIP with the dafault part, you could force things and use target address spaces like it is done in the clang headers for CUDA and this change would capture that. However, as explained before, 0 in the address space 3 (NVPTX backend) is a valid address and it is very easy to highlight in SASS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the reason for the AMDGPU code using an addrspacecast is the following comment // Currently LLVM assumes null pointers always have value 0, which results in incorrectly transformed IR. so it can't use a null literal for all ones.

Looking at the lang-ref I can't actually see anywhere that ptr addrspace(X) null is the zero value, so this should probably be clarified in the lagref: https://llvm.org/docs/LangRef.html#t-pointer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The semantics of non-zero address spaces are target-specific. Memory access through a non-dereferenceable pointer is undefined behavior in any address space. Pointers with the bit-value 0 are only assumed to be non-dereferenceable in address space 0, unless the function is marked with the null_pointer_is_valid attribute."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really needs to be a property of the datalayout. the addrspacecast is a roundabout way of using -1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I believe that right now some LLVM passes assume a null constant is all zeroes, even though LangRef does not specify it so we need to either document those semantics or add a property to the data layout.

}

void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
Expand Down
171 changes: 88 additions & 83 deletions clang/test/CodeGenSYCL/address-space-conversions.cpp
Original file line number Diff line number Diff line change
@@ -1,138 +1,143 @@
// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
void bar(int &Data) {}
// CHECK-DAG: define{{.*}} spir_func void @[[RAW_REF:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef align 4 dereferenceable(4) %
// CHECK: define{{.*}} spir_func void @[[RAW_REF:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef align 4 dereferenceable(4) %
void bar2(int &Data) {}
// CHECK-DAG: define{{.*}} spir_func void @[[RAW_REF2:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef align 4 dereferenceable(4) %
// CHECK: define{{.*}} spir_func void @[[RAW_REF2:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef align 4 dereferenceable(4) %
void bar(__attribute__((opencl_local)) int &Data) {}
// CHECK-DAG: define{{.*}} spir_func void [[LOC_REF:@[a-zA-Z0-9_]+]](ptr addrspace(3) noundef align 4 dereferenceable(4) %
// CHECK: define{{.*}} spir_func void [[LOC_REF:@[a-zA-Z0-9_]+]](ptr addrspace(3) noundef align 4 dereferenceable(4) %
void foo(int *Data) {}
// CHECK-DAG: define{{.*}} spir_func void @[[RAW_PTR:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef %
// CHECK: define{{.*}} spir_func void @[[RAW_PTR:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef %
void foo2(int *Data) {}
// CHECK-DAG: define{{.*}} spir_func void @[[RAW_PTR2:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef %
// CHECK: define{{.*}} spir_func void @[[RAW_PTR2:[a-zA-Z0-9_]+]](ptr addrspace(4) noundef %
void foo(__attribute__((opencl_local)) int *Data) {}
// CHECK-DAG: define{{.*}} spir_func void [[LOC_PTR:@[a-zA-Z0-9_]+]](ptr addrspace(3) noundef %
// CHECK: define{{.*}} spir_func void [[LOC_PTR:@[a-zA-Z0-9_]+]](ptr addrspace(3) noundef %

template <typename T>
void tmpl(T t) {}
// See Check Lines below.

void usages() {
// CHECK-DAG: [[GLOB:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1)
// CHECK-DAG: [[GLOB]].ascast = addrspacecast ptr [[GLOB]] to ptr addrspace(4)
int *NoAS;
// CHECK: [[NoAS:%[a-zA-Z0-9]+]] = alloca ptr addrspace(4)
__attribute__((opencl_global)) int *GLOB;
// CHECK-DAG: [[LOC:%[a-zA-Z0-9]+]] = alloca ptr addrspace(3)
// CHECK-DAG: [[LOC]].ascast = addrspacecast ptr [[LOC]] to ptr addrspace(4)
// CHECK: [[GLOB:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1)
__attribute__((opencl_local)) int *LOC;
// CHECK-DAG: [[NoAS:%[a-zA-Z0-9]+]] = alloca ptr addrspace(4)
// CHECK-DAG: [[NoAS]].ascast = addrspacecast ptr [[NoAS]] to ptr addrspace(4)
int *NoAS;
// CHECK-DAG: [[PRIV:%[a-zA-Z0-9]+]] = alloca ptr
// CHECK-DAG: [[PRIV]].ascast = addrspacecast ptr [[PRIV]] to ptr addrspace(4)
// CHECK: [[LOC:%[a-zA-Z0-9]+]] = alloca ptr addrspace(3)
__attribute__((opencl_private)) int *PRIV;
// CHECK-DAG: [[GLOB_DEVICE:%[a-zA-Z0-9]+]] = alloca ptr addrspace(5)
// CHECK: [[PRIV:%[a-zA-Z0-9]+]] = alloca ptr
__attribute__((opencl_global_device)) int *GLOBDEVICE;
// CHECK-DAG: [[GLOB_HOST:%[a-zA-Z0-9]+]] = alloca ptr addrspace(6)
// CHECK: [[GLOB_DEVICE:%[a-zA-Z0-9]+]] = alloca ptr addrspace(5)
__attribute__((opencl_global_host)) int *GLOBHOST;
// CHECK: [[GLOB_HOST:%[a-zA-Z0-9]+]] = alloca ptr addrspace(6)

// CHECK: [[NoAS]].ascast = addrspacecast ptr [[NoAS]] to ptr addrspace(4)
// CHECK: [[GLOB]].ascast = addrspacecast ptr [[GLOB]] to ptr addrspace(4)
// CHECK: [[LOC]].ascast = addrspacecast ptr [[LOC]] to ptr addrspace(4)
// CHECK: [[PRIV]].ascast = addrspacecast ptr [[PRIV]] to ptr addrspace(4)
LOC = nullptr;
// CHECK: store ptr addrspace(3) null, ptr addrspace(4) [[LOC]].ascast, align 8
GLOB = nullptr;
// CHECK: store ptr addrspace(1) null, ptr addrspace(4) [[GLOB]].ascast, align 8

// Explicit conversions
// From named address spaces to default address space
// CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr addrspace(4)
// CHECK-DAG: store ptr addrspace(4) [[GLOB_CAST]], ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr addrspace(4)
// CHECK: store ptr addrspace(4) [[GLOB_CAST]], ptr addrspace(4) [[NoAS]].ascast
NoAS = (int *)GLOB;
// CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: [[LOC_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD]] to ptr addrspace(4)
// CHECK-DAG: store ptr addrspace(4) [[LOC_CAST]], ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: [[LOC_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD]] to ptr addrspace(4)
// CHECK: store ptr addrspace(4) [[LOC_CAST]], ptr addrspace(4) [[NoAS]].ascast
NoAS = (int *)LOC;
// CHECK-DAG: [[PRIV_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr addrspace(4) [[PRIV]].ascast
// CHECK-DAG: [[PRIV_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[PRIV_LOAD]] to ptr addrspace(4)
// CHECK-DAG: store ptr addrspace(4) [[PRIV_CAST]], ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[PRIV_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr addrspace(4) [[PRIV]].ascast
// CHECK: [[PRIV_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[PRIV_LOAD]] to ptr addrspace(4)
// CHECK: store ptr addrspace(4) [[PRIV_CAST]], ptr addrspace(4) [[NoAS]].ascast
NoAS = (int *)PRIV;
// From default address space to named address space
// CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr addrspace(1)
// CHECK-DAG: store ptr addrspace(1) [[NoAS_CAST]], ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr addrspace(1)
// CHECK: store ptr addrspace(1) [[NoAS_CAST]], ptr addrspace(4) [[GLOB]].ascast
GLOB = (__attribute__((opencl_global)) int *)NoAS;
// CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr addrspace(3)
// CHECK-DAG: store ptr addrspace(3) [[NoAS_CAST]], ptr addrspace(4) [[LOC]].ascast
// CHECK: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr addrspace(3)
// CHECK: store ptr addrspace(3) [[NoAS_CAST]], ptr addrspace(4) [[LOC]].ascast
LOC = (__attribute__((opencl_local)) int *)NoAS;
// CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr
// CHECK-DAG: store ptr [[NoAS_CAST]], ptr addrspace(4) [[PRIV]].ascast
// CHECK: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(4) [[NoAS_LOAD]] to ptr
// CHECK: store ptr [[NoAS_CAST]], ptr addrspace(4) [[PRIV]].ascast
PRIV = (__attribute__((opencl_private)) int *)NoAS;
// From opencl_global_[host/device] address spaces to opencl_global
// CHECK-DAG: [[GLOBDEVICE_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(5), ptr addrspace(4) [[GLOB_DEVICE]].ascast
// CHECK-DAG: [[GLOBDEVICE_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(5) [[GLOBDEVICE_LOAD]] to ptr addrspace(1)
// CHECK-DAG: store ptr addrspace(1) [[GLOBDEVICE_CAST]], ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOBDEVICE_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(5), ptr addrspace(4) [[GLOB_DEVICE]].ascast
// CHECK: [[GLOBDEVICE_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(5) [[GLOBDEVICE_LOAD]] to ptr addrspace(1)
// CHECK: store ptr addrspace(1) [[GLOBDEVICE_CAST]], ptr addrspace(4) [[GLOB]].ascast
GLOB = (__attribute__((opencl_global)) int *)GLOBDEVICE;
// CHECK-DAG: [[GLOBHOST_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(6), ptr addrspace(4) [[GLOB_HOST]].ascast
// CHECK-DAG: [[GLOBHOST_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(6) [[GLOBHOST_LOAD]] to ptr addrspace(1)
// CHECK-DAG: store ptr addrspace(1) [[GLOBHOST_CAST]], ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOBHOST_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(6), ptr addrspace(4) [[GLOB_HOST]].ascast
// CHECK: [[GLOBHOST_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(6) [[GLOBHOST_LOAD]] to ptr addrspace(1)
// CHECK: store ptr addrspace(1) [[GLOBHOST_CAST]], ptr addrspace(4) [[GLOB]].ascast
GLOB = (__attribute__((opencl_global)) int *)GLOBHOST;

bar(*GLOB);
// CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_REF]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[GLOB_CAST]])
// CHECK: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_REF]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[GLOB_CAST]])
bar2(*GLOB);
// CHECK-DAG: [[GLOB_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: [[GLOB_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD2]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[GLOB_CAST2]])
// CHECK: [[GLOB_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOB_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD2]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[GLOB_CAST2]])

bar(*LOC);
// CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: call spir_func void [[LOC_REF]](ptr addrspace(3) noundef align 4 dereferenceable(4) [[LOC_LOAD]])
// CHECK: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: call spir_func void [[LOC_REF]](ptr addrspace(3) noundef align 4 dereferenceable(4) [[LOC_LOAD]])
bar2(*LOC);
// CHECK-DAG: [[LOC_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: [[LOC_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD2]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[LOC_CAST2]])
// CHECK: [[LOC_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: [[LOC_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD2]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[LOC_CAST2]])

bar(*NoAS);
// CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: call spir_func void @[[RAW_REF]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[NoAS_LOAD]])
// CHECK: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: call spir_func void @[[RAW_REF]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[NoAS_LOAD]])
bar2(*NoAS);
// CHECK-DAG: [[NoAS_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[NoAS_LOAD2]])
// CHECK: [[NoAS_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: call spir_func void @[[RAW_REF2]](ptr addrspace(4) noundef align 4 dereferenceable(4) [[NoAS_LOAD2]])

foo(GLOB);
// CHECK-DAG: [[GLOB_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: [[GLOB_CAST3:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD3]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_PTR]](ptr addrspace(4) noundef [[GLOB_CAST3]])
// CHECK: [[GLOB_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOB_CAST3:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD3]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_PTR]](ptr addrspace(4) noundef [[GLOB_CAST3]])
foo2(GLOB);
// CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: [[GLOB_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD4]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[GLOB_CAST4]])
// CHECK: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: [[GLOB_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD4]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[GLOB_CAST4]])
foo(LOC);
// CHECK-DAG: [[LOC_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: call spir_func void [[LOC_PTR]](ptr addrspace(3) noundef [[LOC_LOAD3]])
// CHECK: [[LOC_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: call spir_func void [[LOC_PTR]](ptr addrspace(3) noundef [[LOC_LOAD3]])
foo2(LOC);
// CHECK-DAG: [[LOC_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: [[LOC_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD4]] to ptr addrspace(4)
// CHECK-DAG: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[LOC_CAST4]])
// CHECK: [[LOC_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: [[LOC_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD4]] to ptr addrspace(4)
// CHECK: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[LOC_CAST4]])
foo(NoAS);
// CHECK-DAG: [[NoAS_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: call spir_func void @[[RAW_PTR]](ptr addrspace(4) noundef [[NoAS_LOAD3]])
// CHECK: [[NoAS_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: call spir_func void @[[RAW_PTR]](ptr addrspace(4) noundef [[NoAS_LOAD3]])
foo2(NoAS);
// CHECK-DAG: [[NoAS_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[NoAS_LOAD4]])
// CHECK: [[NoAS_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: call spir_func void @[[RAW_PTR2]](ptr addrspace(4) noundef [[NoAS_LOAD4]])

// Ensure that we still get 3 different template instantiations.
tmpl(GLOB);
// CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK-DAG: call spir_func void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef [[GLOB_LOAD4]])
// CHECK: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr addrspace(4) [[GLOB]].ascast
// CHECK: call spir_func void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef [[GLOB_LOAD4]])
tmpl(LOC);
// CHECK-DAG: [[LOC_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK-DAG: call spir_func void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef [[LOC_LOAD5]])
// CHECK: [[LOC_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr addrspace(4) [[LOC]].ascast
// CHECK: call spir_func void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef [[LOC_LOAD5]])
tmpl(PRIV);
// CHECK-DAG: [[PRIV_LOAD5:%[a-zA-Z0-9]+]] = load ptr, ptr addrspace(4) [[PRIV]].ascast
// CHECK-DAG: call spir_func void @_Z4tmplIPU3AS0iEvT_(ptr noundef [[PRIV_LOAD5]])
// CHECK: [[PRIV_LOAD5:%[a-zA-Z0-9]+]] = load ptr, ptr addrspace(4) [[PRIV]].ascast
// CHECK: call spir_func void @_Z4tmplIPU3AS0iEvT_(ptr noundef [[PRIV_LOAD5]])
tmpl(NoAS);
// CHECK-DAG: [[NoAS_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK-DAG: call spir_func void @_Z4tmplIPiEvT_(ptr addrspace(4) noundef [[NoAS_LOAD5]])
// CHECK: [[NoAS_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(4), ptr addrspace(4) [[NoAS]].ascast
// CHECK: call spir_func void @_Z4tmplIPiEvT_(ptr addrspace(4) noundef [[NoAS_LOAD5]])
}

// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef %
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef %
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPU3AS0iEvT_(ptr noundef %
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPiEvT_(ptr addrspace(4) noundef %
// CHECK: define linkonce_odr spir_func void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef %
// CHECK: define linkonce_odr spir_func void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef %
// CHECK: define linkonce_odr spir_func void @_Z4tmplIPU3AS0iEvT_(ptr noundef %
// CHECK: define linkonce_odr spir_func void @_Z4tmplIPiEvT_(ptr addrspace(4) noundef %
Loading