[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. #78759

mmoadeli · 2024-01-19T18:51:43Z

Address space cast of nullptr in local_space into a generic_space for the CUDA backend. The reason for this cast was having invalid local memory base address for the associated variable.
In the context of AMD GPU, assigns a NULL value as ~0 for the address spaces of sycl_local and sycl_private to match the ones for opencl_local and opencl_private.

llvmbot · 2024-01-19T18:52:13Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (mmoadeli)

Changes

Address space cast of nullptr in local_space into a generic_space for the CUDA backend. The reason for this cast was having invalid local memory base address for the associated variable.
In the context of AMD GPU, assigns a NULL value as ~0 for the address spaces of sycl_local and sycl_private to match the ones for opencl_local and opencl_private.

Full diff: https://github.com/llvm/llvm-project/pull/78759.diff

5 Files Affected:

(modified) clang/lib/Basic/Targets/AMDGPU.h (+4-2)
(modified) clang/lib/CodeGen/Targets/NVPTX.cpp (+18)
(modified) clang/test/CodeGenSYCL/address-space-conversions.cpp (+4)
(added) clang/test/CodeGenSYCL/amd-address-space-conversions.cpp (+128)
(added) clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp (+122)

diff --git a/clang/lib/Basic/Targets/AMDGPU.h b/clang/lib/Basic/Targets/AMDGPU.h
index 90a1516ecdd20d..94d6ee7f5f72df 100644
--- a/clang/lib/Basic/Targets/AMDGPU.h
+++ b/clang/lib/Basic/Targets/AMDGPU.h
@@ -418,8 +418,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo {
   // value ~0.
   uint64_t getNullPointerValue(LangAS AS) const override {
     // FIXME: Also should handle region.
-    return (AS == LangAS::opencl_local || AS == LangAS::opencl_private)
-      ? ~0 : 0;
+    return (AS == LangAS::opencl_local || AS == LangAS::opencl_private ||
+            AS == LangAS::sycl_local || AS == LangAS::sycl_private)
+               ? ~0
+               : 0;
   }
 
   void setAuxTarget(const TargetInfo *Aux) override;
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp
index d0dc7c258a03a6..8718f1ecf3a7e0 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -47,6 +47,10 @@ class NVPTXTargetCodeGenInfo : public TargetCodeGenInfo {
                            CodeGen::CodeGenModule &M) const override;
   bool shouldEmitStaticExternCAliases() const override;
 
+  llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
+                                 llvm::PointerType *T,
+                                 QualType QT) const override;
+
   llvm::Type *getCUDADeviceBuiltinSurfaceDeviceType() const override {
     // On the device side, surface reference is represented as an object handle
     // in 64-bit integer.
@@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
 bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
   return false;
 }
+
+llvm::Constant *
+NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,
+                                       llvm::PointerType *PT,
+                                       QualType QT) const {
+  auto &Ctx = CGM.getContext();
+  if (PT->getAddressSpace() != Ctx.getTargetAddressSpace(LangAS::opencl_local))
+    return llvm::ConstantPointerNull::get(PT);
+
+  auto NPT = llvm::PointerType::get(
+      PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic));
+  return llvm::ConstantExpr::getAddrSpaceCast(
+      llvm::ConstantPointerNull::get(NPT), PT);
+}
 }
 
 void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
diff --git a/clang/test/CodeGenSYCL/address-space-conversions.cpp b/clang/test/CodeGenSYCL/address-space-conversions.cpp
index 3933ad375412da..10a181318a174b 100644
--- a/clang/test/CodeGenSYCL/address-space-conversions.cpp
+++ b/clang/test/CodeGenSYCL/address-space-conversions.cpp
@@ -25,6 +25,10 @@ void usages() {
   __attribute__((opencl_local)) int *LOC;
   // CHECK-DAG: [[NoAS:%[a-zA-Z0-9]+]] = alloca ptr addrspace(4)
   // CHECK-DAG: [[NoAS]].ascast = addrspacecast ptr [[NoAS]] to ptr addrspace(4)
+  LOC = nullptr;
+  // CHECK-DAG: store ptr addrspace(3) null, ptr addrspace(4) [[LOC]].ascast, align 8
+  GLOB = nullptr;
+  // CHECK-DAG: store ptr addrspace(1) null, ptr addrspace(4) [[GLOB]].ascast, align 8
   int *NoAS;
   // CHECK-DAG: [[PRIV:%[a-zA-Z0-9]+]] = alloca ptr
   // CHECK-DAG: [[PRIV]].ascast = addrspacecast ptr [[PRIV]] to ptr addrspace(4)
diff --git a/clang/test/CodeGenSYCL/amd-address-space-conversions.cpp b/clang/test/CodeGenSYCL/amd-address-space-conversions.cpp
new file mode 100644
index 00000000000000..35da61cd8cbbe3
--- /dev/null
+++ b/clang/test/CodeGenSYCL/amd-address-space-conversions.cpp
@@ -0,0 +1,128 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
+void bar(int &Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_REF:[a-zA-Z0-9_]+]](ptr noundef nonnull align 4 dereferenceable(4) %
+void bar2(int &Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_REF2:[a-zA-Z0-9_]+]](ptr noundef nonnull align 4 dereferenceable(4) %
+void bar(__attribute__((opencl_local)) int &Data) {}
+// CHECK-DAG: define dso_local void @[[LOCAL_REF:[a-zA-Z0-9_]+]](ptr addrspace(3) noundef align 4 dereferenceable(4) %
+void foo(int *Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_PTR:[a-zA-Z0-9_]+]](ptr noundef %
+void foo2(int *Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_PTR2:[a-zA-Z0-9_]+]](ptr noundef %
+void foo(__attribute__((opencl_local)) int *Data) {}
+// CHECK-DAG: define dso_local void @[[LOC_PTR:[a-zA-Z0-9_]+]](ptr addrspace(3) noundef %
+
+template <typename T>
+void tmpl(T t);
+// See Check Lines below.
+
+void usages() {
+  __attribute__((opencl_global)) int *GLOB;
+  // CHECK-DAG: [[GLOB:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8, addrspace(5)
+  __attribute__((opencl_local)) int *LOC;
+  // CHECK-DAG: [[LOC:%[a-zA-Z0-9]+]] = alloca ptr addrspace(3), align 4, addrspace(5)
+  LOC = nullptr;
+  // CHECK-DAG: store ptr addrspace(3) addrspacecast (ptr null to ptr addrspace(3)), ptr [[LOC]].ascast, align 4
+  GLOB = nullptr;
+  // CHECK-DAG: store ptr addrspace(1) null, ptr [[GLOB]].ascast, align 8
+  int *NoAS;
+  // CHECK-DAG: [[NoAS:%[a-zA-Z0-9]+]] = alloca ptr, align 8, addrspace(5)
+  __attribute__((opencl_private)) int *PRIV;
+  // CHECK-DAG: [[PRIV:%[a-zA-Z0-9]+]] = alloca ptr addrspace(5), align 4, addrspace(5)
+  __attribute__((opencl_global_device)) int *GLOBDEVICE;
+  // CHECK-DAG: [[GLOB_DEVICE:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8, addrspace(5)
+  __attribute__((opencl_global_host)) int *GLOBHOST;
+  // CHECK-DAG: [[GLOB_HOST:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8, addrspace(5)
+  NoAS = (int *)GLOB;
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: store ptr [[GLOB_CAST]], ptr [[NoAS]].ascast, align 8
+  NoAS = (int *)LOC;
+  // CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: [[LOC_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD]] to ptr
+  // CHECK-DAG: store ptr [[LOC_CAST]], ptr [[NoAS]].ascast, align 8
+  NoAS = (int *)PRIV;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(5), ptr [[PRIV]].ascast, align 4
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(5) [[NoAS_LOAD]] to ptr
+  // CHECK-DAG: store ptr %5, ptr [[NoAS]].ascast, align 8
+  GLOB = (__attribute__((opencl_global)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr %6 to ptr addrspace(1)
+  // CHECK-DAG: store ptr addrspace(1) %7, ptr [[GLOB]].ascast, align 8
+  LOC = (__attribute__((opencl_local)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[NoAS_LOAD]] to ptr addrspace(3)
+  // CHECK-DAG: store ptr addrspace(3) %9, ptr [[LOC]].ascast, align 4
+  PRIV = (__attribute__((opencl_private)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[NoAS_LOAD]] to ptr addrspace(5)
+  // CHECK-DAG: store ptr addrspace(5) [[NoAS_CAST]], ptr [[PRIV]].ascast, align 4
+  GLOB = (__attribute__((opencl_global)) int *)GLOBDEVICE;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]]DEVICE.ascast, align 8
+  // CHECK-DAG: store ptr addrspace(1) [[NoAS_LOAD]], ptr [[GLOB]].ascast, align 8
+  GLOB = (__attribute__((opencl_global)) int *)GLOBHOST;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]]HOST.ascast, align 8
+  // CHECK-DAG: tore ptr addrspace(1) [[NoAS_LOAD]], ptr [[GLOB]].ascast, align 8
+  bar(*GLOB);
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: call void @[[RAW_REF]](ptr noundef nonnull align 4 dereferenceable(4) [[GLOB_CAST]])
+  bar2(*GLOB);
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: call void @[[RAW_REF2]](ptr noundef nonnull align 4 dereferenceable(4) [[GLOB_CAST]])
+  bar(*LOC);
+  // CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: call void @_Z3barRU3AS3i(ptr addrspace(3) noundef align 4 dereferenceable(4) [[LOC_LOAD]])
+  bar2(*LOC);
+  // CHECK-DAG: [[LOC_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: [[LOC_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD2]] to ptr
+  // CHECK-DAG: call void @_Z4bar2Ri(ptr noundef nonnull align 4 dereferenceable(4) [[LOC_CAST2]])
+  bar(*NoAS);
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: call void @_Z3barRi(ptr noundef nonnull align 4 dereferenceable(4) [[NoAS_LOAD]])
+  bar2(*NoAS);
+  // CHECK-DAG: [[NoAS_LOAD2:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: call void @_Z4bar2Ri(ptr noundef nonnull align 4 dereferenceable(4) [[NoAS_LOAD2]])
+  foo(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: [[GLOB_CAST3:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD3]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR]](ptr noundef [[GLOB_CAST3]])
+   foo2(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: [[GLOB_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD4]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[GLOB_CAST4]])
+  foo(LOC);
+  // CHECK-DAG: [[LOC_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: call void @[[LOC_PTR]](ptr addrspace(3) noundef [[LOC_LOAD3]])
+  foo2(LOC);
+  // CHECK-DAG: [[LOC_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: [[LOC_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD4]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[LOC_CAST4]])
+  foo(NoAS);
+  // CHECK-DAG: [[NoAS_LOAD3:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: call void @[[RAW_PTR]](ptr noundef [[NoAS_LOAD3]])
+  foo2(NoAS);
+  // CHECK-DAG: [[NoAS_LOAD4:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[NoAS_LOAD4]])
+
+  // Ensure that we still get 3 different template instantiations.
+  tmpl(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]].ascast, align 8
+  // CHECK-DAG: call void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef [[GLOB_LOAD4]])
+  tmpl(LOC);
+  // CHECK-DAG: [[LOC_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]].ascast, align 4
+  // CHECK-DAG: call void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef [[LOC_LOAD5]])
+  tmpl(PRIV);
+  // CHECK-DAG: [[PRIV_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(5), ptr [[PRIV]].ascast, align 4
+  // CHECK-DAG: call void @_Z4tmplIPU3AS5iEvT_(ptr addrspace(5) noundef [[PRIV_LOAD5]])
+  tmpl(NoAS);
+  // CHECK-DAG: [[NoAS_LOAD5:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]].ascast, align 8
+  // CHECK-DAG: call void @_Z4tmplIPiEvT_(ptr noundef [[NoAS_LOAD5]])
+}
+
+// CHECK-DAG: declare void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef)
+// CHECK-DAG: declare void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef)
+// CHECK-DAG: declare void @_Z4tmplIPU3AS5iEvT_(ptr addrspace(5) noundef)
+// CHECK-DAG: declare void @_Z4tmplIPiEvT_(ptr noundef)
+
diff --git a/clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp b/clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp
new file mode 100644
index 00000000000000..82b80d8fc6b6ff
--- /dev/null
+++ b/clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp
@@ -0,0 +1,122 @@
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
+void bar(int &Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_REF:[a-zA-Z0-9_]+]](ptr noundef nonnull align 4 dereferenceable(4) %
+void bar2(int &Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_REF2:[a-zA-Z0-9_]+]](ptr noundef nonnull align 4 dereferenceable(4) %
+void bar(__attribute__((opencl_local)) int &Data) {}
+// CHECK-DAG: define dso_local void @[[LOCAL_REF:[a-zA-Z0-9_]+]](ptr addrspace(3) noundef align 4 dereferenceable(4) %
+void foo(int *Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_PTR:[a-zA-Z0-9_]+]](ptr noundef %
+void foo2(int *Data) {}
+// CHECK-DAG: define dso_local void @[[RAW_PTR2:[a-zA-Z0-9_]+]](ptr noundef %
+void foo(__attribute__((opencl_local)) int *Data) {}
+// CHECK-DAG: define dso_local void @[[LOC_PTR:[a-zA-Z0-9_]+]](ptr addrspace(3) noundef %
+
+template <typename T>
+void tmpl(T t);
+// See Check Lines below.
+
+void usages() {
+  __attribute__((opencl_global)) int *GLOB;
+  // CHECK-DAG: [[GLOB:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8
+  __attribute__((opencl_local)) int *LOC;
+  // CHECK-DAG: [[LOC:%[a-zA-Z0-9]+]] = alloca ptr addrspace(3), align 8
+  LOC = nullptr;
+  // CHECK-DAG: store ptr addrspace(3) addrspacecast (ptr null to ptr addrspace(3)), ptr [[LOC]], align 8
+  GLOB = nullptr;
+  // CHECK-DAG: store ptr addrspace(1) null, ptr [[GLOB]], align 8
+  int *NoAS;
+  // CHECK-DAG: [[NoAS:%[a-zA-Z0-9]+]] = alloca ptr, align 8
+  __attribute__((opencl_private)) int *PRIV;
+  // CHECK-DAG: [[PRIV:%[a-zA-Z0-9]+]] = alloca ptr, align 8
+  __attribute__((opencl_global_device)) int *GLOBDEVICE;
+  // CHECK-DAG: [[GLOB_DEVICE:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8
+  __attribute__((opencl_global_host)) int *GLOBHOST;
+  // CHECK-DAG: [[GLOB_HOST:%[a-zA-Z0-9]+]] = alloca ptr addrspace(1), align 8
+  NoAS = (int *)GLOB;
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: store ptr [[GLOB_CAST]], ptr [[NoAS]], align 8
+  NoAS = (int *)LOC;
+  // CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: [[LOC_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD]] to ptr
+  // CHECK-DAG: store ptr [[LOC_CAST]], ptr [[NoAS]], align 8
+  NoAS = (int *)PRIV;
+  // CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[PRIV]], align 8
+  // CHECK-DAG: store ptr [[LOC_LOAD]], ptr [[NoAS]], align 8
+  GLOB = (__attribute__((opencl_global)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[NoAS_LOAD]] to ptr addrspace(1)
+  // CHECK-DAG: store ptr addrspace(1) [[NoAS_CAST]], ptr [[GLOB]], align 8
+  LOC = (__attribute__((opencl_local)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: [[NoAS_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr [[NoAS_LOAD]] to ptr addrspace(3)
+  // CHECK-DAG: store ptr addrspace(3) [[NoAS_CAST]], ptr [[LOC]], align 8
+  PRIV = (__attribute__((opencl_private)) int *)NoAS;
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: store ptr [[NoAS_LOAD]], ptr [[PRIV]], align 8
+  GLOB = (__attribute__((opencl_global)) int *)GLOBDEVICE;
+  // CHECK-DAG: [[GLOBDEVICE_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB_DEVICE]], align 8
+  // CHECK-DAG: store ptr addrspace(1) [[GLOBDEVICE_LOAD]], ptr %GLOB, align 8
+  GLOB = (__attribute__((opencl_global)) int *)GLOBHOST;
+  // CHECK-DAG: [[GLOB_HOST_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB_HOST]], align 8
+  // CHECK-DAG: store ptr addrspace(1) [[GLOB_HOST_LOAD]], ptr [[GLOB]], align 8
+  bar(*GLOB);
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: call void @[[RAW_REF]](ptr noundef nonnull align 4 dereferenceable(4) [[GLOB_CAST]])
+  bar2(*GLOB);
+  // CHECK-DAG: [[GLOB_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: [[GLOB_CAST:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD]] to ptr
+  // CHECK-DAG: call void @[[RAW_REF2]](ptr noundef nonnull align 4 dereferenceable(4) [[GLOB_CAST]])
+  bar(*LOC);
+  // CHECK-DAG: [[LOC_LOAD:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: call void @[[LOCAL_REF]](ptr addrspace(3) noundef align 4 dereferenceable(4) [[LOC_LOAD]])
+  bar2(*LOC);
+  // CHECK-DAG: [[LOC_LOAD2:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: [[LOC_CAST2:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD2]] to ptr
+  // CHECK-DAG: call void @[[RAW_REF2]](ptr noundef nonnull align 4 dereferenceable(4) [[LOC_CAST2]])
+  bar(*NoAS);
+  // CHECK-DAG: [[NoAS_LOAD:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: call void @[[RAW_REF]](ptr noundef nonnull align 4 dereferenceable(4) [[NoAS_LOAD]])
+  bar2(*NoAS);
+  // CHECK-DAG: [[NoAS_LOAD2:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: call void @[[RAW_REF2]](ptr noundef nonnull align 4 dereferenceable(4) [[NoAS_LOAD2]])
+  foo(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: [[GLOB_CAST3:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD3]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR]](ptr noundef [[GLOB_CAST3]])
+  foo2(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: [[GLOB_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(1) [[GLOB_LOAD4]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[GLOB_CAST4]])
+  foo(LOC);
+  // CHECK-DAG: [[LOC_LOAD3:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: call void @[[LOC_PTR]](ptr addrspace(3) noundef [[LOC_LOAD3]])
+  foo2(LOC);
+  // CHECK-DAG: [[LOC_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: [[LOC_CAST4:%[a-zA-Z0-9]+]] = addrspacecast ptr addrspace(3) [[LOC_LOAD4]] to ptr
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[LOC_CAST4]])
+  foo(NoAS);
+  // CHECK-DAG: [[NoAS_LOAD3:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: call void @[[RAW_PTR]](ptr noundef [[NoAS_LOAD3]])
+  foo2(NoAS);
+  // CHECK-DAG: [[NoAS_LOAD4:%[a-zA-Z0-9]+]] = load ptr, ptr [[NoAS]], align 8
+  // CHECK-DAG: call void @[[RAW_PTR2]](ptr noundef [[NoAS_LOAD4]])
+  tmpl(GLOB);
+  // CHECK-DAG: [[GLOB_LOAD4:%[a-zA-Z0-9]+]] = load ptr addrspace(1), ptr [[GLOB]], align 8
+  // CHECK-DAG: call void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef [[GLOB_LOAD4]])
+  tmpl(LOC);
+  // CHECK-DAG: [[LOC_LOAD5:%[a-zA-Z0-9]+]] = load ptr addrspace(3), ptr [[LOC]], align 8
+  // CHECK-DAG: call void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef [[LOC_LOAD5]])
+  tmpl(PRIV);
+  // CHECK-DAG: [[PRIV_LOAD5:%[a-zA-Z0-9]+]] = load ptr, ptr [[PRIV]], align 8
+  // CHECK-DAG: call void @_Z4tmplIPiEvT_(ptr noundef [[PRIV_LOAD5]])
+  tmpl(NoAS);
+// CHECK-DAG: %33 = load ptr, ptr %NoAS, align 8
+// CHECK-DAG: call void @_Z4tmplIPiEvT_(ptr noundef %33)
+}
+
+// CHECK-DAG: declare void @_Z4tmplIPU3AS1iEvT_(ptr addrspace(1) noundef)
+// CHECK-DAG: declare void @_Z4tmplIPU3AS3iEvT_(ptr addrspace(3) noundef)
+// CHECK-DAG: declare void @_Z4tmplIPiEvT_(ptr noundef)

yxsamliu · 2024-01-22T14:48:49Z

LGTM for amdgpu

Artem-B

It would be great to add some tests for local AS null pointers for NVPTX and AMDGPU back-ends.

mmoadeli · 2024-01-22T20:27:06Z

It would be great to add some tests for local AS null pointers for NVPTX and AMDGPU back-ends.

Thanks @Artem-B

They already have it here for cuda and here for amd

Artem-B · 2024-01-22T20:50:54Z

clang/lib/CodeGen/Targets/NVPTX.cpp

+      PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic));
+  return llvm::ConstantExpr::getAddrSpaceCast(
+      llvm::ConstantPointerNull::get(NPT), PT);
+}


I don't quite understand what's going on here. Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

Will it work for CUDA (as in the CUDA language)? I think this code should be restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the dafault.

Below code in the AMDGPU file shows that the ASC is already in place for HIP.
LOC = nullptr; // CHECK-DAG: store ptr addrspace(3) addrspacecast (ptr null to ptr addrspace(3)), ptr [[LOC]].ascast, align 4
For the CUDA case, not ASCing the nullptr in local_space results in undefined behaviour as only the 24 LSBs of the nullptr value is 0 and the rest is changing with each run. i.e. if I copy local_space nullptr in the kernel to a long var and print the value, it won't be zero. So, overall, the ASCing the local_space nullptr for CUDA is needed as it's already there for AMDGPU.

Hi @Artem-B

I'm shimming in at @mmoadeli's request. I advised him on the resolution of his issue.

I don't quite understand what's going on here.

So it is a similar story as for the AMDGPU backend. 0 as a pointer to shared memory is a valid one and points to the root of the shared memory, so that's means we cannot use this value as nullptr. AMDGPU uses -1 (all bits set) for this, but we couldn't find anything equivalent in the CUDA/PTX documentation. After a few investigation, we found out the most stable way to do this is simply by inserting this expression.

Note that 0 as a pointer to the generic address space the expected value for nullptr

Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

The patch isn't doing this, if the pointer type is to the cuda shared address space (opencl's local address space) then we do ASC. Otherwise this emits the simple llvm::ConstantPointerNull.
We used LangAS::opencl_generic as a way to emphasis there is a generic to shared address space cast going on. The other solution here would be to use LangAS::Default to retrieve the target address space, but Default doesn't sound right to me as you have to know this maps to NVPTX's generic target address space. Either way, we don't have a strong opinion on what to use. But a comment is probably needed regardless.

Will it work for CUDA (as in the CUDA language)? I think this code should be restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the dafault.

So yes and no. To the Will it work for CUDA ? part, yes it will because you actually cannot hit this return. CUDA doesn't expose address spaces so you can't have that nullptr as an address in the cuda shared address space, so the if above will always evaluate to true in CUDA.

For the leave CUDA/HIP with the dafault part, you could force things and use target address spaces like it is done in the clang headers for CUDA and this change would capture that. However, as explained before, 0 in the address space 3 (NVPTX backend) is a valid address and it is very easy to highlight in SASS.

As far as I can tell the reason for the AMDGPU code using an addrspacecast is the following comment // Currently LLVM assumes null pointers always have value 0, which results in incorrectly transformed IR. so it can't use a null literal for all ones.

Looking at the lang-ref I can't actually see anywhere that ptr addrspace(X) null is the zero value, so this should probably be clarified in the lagref: https://llvm.org/docs/LangRef.html#t-pointer

"The semantics of non-zero address spaces are target-specific. Memory access through a non-dereferenceable pointer is undefined behavior in any address space. Pointers with the bit-value 0 are only assumed to be non-dereferenceable in address space 0, unless the function is marked with the null_pointer_is_valid attribute."

I think this really needs to be a property of the datalayout. the addrspacecast is a roundabout way of using -1

Agreed. I believe that right now some LLVM passes assume a null constant is all zeroes, even though LangRef does not specify it so we need to either document those semantics or add a property to the data layout.

Artem-B · 2024-01-22T20:52:05Z

Address space cast of nullptr in local_space into a generic_space for the CUDA backend.

I think you mean "NVPTX back-end". CUDA is a front-end entity (C++ w/ GPU extensions)

mmoadeli · 2024-01-22T21:26:05Z

Address space cast of nullptr in local_space into a generic_space for the CUDA backend.

I think you mean "NVPTX back-end". CUDA is a front-end entity (C++ w/ GPU extensions)

sorry for confusion. Conceptually you are right. I use SYCL terminology which considers cuda / hip as a backend.

clang/lib/Basic/Targets/AMDGPU.h

arsenm

amdgpu parts lgtm (which could be split to a separate change from the ptx change)

clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp

…generic space for the CUDA backend. - In the context of AMD GPU, assigns a NULL value of `~0` for the address spaces of `sycl_local` and `sycl_private`

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 19, 2024

ldrumm requested review from arsenm, Artem-B, arichardson and yxsamliu January 22, 2024 13:24

Artem-B reviewed Jan 22, 2024

View reviewed changes

arsenm reviewed Jan 25, 2024

View reviewed changes

clang/lib/Basic/Targets/AMDGPU.h Show resolved Hide resolved

arsenm approved these changes Feb 6, 2024

View reviewed changes

arichardson reviewed Feb 6, 2024

View reviewed changes

clang/test/CodeGenSYCL/cuda-address-space-conversions.cpp Show resolved Hide resolved

mmoadeli added 2 commits February 19, 2024 13:05

- Address space cast of a local_space specialized nullptr into a …

717ad72

…generic space for the CUDA backend. - In the context of AMD GPU, assigns a NULL value of `~0` for the address spaces of `sycl_local` and `sycl_private`

Replace CHECK-DAG with CHECK in address space conversion tests

dbe1ada

arsenm approved these changes Feb 26, 2024

View reviewed changes

arsenm merged commit f540044 into llvm:main Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. #78759

[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. #78759

Uh oh!

mmoadeli commented Jan 19, 2024

Uh oh!

llvmbot commented Jan 19, 2024 •

edited

Loading

Uh oh!

yxsamliu commented Jan 22, 2024

Uh oh!

Artem-B left a comment

Uh oh!

mmoadeli commented Jan 22, 2024 •

edited

Loading

Uh oh!

Artem-B Jan 22, 2024

Uh oh!

mmoadeli Jan 22, 2024

Uh oh!

Naghasan Jan 25, 2024

Uh oh!

arichardson Feb 7, 2024

Uh oh!

arsenm Feb 7, 2024

Uh oh!

arsenm Feb 7, 2024

Uh oh!

arichardson Feb 7, 2024

Uh oh!

Artem-B commented Jan 22, 2024

Uh oh!

mmoadeli commented Jan 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

arsenm left a comment

Uh oh!

Uh oh!

Uh oh!

[NVPTX][AMDGPU][CodeGen] Fix local_space nullptr handling for NVPTX and local/private nullptr value for AMDGPU. #78759

[NVPTX][AMDGPU][CodeGen] Fix local_space nullptr handling for NVPTX and local/private nullptr value for AMDGPU. #78759

Uh oh!

Conversation

mmoadeli commented Jan 19, 2024

Uh oh!

llvmbot commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yxsamliu commented Jan 22, 2024

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

mmoadeli commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

mmoadeli Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

Naghasan Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

arichardson Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

arichardson Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B commented Jan 22, 2024

Uh oh!

mmoadeli commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. #78759

[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. #78759

llvmbot commented Jan 19, 2024 •

edited

Loading

mmoadeli commented Jan 22, 2024 •

edited

Loading

mmoadeli commented Jan 22, 2024 •

edited

Loading