Skip to content

[clang] Recover necessary AddrSpaceCast #119246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 17, 2024
Merged

Conversation

JOE1994
Copy link
Member

@JOE1994 JOE1994 commented Dec 9, 2024

A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast.

This fixes #86791 .

A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 .
Recover the AddrSpaceCast.

This fixes llvm#86791 .
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Dec 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-clang-codegen

Author: Youngsuk Kim (JOE1994)

Changes

A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast.

This fixes #86791 .


Full diff: https://github.com/llvm/llvm-project/pull/119246.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+3)
  • (added) clang/test/OpenMP/amdgpu_threadprivate.cpp (+11)
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
       CharUnits Align = CGM.getContext().getDeclAlign(VD);
       Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
     }
+    if (Val->getType() != Wrapper->getReturnType()) {
+      Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+    }
 
     Builder.CreateRet(Val);
   }
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+  MyGlobVar = 1;
+}

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Youngsuk Kim (JOE1994)

Changes

A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast.

This fixes #86791 .


Full diff: https://github.com/llvm/llvm-project/pull/119246.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+3)
  • (added) clang/test/OpenMP/amdgpu_threadprivate.cpp (+11)
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
       CharUnits Align = CGM.getContext().getDeclAlign(VD);
       Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
     }
+    if (Val->getType() != Wrapper->getReturnType()) {
+      Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+    }
 
     Builder.CreateRet(Val);
   }
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+  MyGlobVar = 1;
+}

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-clang

Author: Youngsuk Kim (JOE1994)

Changes

A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast.

This fixes #86791 .


Full diff: https://github.com/llvm/llvm-project/pull/119246.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+3)
  • (added) clang/test/OpenMP/amdgpu_threadprivate.cpp (+11)
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
       CharUnits Align = CGM.getContext().getDeclAlign(VD);
       Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
     }
+    if (Val->getType() != Wrapper->getReturnType()) {
+      Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+    }
 
     Builder.CreateRet(Val);
   }
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+  MyGlobVar = 1;
+}

Copy link
Contributor

@s-barannikov s-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May it be possible that the wrapper function's return type is invalid and the cast isn't needed?


// Don't crash with assertions build.
int MyGlobVar;
#pragma omp threadprivate(MyGlobVar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not crash for sure, but I wonder is this a valid use of OpenMP? I don't know what I should expect from this semantics.

@JOE1994
Copy link
Member Author

JOE1994 commented Dec 12, 2024

May it be possible that the wrapper function's return type is invalid and the cast isn't needed?

LLVM-IR (from llvm without assertions, before this revision)
shows that the wrapper function returns ptr while MyGlobVar has addrspace(1).

$thread-local wrapper routine for MyGlobVar = comdat any

@MyGlobVar = external thread_local addrspace(1) global i32, align 4
@__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500

define weak_odr hidden noundef ptr @thread-local wrapper routine for MyGlobVar() local_unnamed_addr #0 comdat {
  %1 = tail call align 4 ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) align 4 @MyGlobVar)
  ret ptr addrspace(1) %1
}

declare nonnull ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) nonnull) #1

attributes #0 = { nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx906" "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="false" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Updating the wrapper function's return type as below also fixes the error:

--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3078,6 +3078,10 @@ ItaniumCXXABI::getOrCreateThreadLocalWrapper(const VarDecl *VD,
       getContext().getPointerType(RetQT), FunctionArgList());

   llvm::FunctionType *FnTy = CGM.getTypes().GetFunctionType(FI);
+  // Adjust wrapper function's ret type to have matching llvm addrspace with Val.
+  if (llvm::Type *WrapperRetTy = FnTy->getReturnType();
+      WrapperRetTy->isPointerTy() && Val->getType()->isPointerTy() && WrapperRetTy != Val->getType())
+    FnTy = llvm::FunctionType::get(Val->getType(), FnTy->params(), FnTy->isVarArg());
   llvm::Function *Wrapper =
       llvm::Function::Create(FnTy, getThreadLocalWrapperLinkage(VD, CGM),
                              WrapperName.str(), &CGM.getModule());

As per feedback from reviewers.
@s-barannikov
Copy link
Contributor

Updating the wrapper function's return type as below also fixes the error:

I mean, why do we have to watch for type mismatches and fix them instead of creating the correct type in the first place?
There is getPointerType(RetQT) just above, shouldn't it be something like getPointerType(RetQT, <correct address space>)?
Or RetQT should have had the correct address space, but for some reason it was lost?

@JOE1994
Copy link
Member Author

JOE1994 commented Dec 12, 2024

I mean, why do we have to watch for type mismatches and fix them instead of creating the correct type in the first place?
There is getPointerType(RetQT) just above, shouldn't it be something like getPointerType(RetQT, )?
Or RetQT should have had the correct address space, but for some reason it was lost?

I totally agree with you.
I'm unfamiliar with Clang Qualtypes
and how clang QualTypes' addrspace info
match to llvm types' addrspace info,
so I shared a quick dirty fix to at least share a direction.

@s-barannikov
Copy link
Contributor

s-barannikov commented Dec 12, 2024

I'm unfamiliar with Clang Qualtypes

I see. Well, neither am I :) I brought this up in case someone else could suggest the preferred way of fixing the issue.
I don't have objections against the current approach, but I'm not qualified to review it either.

@arsenm
Copy link
Contributor

arsenm commented Dec 13, 2024

I mean, why do we have to watch for type mismatches and fix them instead of creating the correct type in the first place? There is getPointerType(RetQT) just above, shouldn't it be something like getPointerType(RetQT, <correct address space>)? Or RetQT should have had the correct address space, but for some reason it was lost?

Correct, this is generally how things should work. There are select contexts where a cast is necessary.

I'm unfamiliar with Clang Qualtypes and how clang QualTypes' addrspace info match to llvm types' addrspace info, so I shared a quick dirty fix to at least share a direction.

This depends where the value was created from. Generally you just query getTargetAddressSpace

@efriedma-quic
Copy link
Collaborator

The function in question is part of the Itanium C++ ABI; the ABI itself doesn't say anything about address-spaces, so by default we assume everything related to the C++ ABI is in the flat address-space. If we start messing with that, we'll need to write a specification somewhere to document exactly which functions use which address-spaces.

So I think casting here is appropriate; we can revisit if someone ever wants to define a "GPU Itanium ABI" with different address-spaces.

The change to ItaniumCXXABI.cpp looks fine (please address the review comments on the regression test).


// CHECK: @MyGlobVar = external thread_local addrspace(1) global i32, align 4
// CHECK: define weak_odr hidden noundef ptr @_ZTW9MyGlobVar() #0 comdat {
// CHECK-NEXT: %1 = call align 4 ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) align 4 @MyGlobVar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support this intrinsic?

Copy link
Member Author

@JOE1994 JOE1994 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For AMDGPU codegen) Doesn't seem so to me.

The intrinsic was added in 9701053

@JOE1994 JOE1994 merged commit b4c1ded into llvm:main Dec 17, 2024
8 checks passed
@JOE1994 JOE1994 deleted the fix_for_86791 branch December 17, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Clang][OpenMP] Compilation error on omp threadprivate when -fopenmp-targets=amdgcn-amd-amdhsa
6 participants