-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang] Recover necessary AddrSpaceCast #119246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast. This fixes llvm#86791 .
@llvm/pr-subscribers-clang-codegen Author: Youngsuk Kim (JOE1994) ChangesA necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast. This fixes #86791 . Full diff: https://github.com/llvm/llvm-project/pull/119246.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
CharUnits Align = CGM.getContext().getDeclAlign(VD);
Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
}
+ if (Val->getType() != Wrapper->getReturnType()) {
+ Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+ }
Builder.CreateRet(Val);
}
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+ MyGlobVar = 1;
+}
|
@llvm/pr-subscribers-backend-amdgpu Author: Youngsuk Kim (JOE1994) ChangesA necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast. This fixes #86791 . Full diff: https://github.com/llvm/llvm-project/pull/119246.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
CharUnits Align = CGM.getContext().getDeclAlign(VD);
Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
}
+ if (Val->getType() != Wrapper->getReturnType()) {
+ Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+ }
Builder.CreateRet(Val);
}
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+ MyGlobVar = 1;
+}
|
@llvm/pr-subscribers-clang Author: Youngsuk Kim (JOE1994) ChangesA necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast. This fixes #86791 . Full diff: https://github.com/llvm/llvm-project/pull/119246.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 8cbd09d02c7556..0abea335ad69e4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3302,6 +3302,9 @@ void ItaniumCXXABI::EmitThreadLocalInitFuncs(
CharUnits Align = CGM.getContext().getDeclAlign(VD);
Val = Builder.CreateAlignedLoad(Var->getValueType(), Val, Align);
}
+ if (Val->getType() != Wrapper->getReturnType()) {
+ Val = Builder.CreateAddrSpaceCast(Val, Wrapper->getReturnType());
+ }
Builder.CreateRet(Val);
}
diff --git a/clang/test/OpenMP/amdgpu_threadprivate.cpp b/clang/test/OpenMP/amdgpu_threadprivate.cpp
new file mode 100644
index 00000000000000..5b15255ee62d4a
--- /dev/null
+++ b/clang/test/OpenMP/amdgpu_threadprivate.cpp
@@ -0,0 +1,11 @@
+// REQUIRES: asserts
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 -disable-llvm-passes -fopenmp-targets=amdgcn-amd-amdhsa -x c++ -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -target-cpu gfx906 -fopenmp -nogpulib -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -x c++ %s
+
+// Don't crash with assertions build.
+int MyGlobVar;
+#pragma omp threadprivate(MyGlobVar)
+int main() {
+ MyGlobVar = 1;
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May it be possible that the wrapper function's return type is invalid and the cast isn't needed?
|
||
// Don't crash with assertions build. | ||
int MyGlobVar; | ||
#pragma omp threadprivate(MyGlobVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not crash for sure, but I wonder is this a valid use of OpenMP? I don't know what I should expect from this semantics.
LLVM-IR (from llvm without assertions, before this revision) $thread-local wrapper routine for MyGlobVar = comdat any
@MyGlobVar = external thread_local addrspace(1) global i32, align 4
@__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500
define weak_odr hidden noundef ptr @thread-local wrapper routine for MyGlobVar() local_unnamed_addr #0 comdat {
%1 = tail call align 4 ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) align 4 @MyGlobVar)
ret ptr addrspace(1) %1
}
declare nonnull ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) nonnull) #1
attributes #0 = { nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx906" "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="false" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) } Updating the wrapper function's return type as below also fixes the error: --- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -3078,6 +3078,10 @@ ItaniumCXXABI::getOrCreateThreadLocalWrapper(const VarDecl *VD,
getContext().getPointerType(RetQT), FunctionArgList());
llvm::FunctionType *FnTy = CGM.getTypes().GetFunctionType(FI);
+ // Adjust wrapper function's ret type to have matching llvm addrspace with Val.
+ if (llvm::Type *WrapperRetTy = FnTy->getReturnType();
+ WrapperRetTy->isPointerTy() && Val->getType()->isPointerTy() && WrapperRetTy != Val->getType())
+ FnTy = llvm::FunctionType::get(Val->getType(), FnTy->params(), FnTy->isVarArg());
llvm::Function *Wrapper =
llvm::Function::Create(FnTy, getThreadLocalWrapperLinkage(VD, CGM),
WrapperName.str(), &CGM.getModule()); |
As per feedback from reviewers.
I mean, why do we have to watch for type mismatches and fix them instead of creating the correct type in the first place? |
I totally agree with you. |
I see. Well, neither am I :) I brought this up in case someone else could suggest the preferred way of fixing the issue. |
Correct, this is generally how things should work. There are select contexts where a cast is necessary.
This depends where the value was created from. Generally you just query getTargetAddressSpace |
The function in question is part of the Itanium C++ ABI; the ABI itself doesn't say anything about address-spaces, so by default we assume everything related to the C++ ABI is in the flat address-space. If we start messing with that, we'll need to write a specification somewhere to document exactly which functions use which address-spaces. So I think casting here is appropriate; we can revisit if someone ever wants to define a "GPU Itanium ABI" with different address-spaces. The change to ItaniumCXXABI.cpp looks fine (please address the review comments on the regression test). |
|
||
// CHECK: @MyGlobVar = external thread_local addrspace(1) global i32, align 4 | ||
// CHECK: define weak_odr hidden noundef ptr @_ZTW9MyGlobVar() #0 comdat { | ||
// CHECK-NEXT: %1 = call align 4 ptr addrspace(1) @llvm.threadlocal.address.p1(ptr addrspace(1) align 4 @MyGlobVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support this intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For AMDGPU codegen) Doesn't seem so to me.
The intrinsic was added in 9701053
A necessary AddrSpaceCast was wrongfully deleted in 5c91b28 . Recover the AddrSpaceCast.
This fixes #86791 .