Skip to content

[CUDA][HIP] fix virtual dtor host/device attr #128926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 3, 2025
Merged

Conversation

yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented Feb 26, 2025

Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).

It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.

This is a serious issue since it prevents users from
using std::string with C++20 on Windows.

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.

Fixes: #108548

Fixes: SWDEV-517435

@yxsamliu yxsamliu requested a review from Artem-B February 26, 2025 17:58
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 26, 2025

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Currently if CUDA/HIP users use template class with virtual dtor and std::string data member with C++20 and MSVC. When the template class is explicitly instantiated, they encounter unresolved symbols in linker.

It was caused by clang inferring host/device attributes for default dtors. Since all dtors of member and parent classes have implicit host device attrs, clang infers the virtual dtor have implicit host and device attrs. Since virtual dtor of explicitly instantiated template class must be emitted, this causes constexpr dtor of std::string emitted, which calls a host function which was note emitted on device side.

This is a serious issue since it prevents users from using std::string with C++20 on Windows.

There are two issues revealed:

  1. The deferred diag failed to diagnose calling of host function in host device function in device compilation. this can be further divided into two issuse:

1a. the deferred diag visitor does not visit dtor of member
when visiting dtor, which it should

1b. the deferred diag visitor does not visit dtor of explicit template class instantiation, which it should

By fixing this issue, clang will diag the issue in compilation instead of linking.

  1. When inferring host device attr of virtual dtor of explicit template class instantiation, clang should be conservative since it is sure to be emitted. Since an implicit host device function may call a host function, clang cannot assume it is always available on device. This guarantees dtors that may call host functions not to have implicit device attr, therefore will not be emitted on device side.

Fixes: #108548

Fixes: SWDEV-517435


Full diff: https://github.com/llvm/llvm-project/pull/128926.diff

6 Files Affected:

  • (modified) clang/docs/HIPSupport.rst (+20)
  • (modified) clang/include/clang/Sema/Sema.h (+1-1)
  • (modified) clang/lib/Sema/Sema.cpp (+58)
  • (modified) clang/lib/Sema/SemaCUDA.cpp (+21-2)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+15)
  • (added) clang/test/SemaCUDA/dtor.cu (+104)
diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 481ed39230813..8f473c21e1918 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -286,6 +286,26 @@ Example Usage
       basePtr->virtualFunction(); // Allowed since obj is constructed in device code
    }
 
+Host and Device Attributes of Default Destructors
+===================================================
+
+If a default destructor does not have explicit host or device attributes,
+clang infers these attributes based on the destructors of its data members
+and base classes. If any conflicts are detected among these destructors,
+clang diagnoses the issue. Otherwise, clang adds an implicit host or device
+attribute according to whether the data members's and base classes's
+destructors can execute on the host or device side.
+
+For explicit template classes with virtual destructors, which must be emitted,
+the inference adopts a conservative approach. In this case, implicit host or
+device attributes from member and base class destructors are ignored. This
+precaution is necessary because, although a constexpr destructor carries
+implicit host or device attributes, a constexpr function may call a
+non-constexpr function, which is by default a host function.
+
+Users can override the inferred host and device attributes of default
+destructors by adding explicit host and device attributes to them.
+
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==============================================================
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index ebdbc69384efb..3b2be86a88e82 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4392,11 +4392,11 @@ class Sema final : public SemaBase {
   // Whether the callee should be ignored in CUDA/HIP/OpenMP host/device check.
   bool shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee);
 
-private:
   /// Function or variable declarations to be checked for whether the deferred
   /// diagnostics should be emitted.
   llvm::SmallSetVector<Decl *, 4> DeclsToCheckForDeferredDiags;
 
+private:
   /// Map of current shadowing declarations to shadowed declarations. Warn if
   /// it looks like the user is trying to modify the shadowing declaration.
   llvm::DenseMap<const NamedDecl *, const NamedDecl *> ShadowingDecls;
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index c699e92985156..fa9e6db62a3a0 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -1798,6 +1798,62 @@ class DeferredDiagnosticsEmitter
       Inherited::visitUsedDecl(Loc, D);
   }
 
+  // Visitor member and parent dtors called by this dtor.
+  void VisitCalledDestructors(CXXDestructorDecl *DD) {
+    const CXXRecordDecl *RD = DD->getParent();
+
+    // Visit the dtors of all members
+    for (const FieldDecl *FD : RD->fields()) {
+      QualType FT = FD->getType();
+      if (const auto *RT = FT->getAs<RecordType>()) {
+        if (const auto *ClassDecl = dyn_cast<CXXRecordDecl>(RT->getDecl())) {
+          if (ClassDecl->hasDefinition()) {
+            if (CXXDestructorDecl *MemberDtor = ClassDecl->getDestructor()) {
+              asImpl().visitUsedDecl(MemberDtor->getLocation(), MemberDtor);
+            }
+          }
+        }
+      }
+    }
+
+    // Also visit base class dtors
+    for (const auto &Base : RD->bases()) {
+      QualType BaseType = Base.getType();
+      if (const auto *RT = BaseType->getAs<RecordType>()) {
+        if (const auto *BaseDecl = dyn_cast<CXXRecordDecl>(RT->getDecl())) {
+          if (BaseDecl->hasDefinition()) {
+            if (CXXDestructorDecl *BaseDtor = BaseDecl->getDestructor()) {
+              asImpl().visitUsedDecl(BaseDtor->getLocation(), BaseDtor);
+            }
+          }
+        }
+      }
+    }
+  }
+
+  void VisitDeclStmt(DeclStmt *DS) {
+    // Visit dtors called by variables that need destruction
+    for (auto *D : DS->decls()) {
+      if (auto *VD = dyn_cast<VarDecl>(D)) {
+        if (VD->isThisDeclarationADefinition() &&
+            VD->needsDestruction(S.Context)) {
+          QualType VT = VD->getType();
+          if (const auto *RT = VT->getAs<RecordType>()) {
+            if (const auto *ClassDecl =
+                    dyn_cast<CXXRecordDecl>(RT->getDecl())) {
+              if (ClassDecl->hasDefinition()) {
+                if (CXXDestructorDecl *Dtor = ClassDecl->getDestructor()) {
+                  asImpl().visitUsedDecl(Dtor->getLocation(), Dtor);
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+
+    Inherited::VisitDeclStmt(DS);
+  }
   void checkVar(VarDecl *VD) {
     assert(VD->isFileVarDecl() &&
            "Should only check file-scope variables");
@@ -1839,6 +1895,8 @@ class DeferredDiagnosticsEmitter
     if (auto *S = FD->getBody()) {
       this->Visit(S);
     }
+    if (CXXDestructorDecl *Dtor = dyn_cast<CXXDestructorDecl>(FD))
+      asImpl().VisitCalledDestructors(Dtor);
     UsePath.pop_back();
     InUsePath.erase(FD);
   }
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 0e1bf727d72d2..0e5fc5e1a40b4 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -372,6 +372,21 @@ bool SemaCUDA::inferTargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,
                                                    CXXMethodDecl *MemberDecl,
                                                    bool ConstRHS,
                                                    bool Diagnose) {
+  // If MemberDecl is virtual destructor of an explicit template class
+  // instantiation, it must be emitted, therefore it needs to be inferred
+  // conservatively by ignoring implicit host/device attrs of member and parent
+  // dtors called by it. Also, it needs to be checed by deferred diag visitor.
+  bool IsExpVDtor = false;
+  if (isa<CXXDestructorDecl>(MemberDecl) && MemberDecl->isVirtual()) {
+    if (auto *Spec = dyn_cast<ClassTemplateSpecializationDecl>(ClassDecl)) {
+      TemplateSpecializationKind TSK = Spec->getTemplateSpecializationKind();
+      IsExpVDtor = TSK == TSK_ExplicitInstantiationDeclaration ||
+                   TSK == TSK_ExplicitInstantiationDefinition;
+    }
+  }
+  if (IsExpVDtor)
+    SemaRef.DeclsToCheckForDeferredDiags.insert(MemberDecl);
+
   // If the defaulted special member is defined lexically outside of its
   // owning class, or the special member already has explicit device or host
   // attributes, do not infer.
@@ -422,7 +437,9 @@ bool SemaCUDA::inferTargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,
     if (!SMOR.getMethod())
       continue;
 
-    CUDAFunctionTarget BaseMethodTarget = IdentifyTarget(SMOR.getMethod());
+    CUDAFunctionTarget BaseMethodTarget =
+        IdentifyTarget(SMOR.getMethod(), IsExpVDtor);
+
     if (!InferredTarget) {
       InferredTarget = BaseMethodTarget;
     } else {
@@ -466,7 +483,9 @@ bool SemaCUDA::inferTargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,
     if (!SMOR.getMethod())
       continue;
 
-    CUDAFunctionTarget FieldMethodTarget = IdentifyTarget(SMOR.getMethod());
+    CUDAFunctionTarget FieldMethodTarget =
+        IdentifyTarget(SMOR.getMethod(), IsExpVDtor);
+
     if (!InferredTarget) {
       InferredTarget = FieldMethodTarget;
     } else {
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 285bd27a35a76..ab86d2a2d920a 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -20469,6 +20469,21 @@ Sema::FunctionEmissionStatus Sema::getEmissionStatus(const FunctionDecl *FD,
 
     if (IsEmittedForExternalSymbol())
       return FunctionEmissionStatus::Emitted;
+
+    // If FD is a virtual destructor of an explicit instantiation
+    // of a template class, return Emitted.
+    if (auto *Destructor = dyn_cast<CXXDestructorDecl>(FD)) {
+      if (Destructor->isVirtual()) {
+        if (auto *Spec = dyn_cast<ClassTemplateSpecializationDecl>(
+                Destructor->getParent())) {
+          TemplateSpecializationKind TSK =
+              Spec->getTemplateSpecializationKind();
+          if (TSK == TSK_ExplicitInstantiationDeclaration ||
+              TSK == TSK_ExplicitInstantiationDefinition)
+            return FunctionEmissionStatus::Emitted;
+        }
+      }
+    }
   }
 
   // Otherwise, the function is known-emitted if it's in our set of
diff --git a/clang/test/SemaCUDA/dtor.cu b/clang/test/SemaCUDA/dtor.cu
new file mode 100644
index 0000000000000..eebc8cee0a5e0
--- /dev/null
+++ b/clang/test/SemaCUDA/dtor.cu
@@ -0,0 +1,104 @@
+// RUN: %clang_cc1 %s -std=c++20 -fsyntax-only -verify=host 
+// RUN: %clang_cc1 %s -std=c++20 -fcuda-is-device -fsyntax-only -verify=dev
+
+// host-no-diagnostics
+
+#include "Inputs/cuda.h"
+
+// Virtual dtor ~B() of explicit instantiation B<float> must
+// be emitted, which causes host_fun() called.
+namespace ExplicitInstantiationInvalidDtor {
+void host_fun() // dev-note {{'host_fun' declared here}}
+{}
+
+template <unsigned>
+constexpr void hd_fun() {
+  host_fun(); // dev-error {{reference to __host__ function 'host_fun' in __host__ __device__ function}}
+}
+
+struct A {
+  constexpr ~A() { // dev-note {{called by '~B'}}
+     hd_fun<8>(); // dev-note {{called by '~A'}}
+  }
+};
+
+template <typename T>
+struct B {
+public:
+  virtual __device__ ~B() = default;
+  A _a;
+};
+
+template class B<float>;
+}
+
+// The implicit host/device attrs of virtual dtor ~B() should be
+// conservatively inferred, where constexpr member dtor's should
+// not be considered device since they may call host functions. 
+// Therefore B<float>::~B() should not have implicit device attr.
+// However C<float>::~C() should have implicit device attr since
+// it is trivial.
+namespace ExplicitInstantiationValidDtor {
+void host_fun()
+{}
+
+template <unsigned>
+constexpr void hd_fun() {
+  host_fun();
+}
+
+struct A {
+  constexpr ~A() {
+     hd_fun<8>();
+  }
+};
+
+template <typename T>
+struct B {
+public:
+  virtual ~B() = default;
+  A _a;
+};
+
+template <typename T>
+struct C {
+public:
+  virtual ~C() = default;
+};
+
+template class B<float>;
+template class C<float>;
+__device__ void foo() {
+  C<float> x;
+}
+}
+
+// Dtors of implicit template class instantiation are not
+// conservatively inferred because the invalid usage can
+// be diagnosed.
+namespace ImplicitInstantiation {
+void host_fun() // dev-note {{'host_fun' declared here}}
+{}
+
+template <unsigned>
+constexpr void hd_fun() {
+  host_fun(); // dev-error {{reference to __host__ function 'host_fun' in __host__ __device__ function}}
+}
+
+struct A {
+  constexpr ~A() { // dev-note {{called by '~B'}}
+     hd_fun<8>(); // dev-note {{called by '~A'}}
+  }
+};
+
+template <typename T>
+struct B {
+public:
+  ~B() = default; // dev-note {{called by 'foo'}}
+  A _a;
+};
+
+__device__ void foo() {
+  B<float> x;
+}
+}

@Artem-B
Copy link
Member

Artem-B commented Feb 26, 2025

Would it make sense to separate into separate patches deferred diag fix (1a/1b on your list) from inference of destructor attributes?

Deferred diags fix is straightforward, but destructor attribute inference may need a longer discussion.

@yxsamliu
Copy link
Collaborator Author

Would it make sense to separate into separate patches deferred diag fix (1a/1b on your list) from inference of destructor attributes?

Deferred diags fix is straightforward, but destructor attribute inference may need a longer discussion.

will separate the deferred diag change to another PR

@yxsamliu
Copy link
Collaborator Author

Would it make sense to separate into separate patches deferred diag fix (1a/1b on your list) from inference of destructor attributes?

Deferred diags fix is straightforward, but destructor attribute inference may need a longer discussion.

split the deferred diag change to #129117

Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).

It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.

This is a serious issue since it prevents users from
using std::string with C++20 on Windows.

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.

Fixes: llvm#108548

Fixes: SWDEV-517435
{}

template <unsigned>
constexpr void hd_fun() {
host_fun(); // dev-error{{reference to __host__ function 'host_fun' in __host__ __device__ function}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I can not reproduce any of the errors in this test on godbolt, even with older clang: https://godbolt.org/z/4fMh5jxKd

What am I missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the old clang, there is no deferred diag for dtors, and the compilation stops at device assembly, so you won't see the link error. For the trunk clang, it is not new enough (-v shows abe1ecf). I think it may be updated once daily.

namespace ExplicitInstantiationDtorNoAttr {
void host_fun() // dev-note {{'host_fun' declared here}}
void host_fun()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that the code is still expected to error out as a deferred diag at codegen phase?

If so, it would be great to mention that in the comment, and, maybe, add a codegen case if it's not done already in the deferred diags part of the patch we split off.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. deferred diag happens at the end of sema.

What this PR does is that for virtual dtor of explicitly instantiated template class, clang will not treat constexpr dtors as executable on device, therefore if there is a constexpr member dtor, the virtual dtor is host only. Then it won't be emitted on device side.

The reason is that users have no control of this virtual dtor since it must be emitted, and users may not have control of the class definition, so unless we are sure this dtor can be executed on device, let's do not infer it as device. This makes sure the code that works as C++ continue to work in CUDA/HIP.

@yxsamliu yxsamliu merged commit d37a392 into llvm:main Mar 3, 2025
12 checks passed
yxsamliu added a commit to yxsamliu/llvm-project that referenced this pull request Mar 6, 2025
When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative.
This guarantees dtors that may call host functions not to
have implicit device attr, therefore will not be emitted
on device side.

Backports: 0f0665d d37a392

Fixes: llvm#108548
yxsamliu added a commit to yxsamliu/llvm-project that referenced this pull request Mar 12, 2025
When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative.
This guarantees dtors that may call host functions not to
have implicit device attr, therefore will not be emitted
on device side.

Backports: 0f0665d d37a392

Fixes: llvm#108548
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Mar 18, 2025
When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative.
This guarantees dtors that may call host functions not to
have implicit device attr, therefore will not be emitted
on device side.

Backports: 0f0665d d37a392

Fixes: llvm#108548
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).

It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.

This is a serious issue since it prevents users from
using std::string with C++20 on Windows.

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.

Fixes: llvm#108548

Fixes: SWDEV-517435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA: Host-side virtual destructor of template class leaks into PTX as weak function
3 participants