Skip to content

[libc++] Protect the libc++ implementation from CUDA SDK's __noinline__ macro #73838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 22, 2024

Conversation

gribozavr
Copy link
Collaborator

@gribozavr gribozavr commented Nov 29, 2023

The CUDA SDK contains an unfortunate definition for the __noinline__
macro. This patch works around it by using __attribute__((noinline))
instead of __attribute__((__noinline__)) on CUDA. We are still waiting
for a long-term resolution to this issue in NVIDIA/cccl#1235.

@gribozavr gribozavr requested a review from a team as a code owner November 29, 2023 18:21
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 29, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-libcxx

Author: Dmitri Gribenko (gribozavr)

Changes

The CUDA SDK contains an unfortunate definition for the __noinline__ macro.

I don't want to quote the code here, but you can find online plenty of information about this macro definition being problematic and creating conflicts for numerous other libraries, for example on StackOverflow.

This patch does the following:

  • following the existing pattern, it adds __noinline__ to _LIBCPP_PUSH_MACROS, _LIBCPP_POP_MACROS, __undef_macros, and the test generation script,

  • wraps all of include/__config in a push/pop bracket pair and includes __undef_macros in __config.

Macro protection brackets in __config are necessary because __config uses the __noinline__ identifier in a __has_attribute() preprocessor expression, and that usage suffers from CUDA's definition. This problem can be reproduced with the system_reserved_names.gen.py tests if one applies this patch except for the changes to __undef_macros.


Full diff: https://github.com/llvm/llvm-project/pull/73838.diff

3 Files Affected:

  • (modified) libcxx/include/__config (+8-5)
  • (modified) libcxx/include/__undef_macros (+4)
  • (modified) libcxx/test/libcxx/system_reserved_names.gen.py (+3)
diff --git a/libcxx/include/__config b/libcxx/include/__config
index ee77305162f7fc8..2a77b88562ef614 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -46,6 +46,12 @@
 #    endif
 #  endif
 
+#  define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")") _Pragma("push_macro(\"__noinline__\")")
+#  define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")") _Pragma("pop_macro(\"__noinline__\")")
+
+_LIBCPP_PUSH_MACROS
+#  include <__undef_macros>
+
 // The attributes supported by clang are documented at https://clang.llvm.org/docs/AttributeReference.html
 
 // _LIBCPP_VERSION represents the version of libc++, which matches the version of LLVM.
@@ -1208,11 +1214,6 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #    define _LIBCPP_ENABLE_CXX20_REMOVED_TYPE_TRAITS
 #  endif // _LIBCPP_ENABLE_CXX20_REMOVED_FEATURES
 
-// clang-format off
-#  define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")")
-#  define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")")
-// clang-format on
-
 #  ifndef _LIBCPP_NO_AUTO_LINK
 #    if defined(_LIBCPP_ABI_MICROSOFT) && !defined(_LIBCPP_BUILDING_LIBRARY)
 #      if !defined(_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS)
@@ -1489,6 +1490,8 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #    define _LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
 #  endif
 
+_LIBCPP_POP_MACROS
+
 #endif // __cplusplus
 
 #endif // _LIBCPP___CONFIG
diff --git a/libcxx/include/__undef_macros b/libcxx/include/__undef_macros
index 29ab327e1c375ae..885057629d1b831 100644
--- a/libcxx/include/__undef_macros
+++ b/libcxx/include/__undef_macros
@@ -26,3 +26,7 @@
 #ifdef erase
 #  undef erase
 #endif
+
+#ifdef __noinline__
+#  undef __noinline__
+#endif
diff --git a/libcxx/test/libcxx/system_reserved_names.gen.py b/libcxx/test/libcxx/system_reserved_names.gen.py
index 8c4be97897f657e..17ef50568bbb666 100644
--- a/libcxx/test/libcxx/system_reserved_names.gen.py
+++ b/libcxx/test/libcxx/system_reserved_names.gen.py
@@ -158,5 +158,8 @@
 #define erase SYSTEM_RESERVED_NAME
 #define refresh SYSTEM_RESERVED_NAME
 
+// Macros from the CUDA SDKs
+#define __noinline__ __attribute__((noinline))
+
 #include <{header}>
 """)

Copy link

github-actions bot commented Nov 29, 2023

✅ With the latest revision this PR passed the Python code formatter.

@jyknight
Copy link
Member

Has anyone filed a bug against CUDA to stop doing this?

@EricWF
Copy link
Member

EricWF commented Nov 29, 2023

I'm OK taking this patch, but I would like us to file a bug against CUDA first.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, working around this appears to be fairly simple and we have precedent for this for other macros. On the other hand, NVIDIA apparently acknowledges the risk for conflicts when they define this macro according to the SO answer:

/* gcc allows users to define attributes with underscores,
   e.g., __attribute__((__noinline__)).
   Consider a non-CUDA source file (e.g. .cpp) that has the
   above attribute specification, and includes this header file. In that case,
   defining __noinline__ as below  would cause a gcc compilation error.
   Hence, only define __noinline__ when the code is being processed
   by a  CUDA compiler component.
*/

Fixing this here doesn't provide much of an incentive for them to fix their SDK, but being incompatible with a major C++ standard library implementation would. Do we have anyone from Nvidia around here who could enlighten us on the status of that bug report?

@philnik777
Copy link
Contributor

Given that you say yourself that this causes problems with lots of other libraries, I think it would be better to fix the CUDA SDK - especially since it's a really simple fix for them.

Copy link
Member

@EricWF EricWF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the bots come back.

There has been a bunch of discussion about if we want this. I probably LGTM'ed this to soon.

Maybe we should get some CUDA builders?

@mordante
Copy link
Member

I assume this change means we need to push and pop our macros where we use __noinline__ via our macro. Or am I mistaken? This patch doesn't seem to do that. Before approving I would like to know what the status of fixing this in the CUDA SDK is.

@gribozavr
Copy link
Collaborator Author

gribozavr commented Nov 29, 2023

I assume this change means we need to push and pop our macros where we use __noinline__ via our macro. Or am I mistaken? This patch doesn't seem to do that.

That infrastructure was already in place, it was introduced by https://reviews.llvm.org/D147356. The only missing piece was the push/pop bracketing in __config itself, which this patch does.

I think it would be better to fix the CUDA SDK - especially since it's a really simple fix for them.

Yes, we are filing a bug for the CUDA SDK. However the fix will arrive in future SDK versions, the ones currently in use are already conflicting. So ideally libc++ will keep this workaround until those existing SDK versions become obsolete.

@Artem-B
Copy link
Member

Artem-B commented Nov 29, 2023

Clang already works around a handful of cases where we ran into this: https://github.com/llvm/llvm-project/commits/main/clang/lib/Headers/cuda_wrappers

@jrhemstad What would be a proper channel to ask NVIDIA's nvcc/CUDA owners to help with that? I can file a bug directly with NVIDIA, but this is a public issue and it would be great to track it publicly, too.

@jyknight
Copy link
Member

jyknight commented Nov 29, 2023

Should the fix be done by adding a clang/lib/Headers/cuda_wrappers/__config wrapper instead, so that all the CUDA __noinline__ hacks are kept in one place?

@Artem-B
Copy link
Member

Artem-B commented Nov 29, 2023

Should the fix be done by adding a clang/lib/Headers/cuda_wrappers/__config wrapper instead, so that all the CUDA __noinline__ hacks are kept in one place?

If that's the only place where the issue pops up, then it's an option. So far that's what we did for libstdc++ workarounds.

@ldionne
Copy link
Member

ldionne commented Nov 29, 2023

^ FWIW I really like that option! I'd be curious to know if that fixes @gribozavr 's issues.

Like I said, I don't have strong push back about working around this in libc++ but I'd want a public issue with CUDA to be referenced here and an acknowledgement that they're willing and working on fixing the issue. If that's the case I see no point in making people's lives hard and we should work around it here or in the cuda wrapper headers.

Otherwise, in case CUDA says something like "yeah we don't really care", then we'd be bending backwards to support something bad that another vendor is doing in complete awareness and I would push back on doing that on the basis that it's not a good precedent to set nor a sane long term direction for libc++.

@Artem-B
Copy link
Member

Artem-B commented Nov 29, 2023

Otherwise, in case CUDA says something like "yeah we don't really care", then we'd be bending backwards to support something bad that another vendor is doing in complete awareness and I would push back on doing that on the basis that it's not a good precedent to set nor a sane long term direction for libc++.

See. NVIDIA/thrust#1703

The way I read it, the fact that NVCC hacks around this issue is a strong indication that it's not going to be fixed any time soon. And we still need to deal with existing CUDA versions anyways, so what NVIDIA is going to do about it is almost a moot point.

@Artem-B
Copy link
Member

Artem-B commented Nov 30, 2023

Hmm. Looks like undefining __inline__ in a wrapper for __config is not going to work.
libc++ defines

#    define _LIBCPP_NOINLINE __attribute__((__noinline__))

The problem is that expansion of __inline__ happens later, so when some other header uses _LIBCPP_NOINLINE while __noinline__ is defined by CUDA, we still have a problem, even though the _LIBCPP_NOINLINE is defined exactly as it was supposed to. With wrappers, we'd need to wrap every libc++ header which uses _LIBCPP_NOINLINE.

I think we do need to deal with this in libc++. At the very least we should try figuring out how to define _LIBCPP_NOINLINE` in a way that's not affected by macro expansion later on.

https://godbolt.org/z/7qfTM54hh for illustration.

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually looks like nvc++ isn't working around the issue. They simply parse GNU attributes slightly differently: https://godbolt.org/z/MKKf4Wdan. I think we can fix this all by simply using [[__gnu__::__noinline__]] instead. This works on every compiler we support except Clang 16 in C++03 mode, so I don't think that's a huge problem. If anybody complains that the CodeGen is different if cuda is enabled, we can tell them to complain to the cuda sdk people and hopefully it'll be fixed at some point.

@gribozavr
Copy link
Collaborator Author

gribozavr commented Dec 1, 2023

@jyknight I agree with @Artem-B 's analysis. To work around this problem, we need to hide CUDA's __noinline__ macro definition everywhere where _LIBCPP_NOINLINE is used (not just where it is defined) - and it may used in every libc++ header. To achieve such hiding we can either:

  • add __noinline__ to _LIBCPP_PUSH_MACROS, _LIBCPP_POP_MACROS, and to __undef_macros - like in this PR.
  • create a Clang wrapper header for every libc++ public header. I don't prefer this option because I think the first option is simpler to maintain.

@gribozavr
Copy link
Collaborator Author

gribozavr commented Dec 1, 2023

@philnik777 Thank you for the idea!

I think we can fix this all by simply using [[__gnu__::__noinline__]] instead.

So you're suggesting that we change the definition of _LIBCPP_NOINLINE:

#define _LIBCPP_NOINLINE [[__gnu__::__noinline__]]

Thus, it would expand to either [[__gnu__::__noinline__]] under normal circumstances, or to [[__gnu__::__attribute__((noinline))]] under CUDA?

That would work if _LIBCPP_NOINLINE was the only usage of __noinline__. We also have the #if __has_attribute(__noinline__) that gets broken.

(I also have a concern that there are no tests in Clang for such syntax, so I suspect the fact that it works is accidental - but even if it is intentional, the __has_attribute() expression does not work.)

Here is a patch if you want to try it yourself:

A patch that changes _LIBCPP_NOINLINE to [[__gnu__::__noinline__]]
diff --git a/libcxx/include/__config b/libcxx/include/__config
index ee77305162f7..e078a7ea3ba1 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -1176,7 +1176,7 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #  endif

 #  if __has_attribute(__noinline__)
-#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
+#    define _LIBCPP_NOINLINE [[__gnu__::__noinline__]]
 #  else
 #    define _LIBCPP_NOINLINE
 #  endif
diff --git a/libcxx/include/string b/libcxx/include/string
index 25f307825fa2..8bce9af76135 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1901,7 +1901,7 @@ private:
     // to call the __init() functions as those are marked as inline which may
     // result in over-aggressive inlining by the compiler, where our aim is
     // to only inline the fast path code directly in the ctor.
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE void __init_copy_ctor_external(const value_type* __s, size_type __sz);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 void __init_copy_ctor_external(const value_type* __s, size_type __sz);

     template <class _InputIterator, __enable_if_t<__has_exactly_input_iterator_category<_InputIterator>::value, int> = 0>
     inline _LIBCPP_CONSTEXPR_SINCE_CXX20 void __init(_InputIterator __first, _InputIterator __last);
@@ -1935,7 +1935,7 @@ private:
     // have proof that the input does not alias the current instance.
     // For example, operator=(basic_string) performs a 'self' check.
     template <bool __is_short>
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_no_alias(const value_type* __s, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_no_alias(const value_type* __s, size_type __n);

   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __erase_to_end(size_type __pos) {
     __null_terminate_at(std::__to_address(__get_pointer()), __pos);
@@ -1943,7 +1943,7 @@ private:

     // __erase_external_with_move is invoked for erase() invocations where
     // `n ~= npos`, likely requiring memory moves on the string data.
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE void __erase_external_with_move(size_type __pos, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 void __erase_external_with_move(size_type __pos, size_type __n);

     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
     void __copy_assign_alloc(const basic_string& __str)
@@ -2015,8 +2015,8 @@ private:
         _NOEXCEPT
         {}

-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_external(const value_type* __s);
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_external(const value_type* __s, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_external(const value_type* __s);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_external(const value_type* __s, size_type __n);

     // Assigns the value in __s, guaranteed to be __n < __min_cap in length.
     inline _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_short(const value_type* __s, size_type __n) {
@@ -2169,7 +2169,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_ty
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 void basic_string<_CharT, _Traits, _Allocator>::__init_copy_ctor_external(
     const value_type* __s, size_type __sz) {
   if (__libcpp_is_constant_evaluated())
@@ -2398,7 +2398,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_without_replace(

 template <class _CharT, class _Traits, class _Allocator>
 template <bool __is_short>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
     const value_type* __s, size_type __n) {
@@ -2416,7 +2416,7 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_external(
     const value_type* __s, size_type __n) {
@@ -2627,7 +2627,7 @@ basic_string<_CharT, _Traits, _Allocator>::assign(const _Tp& __t, size_type __po
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_external(const value_type* __s) {
   return __assign_external(__s, traits_type::length(__s));
@@ -3112,7 +3112,7 @@ basic_string<_CharT, _Traits, _Allocator>::replace(size_type __pos, size_type __
 // 'externally instantiated' erase() implementation, called when __n != npos.
 // Does not check __pos against size()
 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 void
 basic_string<_CharT, _Traits, _Allocator>::__erase_external_with_move(
     size_type __pos, size_type __n)
diff --git a/libcxx/test/libcxx/system_reserved_names.gen.py b/libcxx/test/libcxx/system_reserved_names.gen.py
index 8c4be97897f6..de68e0d666bd 100644
--- a/libcxx/test/libcxx/system_reserved_names.gen.py
+++ b/libcxx/test/libcxx/system_reserved_names.gen.py
@@ -158,5 +158,8 @@ for header in public_headers:
 #define erase SYSTEM_RESERVED_NAME
 #define refresh SYSTEM_RESERVED_NAME

+// Macros from the CUDA SDKs
+#define __noinline__ SYSTEM_RESERVED_NAME
+
 #include <{header}>
 """)

Sample test failure:

# .---command stderr------------
# | In file included from LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile
.pass.cpp:145:
# | In file included from LLVM/build-release/include/c++/v1/errno.h:25:
# | LLVM/build-release/include/c++/v1/__config:1178:23: error: missing ')' after 'This'
# |  1178 | #  if __has_attribute(__noinline__)
# |       |                       ^~~~~~~~~~~~
# | LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile.pass.cpp:143:22: note
: expanded from macro '__noinline__'
# |   143 | #define __noinline__ SYSTEM_RESERVED_NAME
# |       |                      ^~~~~~~~~~~~~~~~~~~~
# | LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile.pass.cpp:4:35: note:
expanded from macro 'SYSTEM_RESERVED_NAME'
# |     4 | #define SYSTEM_RESERVED_NAME This name should not be used in libc++
# |       |                              ~~~~ ^
# | LLVM/build-release/include/c++/v1/__config:1178:22: note: to match this '('
# |  1178 | #  if __has_attribute(__noinline__)
# |       |                      ^
# | 1 error generated.
# `-----------------------------
# error: command failed with exit status: 1

@philnik777
Copy link
Contributor

philnik777 commented Dec 1, 2023

@gribozavr I'm pretty sure there are tests for this kind of thing. Clang has to both expand macros in C++11 attribute and ignore unknown ones. There may not be a test for this specific case, but for the general case I expect tests. I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700. It's not pretty, but we avoid having another macro we have to push/pop and we can remove the macro entirely after the next release branch anyways.

@gribozavr
Copy link
Collaborator Author

@philnik777

I'm pretty sure there are tests for this kind of thing.

I can't find anything using the pattern ::__attribute. Could you confirm that it is indeed intentionally supported?

I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700.

Yes, that indeed works.

we can remove the macro entirely after the next release branch anyways.

Why? Because the next release will not support compilers that don't support the [[__gnu__::__noinline__]] syntax?

@Artem-B
Copy link
Member

Artem-B commented Dec 1, 2023

Looks like __noinline__ as a macro, as it's defined right now is going to be problematic.

Perhaps the best choice is to permanently undef CUDA's __noinline__ or redefine it as cuda_noinline and let users deal with that if they want/need to. Otherwise we'll be playing this game of whack-an-inline-conflict forever.

__noinline__ appears to be relatively rarely used. I see a single reference in pytorch and there are a handful in NVIDIA's core libraries' tests . I think getting rid of them should be doable.

At the moment, only few libc++ headers use _LIBCPP_NOINLINE, so we can introduce a temporary build knob which would temporarily undef __inline__ in the wrappers for __config and string, but which would need to be explicitly enabled by the user.

So, transition would work roughly like this:

  • temporary undef __noinline__ in wrappers for __config and string
  • Introduce a knob to completely undef __noinline__ and disable the workarounds. Keep it enabled initially.
  • Switch the knob default to off for the next major clang release.
  • Remove the knob in the next major release.

If it's too much hassle, we can keep adding the wrappers. It's a hassle, but it would probably work OK, considering that it's not used all that often.

WDYT, all?

@philnik777
Copy link
Contributor

@philnik777

I'm pretty sure there are tests for this kind of thing.

I can't find anything using the pattern ::__attribute. Could you confirm that it is indeed intentionally supported?

I wasn't talking about __attribute__ specifically. There can be basically any string after the :: that clang has to accept, so clang basically converts anything following the :: to an identifier. e.g. clang also has to accept [[whatever::volatile]]. IOW it's intentional that clang accepts any keyword in C++11 attributes. You can see in clang/test/Parser/cxx0x-attributes.cpp that it's tested with a bunch of keywords and some very interesting syntax (lines 49 and 50 currently).

I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700.

Yes, that indeed works.

we can remove the macro entirely after the next release branch anyways.

Why? Because the next release will not support compilers that don't support the [[__gnu__::__noinline__]] syntax?

Yes

@gribozavr
Copy link
Collaborator Author

There can be basically any string after the :: that clang has to accept

But then it wouldn't have the desired noinline effect, right?

@philnik777
Copy link
Contributor

There can be basically any string after the :: that clang has to accept

But then it wouldn't have the desired noinline effect, right?

Yes, but TBH I don't think that's a huge problem. We've had the functions inlined for a few releases and not that many people seemed to care. I think it's fair to say "our code works with cuda, but please complain to the cuda sdk people that they shouldn't break __noinline__ for everybody".

@philnik777
Copy link
Contributor

@gribozavr Do you plan to work on this?

@mordante
Copy link
Member

The easiest workaround that I can think of is to use noinline instead of __noinline__ when compiling CUDA code. To do that, just change this code in __config:

#  if __has_attribute(__noinline__)
#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
#  else
#    define _LIBCPP_NOINLINE
#  endif

to:

#  if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
#    define _LIBCPP_NOINLINE_ATTR_NAME noinline
#  else
#    define _LIBCPP_NOINLINE_ATTR_NAME __noinline__
#  endif
#  if __has_attribute(_LIBCPP_NOINLINE_ATTR_NAME)
#    define _LIBCPP_NOINLINE __attribute__((_LIBCPP_NOINLINE_ATTR_NAME))
#  else
#    define _LIBCPP_NOINLINE
#  endif

(The condition in the first #if matches the condition in CUDA's crt/host_defines.h where __noinline__ is defined as a macro.)

No pushing/popping of macros or creating wrapper headers. No maintenance burden going forward if _LIBCPP_NOINLINE is used in more places.

This is just a workaround, not a fix. Compilation can still fail if CUDA programs also define noinline as a macro. But I expect that to be extremely rare, because defining the macro noinline would likely break the CUDA headers.

I have no objection to withholding any workaround until NVIDIA shows some interesting in improving the situation on the CUDA side of things. But if there is progress in that area and libc++ wants to check in a workaround on their side, I think this is the way to go.

(Full disclosure: I work for NVIDIA, though not in the CUDA or NVCC organizations, so I don't have any direct say in what CUDA does. I am working with @jrhemstad on "communicating with the relevant internal teams to come to a better solution that everyone can be happy with.")

Thanks for working on trying to solve this inside NVIDIA!
I like your work-around better than the current proposal, thanks for the suggestions.

@Artem-B
Copy link
Member

Artem-B commented Jan 16, 2024

use noinline instead of __noinline__ when compiling CUDA code.

SGTM.

We should probably add a comment that this specifically targets the __noinline__ macro definition in CUDA's /crt/host_defines.h. If CUDA headers' implementation changes in the future this workaround may break and would have to grow version-dependent variants.

We may make the workaround a bit simpler and more robust by directly checking if __noinline__ is defined. However that opens a possibility of us applying the workaround if/when __noinline__ may be defined as a macro for reasons other than CUDA headers.

@philnik777
Copy link
Contributor

The last solution actually sounds really nice to me. It's basically "if you do UB by defining __noinline__ we're claiming noinline for ourselves".

@ldionne
Copy link
Member

ldionne commented Jan 16, 2024

I'd be OK with the suggested workaround since folks are actively escalating this with the Cuda SDK. So the whole patch could be as simple as:

#  if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
#    define _LIBCPP_NOINLINE __attribute__((noinline)) // LINK TO BUG REPORT
#  elif __has_attribute(__noinline__)
#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
#  else
#    define _LIBCPP_NOINLINE
#  endif

@gribozavr Are you able to update this PR accordingly? Ideally we'd land this before we cut the LLVM 18 branch next Tuesday.

@ldionne ldionne added this to the LLVM 18.0.X Release milestone Jan 16, 2024
…e__` macro

The CUDA SDK contains an unfortunate definition for the `__noinline__` macro.

I don't want to quote the CUDA header here, but you can find online plenty of
information about this macro definition being problematic and creating
conflicts for numerous other libraries, for example [on
StackOverflow](https://stackoverflow.com/questions/70301375/noinline-macro-conflict-between-glib-and-cuda).
@gribozavr gribozavr requested a review from mordante January 22, 2024 00:31
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch, and thanks everyone for the discussion.

I'll merge this later today. In the meantime, I will ping @philnik777, @mordante and @EricWF to make sure they have a chance to see this quickly and express any concerns.

@ldionne
Copy link
Member

ldionne commented Jan 22, 2024

I spoke to Eric offline, who said he didn't have any special thoughts about this patch. I'm merging now.

@ldionne ldionne merged commit 7378fb3 into llvm:main Jan 22, 2024
@gribozavr gribozavr deleted the libcxx-cuda-noinline-macro branch January 22, 2024 23:26
philnik777 pushed a commit that referenced this pull request Jan 23, 2024
I failed to delete the old definition as a part of
#73838.
blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Feb 7, 2024
I failed to delete the old definition as a part of
llvm/llvm-project#73838.

NOKEYCHECK=True
GitOrigin-RevId: 9629c73aeb4609f58aa9edb0b87d18dd9e8fecc0
qihangkong pushed a commit to rvgpu/llvm that referenced this pull request Apr 18, 2024
@Artem-B
Copy link
Member

Artem-B commented Nov 25, 2024

The feedback that I got internally is that this has been discussed in the past and there is no plan to address this.

To be clear, this is not the definitive answer.

As I mentioned in NVIDIA/cccl#1235 (comment), I am working on communicating with the relevant internal teams to come to a better solution that everyone can be happy with.

The bug I've opened with NVIDIA has just been closed as a "not NVIDIA's problem". :-(

@ldionne
Copy link
Member

ldionne commented Nov 26, 2024

@Artem-B Which bug report? Do you have a link?

@jrhemstad
Copy link

The bug I've opened with NVIDIA has just been closed as a "not NVIDIA's problem". :-(

Yeah, this isn't right. I have commitment from the compiler team to fix this. This sounds like the bug was closed by someone who doesn't know the context.

@Artem-B
Copy link
Member

Artem-B commented Nov 26, 2024

Which bug report? Do you have a link?

It is, unfortunately, siloed. AFAICT there's no publlic way to report CUDA bugs other than loudly complaining on NVIDIA's forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.