[libc++] Protect the libc++ implementation from CUDA SDK's `noinline` macro #73838

gribozavr · 2023-11-29T18:21:05Z

The CUDA SDK contains an unfortunate definition for the __noinline__
macro. This patch works around it by using __attribute__((noinline))
instead of __attribute__((__noinline__)) on CUDA. We are still waiting
for a long-term resolution to this issue in NVIDIA/cccl#1235.

llvmbot · 2023-11-29T18:21:36Z

@llvm/pr-subscribers-libcxx

Author: Dmitri Gribenko (gribozavr)

Changes

The CUDA SDK contains an unfortunate definition for the __noinline__ macro.

I don't want to quote the code here, but you can find online plenty of information about this macro definition being problematic and creating conflicts for numerous other libraries, for example on StackOverflow.

This patch does the following:

following the existing pattern, it adds __noinline__ to _LIBCPP_PUSH_MACROS, _LIBCPP_POP_MACROS, __undef_macros, and the test generation script,
wraps all of include/__config in a push/pop bracket pair and includes __undef_macros in __config.

Macro protection brackets in __config are necessary because __config uses the __noinline__ identifier in a __has_attribute() preprocessor expression, and that usage suffers from CUDA's definition. This problem can be reproduced with the system_reserved_names.gen.py tests if one applies this patch except for the changes to __undef_macros.

Full diff: https://github.com/llvm/llvm-project/pull/73838.diff

3 Files Affected:

(modified) libcxx/include/__config (+8-5)
(modified) libcxx/include/__undef_macros (+4)
(modified) libcxx/test/libcxx/system_reserved_names.gen.py (+3)

diff --git a/libcxx/include/__config b/libcxx/include/__config
index ee77305162f7fc8..2a77b88562ef614 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -46,6 +46,12 @@
 #    endif
 #  endif
 
+#  define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")") _Pragma("push_macro(\"__noinline__\")")
+#  define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")") _Pragma("pop_macro(\"__noinline__\")")
+
+_LIBCPP_PUSH_MACROS
+#  include <__undef_macros>
+
 // The attributes supported by clang are documented at https://clang.llvm.org/docs/AttributeReference.html
 
 // _LIBCPP_VERSION represents the version of libc++, which matches the version of LLVM.
@@ -1208,11 +1214,6 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #    define _LIBCPP_ENABLE_CXX20_REMOVED_TYPE_TRAITS
 #  endif // _LIBCPP_ENABLE_CXX20_REMOVED_FEATURES
 
-// clang-format off
-#  define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")")
-#  define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")")
-// clang-format on
-
 #  ifndef _LIBCPP_NO_AUTO_LINK
 #    if defined(_LIBCPP_ABI_MICROSOFT) && !defined(_LIBCPP_BUILDING_LIBRARY)
 #      if !defined(_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS)
@@ -1489,6 +1490,8 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #    define _LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
 #  endif
 
+_LIBCPP_POP_MACROS
+
 #endif // __cplusplus
 
 #endif // _LIBCPP___CONFIG
diff --git a/libcxx/include/__undef_macros b/libcxx/include/__undef_macros
index 29ab327e1c375ae..885057629d1b831 100644
--- a/libcxx/include/__undef_macros
+++ b/libcxx/include/__undef_macros
@@ -26,3 +26,7 @@
 #ifdef erase
 #  undef erase
 #endif
+
+#ifdef __noinline__
+#  undef __noinline__
+#endif
diff --git a/libcxx/test/libcxx/system_reserved_names.gen.py b/libcxx/test/libcxx/system_reserved_names.gen.py
index 8c4be97897f657e..17ef50568bbb666 100644
--- a/libcxx/test/libcxx/system_reserved_names.gen.py
+++ b/libcxx/test/libcxx/system_reserved_names.gen.py
@@ -158,5 +158,8 @@
 #define erase SYSTEM_RESERVED_NAME
 #define refresh SYSTEM_RESERVED_NAME
 
+// Macros from the CUDA SDKs
+#define __noinline__ __attribute__((noinline))
+
 #include <{header}>
 """)

github-actions · 2023-11-29T18:23:31Z

✅ With the latest revision this PR passed the Python code formatter.

jyknight · 2023-11-29T18:24:46Z

Has anyone filed a bug against CUDA to stop doing this?

EricWF · 2023-11-29T18:37:28Z

I'm OK taking this patch, but I would like us to file a bug against CUDA first.

ldionne

On the one hand, working around this appears to be fairly simple and we have precedent for this for other macros. On the other hand, NVIDIA apparently acknowledges the risk for conflicts when they define this macro according to the SO answer:

/* gcc allows users to define attributes with underscores,
   e.g., __attribute__((__noinline__)).
   Consider a non-CUDA source file (e.g. .cpp) that has the
   above attribute specification, and includes this header file. In that case,
   defining __noinline__ as below  would cause a gcc compilation error.
   Hence, only define __noinline__ when the code is being processed
   by a  CUDA compiler component.
*/

Fixing this here doesn't provide much of an incentive for them to fix their SDK, but being incompatible with a major C++ standard library implementation would. Do we have anyone from Nvidia around here who could enlighten us on the status of that bug report?

libcxx/test/libcxx/system_reserved_names.gen.py

philnik777 · 2023-11-29T18:39:13Z

Given that you say yourself that this causes problems with lots of other libraries, I think it would be better to fix the CUDA SDK - especially since it's a really simple fix for them.

EricWF

~~LGTM once the bots come back.~~

There has been a bunch of discussion about if we want this. I probably LGTM'ed this to soon.

Maybe we should get some CUDA builders?

libcxx/test/libcxx/system_reserved_names.gen.py

libcxx/include/__undef_macros

mordante · 2023-11-29T18:44:08Z

I assume this change means we need to push and pop our macros where we use __noinline__ via our macro. Or am I mistaken? This patch doesn't seem to do that. Before approving I would like to know what the status of fixing this in the CUDA SDK is.

gribozavr · 2023-11-29T18:48:52Z

I assume this change means we need to push and pop our macros where we use __noinline__ via our macro. Or am I mistaken? This patch doesn't seem to do that.

That infrastructure was already in place, it was introduced by https://reviews.llvm.org/D147356. The only missing piece was the push/pop bracketing in __config itself, which this patch does.

I think it would be better to fix the CUDA SDK - especially since it's a really simple fix for them.

Yes, we are filing a bug for the CUDA SDK. However the fix will arrive in future SDK versions, the ones currently in use are already conflicting. So ideally libc++ will keep this workaround until those existing SDK versions become obsolete.

Artem-B · 2023-11-29T18:50:20Z

Clang already works around a handful of cases where we ran into this: https://github.com/llvm/llvm-project/commits/main/clang/lib/Headers/cuda_wrappers

@jrhemstad What would be a proper channel to ask NVIDIA's nvcc/CUDA owners to help with that? I can file a bug directly with NVIDIA, but this is a public issue and it would be great to track it publicly, too.

jyknight · 2023-11-29T20:00:43Z

Should the fix be done by adding a clang/lib/Headers/cuda_wrappers/__config wrapper instead, so that all the CUDA __noinline__ hacks are kept in one place?

Artem-B · 2023-11-29T20:40:52Z

Should the fix be done by adding a clang/lib/Headers/cuda_wrappers/__config wrapper instead, so that all the CUDA __noinline__ hacks are kept in one place?

If that's the only place where the issue pops up, then it's an option. So far that's what we did for libstdc++ workarounds.

ldionne · 2023-11-29T23:23:29Z

^ FWIW I really like that option! I'd be curious to know if that fixes @gribozavr 's issues.

Like I said, I don't have strong push back about working around this in libc++ but I'd want a public issue with CUDA to be referenced here and an acknowledgement that they're willing and working on fixing the issue. If that's the case I see no point in making people's lives hard and we should work around it here or in the cuda wrapper headers.

Otherwise, in case CUDA says something like "yeah we don't really care", then we'd be bending backwards to support something bad that another vendor is doing in complete awareness and I would push back on doing that on the basis that it's not a good precedent to set nor a sane long term direction for libc++.

Artem-B · 2023-11-29T23:29:22Z

Otherwise, in case CUDA says something like "yeah we don't really care", then we'd be bending backwards to support something bad that another vendor is doing in complete awareness and I would push back on doing that on the basis that it's not a good precedent to set nor a sane long term direction for libc++.

See. NVIDIA/thrust#1703

The way I read it, the fact that NVCC hacks around this issue is a strong indication that it's not going to be fixed any time soon. And we still need to deal with existing CUDA versions anyways, so what NVIDIA is going to do about it is almost a moot point.

Artem-B · 2023-11-30T22:11:30Z

Hmm. Looks like undefining __inline__ in a wrapper for __config is not going to work.
libc++ defines

#    define _LIBCPP_NOINLINE __attribute__((__noinline__))

The problem is that expansion of __inline__ happens later, so when some other header uses _LIBCPP_NOINLINE while __noinline__ is defined by CUDA, we still have a problem, even though the _LIBCPP_NOINLINE is defined exactly as it was supposed to. With wrappers, we'd need to wrap every libc++ header which uses _LIBCPP_NOINLINE.

I think we do need to deal with this in libc++. At the very least we should try figuring out how to define _LIBCPP_NOINLINE` in a way that's not affected by macro expansion later on.

https://godbolt.org/z/7qfTM54hh for illustration.

philnik777

It actually looks like nvc++ isn't working around the issue. They simply parse GNU attributes slightly differently: https://godbolt.org/z/MKKf4Wdan. I think we can fix this all by simply using [[__gnu__::__noinline__]] instead. This works on every compiler we support except Clang 16 in C++03 mode, so I don't think that's a huge problem. If anybody complains that the CodeGen is different if cuda is enabled, we can tell them to complain to the cuda sdk people and hopefully it'll be fixed at some point.

gribozavr · 2023-12-01T17:20:52Z

@jyknight I agree with @Artem-B 's analysis. To work around this problem, we need to hide CUDA's __noinline__ macro definition everywhere where _LIBCPP_NOINLINE is used (not just where it is defined) - and it may used in every libc++ header. To achieve such hiding we can either:

add __noinline__ to _LIBCPP_PUSH_MACROS, _LIBCPP_POP_MACROS, and to __undef_macros - like in this PR.
create a Clang wrapper header for every libc++ public header. I don't prefer this option because I think the first option is simpler to maintain.

gribozavr · 2023-12-01T17:40:05Z

@philnik777 Thank you for the idea!

I think we can fix this all by simply using [[__gnu__::__noinline__]] instead.

So you're suggesting that we change the definition of _LIBCPP_NOINLINE:

#define _LIBCPP_NOINLINE [[__gnu__::__noinline__]]

Thus, it would expand to either [[__gnu__::__noinline__]] under normal circumstances, or to [[__gnu__::__attribute__((noinline))]] under CUDA?

That would work if _LIBCPP_NOINLINE was the only usage of __noinline__. We also have the #if __has_attribute(__noinline__) that gets broken.

(I also have a concern that there are no tests in Clang for such syntax, so I suspect the fact that it works is accidental - but even if it is intentional, the __has_attribute() expression does not work.)

Here is a patch if you want to try it yourself:

A patch that changes _LIBCPP_NOINLINE to [[__gnu__::__noinline__]]

diff --git a/libcxx/include/__config b/libcxx/include/__config
index ee77305162f7..e078a7ea3ba1 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -1176,7 +1176,7 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
 #  endif

 #  if __has_attribute(__noinline__)
-#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
+#    define _LIBCPP_NOINLINE [[__gnu__::__noinline__]]
 #  else
 #    define _LIBCPP_NOINLINE
 #  endif
diff --git a/libcxx/include/string b/libcxx/include/string
index 25f307825fa2..8bce9af76135 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1901,7 +1901,7 @@ private:
     // to call the __init() functions as those are marked as inline which may
     // result in over-aggressive inlining by the compiler, where our aim is
     // to only inline the fast path code directly in the ctor.
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE void __init_copy_ctor_external(const value_type* __s, size_type __sz);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 void __init_copy_ctor_external(const value_type* __s, size_type __sz);

     template <class _InputIterator, __enable_if_t<__has_exactly_input_iterator_category<_InputIterator>::value, int> = 0>
     inline _LIBCPP_CONSTEXPR_SINCE_CXX20 void __init(_InputIterator __first, _InputIterator __last);
@@ -1935,7 +1935,7 @@ private:
     // have proof that the input does not alias the current instance.
     // For example, operator=(basic_string) performs a 'self' check.
     template <bool __is_short>
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_no_alias(const value_type* __s, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_no_alias(const value_type* __s, size_type __n);

   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __erase_to_end(size_type __pos) {
     __null_terminate_at(std::__to_address(__get_pointer()), __pos);
@@ -1943,7 +1943,7 @@ private:

     // __erase_external_with_move is invoked for erase() invocations where
     // `n ~= npos`, likely requiring memory moves on the string data.
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE void __erase_external_with_move(size_type __pos, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 void __erase_external_with_move(size_type __pos, size_type __n);

     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
     void __copy_assign_alloc(const basic_string& __str)
@@ -2015,8 +2015,8 @@ private:
         _NOEXCEPT
         {}

-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_external(const value_type* __s);
-    _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE basic_string& __assign_external(const value_type* __s, size_type __n);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_external(const value_type* __s);
+    _LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_external(const value_type* __s, size_type __n);

     // Assigns the value in __s, guaranteed to be __n < __min_cap in length.
     inline _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_short(const value_type* __s, size_type __n) {
@@ -2169,7 +2169,7 @@ basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_ty
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 void basic_string<_CharT, _Traits, _Allocator>::__init_copy_ctor_external(
     const value_type* __s, size_type __sz) {
   if (__libcpp_is_constant_evaluated())
@@ -2398,7 +2398,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_without_replace(

 template <class _CharT, class _Traits, class _Allocator>
 template <bool __is_short>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
     const value_type* __s, size_type __n) {
@@ -2416,7 +2416,7 @@ basic_string<_CharT, _Traits, _Allocator>::__assign_no_alias(
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_external(
     const value_type* __s, size_type __n) {
@@ -2627,7 +2627,7 @@ basic_string<_CharT, _Traits, _Allocator>::assign(const _Tp& __t, size_type __po
 }

 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 basic_string<_CharT, _Traits, _Allocator>&
 basic_string<_CharT, _Traits, _Allocator>::__assign_external(const value_type* __s) {
   return __assign_external(__s, traits_type::length(__s));
@@ -3112,7 +3112,7 @@ basic_string<_CharT, _Traits, _Allocator>::replace(size_type __pos, size_type __
 // 'externally instantiated' erase() implementation, called when __n != npos.
 // Does not check __pos against size()
 template <class _CharT, class _Traits, class _Allocator>
-_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_NOINLINE
+_LIBCPP_NOINLINE _LIBCPP_CONSTEXPR_SINCE_CXX20
 void
 basic_string<_CharT, _Traits, _Allocator>::__erase_external_with_move(
     size_type __pos, size_type __n)
diff --git a/libcxx/test/libcxx/system_reserved_names.gen.py b/libcxx/test/libcxx/system_reserved_names.gen.py
index 8c4be97897f6..de68e0d666bd 100644
--- a/libcxx/test/libcxx/system_reserved_names.gen.py
+++ b/libcxx/test/libcxx/system_reserved_names.gen.py
@@ -158,5 +158,8 @@ for header in public_headers:
 #define erase SYSTEM_RESERVED_NAME
 #define refresh SYSTEM_RESERVED_NAME

+// Macros from the CUDA SDKs
+#define __noinline__ SYSTEM_RESERVED_NAME
+
 #include <{header}>
 """)

Sample test failure:

# .---command stderr------------
# | In file included from LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile
.pass.cpp:145:
# | In file included from LLVM/build-release/include/c++/v1/errno.h:25:
# | LLVM/build-release/include/c++/v1/__config:1178:23: error: missing ')' after 'This'
# |  1178 | #  if __has_attribute(__noinline__)
# |       |                       ^~~~~~~~~~~~
# | LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile.pass.cpp:143:22: note
: expanded from macro '__noinline__'
# |   143 | #define __noinline__ SYSTEM_RESERVED_NAME
# |       |                      ^~~~~~~~~~~~~~~~~~~~
# | LLVM/build-release/runtimes/runtimes-bins/test/libcxx/system_reserved_names.gen.py/errno.h.compile.pass.cpp:4:35: note:
expanded from macro 'SYSTEM_RESERVED_NAME'
# |     4 | #define SYSTEM_RESERVED_NAME This name should not be used in libc++
# |       |                              ~~~~ ^
# | LLVM/build-release/include/c++/v1/__config:1178:22: note: to match this '('
# |  1178 | #  if __has_attribute(__noinline__)
# |       |                      ^
# | 1 error generated.
# `-----------------------------
# error: command failed with exit status: 1

philnik777 · 2023-12-01T17:50:19Z

@gribozavr I'm pretty sure there are tests for this kind of thing. Clang has to both expand macros in C++11 attribute and ignore unknown ones. There may not be a test for this specific case, but for the general case I expect tests. I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700. It's not pretty, but we avoid having another macro we have to push/pop and we can remove the macro entirely after the next release branch anyways.

gribozavr · 2023-12-01T18:23:02Z

@philnik777

I'm pretty sure there are tests for this kind of thing.

I can't find anything using the pattern ::__attribute. Could you confirm that it is indeed intentionally supported?

I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700.

Yes, that indeed works.

we can remove the macro entirely after the next release branch anyways.

Why? Because the next release will not support compilers that don't support the [[__gnu__::__noinline__]] syntax?

Artem-B · 2023-12-01T18:34:43Z

Looks like __noinline__ as a macro, as it's defined right now is going to be problematic.

Perhaps the best choice is to permanently undef CUDA's __noinline__ or redefine it as cuda_noinline and let users deal with that if they want/need to. Otherwise we'll be playing this game of whack-an-inline-conflict forever.

__noinline__ appears to be relatively rarely used. I see a single reference in pytorch and there are a handful in NVIDIA's core libraries' tests . I think getting rid of them should be doable.

At the moment, only few libc++ headers use _LIBCPP_NOINLINE, so we can introduce a temporary build knob which would temporarily undef __inline__ in the wrappers for __config and string, but which would need to be explicitly enabled by the user.

So, transition would work roughly like this:

temporary undef __noinline__ in wrappers for __config and string
Introduce a knob to completely undef __noinline__ and disable the workarounds. Keep it enabled initially.
Switch the knob default to off for the next major clang release.
Remove the knob in the next major release.

If it's too much hassle, we can keep adding the wrappers. It's a hassle, but it would probably work OK, considering that it's not used all that often.

WDYT, all?

llvm#73838

philnik777 · 2023-12-01T18:43:47Z

@philnik777

I'm pretty sure there are tests for this kind of thing.

I can't find anything using the pattern ::__attribute. Could you confirm that it is indeed intentionally supported?

I wasn't talking about __attribute__ specifically. There can be basically any string after the :: that clang has to accept, so clang basically converts anything following the :: to an identifier. e.g. clang also has to accept [[whatever::volatile]]. IOW it's intentional that clang accepts any keyword in C++11 attributes. You can see in clang/test/Parser/cxx0x-attributes.cpp that it's tested with a bunch of keywords and some very interesting syntax (lines 49 and 50 currently).

I think we can avoid the preprocessor problem by having the conditional as #if !defined(_LIBCPP_CXX03_LANG) || _LICPP_CLANG_VER >= 1700.

Yes, that indeed works.

we can remove the macro entirely after the next release branch anyways.

Why? Because the next release will not support compilers that don't support the [[__gnu__::__noinline__]] syntax?

Yes

gribozavr · 2023-12-01T19:10:07Z

There can be basically any string after the :: that clang has to accept

But then it wouldn't have the desired noinline effect, right?

philnik777 · 2023-12-01T19:32:06Z

There can be basically any string after the :: that clang has to accept

But then it wouldn't have the desired noinline effect, right?

Yes, but TBH I don't think that's a huge problem. We've had the functions inlined for a few releases and not that many people seemed to care. I think it's fair to say "our code works with cuda, but please complain to the cuda sdk people that they shouldn't break __noinline__ for everybody".

philnik777 · 2023-12-18T11:40:39Z

@gribozavr Do you plan to work on this?

mordante · 2024-01-15T07:07:04Z

The easiest workaround that I can think of is to use noinline instead of __noinline__ when compiling CUDA code. To do that, just change this code in __config:
#  if __has_attribute(__noinline__)
#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
#  else
#    define _LIBCPP_NOINLINE
#  endif
to:
#  if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
#    define _LIBCPP_NOINLINE_ATTR_NAME noinline
#  else
#    define _LIBCPP_NOINLINE_ATTR_NAME __noinline__
#  endif
#  if __has_attribute(_LIBCPP_NOINLINE_ATTR_NAME)
#    define _LIBCPP_NOINLINE __attribute__((_LIBCPP_NOINLINE_ATTR_NAME))
#  else
#    define _LIBCPP_NOINLINE
#  endif
(The condition in the first #if matches the condition in CUDA's crt/host_defines.h where __noinline__ is defined as a macro.)

No pushing/popping of macros or creating wrapper headers. No maintenance burden going forward if _LIBCPP_NOINLINE is used in more places.

This is just a workaround, not a fix. Compilation can still fail if CUDA programs also define noinline as a macro. But I expect that to be extremely rare, because defining the macro noinline would likely break the CUDA headers.

I have no objection to withholding any workaround until NVIDIA shows some interesting in improving the situation on the CUDA side of things. But if there is progress in that area and libc++ wants to check in a workaround on their side, I think this is the way to go.

(Full disclosure: I work for NVIDIA, though not in the CUDA or NVCC organizations, so I don't have any direct say in what CUDA does. I am working with @jrhemstad on "communicating with the relevant internal teams to come to a better solution that everyone can be happy with.")

Thanks for working on trying to solve this inside NVIDIA!
I like your work-around better than the current proposal, thanks for the suggestions.

Artem-B · 2024-01-16T19:29:04Z

use noinline instead of __noinline__ when compiling CUDA code.

SGTM.

We should probably add a comment that this specifically targets the __noinline__ macro definition in CUDA's /crt/host_defines.h. If CUDA headers' implementation changes in the future this workaround may break and would have to grow version-dependent variants.

We may make the workaround a bit simpler and more robust by directly checking if __noinline__ is defined. However that opens a possibility of us applying the workaround if/when __noinline__ may be defined as a macro for reasons other than CUDA headers.

philnik777 · 2024-01-16T19:41:26Z

The last solution actually sounds really nice to me. It's basically "if you do UB by defining __noinline__ we're claiming noinline for ourselves".

ldionne · 2024-01-16T21:43:36Z

I'd be OK with the suggested workaround since folks are actively escalating this with the Cuda SDK. So the whole patch could be as simple as:

#  if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
#    define _LIBCPP_NOINLINE __attribute__((noinline)) // LINK TO BUG REPORT
#  elif __has_attribute(__noinline__)
#    define _LIBCPP_NOINLINE __attribute__((__noinline__))
#  else
#    define _LIBCPP_NOINLINE
#  endif

@gribozavr Are you able to update this PR accordingly? Ideally we'd land this before we cut the LLVM 18 branch next Tuesday.

…e__` macro The CUDA SDK contains an unfortunate definition for the `__noinline__` macro. I don't want to quote the CUDA header here, but you can find online plenty of information about this macro definition being problematic and creating conflicts for numerous other libraries, for example [on StackOverflow](https://stackoverflow.com/questions/70301375/noinline-macro-conflict-between-glib-and-cuda).

libcxx/include/__config

ldionne

LGTM. Thanks for the patch, and thanks everyone for the discussion.

I'll merge this later today. In the meantime, I will ping @philnik777, @mordante and @EricWF to make sure they have a chance to see this quickly and express any concerns.

ldionne · 2024-01-22T18:11:30Z

I spoke to Eric offline, who said he didn't have any special thoughts about this patch. I'm merging now.

I failed to delete the old definition as a part of #73838.

I failed to delete the old definition as a part of llvm/llvm-project#73838. NOKEYCHECK=True GitOrigin-RevId: 9629c73aeb4609f58aa9edb0b87d18dd9e8fecc0

I failed to delete the old definition as a part of llvm/llvm-project#73838.

Artem-B · 2024-11-25T23:21:45Z

The feedback that I got internally is that this has been discussed in the past and there is no plan to address this.

To be clear, this is not the definitive answer.

As I mentioned in NVIDIA/cccl#1235 (comment), I am working on communicating with the relevant internal teams to come to a better solution that everyone can be happy with.

The bug I've opened with NVIDIA has just been closed as a "not NVIDIA's problem". :-(

ldionne · 2024-11-26T13:17:07Z

@Artem-B Which bug report? Do you have a link?

jrhemstad · 2024-11-26T14:44:59Z

The bug I've opened with NVIDIA has just been closed as a "not NVIDIA's problem". :-(

Yeah, this isn't right. I have commitment from the compiler team to fix this. This sounds like the bug was closed by someone who doesn't know the context.

Artem-B · 2024-11-26T18:14:57Z

Which bug report? Do you have a link?

It is, unfortunately, siloed. AFAICT there's no publlic way to report CUDA bugs other than loudly complaining on NVIDIA's forums.

gribozavr requested a review from a team as a code owner November 29, 2023 18:21

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 29, 2023

ldionne reviewed Nov 29, 2023

View reviewed changes

libcxx/test/libcxx/system_reserved_names.gen.py Outdated Show resolved Hide resolved

EricWF approved these changes Nov 29, 2023

View reviewed changes

libcxx/test/libcxx/system_reserved_names.gen.py Outdated Show resolved Hide resolved

libcxx/include/__undef_macros Outdated Show resolved Hide resolved

philnik777 requested changes Dec 1, 2023

View reviewed changes

Artem-B added a commit to Artem-B/llvm-project that referenced this pull request Dec 1, 2023

[CUDA] work around more __noinline__ conflicts with libc++

71e24fc

llvm#73838

Artem-B mentioned this pull request Dec 1, 2023

[CUDA] work around more __noinline__ conflicts with libc++ #74123

Closed

firewave mentioned this pull request Dec 14, 2023

clang-cuda-*4 builds fail in CI #75431

Closed

ldionne added this to the LLVM 18.0.X Release milestone Jan 16, 2024

gribozavr force-pushed the libcxx-cuda-noinline-macro branch from eec89a7 to 7982d7d Compare January 20, 2024 23:00

gribozavr requested review from ldionne, mordante, philnik777 and EricWF January 20, 2024 23:02

mordante reviewed Jan 21, 2024

View reviewed changes

libcxx/include/__config Show resolved Hide resolved

Expand the comment, we are still expecting a change in the CUDA SDK

c212ae7

gribozavr requested a review from mordante January 22, 2024 00:31

ldionne approved these changes Jan 22, 2024

View reviewed changes

philnik777 approved these changes Jan 22, 2024

View reviewed changes

mordante approved these changes Jan 22, 2024

View reviewed changes

ldionne merged commit 7378fb3 into llvm:main Jan 22, 2024

gribozavr deleted the libcxx-cuda-noinline-macro branch January 22, 2024 23:26

gribozavr mentioned this pull request Jan 23, 2024

[libc++] Remove a duplicated definition of _LIBCPP_NOINLINE #79114

Merged

philnik777 pushed a commit that referenced this pull request Jan 23, 2024

[libc++] Remove a duplicated definition of _LIBCPP_NOINLINE (#79114)

9629c73

I failed to delete the old definition as a part of #73838.

qihangkong pushed a commit to rvgpu/llvm that referenced this pull request Apr 18, 2024

[libc++] Remove a duplicated definition of _LIBCPP_NOINLINE

ec104b4

I failed to delete the old definition as a part of llvm/llvm-project#73838.

[libc++] Protect the libc++ implementation from CUDA SDK's __noinline__ macro #73838

[libc++] Protect the libc++ implementation from CUDA SDK's __noinline__ macro #73838

Uh oh!

Conversation

gribozavr commented Nov 29, 2023 • edited by ldionne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 29, 2023

Uh oh!

github-actions bot commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jyknight commented Nov 29, 2023

Uh oh!

EricWF commented Nov 29, 2023

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philnik777 commented Nov 29, 2023

Uh oh!

EricWF left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mordante commented Nov 29, 2023

Uh oh!

gribozavr commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Nov 29, 2023

Uh oh!

jyknight commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Nov 29, 2023

Uh oh!

ldionne commented Nov 29, 2023

Uh oh!

Artem-B commented Nov 29, 2023

Uh oh!

Artem-B commented Nov 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

gribozavr commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gribozavr commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philnik777 commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gribozavr commented Dec 1, 2023

Uh oh!

Artem-B commented Dec 1, 2023

Uh oh!

philnik777 commented Dec 1, 2023

Uh oh!

gribozavr commented Dec 1, 2023

Uh oh!

philnik777 commented Dec 1, 2023

Uh oh!

philnik777 commented Dec 18, 2023

Uh oh!

mordante commented Jan 15, 2024

Uh oh!

Artem-B commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philnik777 commented Jan 16, 2024

Uh oh!

ldionne commented Jan 16, 2024

Uh oh!

[libc++] Protect the libc++ implementation from CUDA SDK's `noinline` macro #73838

[libc++] Protect the libc++ implementation from CUDA SDK's `noinline` macro #73838

gribozavr commented Nov 29, 2023 •

edited by ldionne

Loading

github-actions bot commented Nov 29, 2023 •

edited

Loading

EricWF left a comment •

edited

Loading

gribozavr commented Nov 29, 2023 •

edited

Loading

jyknight commented Nov 29, 2023 •

edited

Loading

Artem-B commented Nov 30, 2023 •

edited

Loading

gribozavr commented Dec 1, 2023 •

edited

Loading

gribozavr commented Dec 1, 2023 •

edited

Loading

philnik777 commented Dec 1, 2023 •

edited

Loading

Artem-B commented Jan 16, 2024 •

edited

Loading