-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc++] Protect the libc++ implementation from CUDA SDK's __noinline__
macro
#73838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-libcxx Author: Dmitri Gribenko (gribozavr) ChangesThe CUDA SDK contains an unfortunate definition for the I don't want to quote the code here, but you can find online plenty of information about this macro definition being problematic and creating conflicts for numerous other libraries, for example on StackOverflow. This patch does the following:
Macro protection brackets in Full diff: https://github.com/llvm/llvm-project/pull/73838.diff 3 Files Affected:
diff --git a/libcxx/include/__config b/libcxx/include/__config
index ee77305162f7fc8..2a77b88562ef614 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -46,6 +46,12 @@
# endif
# endif
+# define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")") _Pragma("push_macro(\"__noinline__\")")
+# define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")") _Pragma("pop_macro(\"__noinline__\")")
+
+_LIBCPP_PUSH_MACROS
+# include <__undef_macros>
+
// The attributes supported by clang are documented at https://clang.llvm.org/docs/AttributeReference.html
// _LIBCPP_VERSION represents the version of libc++, which matches the version of LLVM.
@@ -1208,11 +1214,6 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
# define _LIBCPP_ENABLE_CXX20_REMOVED_TYPE_TRAITS
# endif // _LIBCPP_ENABLE_CXX20_REMOVED_FEATURES
-// clang-format off
-# define _LIBCPP_PUSH_MACROS _Pragma("push_macro(\"min\")") _Pragma("push_macro(\"max\")") _Pragma("push_macro(\"refresh()\")") _Pragma("push_macro(\"move(int, int)\")") _Pragma("push_macro(\"erase()\")")
-# define _LIBCPP_POP_MACROS _Pragma("pop_macro(\"min\")") _Pragma("pop_macro(\"max\")") _Pragma("pop_macro(\"refresh()\")") _Pragma("pop_macro(\"move(int, int)\")") _Pragma("pop_macro(\"erase()\")")
-// clang-format on
-
# ifndef _LIBCPP_NO_AUTO_LINK
# if defined(_LIBCPP_ABI_MICROSOFT) && !defined(_LIBCPP_BUILDING_LIBRARY)
# if !defined(_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS)
@@ -1489,6 +1490,8 @@ __sanitizer_verify_double_ended_contiguous_container(const void*, const void*, c
# define _LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
# endif
+_LIBCPP_POP_MACROS
+
#endif // __cplusplus
#endif // _LIBCPP___CONFIG
diff --git a/libcxx/include/__undef_macros b/libcxx/include/__undef_macros
index 29ab327e1c375ae..885057629d1b831 100644
--- a/libcxx/include/__undef_macros
+++ b/libcxx/include/__undef_macros
@@ -26,3 +26,7 @@
#ifdef erase
# undef erase
#endif
+
+#ifdef __noinline__
+# undef __noinline__
+#endif
diff --git a/libcxx/test/libcxx/system_reserved_names.gen.py b/libcxx/test/libcxx/system_reserved_names.gen.py
index 8c4be97897f657e..17ef50568bbb666 100644
--- a/libcxx/test/libcxx/system_reserved_names.gen.py
+++ b/libcxx/test/libcxx/system_reserved_names.gen.py
@@ -158,5 +158,8 @@
#define erase SYSTEM_RESERVED_NAME
#define refresh SYSTEM_RESERVED_NAME
+// Macros from the CUDA SDKs
+#define __noinline__ __attribute__((noinline))
+
#include <{header}>
""")
|
✅ With the latest revision this PR passed the Python code formatter. |
Has anyone filed a bug against CUDA to stop doing this? |
I'm OK taking this patch, but I would like us to file a bug against CUDA first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, working around this appears to be fairly simple and we have precedent for this for other macros. On the other hand, NVIDIA apparently acknowledges the risk for conflicts when they define this macro according to the SO answer:
/* gcc allows users to define attributes with underscores,
e.g., __attribute__((__noinline__)).
Consider a non-CUDA source file (e.g. .cpp) that has the
above attribute specification, and includes this header file. In that case,
defining __noinline__ as below would cause a gcc compilation error.
Hence, only define __noinline__ when the code is being processed
by a CUDA compiler component.
*/
Fixing this here doesn't provide much of an incentive for them to fix their SDK, but being incompatible with a major C++ standard library implementation would. Do we have anyone from Nvidia around here who could enlighten us on the status of that bug report?
Given that you say yourself that this causes problems with lots of other libraries, I think it would be better to fix the CUDA SDK - especially since it's a really simple fix for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the bots come back.
There has been a bunch of discussion about if we want this. I probably LGTM'ed this to soon.
Maybe we should get some CUDA builders?
I assume this change means we need to push and pop our macros where we use |
That infrastructure was already in place, it was introduced by https://reviews.llvm.org/D147356. The only missing piece was the push/pop bracketing in
Yes, we are filing a bug for the CUDA SDK. However the fix will arrive in future SDK versions, the ones currently in use are already conflicting. So ideally libc++ will keep this workaround until those existing SDK versions become obsolete. |
Clang already works around a handful of cases where we ran into this: https://github.com/llvm/llvm-project/commits/main/clang/lib/Headers/cuda_wrappers @jrhemstad What would be a proper channel to ask NVIDIA's nvcc/CUDA owners to help with that? I can file a bug directly with NVIDIA, but this is a public issue and it would be great to track it publicly, too. |
Should the fix be done by adding a |
If that's the only place where the issue pops up, then it's an option. So far that's what we did for libstdc++ workarounds. |
^ FWIW I really like that option! I'd be curious to know if that fixes @gribozavr 's issues. Like I said, I don't have strong push back about working around this in libc++ but I'd want a public issue with CUDA to be referenced here and an acknowledgement that they're willing and working on fixing the issue. If that's the case I see no point in making people's lives hard and we should work around it here or in the cuda wrapper headers. Otherwise, in case CUDA says something like "yeah we don't really care", then we'd be bending backwards to support something bad that another vendor is doing in complete awareness and I would push back on doing that on the basis that it's not a good precedent to set nor a sane long term direction for libc++. |
See. NVIDIA/thrust#1703 The way I read it, the fact that NVCC hacks around this issue is a strong indication that it's not going to be fixed any time soon. And we still need to deal with existing CUDA versions anyways, so what NVIDIA is going to do about it is almost a moot point. |
Hmm. Looks like undefining
The problem is that expansion of I think we do need to deal with this in libc++. At the very least we should try figuring out how to define _LIBCPP_NOINLINE` in a way that's not affected by macro expansion later on. https://godbolt.org/z/7qfTM54hh for illustration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually looks like nvc++ isn't working around the issue. They simply parse GNU attributes slightly differently: https://godbolt.org/z/MKKf4Wdan. I think we can fix this all by simply using [[__gnu__::__noinline__]]
instead. This works on every compiler we support except Clang 16 in C++03 mode, so I don't think that's a huge problem. If anybody complains that the CodeGen is different if cuda is enabled, we can tell them to complain to the cuda sdk people and hopefully it'll be fixed at some point.
@jyknight I agree with @Artem-B 's analysis. To work around this problem, we need to hide CUDA's
|
@philnik777 Thank you for the idea!
So you're suggesting that we change the definition of
Thus, it would expand to either That would work if (I also have a concern that there are no tests in Clang for such syntax, so I suspect the fact that it works is accidental - but even if it is intentional, the Here is a patch if you want to try it yourself: A patch that changes _LIBCPP_NOINLINE to [[__gnu__::__noinline__]]
Sample test failure:
|
@gribozavr I'm pretty sure there are tests for this kind of thing. Clang has to both expand macros in C++11 attribute and ignore unknown ones. There may not be a test for this specific case, but for the general case I expect tests. I think we can avoid the preprocessor problem by having the conditional as |
I can't find anything using the pattern
Yes, that indeed works.
Why? Because the next release will not support compilers that don't support the |
Looks like Perhaps the best choice is to permanently undef CUDA's
At the moment, only few libc++ headers use _LIBCPP_NOINLINE, so we can introduce a temporary build knob which would temporarily undef So, transition would work roughly like this:
If it's too much hassle, we can keep adding the wrappers. It's a hassle, but it would probably work OK, considering that it's not used all that often. WDYT, all? |
I wasn't talking about
Yes |
But then it wouldn't have the desired noinline effect, right? |
Yes, but TBH I don't think that's a huge problem. We've had the functions inlined for a few releases and not that many people seemed to care. I think it's fair to say "our code works with cuda, but please complain to the cuda sdk people that they shouldn't break |
@gribozavr Do you plan to work on this? |
Thanks for working on trying to solve this inside NVIDIA! |
SGTM. We should probably add a comment that this specifically targets the We may make the workaround a bit simpler and more robust by directly checking if |
The last solution actually sounds really nice to me. It's basically "if you do UB by defining |
I'd be OK with the suggested workaround since folks are actively escalating this with the Cuda SDK. So the whole patch could be as simple as: # if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
# define _LIBCPP_NOINLINE __attribute__((noinline)) // LINK TO BUG REPORT
# elif __has_attribute(__noinline__)
# define _LIBCPP_NOINLINE __attribute__((__noinline__))
# else
# define _LIBCPP_NOINLINE
# endif @gribozavr Are you able to update this PR accordingly? Ideally we'd land this before we cut the LLVM 18 branch next Tuesday. |
…e__` macro The CUDA SDK contains an unfortunate definition for the `__noinline__` macro. I don't want to quote the CUDA header here, but you can find online plenty of information about this macro definition being problematic and creating conflicts for numerous other libraries, for example [on StackOverflow](https://stackoverflow.com/questions/70301375/noinline-macro-conflict-between-glib-and-cuda).
eec89a7
to
7982d7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the patch, and thanks everyone for the discussion.
I'll merge this later today. In the meantime, I will ping @philnik777, @mordante and @EricWF to make sure they have a chance to see this quickly and express any concerns.
I spoke to Eric offline, who said he didn't have any special thoughts about this patch. I'm merging now. |
I failed to delete the old definition as a part of #73838.
I failed to delete the old definition as a part of llvm/llvm-project#73838. NOKEYCHECK=True GitOrigin-RevId: 9629c73aeb4609f58aa9edb0b87d18dd9e8fecc0
I failed to delete the old definition as a part of llvm/llvm-project#73838.
The bug I've opened with NVIDIA has just been closed as a "not NVIDIA's problem". :-( |
@Artem-B Which bug report? Do you have a link? |
Yeah, this isn't right. I have commitment from the compiler team to fix this. This sounds like the bug was closed by someone who doesn't know the context. |
It is, unfortunately, siloed. AFAICT there's no publlic way to report CUDA bugs other than loudly complaining on NVIDIA's forums. |
The CUDA SDK contains an unfortunate definition for the
__noinline__
macro. This patch works around it by using
__attribute__((noinline))
instead of
__attribute__((__noinline__))
on CUDA. We are still waitingfor a long-term resolution to this issue in NVIDIA/cccl#1235.