Skip to content

[Clang] Prevent mlink-builtin-bitcode from internalizing the RPC client #118661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 28, 2025

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Dec 4, 2024

Summary:
Currently, we only use -mlink-builtin-bitcode for non-LTO NVIDIA
compiliations. This has the problem that it will internalize the RPC
client symbol which needs to be visible to the host. To counteract that,
I put retain on it, but this also prevents optimizations on the global
itself, so the passes we have that remove the symbol don't work on
OpenMP anymore. This patch does the dumbest solution, adding a special
string check for it in clang. Not the best solution, the runner up would
be to have a clang attribute for externally_initialized because those
can't be internalized, but that might have some unfortunate
side-effects. Alternatively we could make NVIDIA compilations do LTO all
the time, but that would affect some users and it's harder than I
thought.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. offload labels Dec 4, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-offload

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently, we only use -mmlink-builtin-bitcode for non-LTO NVIDIA
compiliations. THis has the problem that it will internalize the RPC
client symbol which needs to be visible to the host. To counteract that,
I put retain on it, but this also prevents optimizations on the global
itself, so the passes we have that remove the symbol don't work on
OpenMP anymore. This patch does the dumbest solution, adding a special
string check for it in clang. Not the best solution, the runner up would
be to habe a clang attribute for externally_inititliazed because those
can't be internalized, but that might have some unfortunate
side-effects. Alternatively we could make NVIDIA compilations do LTO all
the time, but that would affect some users and it's harder than I
thought.


Full diff: https://github.com/llvm/llvm-project/pull/118661.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+2-1)
  • (modified) offload/DeviceRTL/src/Misc.cpp (+1-4)
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index cc927f44e0326e..db762b22560f28 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -246,7 +246,8 @@ bool BackendConsumer::LinkInModules(llvm::Module *M) {
           *M, std::move(LM.Module), LM.LinkFlags,
           [](llvm::Module &M, const llvm::StringSet<> &GVS) {
             internalizeModule(M, [&GVS](const llvm::GlobalValue &GV) {
-              return !GV.hasName() || (GVS.count(GV.getName()) == 0);
+              return !GV.hasName() || (GVS.count(GV.getName()) == 0) ||
+                     GV.getName().starts_with("__llvm_rpc_client");
             });
           });
     } else
diff --git a/offload/DeviceRTL/src/Misc.cpp b/offload/DeviceRTL/src/Misc.cpp
index c1df477365bcb6..e6f1a341e4d769 100644
--- a/offload/DeviceRTL/src/Misc.cpp
+++ b/offload/DeviceRTL/src/Misc.cpp
@@ -113,10 +113,7 @@ void *indirectCallLookup(void *HstPtr) {
 }
 
 /// The openmp client instance used to communicate with the server.
-/// FIXME: This is marked as 'retain' so that it is not removed via
-/// `-mlink-builtin-bitcode`
-[[gnu::visibility("protected"), gnu::weak,
-  gnu::retain]] rpc::Client Client asm("__llvm_rpc_client");
+[[gnu::visibility("protected"), gnu::weak]] rpc::Client Client asm("__llvm_rpc_client");
 
 } // namespace impl
 } // namespace ompx

Copy link

github-actions bot commented Dec 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm
Copy link
Contributor

arsenm commented Dec 4, 2024

clang attribute for externally_initialized

I'm surprised there isn't one already. Also seems better if you're going to special case the symbol to special case it by just setting this rather than skipping internalize for it

@jhuber6
Copy link
Contributor Author

jhuber6 commented Dec 4, 2024

clang attribute for externally_initialized

I'm surprised there isn't one already. Also seems better if you're going to special case the symbol to special case it by just setting this rather than skipping internalize for it

I'm not sure I want it in the generic case though, because I don't know the side-effects that LLVM attribute has. This is only necessary because of this stupid mlink-builtin-bitcode thing. if there's some other way to just tell it not to internalize this global that'd be great.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 24, 2025

Ping, this is a little hacky since the real solution is to stop using -mlink-builtin-bitcode, but I'd like to let this be optimized out on NVPTX before the release.

…ient

Summary:
Currently, we only use `-mmlink-builtin-bitcode` for non-LTO NVIDIA
compiliations. THis has the problem that it will internalize the RPC
client symbol which needs to be visible to the host. To counteract that,
I put `retain` on it, but this also prevents optimizations on the global
itself, so the passes we have that remove the symbol don't work on
OpenMP anymore. This patch does the dumbest solution, adding a special
string check for it in clang. Not the best solution, the runner up would
be to habe a clang attribute for `externally_inititliazed` because those
can't be internalized, but that might have some unfortunate
side-effects. Alternatively we could make NVIDIA compilations do LTO all
the time, but that would affect some users and it's harder than I
thought.
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change looks fine but I'm not an expert to judge if this is the best to way to do so.

@yxsamliu @Artem-B what do you folks think?

@@ -233,6 +233,10 @@ bool InternalizePass::internalizeModule(Module &M) {
else
AlwaysPreserved.insert("__stack_chk_guard");

// Preserve the RPC interface for GPU host callbacks when internalizing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the FIXME suggested, this might be done by introducing a lambda argument when creating this pass, like MustPreserveGV, or extend MustPreserveGV to be able to preserve functions. Then when AMDGPU backend creates this pass, it may put the name of the functions that needs preserving in that lamba.

However, since there is precedence to preserve specific functions by name, I won't put a hard requirement on this PR. It is up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what a previous version of this was, but I don't think it's correct in every case. The ideal solution would just be to remove the -mlink-builtin-bitcode option entirely.

@jhuber6 jhuber6 requested a review from jplehr January 27, 2025 20:32
Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follows the "hacky" SOTA and gets us a little closer to what we want.

@jhuber6 jhuber6 merged commit 760a786 into llvm:main Jan 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category llvm:transforms offload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants