-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Offload] Make only a single thread handle the RPC server thread #126067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.
@llvm/pr-subscribers-offload Author: Joseph Huber (jhuber6) ChangesSummary: Another question is if we should make this thread truly global state, Full diff: https://github.com/llvm/llvm-project/pull/126067.diff 3 Files Affected:
diff --git a/offload/plugins-nextgen/common/include/RPC.h b/offload/plugins-nextgen/common/include/RPC.h
index d750ce30e74b05e..7b031083647aafd 100644
--- a/offload/plugins-nextgen/common/include/RPC.h
+++ b/offload/plugins-nextgen/common/include/RPC.h
@@ -80,7 +80,7 @@ struct RPCServerTy {
std::thread Worker;
/// A boolean indicating whether or not the worker thread should continue.
- std::atomic<bool> Running;
+ std::atomic<uint32_t> Running;
/// The number of currently executing kernels across all devices that need
/// the server thread to be running.
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 76ae0a2dd9c4523..48c9b671c1a91d3 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -1058,9 +1058,8 @@ Error GenericDeviceTy::setupRPCServer(GenericPluginTy &Plugin,
if (auto Err = Server.initDevice(*this, Plugin.getGlobalHandler(), Image))
return Err;
- if (!Server.Thread->Running.load(std::memory_order_acquire))
- if (auto Err = Server.startThread())
- return Err;
+ if (auto Err = Server.startThread())
+ return Err;
RPCServer = &Server;
DP("Running an RPC server on device %d\n", getDeviceId());
@@ -1635,12 +1634,11 @@ Error GenericPluginTy::deinit() {
if (GlobalHandler)
delete GlobalHandler;
- if (RPCServer && RPCServer->Thread->Running.load(std::memory_order_acquire))
+ if (RPCServer) {
if (Error Err = RPCServer->shutDown())
return Err;
-
- if (RPCServer)
delete RPCServer;
+ }
if (RecordReplay)
delete RecordReplay;
diff --git a/offload/plugins-nextgen/common/src/RPC.cpp b/offload/plugins-nextgen/common/src/RPC.cpp
index e6750a540b391d7..4289f920c0e1eb2 100644
--- a/offload/plugins-nextgen/common/src/RPC.cpp
+++ b/offload/plugins-nextgen/common/src/RPC.cpp
@@ -99,18 +99,15 @@ static rpc::Status runServer(plugin::GenericDeviceTy &Device, void *Buffer) {
}
void RPCServerTy::ServerThread::startThread() {
- assert(!Running.load(std::memory_order_relaxed) &&
- "Attempting to start thread that is already running");
- Running.store(true, std::memory_order_release);
- Worker = std::thread([this]() { run(); });
+ if (!Running.fetch_or(true, std::memory_order_acquire))
+ Worker = std::thread([this]() { run(); });
}
void RPCServerTy::ServerThread::shutDown() {
- assert(Running.load(std::memory_order_relaxed) &&
- "Attempting to shut down a thread that is not running");
+ if (!Running.fetch_and(false, std::memory_order_release))
+ return;
{
std::lock_guard<decltype(Mutex)> Lock(Mutex);
- Running.store(false, std::memory_order_release);
CV.notify_all();
}
if (Worker.joinable())
|
I think we still want to have one for each. There isn’t much to gain from sharing a single instance. To support both AMD and NVIDIA cards on the same system, an offloading program could use both cards simultaneously, with each relying on host RPC calls. While I wouldn’t say it’s entirely impossible, it’s 99.99% unlikely. One potential benefit of keeping them private to each plugin is the ability to customize the handling of certain RPC calls if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good fix. We might want a different strategy for host threads later but staying with one per device looks good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…m#126067) Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.
Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.
Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.