Skip to content

[Offload] Make only a single thread handle the RPC server thread #126067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Feb 6, 2025

Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.

Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.
@llvmbot
Copy link
Member

llvmbot commented Feb 6, 2025

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.


Full diff: https://github.com/llvm/llvm-project/pull/126067.diff

3 Files Affected:

  • (modified) offload/plugins-nextgen/common/include/RPC.h (+1-1)
  • (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+4-6)
  • (modified) offload/plugins-nextgen/common/src/RPC.cpp (+4-7)
diff --git a/offload/plugins-nextgen/common/include/RPC.h b/offload/plugins-nextgen/common/include/RPC.h
index d750ce30e74b05e..7b031083647aafd 100644
--- a/offload/plugins-nextgen/common/include/RPC.h
+++ b/offload/plugins-nextgen/common/include/RPC.h
@@ -80,7 +80,7 @@ struct RPCServerTy {
     std::thread Worker;
 
     /// A boolean indicating whether or not the worker thread should continue.
-    std::atomic<bool> Running;
+    std::atomic<uint32_t> Running;
 
     /// The number of currently executing kernels across all devices that need
     /// the server thread to be running.
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 76ae0a2dd9c4523..48c9b671c1a91d3 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -1058,9 +1058,8 @@ Error GenericDeviceTy::setupRPCServer(GenericPluginTy &Plugin,
   if (auto Err = Server.initDevice(*this, Plugin.getGlobalHandler(), Image))
     return Err;
 
-  if (!Server.Thread->Running.load(std::memory_order_acquire))
-    if (auto Err = Server.startThread())
-      return Err;
+  if (auto Err = Server.startThread())
+    return Err;
 
   RPCServer = &Server;
   DP("Running an RPC server on device %d\n", getDeviceId());
@@ -1635,12 +1634,11 @@ Error GenericPluginTy::deinit() {
   if (GlobalHandler)
     delete GlobalHandler;
 
-  if (RPCServer && RPCServer->Thread->Running.load(std::memory_order_acquire))
+  if (RPCServer) {
     if (Error Err = RPCServer->shutDown())
       return Err;
-
-  if (RPCServer)
     delete RPCServer;
+  }
 
   if (RecordReplay)
     delete RecordReplay;
diff --git a/offload/plugins-nextgen/common/src/RPC.cpp b/offload/plugins-nextgen/common/src/RPC.cpp
index e6750a540b391d7..4289f920c0e1eb2 100644
--- a/offload/plugins-nextgen/common/src/RPC.cpp
+++ b/offload/plugins-nextgen/common/src/RPC.cpp
@@ -99,18 +99,15 @@ static rpc::Status runServer(plugin::GenericDeviceTy &Device, void *Buffer) {
 }
 
 void RPCServerTy::ServerThread::startThread() {
-  assert(!Running.load(std::memory_order_relaxed) &&
-         "Attempting to start thread that is already running");
-  Running.store(true, std::memory_order_release);
-  Worker = std::thread([this]() { run(); });
+  if (!Running.fetch_or(true, std::memory_order_acquire))
+    Worker = std::thread([this]() { run(); });
 }
 
 void RPCServerTy::ServerThread::shutDown() {
-  assert(Running.load(std::memory_order_relaxed) &&
-         "Attempting to shut down a thread that is not running");
+  if (!Running.fetch_and(false, std::memory_order_release))
+    return;
   {
     std::lock_guard<decltype(Mutex)> Lock(Mutex);
-    Running.store(false, std::memory_order_release);
     CV.notify_all();
   }
   if (Worker.joinable())

@shiltian
Copy link
Contributor

shiltian commented Feb 6, 2025

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.

I think we still want to have one for each. There isn’t much to gain from sharing a single instance. To support both AMD and NVIDIA cards on the same system, an offloading program could use both cards simultaneously, with each relying on host RPC calls. While I wouldn’t say it’s entirely impossible, it’s 99.99% unlikely. One potential benefit of keeping them private to each plugin is the ability to customize the handling of certain RPC calls if needed.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. We might want a different strategy for host threads later but staying with one per device looks good for now.

Copy link
Contributor

@dhruvachak dhruvachak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jhuber6 jhuber6 merged commit 5812d0b into llvm:main Feb 6, 2025
8 checks passed
@jhuber6 jhuber6 deleted the Thread branch February 6, 2025 17:38
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…m#126067)

Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants