[OpenMP] Add support for custom callback in AMDGPUStream #112785

jhuber6 · 2024-10-17T22:10:47Z

Summary:
We have the ability to schedule callbacks after certain events complete.
Currently we can register an arbitrary callback in CUDA, but can't in
AMDGPU. I am planning on using this support to move the RPC handling to
a separate thread, then using these callbacks to suspend / resume it
when no kernels are running. This is a preliminary patch to keep this
noise out of that one.

llvmbot · 2024-10-17T22:11:21Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
We have the ability to schedule callbacks after certain events complete.
Currently we can register an arbitrary callback in CUDA, but can't in
AMDGPU. I am planning on using this support to move the RPC handling to
a separate thread, then using these callbacks to suspend / resume it
when no kernels are running. This is a preliminary patch to keep this
noise out of that one.

Full diff: https://github.com/llvm/llvm-project/pull/112785.diff

1 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+42-25)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index f0cc0c2e4d08e5..b6ad149e728caa 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -927,6 +927,8 @@ struct AMDGPUStreamTy {
     AMDGPUSignalManagerTy *SignalManager;
   };
 
+  using AMDGPUStreamCallbackTy = Error(void *Data);
+
   /// The stream is composed of N stream's slots. The struct below represents
   /// the fields of each slot. Each slot has a signal and an optional action
   /// function. When appending an HSA asynchronous operation to the stream, one
@@ -942,65 +944,80 @@ struct AMDGPUStreamTy {
     /// operation as input signal.
     AMDGPUSignalTy *Signal;
 
-    /// The action that must be performed after the operation's completion. Set
+    /// The actions that must be performed after the operation's completion. Set
     /// to nullptr when there is no action to perform.
-    Error (*ActionFunction)(void *);
+    llvm::SmallVector<AMDGPUStreamCallbackTy *> Callbacks;
 
     /// Space for the action's arguments. A pointer to these arguments is passed
     /// to the action function. Notice the space of arguments is limited.
-    union {
+    union ActionArgsTy {
       MemcpyArgsTy MemcpyArgs;
       ReleaseBufferArgsTy ReleaseBufferArgs;
       ReleaseSignalArgsTy ReleaseSignalArgs;
-    } ActionArgs;
+      void *CallbackArgs;
+    };
+
+    llvm::SmallVector<ActionArgsTy> ActionArgs;
 
     /// Create an empty slot.
-    StreamSlotTy() : Signal(nullptr), ActionFunction(nullptr) {}
+    StreamSlotTy() : Signal(nullptr), Callbacks({}), ActionArgs({}) {}
 
     /// Schedule a host memory copy action on the slot.
     Error schedHostMemoryCopy(void *Dst, const void *Src, size_t Size) {
-      ActionFunction = memcpyAction;
-      ActionArgs.MemcpyArgs = MemcpyArgsTy{Dst, Src, Size};
+      Callbacks.emplace_back(memcpyAction);
+      ActionArgs.emplace_back().MemcpyArgs = MemcpyArgsTy{Dst, Src, Size};
       return Plugin::success();
     }
 
     /// Schedule a release buffer action on the slot.
     Error schedReleaseBuffer(void *Buffer, AMDGPUMemoryManagerTy &Manager) {
-      ActionFunction = releaseBufferAction;
-      ActionArgs.ReleaseBufferArgs = ReleaseBufferArgsTy{Buffer, &Manager};
+      Callbacks.emplace_back(releaseBufferAction);
+      ActionArgs.emplace_back().ReleaseBufferArgs =
+          ReleaseBufferArgsTy{Buffer, &Manager};
       return Plugin::success();
     }
 
     /// Schedule a signal release action on the slot.
     Error schedReleaseSignal(AMDGPUSignalTy *SignalToRelease,
                              AMDGPUSignalManagerTy *SignalManager) {
-      ActionFunction = releaseSignalAction;
-      ActionArgs.ReleaseSignalArgs =
+      Callbacks.emplace_back(releaseSignalAction);
+      ActionArgs.emplace_back().ReleaseSignalArgs =
           ReleaseSignalArgsTy{SignalToRelease, SignalManager};
       return Plugin::success();
     }
 
+    /// Register a callback to be called on compleition
+    Error schedCallback(AMDGPUStreamCallbackTy *Func, void *Data) {
+      Callbacks.emplace_back(Func);
+      ActionArgs.emplace_back().CallbackArgs = Data;
+
+      return Plugin::success();
+    }
+
     // Perform the action if needed.
     Error performAction() {
-      if (!ActionFunction)
+      if (Callbacks.empty())
         return Plugin::success();
 
-      // Perform the action.
-      if (ActionFunction == memcpyAction) {
-        if (auto Err = memcpyAction(&ActionArgs))
-          return Err;
-      } else if (ActionFunction == releaseBufferAction) {
-        if (auto Err = releaseBufferAction(&ActionArgs))
-          return Err;
-      } else if (ActionFunction == releaseSignalAction) {
-        if (auto Err = releaseSignalAction(&ActionArgs))
-          return Err;
-      } else {
-        return Plugin::error("Unknown action function!");
+      for (auto [Callback, ActionArg] : llvm::zip(Callbacks, ActionArgs)) {
+        // Perform the action.
+        if (Callback == memcpyAction) {
+          if (auto Err = memcpyAction(&ActionArg))
+            return Err;
+        } else if (Callback == releaseBufferAction) {
+          if (auto Err = releaseBufferAction(&ActionArg))
+            return Err;
+        } else if (Callback == releaseSignalAction) {
+          if (auto Err = releaseSignalAction(&ActionArg))
+            return Err;
+        } else {
+          if (auto Err = Callback(ActionArg.CallbackArgs))
+            return Err;
+        }
       }
 
       // Invalidate the action.
-      ActionFunction = nullptr;
+      Callbacks.clear();
 
       return Plugin::success();
     }

shiltian · 2024-10-17T22:58:38Z

Can you stack the PRs such that we can have a clear idea of what you are trying to do?

jhuber6 · 2024-10-17T23:20:54Z

Can you stack the PRs such that we can have a clear idea of what you are trying to do?

Stacking PRs in GH is a pain, and I haven't finished the other part yet. I thought that this was straightforward enough to show that it doesn't break anything with the promise that it will have a user later.

Summary: We have the ability to schedule callbacks after certain events complete. Currently we can register an arbitrary callback in CUDA, but can't in AMDGPU. I am planning on using this support to move the RPC handling to a separate thread, then using these callbacks to suspend / resume it when no kernels are running. This is a preliminary patch to keep this noise out of that one.

jhuber6 · 2024-10-29T17:17:17Z

@ronlieb @jplehr @dhruvachak AMD version https://gist.github.com/jhuber6/09718c6834071957a790e95ee37000b9. I was trying to port it to use the proper interface, but the type you crammed into OMPT is non-trivial due to std::unique_ptr so I just worked around it being a weird separate thing.

Summary: We have the ability to schedule callbacks after certain events complete. Currently we can register an arbitrary callback in CUDA, but can't in AMDGPU. I am planning on using this support to move the RPC handling to a separate thread, then using these callbacks to suspend / resume it when no kernels are running. This is a preliminary patch to keep this noise out of that one.

jhuber6 requested review from jdoerfert, jplehr, saiislam and shiltian October 17, 2024 22:10

llvmbot added backend:AMDGPU offload labels Oct 17, 2024

jhuber6 force-pushed the callback branch from ba7c850 to 4769b40 Compare October 17, 2024 22:29

jhuber6 force-pushed the callback branch from 4769b40 to fd0e8d1 Compare October 18, 2024 15:27

jhuber6 mentioned this pull request Oct 19, 2024

[Offload] Move RPC server handling to a dedicated thread #112988

Merged

jhuber6 force-pushed the callback branch from fd0e8d1 to 4954d14 Compare October 19, 2024 15:11

shiltian approved these changes Oct 28, 2024

View reviewed changes

jhuber6 added 2 commits October 28, 2024 20:47

Comments

c0cc5a9

jhuber6 force-pushed the callback branch from 4954d14 to c0cc5a9 Compare October 29, 2024 13:09

jhuber6 merged commit d661aea into llvm:main Oct 29, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP] Add support for custom callback in AMDGPUStream #112785

[OpenMP] Add support for custom callback in AMDGPUStream #112785

Uh oh!

jhuber6 commented Oct 17, 2024

Uh oh!

llvmbot commented Oct 17, 2024 •

edited

Loading

Uh oh!

shiltian commented Oct 17, 2024

Uh oh!

jhuber6 commented Oct 17, 2024

Uh oh!

jhuber6 commented Oct 29, 2024

Uh oh!

Uh oh!

Uh oh!

[OpenMP] Add support for custom callback in AMDGPUStream #112785

[OpenMP] Add support for custom callback in AMDGPUStream #112785

Uh oh!

Conversation

jhuber6 commented Oct 17, 2024

Uh oh!

llvmbot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Oct 17, 2024

Uh oh!

jhuber6 commented Oct 17, 2024

Uh oh!

jhuber6 commented Oct 29, 2024

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Oct 17, 2024 •

edited

Loading