Skip to content

Commit 182e5ac

Browse files
committed
[libc] Check the RPC server once again after the kernel exits
We support asynchronous sends, that means that the kernel can issue a send, then exit the kernel as we do with the `EXIT` syscall. Because of the condition it's therefore possible for the kernel to exit and break from the loop before we check the server again. This can potentially cause us to ignore an `EXIT` call from the GPU. Reviewed By: JonChesterfield, lntue Differential Revision: https://reviews.llvm.org/D150456
1 parent 648d192 commit 182e5ac

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

libc/utils/gpu/loader/amdgpu/Loader.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,10 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
221221
/*timeout_hint=*/1024, HSA_WAIT_STATE_ACTIVE) != 0)
222222
handle_server();
223223

224+
// Handle the server one more time in case the kernel exited with a pending
225+
// send still in flight.
226+
handle_server();
227+
224228
// Destroy the resources acquired to launch the kernel and return.
225229
if (hsa_status_t err = hsa_amd_memory_pool_free(args))
226230
handle_error(err);

libc/utils/gpu/loader/nvptx/Loader.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,10 @@ CUresult launch_kernel(CUmodule binary, CUstream stream,
186186
while (cuStreamQuery(stream) == CUDA_ERROR_NOT_READY)
187187
handle_server();
188188

189+
// Handle the server one more time in case the kernel exited with a pending
190+
// send still in flight.
191+
handle_server();
192+
189193
return CUDA_SUCCESS;
190194
}
191195

0 commit comments

Comments
 (0)