Skip to content

Commit 0f88be7

Browse files
authored
[Libomptarget] Fix Nvidia offloading hanging on dataRetrieve using RPC (llvm#66817)
Summary: The RPC server is responsible for providing host services from the GPU. Generally, the client running on the GPU will spin in place until the host checks the server. Inside the runtime, we elected to have the user thread do this checking while it would be otherwise waiting for the kernel to finish. However, for Nvidia this caused problems when offloading to a target region that requires a copy back. This is caused by the implementation of `dataRetrieve` on Nvidia. We initialize an asynchronous copy-back on the same stream that the kernel is running on. This creates an implicit sync on the kernel to finish before we issue the D2H copy, which we then wait on. This implicit sync happens inside of the CUDA runtime. This is problematic when running the RPC server because we need someone to check the RPC server. If no one checks the RPC server then the kernel will never finish, meaning that the memcpy will never be issued and the program hangs. This patch adds an explicit check for unfinished work on the stream and waits for it to complete.
1 parent 79b9d41 commit 0f88be7

File tree

2 files changed

+33
-0
lines changed

2 files changed

+33
-0
lines changed

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -576,6 +576,17 @@ struct CUDADeviceTy : public GenericDeviceTy {
576576
if (auto Err = getStream(AsyncInfoWrapper, Stream))
577577
return Err;
578578

579+
// If there is already pending work on the stream it could be waiting for
580+
// someone to check the RPC server.
581+
if (auto RPCServer = getRPCServer()) {
582+
CUresult Res = cuStreamQuery(Stream);
583+
while (Res == CUDA_ERROR_NOT_READY) {
584+
if (auto Err = RPCServer->runServer(*this))
585+
return Err;
586+
Res = cuStreamQuery(Stream);
587+
}
588+
}
589+
579590
CUresult Res = cuMemcpyDtoHAsync(HstPtr, (CUdeviceptr)TgtPtr, Size, Stream);
580591
return Plugin::check(Res, "Error in cuMemcpyDtoHAsync: %s");
581592
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// RUN: %libomptarget-compile-run-and-check-generic
2+
3+
// REQUIRES: libc
4+
5+
#include <assert.h>
6+
#include <stdio.h>
7+
8+
#pragma omp declare target to(stdout)
9+
10+
int main() {
11+
int r = 0;
12+
// CHECK: PASS
13+
#pragma omp target map(from : r)
14+
{ r = fwrite("PASS\n", 1, sizeof("PASS\n") - 1, stdout); }
15+
assert(r == sizeof("PASS\n") - 1 && "Incorrect number of bytes written");
16+
17+
// CHECK: PASS
18+
#pragma omp target map(from : r) nowait
19+
{ r = fwrite("PASS\n", 1, 5, stdout); }
20+
#pragma omp taskwait
21+
assert(r == sizeof("PASS\n") - 1 && "Incorrect number of bytes written");
22+
}

0 commit comments

Comments
 (0)