[SYCL][CUDA] Don't restore CUDA contexts

npmiller · npmiller · commit db65383e73ab · 2021-08-31T10:57:37.000+01:00
This patch fixes #4171. The issue highlighted in that ticket is that CUDA contexts are bound to threads and PI calls are executed both in the main thread and in threads of the thread pool. And to ensure the active contexts are correct the CUDA plugin uses `ScopedContext` RAII struct to set the active context to the PI context and restore the previous active context at the end of the PI call. However for optimization purposes `ScopedContext` skips the context recovery if there was no active context on the thread originally, which means it leaves the PI context active on the thread. In addition deleting a CUDA context only deactivates it on the thread where it is deleted, it will stay active in other threads after being deleted. Which means that if you start from an application with no CUDA context active, create a SYCL queue, run an operation then delete the SYCL queue, the context on the current thread will be created, set active, deactivated and deleted properly. However it won't be deactivated in the threads of the thread pools, which means that if we create another queue and run SYCL operations on the thread pool again, that second queue will setup its own context in the threads but then try to restore the deleted context from the previous queue. This patch aims to fix that issue by simply never restoring previous active context, which means that PI calls from the second queue running in the thread pool would just override the deleted context and not try to restore it. This should work well in SYCL only code as all the PI calls are guarded by the `ScopedContext` and will change the active context accordingly, in fact it may even provide performance improvement in certain multi-context scenarios, because the current implementation would only really prevent context switches for the first context used, this will prevent context switches for the latest context used instead. In CUDA interop scenarios, however it does mean that after running any SYCL code CUDA interop code cannot make assumptions about the active context and needs to reset it to whatever context it needs. But as far as I'm aware, this is already the current practice in `oneMKL` and `oneDNN`, where they also use a `ScopedContext` mechanism. In summary this patch should: * Fix trying to restore deleted contexts in internal SYCL threads * May improve performance in certain multi-context scenarios * Break assumptions on the active context for CUDA interop code
diff --git a/sycl/plugins/cuda/pi_cuda.cpp b/sycl/plugins/cuda/pi_cuda.cpp
@@ -137,19 +137,15 @@ pi_result check_error(CUresult result, const char *function, int line,
 /// \cond NODOXY
 #define PI_CHECK_ERROR(result) check_error(result, __func__, __LINE__, __FILE__)
 
-/// RAII type to guarantee recovering original CUDA context
 /// Scoped context is used across all PI CUDA plugin implementation
-/// to activate the PI Context on the current thread, matching the
-/// CUDA driver semantics where the context used for the CUDA Driver
-/// API is the one active on the thread.
-/// The implementation tries to avoid replacing the CUcontext if it cans
+/// to activate the PI Context on the current thread.
+/// The implementation tries to avoid replacing the CUcontext if it cans.
 class ScopedContext {
   pi_context placedContext_;
   CUcontext original_;
-  bool needToRecover_;
 
 public:
-  ScopedContext(pi_context ctxt) : placedContext_{ctxt}, needToRecover_{false} {
+  ScopedContext(pi_context ctxt) : placedContext_{ctxt} {
 
     if (!placedContext_) {
       throw PI_INVALID_CONTEXT;
@@ -160,23 +156,18 @@ class ScopedContext {
     if (original_ != desired) {
       // Sets the desired context as the active one for the thread
       PI_CHECK_ERROR(cuCtxSetCurrent(desired));
-      if (original_ == nullptr) {
-        // No context is installed on the current thread
-        // This is the most common case. We can activate the context in the
-        // thread and leave it there until all the PI context referring to the
-        // same underlying CUDA context are destroyed. This emulates
-        // the behaviour of the CUDA runtime api, and avoids costly context
-        // switches. No action is required on this side of the if.
-      } else {
-        needToRecover_ = true;
-      }
     }
   }
 
   ~ScopedContext() {
-    if (needToRecover_) {
-      PI_CHECK_ERROR(cuCtxSetCurrent(original_));
-    }
+    // Leave the context active, this avoids costly context switches for
+    // subsequent calls that use the same context.
+    //
+    // Calls using a different context will simply set their own context as
+    // active, so it will context switch as necessary.
+    //
+    // This does mean that interop tasks shouldn't make any assumptions about
+    // the state of the CUDA context after or in between calls to SYCL.
   }
 };