Skip to content

Commit db65383

Browse files
committed
[SYCL][CUDA] Don't restore CUDA contexts
This patch fixes #4171. The issue highlighted in that ticket is that CUDA contexts are bound to threads and PI calls are executed both in the main thread and in threads of the thread pool. And to ensure the active contexts are correct the CUDA plugin uses `ScopedContext` RAII struct to set the active context to the PI context and restore the previous active context at the end of the PI call. However for optimization purposes `ScopedContext` skips the context recovery if there was no active context on the thread originally, which means it leaves the PI context active on the thread. In addition deleting a CUDA context only deactivates it on the thread where it is deleted, it will stay active in other threads after being deleted. Which means that if you start from an application with no CUDA context active, create a SYCL queue, run an operation then delete the SYCL queue, the context on the current thread will be created, set active, deactivated and deleted properly. However it won't be deactivated in the threads of the thread pools, which means that if we create another queue and run SYCL operations on the thread pool again, that second queue will setup its own context in the threads but then try to restore the deleted context from the previous queue. This patch aims to fix that issue by simply never restoring previous active context, which means that PI calls from the second queue running in the thread pool would just override the deleted context and not try to restore it. This should work well in SYCL only code as all the PI calls are guarded by the `ScopedContext` and will change the active context accordingly, in fact it may even provide performance improvement in certain multi-context scenarios, because the current implementation would only really prevent context switches for the first context used, this will prevent context switches for the latest context used instead. In CUDA interop scenarios, however it does mean that after running any SYCL code CUDA interop code cannot make assumptions about the active context and needs to reset it to whatever context it needs. But as far as I'm aware, this is already the current practice in `oneMKL` and `oneDNN`, where they also use a `ScopedContext` mechanism. In summary this patch should: * Fix trying to restore deleted contexts in internal SYCL threads * May improve performance in certain multi-context scenarios * Break assumptions on the active context for CUDA interop code
1 parent a418e1c commit db65383

File tree

1 file changed

+11
-20
lines changed

1 file changed

+11
-20
lines changed

sycl/plugins/cuda/pi_cuda.cpp

Lines changed: 11 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -137,19 +137,15 @@ pi_result check_error(CUresult result, const char *function, int line,
137137
/// \cond NODOXY
138138
#define PI_CHECK_ERROR(result) check_error(result, __func__, __LINE__, __FILE__)
139139

140-
/// RAII type to guarantee recovering original CUDA context
141140
/// Scoped context is used across all PI CUDA plugin implementation
142-
/// to activate the PI Context on the current thread, matching the
143-
/// CUDA driver semantics where the context used for the CUDA Driver
144-
/// API is the one active on the thread.
145-
/// The implementation tries to avoid replacing the CUcontext if it cans
141+
/// to activate the PI Context on the current thread.
142+
/// The implementation tries to avoid replacing the CUcontext if it cans.
146143
class ScopedContext {
147144
pi_context placedContext_;
148145
CUcontext original_;
149-
bool needToRecover_;
150146

151147
public:
152-
ScopedContext(pi_context ctxt) : placedContext_{ctxt}, needToRecover_{false} {
148+
ScopedContext(pi_context ctxt) : placedContext_{ctxt} {
153149

154150
if (!placedContext_) {
155151
throw PI_INVALID_CONTEXT;
@@ -160,23 +156,18 @@ class ScopedContext {
160156
if (original_ != desired) {
161157
// Sets the desired context as the active one for the thread
162158
PI_CHECK_ERROR(cuCtxSetCurrent(desired));
163-
if (original_ == nullptr) {
164-
// No context is installed on the current thread
165-
// This is the most common case. We can activate the context in the
166-
// thread and leave it there until all the PI context referring to the
167-
// same underlying CUDA context are destroyed. This emulates
168-
// the behaviour of the CUDA runtime api, and avoids costly context
169-
// switches. No action is required on this side of the if.
170-
} else {
171-
needToRecover_ = true;
172-
}
173159
}
174160
}
175161

176162
~ScopedContext() {
177-
if (needToRecover_) {
178-
PI_CHECK_ERROR(cuCtxSetCurrent(original_));
179-
}
163+
// Leave the context active, this avoids costly context switches for
164+
// subsequent calls that use the same context.
165+
//
166+
// Calls using a different context will simply set their own context as
167+
// active, so it will context switch as necessary.
168+
//
169+
// This does mean that interop tasks shouldn't make any assumptions about
170+
// the state of the CUDA context after or in between calls to SYCL.
180171
}
181172
};
182173

0 commit comments

Comments
 (0)