[SYCL] Fix crash and deadlock in level0 plugin when using multiple threads #2550

alexanderfle · 2020-09-28T14:29:50Z

This patch fixes sporadic thread-safety bugs on Windows when level0 plugin is used:

sporadic crash in piEventsWait():
one thread executes EventList[I]->ZeCommandList = nullptr;
while another thread locks ZeCommandListFenceMapMutex and when it is time to use EventList[I]->ZeCommandList it already is nullptr.
-- Now the thread will check EventList[I]->ZeCommandList only after locking ZeCommandListFenceMapMutex
sporadic dead-lock in getAvailableCommandList():
wrong order of locking mutexes in the function led to sporadic hang.
-- Now the order is aligned order as elsewhere: first lock ZeCommandListFenceMapMutex and then ZeCommandListCacheMutex.

Signed-off-by: Alexander Flegontov [email protected]

…reads Signed-off-by: Alexander Flegontov <[email protected]>

alexanderfle · 2020-09-28T18:40:18Z

My observation about the sporadic crash in piEventsWait():
When debugging, I see that a thread is setting EventList[I]->ZeCommandList value to nullptr, while another thread has already done the following check: if (EventList[I]->ZeCommandList) and when the context switches to the thread,
the thread continues execution, assuming EventList[I]->ZeCommandList is not nullptr, the thread accesses to EventList[I]->Queue->ZeCommandListFenceMap by null pointer.
I printed the pointers for both threads(before one sets it to nullptr) and can see that they are the same. This case leads to a crash.
I guess ZeCommandLists are reused by threads, but it looks like they start to be reused before other threads released them.

smaslov-intel · 2020-09-29T07:42:25Z

a thread is setting EventList[I]->ZeCommandList value to nullptr, while another thread has already done the following check: if (EventList[I]->ZeCommandList)

But this means that the same PI event was was simultaneously passed to here from different threads, right? The ZeCommandLists belongs to a PI event, and is not shared between different events. The plugin is only guarding (with mutexes) access to somehow shared data, but not this. It seems to me that this is a bug in the SYCL program (or SYCL RT) and synchronization is needed there.

If I am not right here, then we must have a much bigger problem than just this (like I said we do not guard for multi-threaded access all of the contents of PI objects unless it is shared between PI objects).

smaslov-intel

Please separate out the NFC change of changing lock/unlock to lock_gurad from the real fix. And also the "fix" in piEventsWAit should go (if it should at all) by itself too.

alexanderfle · 2020-09-29T09:41:18Z

But this means that the same PI event was simultaneously passed to here from different threads, right?

right, but I will check that.

It seems to me that this is a bug in the SYCL program (or SYCL RT) and synchronization is needed there.

strange, if that bug is in other places, then I don't know why it works for the same application using OpenCL backend.

Please separate out the NFC change of changing lock/unlock to lock_gurad from the real fix.

Do you suggest reverting the changes lock/unlock to lock_gurad in the current PR and prepare a separate ticket for that?
to clearly understand what the real fix is.

smaslov-intel · 2020-09-29T10:30:05Z

Do you suggest reverting the changes lock/unlock to lock_gurad in the current PR and prepare a separate ticket for that?
to clearly understand what the real fix is.

Yes, please.

if that bug is in other places, then I don't know why it works for the same application using OpenCL backend.

Might be a timing issue, data races are sporadic in nature. Or maybe OpenCL wait for events is meant to be non-destructive, and can really be called by multiple threads on the same events. I can't see anything discussing this here: https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clWaitForEvents.html

@romanovvlad, what do you think?

romanovvlad · 2020-09-29T13:32:49Z

Do you suggest reverting the changes lock/unlock to lock_gurad in the current PR and prepare a separate ticket for that?
to clearly understand what the real fix is.

Yes, please.

if that bug is in other places, then I don't know why it works for the same application using OpenCL backend.

Might be a timing issue, data races are sporadic in nature. Or maybe OpenCL wait for events is meant to be non-destructive, and can really be called by multiple threads on the same events. I can't see anything discussing this here: https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clWaitForEvents.html

@romanovvlad, what do you think?

The OpenCL 2.1 specification says:
All API calls except clSetKernelArg, clSetKernelArgSVMPointer, clSetKernelExecInfo and clCloneKernel are thread-safe.
So, it should be legal and safe to call any API(except mentioned above) with the same or different arguments from multiple threads. Since PI API is based on OpenCL API I would expect that the rule is applicable for PI API as well.

…tiple threads" This reverts commit 39b5bde.

Signed-off-by: Alexander Flegontov <[email protected]>

smaslov-intel · 2020-09-30T08:22:09Z

it should be legal and safe to call any API(except mentioned above) with the same or different arguments from multiple threads.

@romanovvlad : but this effectively means that (almost) all PI API need to be completely locking for exclusive execution, since there is many internal state in PI objects that's transforming during a PI call (and thus other PI call on the same handle should wait until current one is finished). It is especially strange that this had not come up before, and apparently applications aren't doing what you think is legal.

@kbobrovs : do you have an opinion here?

FWIW, Level-Zero API is "free-threaded" https://spec.oneapi.com/level-zero/latest/core/INTRO.html#multithreading-and-concurrency:

multiple, **simultaneous threads may operate on independent driver objects** with no implicit thread-locks

romanovvlad · 2020-09-30T09:34:15Z

it should be legal and safe to call any API(except mentioned above) with the same or different arguments from multiple threads.

@romanovvlad : but this effectively means that (almost) all PI API need to be completely locking for exclusive execution, since there is many internal state in PI objects that's transforming during a PI call (and thus other PI call on the same handle should wait until current one is finished).

Could you please elaborate why it is so? Can't only write operations to internal state be exclusive?
UPD. After reading the link you have posted, I see, that more complicated mechanism is needed. A mechanism which makes sure that only one API operates on a given handle at the same time. So the issue not only with calling the same API with the same handle, but with different API that take the same handle as well; like calling piWaitForEvent and piGetEventInfo with the same event can cause problems.

It is especially strange that this had not come up before, and apparently applications aren't doing what you think is legal.

I believe the reason for this is that most applications are single threaded.

kbobrovs · 2020-09-30T10:15:28Z

@kbobrovs : do you have an opinion here?

Originally PI was conceived to match OpenCL execution model, where most of the APIs are thread-safe. Looks like we should review at least Level Zero plugin implementation for this matter.

Could you please elaborate why it is so? Can't only write operations to internal state be exclusive?

Not sure if I correctly get the point here, but reads should be synchronized with writes, obviously.

val = ThreadA::read(X)
ThreadB::write(X)
ThreadA::use(val)

will likely cause problems.

smaslov-intel · 2020-09-30T17:15:32Z

Could you please elaborate why it is so? Can't only write operations to internal state be exclusive?

This is because the entire contents of each PI objects then becomes shared since multiple PI API may then access them simultaneously from different threads. We then need to lock each PI call for the entirety of working (both read and write) with related PI objects. That sounds like a big synchronization overhead and I wouldn't jump to doing so until we are absolutely sure.

smaslov-intel · 2020-10-01T13:37:47Z

@alexanderfle : Given the input about needing to be truly truly-safe, we need to rework how we lock. We should no longer lock individual share data in the PI objects (various mutexes in PI queue in this case), but lock the entire PI objects for exclusive access. Let's also add a control (env var) that will skip the locking entirely, so we can evaluate the performance impact of such locking, and possibly use it for apps known to be single-threaded.

…e-dir Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR

Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR

[SYCL] Fix crash and deadlock in level0 plugin when using multiple th…

39b5bde

…reads Signed-off-by: Alexander Flegontov <[email protected]>

alexanderfle requested a review from smaslov-intel as a code owner September 28, 2020 14:29

alexanderfle mentioned this pull request Sep 28, 2020

[SYCL] Fix crash in piEventsWait in pi_level_zero #2535

Closed

smaslov-intel suggested changes Sep 29, 2020

View reviewed changes

alexanderfle added 2 commits September 29, 2020 17:37

Revert "[SYCL] Fix crash and deadlock in level0 plugin when using mul…

34a41dd

…tiple threads" This reverts commit 39b5bde.

[SYCL] Restore the logic of the fixes without using extra improvements

013d83f

Signed-off-by: Alexander Flegontov <[email protected]>

alexanderfle closed this Oct 1, 2020

kbenzie added a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025

Merge pull request intel#2550 from kbenzie/benie/dont-use-cmake-sourc…

ef70004

…e-dir Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR

Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025

Merge pull request #2550 from kbenzie/benie/dont-use-cmake-source-dir

3f96aa2

Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Fix crash and deadlock in level0 plugin when using multiple threads #2550

[SYCL] Fix crash and deadlock in level0 plugin when using multiple threads #2550

Uh oh!

alexanderfle commented Sep 28, 2020 •

edited

Loading

Uh oh!

alexanderfle commented Sep 28, 2020

Uh oh!

smaslov-intel commented Sep 29, 2020

Uh oh!

smaslov-intel left a comment

Uh oh!

alexanderfle commented Sep 29, 2020

Uh oh!

smaslov-intel commented Sep 29, 2020

Uh oh!

romanovvlad commented Sep 29, 2020

Uh oh!

smaslov-intel commented Sep 30, 2020

Uh oh!

romanovvlad commented Sep 30, 2020 •

edited

Loading

Uh oh!

kbobrovs commented Sep 30, 2020 •

edited

Loading

Uh oh!

smaslov-intel commented Sep 30, 2020

Uh oh!

smaslov-intel commented Oct 1, 2020

Uh oh!

Uh oh!

[SYCL] Fix crash and deadlock in level0 plugin when using multiple threads #2550

[SYCL] Fix crash and deadlock in level0 plugin when using multiple threads #2550

Uh oh!

Conversation

alexanderfle commented Sep 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexanderfle commented Sep 28, 2020

Uh oh!

smaslov-intel commented Sep 29, 2020

Uh oh!

smaslov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

alexanderfle commented Sep 29, 2020

Uh oh!

smaslov-intel commented Sep 29, 2020

Uh oh!

romanovvlad commented Sep 29, 2020

Uh oh!

smaslov-intel commented Sep 30, 2020

Uh oh!

romanovvlad commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbobrovs commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smaslov-intel commented Sep 30, 2020

Uh oh!

smaslov-intel commented Oct 1, 2020

Uh oh!

Uh oh!

alexanderfle commented Sep 28, 2020 •

edited

Loading

romanovvlad commented Sep 30, 2020 •

edited

Loading

kbobrovs commented Sep 30, 2020 •

edited

Loading