-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Clear event_impl dependencies with graph cleanup #4793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Clear event_impl dependencies with graph cleanup #4793
Conversation
Signed-off-by: mdimakov <[email protected]>
/summary:run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
The main concern of this solution is need to store all elements of graph (events) in a container.
There might be another solution with DFS employed instead of BFS.
Within this approach, The deepest event is to be released. A container is used for storing path to current graph element. Prior to moving up in a graph, vectors of dependencies of current event should be cleared.
Using raw pointers in path storage container will also reduce amount of calls to copy c-tor of shared ptr which will neglect the performance impact.
sycl/source/detail/event_impl.cpp
Outdated
// When d-tor of any event_impl except for Head is called | ||
// the vectors of dependencies must be clean | ||
for (auto &DepPtr : MPreparedDepsEvents) { | ||
assert(DepPtr && "Dependencies list is not clean"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert message doesn't make sense here.
It should be smth like "Only valid dependencies are expected" etc.
sycl/source/detail/event_impl.cpp
Outdated
} | ||
} | ||
|
||
Q.pop_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you, please, describe why the last element is being popped here prior to reset sequence?
In the while (!Q.back().is_end())
one gets to the deepest element of the graph.
This element is the one without any descendants (dependencies). Hence, one should reset it then get back one step and continue digging into graph on the next dependency item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DepIter, which contains parent for the deepest element (let it be D), returns shared pointer to D. When we need to reset D, we must lift up to its parent and reset it. I do it in the code below, after that I increment DepIter to continue exploring descendants
sycl/source/detail/event_impl.cpp
Outdated
(*Q.back()).get()->MPreparedDepsEvents.clear(); | ||
(*Q.back()).get()->MPreparedHostDepsEvents.clear(); | ||
(*Q.back()).reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can fully understand the code, but can't we have a situation when we clearing MPreparedDepsEvents
of an event_impl
which is still used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, that two following implications are true:
- an event (an instance of
event_impl
) is being destroyed once it's command is complete - a command can be complete if and only if all of its dependencies are complete
These implications make this behaviour of destructor valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an event (an instance of event_impl) is being destroyed once it's command is complete
An instance of event_impl
is being destroyed once there is no event
which points to it, so a user can hold an event even after a Command
is completed and destroyed.
Anyway, probably the code is still valid in a sense that we are not required to return already completed events in get_wait_list
, so we can clear wait list of a completed command(which probably even better option that one I suggested below). But access to this event lists should be guarded by a mutex in all cases.
As an alternative approach we could try to update |
sycl/source/detail/event_impl.hpp
Outdated
@@ -164,14 +164,14 @@ class event_impl { | |||
/// Returns vector of event dependencies. | |||
/// | |||
/// @return a reference to MPreparedDepsEvents. | |||
std::vector<std::shared_ptr<event_impl>> &getPreparedDepsEvents() { | |||
std::vector<std::weak_ptr<event_impl>> &getPreparedDepsEvents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please tell who own/holds event_impl
with this patch?
Still think that approach mentioned in this comment would be much simpler #4793 (comment).
|
||
enum HostEventState : int { HES_NotComplete = 0, HES_Complete }; | ||
|
||
// State of host event. Employed only for host events and event with no | ||
// backend's representation (e.g. alloca). Used values are listed in | ||
// HostEventState enum. | ||
std::atomic<int> MState; | ||
|
||
std::mutex MMutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this mutex needed? Is access to MPreparedDepsEvents and MPreparedHostDepsEvents not guarded by graph R/W lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access to these dependencies is not guarded when the user wants to get them from event::get_wait_list
@@ -129,6 +129,7 @@ static void handleVisitedNodes(std::vector<Command *> &Visited) { | |||
for (Command *Cmd : Visited) { | |||
if (Cmd->MMarks.MToBeDeleted) { | |||
Cmd->getEvent()->setCommand(nullptr); | |||
Cmd->getEvent()->cleanupDependencyEvents(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to still lead to d-tor to go through DFS and overflowing the stack, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last changes are not lead to stack overflow on the tests that were affected
@@ -217,8 +217,7 @@ void event_impl::wait_and_throw( | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this patch, but it seems this function is not thread safe. We accessing Cmd
here which AFAIK can be removed in graph cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrapped wait_and_throw under mutex
@@ -210,15 +210,17 @@ void event_impl::wait( | |||
|
|||
void event_impl::wait_and_throw( | |||
std::shared_ptr<cl::sycl::detail::event_impl> Self) { | |||
Scheduler &Sched = Scheduler::getInstance(); | |||
Scheduler::ReadLockT Lock(Sched.MGraphLock); | |||
|
|||
Command *Cmd = static_cast<Command *>(Self->getCommand()); | |||
QueueImplPtr submittedQueue = nullptr; | |||
if (Cmd) | |||
submittedQueue = Cmd->getSubmittedQueue(); | |||
|
|||
wait(Self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call to wait
should not take place with graph lock taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit.
@maximdimakov , could you, please, add a test for this fix in intel/llvm-test-suite? |
@maximdimakov , can you add a link to PR with tests for this patch from intel/llvm-test-suite? |
When event_impl's d-tor is called its dependencies are starting to recursive releasing. It leads to stack overflow.
Clearing event_impl dependencies in graph cleanup helps to eliminate this problem.
getWaitList was moved to event_impl class so the work with dependencies could be wrapped to mutex.
Test : intel/llvm-test-suite#574
Signed-off-by: mdimakov [email protected]