Skip to content

[SYCL] Clear event_impl dependencies with graph cleanup #4793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 19, 2021

Conversation

maximdimakov
Copy link
Contributor

@maximdimakov maximdimakov commented Oct 21, 2021

When event_impl's d-tor is called its dependencies are starting to recursive releasing. It leads to stack overflow.
Clearing event_impl dependencies in graph cleanup helps to eliminate this problem.
getWaitList was moved to event_impl class so the work with dependencies could be wrapped to mutex.
Test : intel/llvm-test-suite#574
Signed-off-by: mdimakov [email protected]

@maximdimakov maximdimakov requested a review from s-kanaev October 21, 2021 10:02
@maximdimakov maximdimakov requested a review from a team as a code owner October 21, 2021 10:02
@maximdimakov
Copy link
Contributor Author

/summary:run

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
The main concern of this solution is need to store all elements of graph (events) in a container.

There might be another solution with DFS employed instead of BFS.
Within this approach, The deepest event is to be released. A container is used for storing path to current graph element. Prior to moving up in a graph, vectors of dependencies of current event should be cleared.
Using raw pointers in path storage container will also reduce amount of calls to copy c-tor of shared ptr which will neglect the performance impact.

// When d-tor of any event_impl except for Head is called
// the vectors of dependencies must be clean
for (auto &DepPtr : MPreparedDepsEvents) {
assert(DepPtr && "Dependencies list is not clean");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert message doesn't make sense here.
It should be smth like "Only valid dependencies are expected" etc.

@maximdimakov maximdimakov requested a review from s-kanaev October 25, 2021 10:49
}
}

Q.pop_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you, please, describe why the last element is being popped here prior to reset sequence?

In the while (!Q.back().is_end()) one gets to the deepest element of the graph.
This element is the one without any descendants (dependencies). Hence, one should reset it then get back one step and continue digging into graph on the next dependency item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DepIter, which contains parent for the deepest element (let it be D), returns shared pointer to D. When we need to reset D, we must lift up to its parent and reset it. I do it in the code below, after that I increment DepIter to continue exploring descendants

Comment on lines 93 to 95
(*Q.back()).get()->MPreparedDepsEvents.clear();
(*Q.back()).get()->MPreparedHostDepsEvents.clear();
(*Q.back()).reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can fully understand the code, but can't we have a situation when we clearing MPreparedDepsEvents of an event_impl which is still used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, that two following implications are true:

  • an event (an instance of event_impl) is being destroyed once it's command is complete
  • a command can be complete if and only if all of its dependencies are complete

These implications make this behaviour of destructor valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an event (an instance of event_impl) is being destroyed once it's command is complete

An instance of event_impl is being destroyed once there is no event which points to it, so a user can hold an event even after a Command is completed and destroyed.

Anyway, probably the code is still valid in a sense that we are not required to return already completed events in get_wait_list , so we can clear wait list of a completed command(which probably even better option that one I suggested below). But access to this event lists should be guarded by a mutex in all cases.

@romanovvlad
Copy link
Contributor

When event_impl's d-tor is called its dependencies are starting to recursive releasing. It leads to stack overflow. Add manual cleaning of dependencies by DFS to event_impl's d-tor to avoid this problem. Signed-off-by: mdimakov [email protected]

As an alternative approach we could try to update MPreparedDepsEvents of events that are connected to users of the Schduler::Command when it's deconstructing. In this option accesses to MPreparedDepsEvents should be guarded by a lock.

@maximdimakov maximdimakov marked this pull request as draft November 9, 2021 12:00
@maximdimakov maximdimakov changed the title [SYCL] Add manual cleaning of event_impl dependency trees in d-tor [SYCL] Holding prepared events in weak_ptr instead of shared_ptr Nov 10, 2021
@maximdimakov maximdimakov changed the title [SYCL] Holding prepared events in weak_ptr instead of shared_ptr [SYCL] Keep prepared events in weak_ptr instead of shared_ptr Nov 10, 2021
@@ -164,14 +164,14 @@ class event_impl {
/// Returns vector of event dependencies.
///
/// @return a reference to MPreparedDepsEvents.
std::vector<std::shared_ptr<event_impl>> &getPreparedDepsEvents() {
std::vector<std::weak_ptr<event_impl>> &getPreparedDepsEvents() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please tell who own/holds event_impl with this patch?

Still think that approach mentioned in this comment would be much simpler #4793 (comment).


enum HostEventState : int { HES_NotComplete = 0, HES_Complete };

// State of host event. Employed only for host events and event with no
// backend's representation (e.g. alloca). Used values are listed in
// HostEventState enum.
std::atomic<int> MState;

std::mutex MMutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this mutex needed? Is access to MPreparedDepsEvents and MPreparedHostDepsEvents not guarded by graph R/W lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access to these dependencies is not guarded when the user wants to get them from event::get_wait_list

@@ -129,6 +129,7 @@ static void handleVisitedNodes(std::vector<Command *> &Visited) {
for (Command *Cmd : Visited) {
if (Cmd->MMarks.MToBeDeleted) {
Cmd->getEvent()->setCommand(nullptr);
Cmd->getEvent()->cleanupDependencyEvents();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to still lead to d-tor to go through DFS and overflowing the stack, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last changes are not lead to stack overflow on the tests that were affected

@maximdimakov maximdimakov changed the title [SYCL] Keep prepared events in weak_ptr instead of shared_ptr [SYCL] Clear event_impl dependencies with graph cleanup Nov 11, 2021
@maximdimakov maximdimakov marked this pull request as ready for review November 12, 2021 09:00
@@ -217,8 +217,7 @@ void event_impl::wait_and_throw(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this patch, but it seems this function is not thread safe. We accessing Cmd here which AFAIK can be removed in graph cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrapped wait_and_throw under mutex

@@ -210,15 +210,17 @@ void event_impl::wait(

void event_impl::wait_and_throw(
std::shared_ptr<cl::sycl::detail::event_impl> Self) {
Scheduler &Sched = Scheduler::getInstance();
Scheduler::ReadLockT Lock(Sched.MGraphLock);

Command *Cmd = static_cast<Command *>(Self->getCommand());
QueueImplPtr submittedQueue = nullptr;
if (Cmd)
submittedQueue = Cmd->getSubmittedQueue();

wait(Self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to wait should not take place with graph lock taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit.

@s-kanaev
Copy link
Contributor

@maximdimakov , could you, please, add a test for this fix in intel/llvm-test-suite?

@dm-vodopyanov
Copy link
Contributor

@maximdimakov , can you add a link to PR with tests for this patch from intel/llvm-test-suite?

@dm-vodopyanov dm-vodopyanov merged commit 5bb3ab9 into intel:sycl Nov 19, 2021
@maximdimakov maximdimakov deleted the fix_dependency_release branch March 24, 2022 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants