-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Add a leaf limit to the execution graph #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It is not clear to me what is the problem you are trying to solve. I admit I am not familiar with your graph scheduler... |
Admittedly, this patch is mainly solving a problem that has not been merged to the code base yet. It has to do with the graph cleanup (#1066) which regularly deletes finished non-alloca non-leaf command nodes from the graph to avoid large memory usage. A large enough number of leaves in the graph dramatically slows down this process and that's where this change comes in. Thanks for pointing this out, I'll amend the commit message to make the intent clearer.
Where do you expect it to be? |
I guess that usually circular buffer is implemented on top of regular vector: void push_back(T elem) {
auto Index = ++LastIndex % StorageSize;
Storage[Index] = elem;
} |
This patch adds a leaf limit (per memory object) for the command execution graph in order to avoid leaf bloat in applications that have an overwhelming number of command groups that can be executed in parallel. Limiting the number of leaves is necessary for reducing performance overhead of regular cleanup of finished command nodes. Whenever the limit is exceeded, the oldest leaf is added as a dependency of the new one instead. Signed-off-by: Sergey Semenov <[email protected]>
Signed-off-by: Sergey Semenov <[email protected]>
Yes usually this is what a circular buffer is. Here it is more like a FIFO... |
Thanks for the explanation, even if I think I do not have the back-ground to understand. |
Signed-off-by: Sergey Semenov <[email protected]>
Signed-off-by: Sergey Semenov <[email protected]>
f561a27
to
a0aa401
Compare
That's definitely a valid concern, deque was chosen primarily for the ease of implementation (double ended push/pop + out-of-box iterators), but making that switch might prove to be a worthwhile optimization eventually. |
This patch adds a leaf limit (per memory object) for the command
execution graph in order to avoid graph bloat in applications that have
an overwhelming number of command groups that can be executed in
parallel.
Whenever the limit is exceeded, one of the old leaves is added as a
dependency of the new leaf instead.
Signed-off-by: Sergey Semenov [email protected]