[SYCL] Add a leaf limit to the execution graph #1070

sergey-semenov · 2020-01-29T14:17:28Z

This patch adds a leaf limit (per memory object) for the command
execution graph in order to avoid graph bloat in applications that have
an overwhelming number of command groups that can be executed in
parallel.

Whenever the limit is exceeded, one of the old leaves is added as a
dependency of the new leaf instead.

Signed-off-by: Sergey Semenov [email protected]

keryell · 2020-02-01T02:33:33Z

It is not clear to me what is the problem you are trying to solve. I admit I am not familiar with your graph scheduler...
I do not understand how your circular buffer works. Is there any modulo operation lacking somewhere?

sergey-semenov · 2020-02-03T10:23:40Z

It is not clear to me what is the problem you are trying to solve. I admit I am not familiar with your graph scheduler...

Admittedly, this patch is mainly solving a problem that has not been merged to the code base yet. It has to do with the graph cleanup (#1066) which regularly deletes finished non-alloca non-leaf command nodes from the graph to avoid large memory usage. A large enough number of leaves in the graph dramatically slows down this process and that's where this change comes in.

Thanks for pointing this out, I'll amend the commit message to make the intent clearer.

I do not understand how your circular buffer works. Is there any modulo operation lacking somewhere?

Where do you expect it to be?

sycl/include/CL/sycl/detail/circular_buffer.hpp

sycl/source/detail/scheduler/graph_builder.cpp

sycl/include/CL/sycl/detail/circular_buffer.hpp

AlexeySachkov · 2020-02-03T11:11:57Z

@sergey-semenov,

Where do you expect it to be?

I guess that usually circular buffer is implemented on top of regular vector:

void push_back(T elem) {
  auto Index = ++LastIndex % StorageSize;
  Storage[Index] = elem;
}

This patch adds a leaf limit (per memory object) for the command execution graph in order to avoid leaf bloat in applications that have an overwhelming number of command groups that can be executed in parallel. Limiting the number of leaves is necessary for reducing performance overhead of regular cleanup of finished command nodes. Whenever the limit is exceeded, the oldest leaf is added as a dependency of the new one instead. Signed-off-by: Sergey Semenov <[email protected]>

Signed-off-by: Sergey Semenov <[email protected]>

keryell · 2020-02-03T19:55:06Z

@sergey-semenov,

Where do you expect it to be?

I guess that usually circular buffer is implemented on top of regular vector:
void push_back(T elem) {
  auto Index = ++LastIndex % StorageSize;
  Storage[Index] = elem;
}

Yes usually this is what a circular buffer is. Here it is more like a FIFO...
So it is a std::queue with a bounded capacity.
I have the feeling it is more efficient to use a vector + % or & (if power of 2, the best) or ?: to handle the end instead of paying the price of a deque, that will allocate and deallocate memory on a regular basis.

keryell · 2020-02-03T19:58:15Z

Thanks for pointing this out, I'll amend the commit message to make the intent clearer.

Thanks for the explanation, even if I think I do not have the back-ground to understand.
Otherwise I am always confused with the name alloca since it is neither here the UNIX alloca nor the LLVM alloca...

Signed-off-by: Sergey Semenov <[email protected]>

sergey-semenov · 2020-02-04T11:49:51Z

I have the feeling it is more efficient to use a vector + % or & (if power of 2, the best) or ?: to handle the end instead of paying the price of a deque, that will allocate and deallocate memory on a regular basis.

That's definitely a valid concern, deque was chosen primarily for the ease of implementation (double ended push/pop + out-of-box iterators), but making that switch might prove to be a worthwhile optimization eventually.

sergey-semenov requested a review from romanovvlad January 29, 2020 14:17

sergey-semenov assigned romanovvlad Jan 31, 2020

romanovvlad requested changes Feb 3, 2020

View reviewed changes

sycl/include/CL/sycl/detail/circular_buffer.hpp Show resolved Hide resolved

sycl/source/detail/scheduler/graph_builder.cpp Show resolved Hide resolved

sycl/include/CL/sycl/detail/circular_buffer.hpp Show resolved Hide resolved

sergey-semenov added 2 commits February 3, 2020 15:50

Add tests

2857ca8

Signed-off-by: Sergey Semenov <[email protected]>

sergey-semenov added 2 commits February 4, 2020 11:32

Address comments

a0aa401

Signed-off-by: Sergey Semenov <[email protected]>

Fix LeafLimit test

f224195

Signed-off-by: Sergey Semenov <[email protected]>

sergey-semenov force-pushed the leaflimit branch from f561a27 to a0aa401 Compare February 4, 2020 11:39

sergey-semenov requested a review from romanovvlad February 4, 2020 12:12

romanovvlad approved these changes Feb 5, 2020

View reviewed changes

romanovvlad merged commit 7c293e2 into intel:sycl Feb 5, 2020

sergey-semenov deleted the leaflimit branch February 5, 2020 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Add a leaf limit to the execution graph #1070

[SYCL] Add a leaf limit to the execution graph #1070

Uh oh!

sergey-semenov commented Jan 29, 2020

Uh oh!

keryell commented Feb 1, 2020

Uh oh!

sergey-semenov commented Feb 3, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Feb 3, 2020

Uh oh!

keryell commented Feb 3, 2020

Uh oh!

keryell commented Feb 3, 2020

Uh oh!

sergey-semenov commented Feb 4, 2020

Uh oh!

Uh oh!

[SYCL] Add a leaf limit to the execution graph #1070

[SYCL] Add a leaf limit to the execution graph #1070

Uh oh!

Conversation

sergey-semenov commented Jan 29, 2020

Uh oh!

keryell commented Feb 1, 2020

Uh oh!

sergey-semenov commented Feb 3, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Feb 3, 2020

Uh oh!

keryell commented Feb 3, 2020

Uh oh!

keryell commented Feb 3, 2020

Uh oh!

sergey-semenov commented Feb 4, 2020

Uh oh!

Uh oh!