You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][CUDA][PI] Introduce multiple streams in each queue (#6102)
So far each queue only had one underlying CUstream, making it de facto in-order. This PR introduces multiple streams in each queue. To improve opportunities for concurrent execution streams are split into two pools - one for compute (kernels) and one for memory transfers.
The streams in pools are created dynamically when first needed. When a pool is full, previously created streams are reused. By default each queue has space for up to 128 streams for compute and 64 for transfers.
This PR also removes a test for internal workings of the queue. The problem is that introducing dynamic stream creation puts more work into `_pi_queue::get()`, making it depend on some helper functions in `pi_cuda.cpp`, so I had to move the definition from header to `pi_cuda.cpp`. This, however caused problem with linking this test, as `lib_pi_cuda.so` is created using custom linking script that only exposes functions starting with "pi". This improves linking performance, but prevents any other function from being tested. Looking at other tests for internals of the plugin I noticed that other functions that other functions that would need anything from `pi_cuda.cpp` are also not tested, so I deleted this test as well.
This is not changing any user-facing interface, so there are no accompanying changes to the test suite.
0 commit comments