You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][CUDA] Handle large Y/Z range dimensions. (#7968)
The dimensions passed to sycl::range, determine the blocks per grid and
threads per blocks. Currently, calculation of thread per blocks only
performed for the x dimension. This means the blocks per grid for y and
z dimensions passed to cuLaunchKernel, directly come from the
sycl::range arguments. This can result in an error returned on calling
cuLaunchKernel, when those parameters for y and z dimensions are larger
than 65535.
This PR offers a simple tuning of thread per block for larger (over
65535) values of Y and Z dimensions to make the associated blocks per
grid within the allowed range.
0 commit comments