-
Notifications
You must be signed in to change notification settings - Fork 30
Boolean reduction performance improvements #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Similar to changes in sum, now traverses the iteration dimension the fastest
- Aligns with similar changes to sum
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1401/index.html |
Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_4 ran successfully. |
Using the same example as in #1364, the performance benefits are clear: Before:
after:
|
Please add |
It's been added and fixes the CI. I'll look into properly solving the problem in a separate PR. |
Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_5 ran successfully. |
This PR makes changes to boolean reductions which align with #1364
Namely, the traversal pattern of work groups in boolean reductions has been changed to be fastest over the iteration dimension, rather than the reduction dimension, and a specialized kernel for reductions over
axis 0
in matrices has been added.The original contiguous boolean reduction kernel has also been renamed to make the difference more apparent.