Boolean reduction performance improvements #1401

ndgrigorian · 2023-09-17T17:24:37Z

This PR makes changes to boolean reductions which align with #1364

Namely, the traversal pattern of work groups in boolean reductions has been changed to be fastest over the iteration dimension, rather than the reduction dimension, and a specialized kernel for reductions over axis 0 in matrices has been added.

The original contiguous boolean reduction kernel has also been renamed to make the difference more apparent.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

Similar to changes in sum, now traverses the iteration dimension the fastest

- Aligns with similar changes to sum

github-actions · 2023-09-17T17:53:23Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1401/index.html

github-actions · 2023-09-17T18:53:57Z

Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_4 ran successfully.
Passed: 916
Failed: 84
Skipped: 119

ndgrigorian · 2023-09-18T16:36:09Z

Using the same example as in #1364, the performance benefits are clear:

Before:

In [5]: x = dpt.reshape(dpt.asarray(1, dtype="f4")/dpt.square(\
   ...:                            dpt.arange(1, 1282200*128 + 1, dtype="f4")), (1282200, 128))

In [6]: %time y = dpt.all(x, axis=0)
CPU times: user 481 ms, sys: 349 ms, total: 830 ms
Wall time: 763 ms

In [7]: %time y = dpt.all(x, axis=0)
CPU times: user 232 ms, sys: 325 ms, total: 556 ms
Wall time: 601 ms

In [8]: %time y = dpt.all(x, axis=0)
CPU times: user 316 ms, sys: 235 ms, total: 551 ms
Wall time: 599 ms

In [9]: %time y = dpt.any(x, axis=0)
CPU times: user 454 ms, sys: 261 ms, total: 715 ms
Wall time: 774 ms

In [10]: %time y = dpt.any(x, axis=0)
CPU times: user 284 ms, sys: 308 ms, total: 592 ms
Wall time: 639 ms

In [11]: %time y = dpt.any(x, axis=0)
CPU times: user 280 ms, sys: 325 ms, total: 605 ms
Wall time: 654 ms

after:

In [3]: x = dpt.reshape(dpt.asarray(1, dtype="f4")/dpt.square(\
   ...:                            dpt.arange(1, 1282200*128 + 1, dtype="f4")), (1282200, 128))

In [4]: %time y = dpt.all(x, axis=0)
CPU times: user 210 ms, sys: 17.3 ms, total: 227 ms
Wall time: 198 ms

In [5]: %time y = dpt.all(x, axis=0)
CPU times: user 8.63 ms, sys: 36.6 ms, total: 45.3 ms
Wall time: 50.3 ms

In [6]: %time y = dpt.all(x, axis=0)
CPU times: user 15 ms, sys: 35.2 ms, total: 50.3 ms
Wall time: 51.1 ms

In [7]: %time y = dpt.any(x, axis=0)
CPU times: user 81 ms, sys: 19.2 ms, total: 100 ms
Wall time: 108 ms

In [8]: %time y = dpt.any(x, axis=0)
CPU times: user 18.6 ms, sys: 25.4 ms, total: 44 ms
Wall time: 46.6 ms

In [9]: %time y = dpt.any(x, axis=0)
CPU times: user 7.31 ms, sys: 35.7 ms, total: 43 ms
Wall time: 45.5 ms

oleksandr-pavlyk · 2023-09-18T16:38:46Z

Please add "numpy<1.26" restriction to pip install commands in "generate-coverage" and "sycl-nightly" workflows. It looks good to go in, but I'd prefer a green CI

coveralls · 2023-09-18T17:32:24Z

coverage: 85.774%. remained the same when pulling 351232a on boolean-reduction-performance into 83fff33 on master.

ndgrigorian · 2023-09-18T17:37:05Z

Please add "numpy<1.26" restriction to pip install commands in "generate-coverage" and "sycl-nightly" workflows. It looks good to go in, but I'd prefer a green CI

It's been added and fixes the CI. I'll look into properly solving the problem in a separate PR.

github-actions · 2023-09-18T18:04:55Z

Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_5 ran successfully.
Passed: 916
Failed: 84
Skipped: 119

ndgrigorian added 2 commits September 16, 2023 19:29

Changed WG traversal pattern in boolean reductions

1e85b1e

Similar to changes in sum, now traverses the iteration dimension the fastest

Implements boolean reduction kernel for axis 0

8f469a8

- Aligns with similar changes to sum

ndgrigorian requested a review from oleksandr-pavlyk September 17, 2023 17:25

Require Numpy <1.26 until test_hyperbolic is fixed

351232a

oleksandr-pavlyk approved these changes Sep 18, 2023

View reviewed changes

ndgrigorian merged commit b32fc71 into master Sep 18, 2023

oleksandr-pavlyk mentioned this pull request Sep 19, 2023

Merge 0.15.0rc1 into gold/2021 #1405

Merged

6 tasks

ndgrigorian deleted the boolean-reduction-performance branch September 20, 2023 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Boolean reduction performance improvements #1401

Boolean reduction performance improvements #1401

Uh oh!

ndgrigorian commented Sep 17, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Sep 17, 2023

Uh oh!

github-actions bot commented Sep 17, 2023

Uh oh!

ndgrigorian commented Sep 18, 2023

Uh oh!

oleksandr-pavlyk commented Sep 18, 2023

Uh oh!

coveralls commented Sep 18, 2023

Uh oh!

ndgrigorian commented Sep 18, 2023

Uh oh!

github-actions bot commented Sep 18, 2023

Uh oh!

Uh oh!

Boolean reduction performance improvements #1401

Boolean reduction performance improvements #1401

Uh oh!

Conversation

ndgrigorian commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 17, 2023

Uh oh!

github-actions bot commented Sep 17, 2023

Uh oh!

ndgrigorian commented Sep 18, 2023

Uh oh!

oleksandr-pavlyk commented Sep 18, 2023

Uh oh!

coveralls commented Sep 18, 2023

Uh oh!

ndgrigorian commented Sep 18, 2023

Uh oh!

github-actions bot commented Sep 18, 2023

Uh oh!

Uh oh!

ndgrigorian commented Sep 17, 2023 •

edited

Loading