-
Notifications
You must be signed in to change notification settings - Fork 22
Reuse dpnp.nan_to_num
in dpnp.nansum
and dpnp.nanprod
#2339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
View rendered docs @ https://intelpython.github.io/dpnp/index.html |
Array API standard conformance tests for dpnp=0.18.0dev0=py312he4f9c94_16 ran successfully. |
5997cf3
to
4c0908b
Compare
This relatively simple and non-invasive change improves performance significantly. On Max GPU before: In [1]: import dpnp
In [2]: x = dpnp.ones(3*10**8, dtype="f4")
In [3]: q = x.sycl_queue
In [4]: %timeit r = dpnp.nansum(x); q.wait()
9.37 ms ± 33.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit r = dpnp.nansum(x); q.wait()
9.42 ms ± 18.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: x = dpnp.ones(10**8, dtype="f4")
In [7]: %timeit r = dpnp.nansum(x); q.wait()
4.5 ms ± 8.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit r = dpnp.nansum(x); q.wait()
4.51 ms ± 11 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) after: In [1]: import dpnp
In [2]: x = dpnp.ones(3*10**8, dtype="f4")
In [3]: q = x.sycl_queue
In [4]: %timeit r = dpnp.nansum(x); q.wait()
6.5 ms ± 24.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit r = dpnp.nansum(x); q.wait()
6.47 ms ± 35.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: x = dpnp.ones(10**8, dtype="f4")
In [7]: %timeit r = dpnp.nansum(x); q.wait()
2.78 ms ± 14.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit r = dpnp.nansum(x); q.wait()
2.78 ms ± 14 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
aa48c71
to
4552fe8
Compare
Changes to I will revert the commits changing the nanarg functions and add a warning about synchronization. |
d0dad9b
to
f69ef28
Compare
Moved warnings relating to all-NaN and all-negative-inf slices to near the synchronization warning
8d78920
to
1995cd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ndgrigorian, LGTM!
Reuse `dpnp.nan_to_num` in `dpnp.nansum` and `dpnp.nanprod` 14274d8
This PR proposes the use of
nan_to_num
over_replace_nan
innansum
,nanprod
,nancumsum
, andnancumprod
using new internal function_replace_nan_no_mask
.