-
Notifications
You must be signed in to change notification settings - Fork 30
dpctl.tensor.floor_divide fixed for signed 0 output #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1271/index.html |
Array API standard conformance tests for dpctl=0.14.5dev0=py310h7bf5fec_5 ran successfully. |
25f0109
to
01f4571
Compare
Array API standard conformance tests for dpctl= ran successfully. |
01f4571
to
c31d020
Compare
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_51 ran successfully. |
- Rather than computing division and modulo for each element for sycl::vec, instead the vector is initialized and filled per-element
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_55 ran successfully. |
else { | ||
res[i] = in1[i] / in2[i]; | ||
if constexpr (std::is_signed_v<resT>) { | ||
auto mod = in1[i] % in2[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This performs a second division. How about mod = in1[i] - res[i] * in2[i]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try this out as well, I'd like to see how the performance looks.
I have also seen sources suggest that the remainder is, in some cases, a byproduct of division and the compiler can optimize these operations when nearby.
i.e., here under notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @ndgrigorian. Thank you
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_58 ran successfully. |
On some devices,
sycl::floor
andstd::floor
would drop the sign of 0.This PR resolves those cases and adds a test.