Skip to content

Changes py_dot dispatching for boolean data #1553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 20, 2024
Merged

Conversation

ndgrigorian
Copy link
Collaborator

py_dot for boolean inputs now gives boolean results.

Boolean arrays input into matmul, tensordot, or vecdot will no longer be copied and cast to uint8, improving performance

Improves performance by about 15% for large arrays:

In [9]: y = dpt.zeros((4000, 40), dtype="?")

In [10]: x = dpt.ones((40, 4000), dtype="?")
# imitates old behavior
In [11]: %timeit z = dpt.matmul(x, y, dtype="u1")
1.06 ms ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [12]: %timeit z = dpt.matmul(x, y)
871 µs ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

py_dot for boolean inputs now gives boolean results, which means boolean input arrays will no longer be copied and cast to uint8, improving performance
Copy link

github-actions bot commented Feb 15, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_4 ran successfully.
Passed: 908
Failed: 1
Skipped: 86

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_9 ran successfully.
Passed: 909
Failed: 0
Skipped: 86

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_10 ran successfully.
Passed: 909
Failed: 0
Skipped: 86

Adds functions for submitting reductions which handle the choice of using sycl::reduce_over_group or custom_reduce_over_group internally
Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_17 ran successfully.
Passed: 906
Failed: 1
Skipped: 88

The last template parameter is a templated class that takes 5
template parameters. This class, instantiated with types, this class
serves as a KernelName for the submitted functor.

The invocation sites were modified to provide such a class as
reduction_*._krn.

The custom_reduction_*_krn class was removed, in favor of using
custom_reduction_wrapper. The generated kernel name, in case
custom reduction functor is called, is custom_reduction_wrapper<KN>,
where KN would be the kernel name for Functor using built-in
sycl::reduce_over_group function.
Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_19 ran successfully.
Passed: 907
Failed: 0
Skipped: 88

ndgrigorian and others added 2 commits February 19, 2024 14:38
Functor constructors take const references for indexers, and
store them with const qualifiers.
Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_20 ran successfully.
Passed: 907
Failed: 0
Skipped: 88

Assertions were asserting for reduction_groups rather than final_reduction_groups. Now final_reduction_groups has been removed.

Also removes unnecessary scope creation during middle portion of tree reductions
Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_21 ran successfully.
Passed: 907
Failed: 0
Skipped: 88

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_23 ran successfully.
Passed: 907
Failed: 0
Skipped: 88

Made indexer instances `const`, or `constexpr` as appropriate.
Functors store indexers as const members, and constructors take
const reference.

Modularized repeated code to compute work-group size into an inline
function in detail namespace.
@coveralls
Copy link
Collaborator

coveralls commented Feb 20, 2024

Coverage Status

coverage: 91.148%. remained the same
when pulling ebbc5da on dot-bool-dispatching
into f4d4bda on master.

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_24 ran successfully.
Passed: 907
Failed: 0
Skipped: 88

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_25 ran successfully.
Passed: 906
Failed: 1
Skipped: 88

Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you @ndgrigorian

I wonder why sequential kernels do not need any changes and work for type bool as is.

@ndgrigorian
Copy link
Collaborator Author

LGTM! Thank you @ndgrigorian

I wonder why sequential kernels do not need any changes and work for type bool as is.

The sequential kernels don't use reduce over group, and in the case of dot_product and gemm implementations, the specialized sequential kernels just use normal addition and multiplication rather than sycl::plus.

@ndgrigorian ndgrigorian merged commit 7ab3731 into master Feb 20, 2024
@ndgrigorian ndgrigorian deleted the dot-bool-dispatching branch February 29, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants