-
Notifications
You must be signed in to change notification settings - Fork 30
Changes py_dot
dispatching for boolean data
#1553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
py_dot for boolean inputs now gives boolean results, which means boolean input arrays will no longer be copied and cast to uint8, improving performance
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_4 ran successfully. |
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_9 ran successfully. |
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_10 ran successfully. |
Adds functions for submitting reductions which handle the choice of using sycl::reduce_over_group or custom_reduce_over_group internally
5305f2b
to
688ccbc
Compare
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_17 ran successfully. |
The last template parameter is a templated class that takes 5 template parameters. This class, instantiated with types, this class serves as a KernelName for the submitted functor. The invocation sites were modified to provide such a class as reduction_*._krn. The custom_reduction_*_krn class was removed, in favor of using custom_reduction_wrapper. The generated kernel name, in case custom reduction functor is called, is custom_reduction_wrapper<KN>, where KN would be the kernel name for Functor using built-in sycl::reduce_over_group function.
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_19 ran successfully. |
Functor constructors take const references for indexers, and store them with const qualifiers.
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_20 ran successfully. |
Assertions were asserting for reduction_groups rather than final_reduction_groups. Now final_reduction_groups has been removed. Also removes unnecessary scope creation during middle portion of tree reductions
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_21 ran successfully. |
437dc76
to
fb410ea
Compare
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_23 ran successfully. |
Made indexer instances `const`, or `constexpr` as appropriate. Functors store indexers as const members, and constructors take const reference. Modularized repeated code to compute work-group size into an inline function in detail namespace.
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_24 ran successfully. |
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_25 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @ndgrigorian
I wonder why sequential kernels do not need any changes and work for type bool
as is.
The sequential kernels don't use reduce over group, and in the case of |
py_dot
for boolean inputs now gives boolean results.Boolean arrays input into
matmul
,tensordot
, orvecdot
will no longer be copied and cast to uint8, improving performanceImproves performance by about 15% for large arrays: