-
Notifications
You must be signed in to change notification settings - Fork 30
Commit 7ab3731
Changes
* Changes py_dot dispatching for boolean data
py_dot for boolean inputs now gives boolean results, which means boolean input arrays will no longer be copied and cast to uint8, improving performance
* Added logical_or to need_workaround
* Refactors reductions.hpp
Adds functions for submitting reductions which handle the choice of using sycl::reduce_over_group or custom_reduce_over_group internally
* Added meta-template arg to submit_(no_)atomic_reduction functions
The last template parameter is a templated class that takes 5
template parameters. This class, instantiated with types, this class
serves as a KernelName for the submitted functor.
The invocation sites were modified to provide such a class as
reduction_*._krn.
The custom_reduction_*_krn class was removed, in favor of using
custom_reduction_wrapper. The generated kernel name, in case
custom reduction functor is called, is custom_reduction_wrapper<KN>,
where KN would be the kernel name for Functor using built-in
sycl::reduce_over_group function.
* Used submit_no_atomic_reduction wrapper throughout gemm.hpp
* Refactors dot_product.hpp to permit using custom_reduce_over_group
* Pass indexers as const, or constexpr as appropriate
Functor constructors take const references for indexers, and
store them with const qualifiers.
* Fix assertions in dot_product.hpp and reductions.hpp
Assertions were asserting for reduction_groups rather than final_reduction_groups. Now final_reduction_groups has been removed.
Also removes unnecessary scope creation during middle portion of tree reductions
* Make indexers const and constexpr, functor constructors take const &
Made indexer instances `const`, or `constexpr` as appropriate.
Functors store indexers as const members, and constructors take
const reference.
Modularized repeated code to compute work-group size into an inline
function in detail namespace.
---------
Co-authored-by: Oleksandr Pavlyk <[email protected]>py_dot
dispatching for boolean data (#1553)1 parent f4d4bda commit 7ab3731Copy full SHA for 7ab3731
File tree
Expand file treeCollapse file tree
4 files changed
+1902
-2741
lines changedFilter options
- dpctl/tensor/libtensor
- include/kernels
- linalg_functions
- source/linalg_functions
Expand file treeCollapse file tree
4 files changed
+1902
-2741
lines changed
0 commit comments