Skip to content

Commit c742e79

Browse files
Use SequentialReductionKernel for tree-reduction as well
1. Renamed misspelled variable 2. If reduction_nelems is small, used SequentialReductionKernel for tree-reductions as it is done for atomic reduction 3. Tweak scaling down logic for moderately-sized number of elements to reduce. We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under- utilization of GPU.
1 parent 11ecba8 commit c742e79

File tree

1 file changed

+194
-117
lines changed

1 file changed

+194
-117
lines changed

0 commit comments

Comments
 (0)