You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each of the existing reduction implementations (for a single reduction object)
can be extended to support spans by looping over the number of elements in the
reduction.
If (num_elements == 1), the loop has a single iteration and degenerates to the
behavior of the reduction implementation prior to this commit. If
(num_elements > 1), the loop iterates over each reduction element in turn.
Note that the getElement() function allows the scalar and array reduction
implementations to be the same without specializing for either case, and
allowing difference in storage (a single T vs an array of Ts). This is
especially convenient because a scalar reduction is equivalent to an array
reduction with a single element.
If (num_elements > 1), the implementation currently reduces each element
separately. This allows array reductions to use the same amount of work-group
local memory as a scalar reduction using the same T, but at the expense of
additional synchronization calls.
Signed-off-by: John Pennycook <[email protected]>
0 commit comments