You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][AMDGCN] Fix up and down shuffles and reductions (#5359)
This patch fixes the group collective implementation for AMDGCN, which
had two main issues, in one place it was calling a regular `shuffle`
instead of a `shuffleUp` which ended up breaking the reduction
algorithm. In addition it was also not using the correct interface for
the SPIR-V `shuffleUp` function.
Which leads to the second part of this patch which fixes the `shuffleUp`
and `shuffleDown` functions, mostly for the AMDGCN built-ins but also in
the SYCL header, as the SYCL built-ins were not implemented properly on
top of the SPIR-V built-ins.
At the SYCL level, the `shuffleUp` and `shuffleDown` built-ins take a
value to participate in the shuffle and a delta. The delta is used to
compute which thread to take the value from during the shuffle
operation. For `shuffleUp` it will be substracted from the thread id,
and for `shuffleDown` it will be added. And so in SYCL this delta must
be defined such as `subgroup_local_id - delta` falls within `[0,
subgroup_local_size[` for `shuffleUp`, and `subgroup_local_id + delta`
falls within `[0, subgroup_local_size[` for `shuffleDown`.
However in SPIR-V, these built-ins are a bit more complicated and take
two values to participate in the shuffle and support twice the delta
range as the SYCL built-ins. For example for `shuffleUp` the valid range
for `subgroup_local_id - delta` is `[-subgroup_local_size,
subgroup_local_size[` and in this instance if it falls within
`[-subgroup_local_size, 0[` the first value will be used to participate
in the shuffle, and if it falls within `[0, subgroup_local_size[` the
second value will be used to participate in the shuffle. And it works in
a similar way for `shuffleDown`.
And so when implementing the SYCL built-ins using the SPIR-V built-ins,
only half of the range can be used in a properly defined way, which
means only one of the value parameters of the SPIR-V built-ins actually
matters. Therefore the SYCL built-ins are implemented passing in the
same value to both value parameters of the SPIR-V built-ins.
The complete definition of the SPIR-V built-ins can be found here:
* https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_subgroups.asciidoc#instructions
Using defines to figure out the wavefront size there is incorrect
because libclc is not built for a specific amdgcn version, so it will
always default to `64`.
Instead use the `__oclc_wavefront64` global variable provided by ROCm,
which will be set to a different value depending on the architecture.
0 commit comments