Skip to content

[NFC][LLVM][LangRef] Improve documentation for partial.reduce.add. #126728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20238,18 +20238,31 @@ Overview:
"""""""""

The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
concatenation of the two vector operands down to the number of elements dictated
by the result type. The result type is a vector type that matches the type of the
first operand vector.
concatenation of the two vector arguments down to the number of elements of the
result vector type.

Arguments:
""""""""""

Both arguments must be vectors of matching element types. The first argument type must
match the result type, while the second argument type must have a vector length that is a
positive integer multiple of the first vector/result type. The arguments must be either be
both fixed or both scalable vectors.
The first argument is an integer vector with the same type as the result.

The second argument is a vector with a length that is a known integer multiple
of the result's type, while maintaining the same element type.

Semantics:
""""""""""

Other than the reduction operator (e.g. add) the way in which the concatinated
arguments is reduced is entirely unspecified. By their nature these intrinsics
are not expected to be useful in isolation but instead implement the first phase
of an overall reduction operation.

The typical use case is loop vectorization where reductions are split into an
in-loop phase, where maintaining an unordered vector result is important for
performance, and an out-of-loop phase to calculate the final scalar result.

By not introducing any new ordering constraints these intrinsics maximize the
abilitity to utilise a target's accumulation instructions.

'``llvm.experimental.vector.histogram.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down