Skip to content

Commit 01afa8f

Browse files
[NFC][LLVM][LangRef] Improve documentation for partial.reduce.add. (#126728)
1 parent 79010e2 commit 01afa8f

File tree

1 file changed

+20
-7
lines changed

1 file changed

+20
-7
lines changed

llvm/docs/LangRef.rst

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20238,18 +20238,31 @@ Overview:
2023820238
"""""""""
2023920239

2024020240
The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
20241-
concatenation of the two vector operands down to the number of elements dictated
20242-
by the result type. The result type is a vector type that matches the type of the
20243-
first operand vector.
20241+
concatenation of the two vector arguments down to the number of elements of the
20242+
result vector type.
2024420243

2024520244
Arguments:
2024620245
""""""""""
2024720246

20248-
Both arguments must be vectors of matching element types. The first argument type must
20249-
match the result type, while the second argument type must have a vector length that is a
20250-
positive integer multiple of the first vector/result type. The arguments must be either be
20251-
both fixed or both scalable vectors.
20247+
The first argument is an integer vector with the same type as the result.
2025220248

20249+
The second argument is a vector with a length that is a known integer multiple
20250+
of the result's type, while maintaining the same element type.
20251+
20252+
Semantics:
20253+
""""""""""
20254+
20255+
Other than the reduction operator (e.g. add) the way in which the concatinated
20256+
arguments is reduced is entirely unspecified. By their nature these intrinsics
20257+
are not expected to be useful in isolation but instead implement the first phase
20258+
of an overall reduction operation.
20259+
20260+
The typical use case is loop vectorization where reductions are split into an
20261+
in-loop phase, where maintaining an unordered vector result is important for
20262+
performance, and an out-of-loop phase to calculate the final scalar result.
20263+
20264+
By not introducing any new ordering constraints these intrinsics maximize the
20265+
abilitity to utilise a target's accumulation instructions.
2025320266

2025420267
'``llvm.experimental.vector.histogram.*``' Intrinsic
2025520268
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)