Skip to content

Commit f4e5cea

Browse files
committed
[NVPTX][Docs] [NFC] Update docs on intrinsics
Recently, we have added a set of complex intrinsics on TMA, tcgen05 and Cvt family of instructions. This patch captures the key learnings from our experience so far and documents them as guidelines for future design. Signed-off-by: Durgadoss R <[email protected]>
1 parent 9269aae commit f4e5cea

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

llvm/docs/NVPTXUsage.rst

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,68 @@ Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``
153153
NVPTX Intrinsics
154154
================
155155

156+
Guidelines on NVPTX Intrinsic design
157+
------------------------------------
158+
159+
The NVPTX intrinsics are used to model instructions in the PTX ISA.
160+
While simpler intrinsics can represent certain features effectively,
161+
more complex instructions like TMA and MMA are not as straightforward
162+
to model. A single variant of these complex instructions can expand
163+
into hundreds of intrinsics. Additionally, any expansion in the
164+
corresponding ISA can exponentially increase these numbers, making it
165+
difficult to manage them in the IR and backend passes. Therefore,
166+
a careful design of intrinsic interfaces can ease maintenance and
167+
contribute to a sustainable, long-term solution.
168+
169+
Below are a set of guidelines that may help in choosing
170+
an appropriate design for the complex intrinsics:
171+
172+
1. If there are only a few intrinsics, prefer a flat design
173+
where the intrinsic name encodes all relevant details,
174+
and includes only the arguments used by the actual instruction.
175+
2. As the number of intrinsics grows, it is desirable to consolidate
176+
them. NVPTX uses a 'flags'-based design where each flag argument
177+
represents one set of instruction modifiers. These flags are
178+
compile-time integer constants.
179+
180+
3. When an intrinsic uses flags, document it with details of the
181+
flag usage in the ``NVPTXUsage.rst`` file.
182+
4. Use i1 for boolean flags and i8 for other flag types.
183+
5. Annotate all flag arguments with ImmArg<ArgIdx<>>.
184+
6. Place the flag arguments at the end of the (actual)argument list.
185+
186+
7. Identify the key features of an intrinsic and distinguish between
187+
first-order and supplementary information. Typically, encoding the
188+
first-order information in the intrinsic name while using flags
189+
for supplementary details improves readability.
190+
For example:
191+
192+
i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature,
193+
whereas an optional scaling applied to matrices is relatively secondary.
194+
195+
ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order
196+
information, while features like an optional cache hint tend to be
197+
secondary.
198+
199+
8. If there are invalid combinations within a set of modifiers, avoid
200+
encoding them as flags, as much as possible. This helps reduce the
201+
need for error handling of unsupported cases in the backend.
202+
For example, some 'cvt' intrinsics support only a subset of the
203+
possible rounding modes; so it is preferable not to encode the
204+
rounding modes as flags.
205+
9. Similarly, when there are invalid combinations across a set of
206+
modifiers, avoid encoding them as flags to prevent additional
207+
complexity in error handling.
208+
209+
10. Maintain a consistent design within an intrinsic family, including
210+
argument ordering as well as the usage and ordering of flags.
211+
11. When designing an intrinsic corresponding to an instruction or its variant,
212+
consider the entire instruction family. This may reveal common features
213+
that can be modelled consistently across the family.
214+
215+
In summary, strive to balance the aspects mentioned above, to achieve
216+
a scalable design with maximum readability.
217+
156218
Reading PTX Special Registers
157219
-----------------------------
158220

0 commit comments

Comments
 (0)