|
| 1 | +========================= |
| 2 | + RISC-V Vector Extension |
| 3 | +========================= |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +The RISC-V target supports the 1.0 version of the `RISC-V Vector Extension (RVV) <https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc>`_. |
| 9 | +This guide gives an overview of how it's modelled in LLVM IR and how the backend generates code for it. |
| 10 | + |
| 11 | +Mapping to LLVM IR types |
| 12 | +======================== |
| 13 | + |
| 14 | +RVV adds 32 VLEN sized registers, where VLEN is an unknown constant to the compiler. To be able to represent VLEN sized values, the RISC-V backend takes the same approach as AArch64's SVE and uses `scalable vector types <https://llvm.org/docs/LangRef.html#t-vector>`_. |
| 15 | + |
| 16 | +Scalable vector types are of the form ``<vscale x n x ty>``, which indicates a vector with a multiple of ``n`` elements of type ``ty``. |
| 17 | +On RISC-V ``n`` and ``ty`` control LMUL and SEW respectively. |
| 18 | + |
| 19 | +LLVM only supports ELEN=32 or ELEN=64, so ``vscale`` is defined as VLEN/64 (see ``RISCV::RVVBitsPerBlock``). |
| 20 | +Note this means that VLEN must be at least 64, so VLEN=32 isn't currently supported. |
| 21 | + |
| 22 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 23 | +| | LMUL=⅛ | LMUL=¼ | LMUL=½ | LMUL=1 | LMUL=2 | LMUL=4 | LMUL=8 | |
| 24 | ++===================+===============+================+==================+===================+===================+===================+===================+ |
| 25 | +| i64 (ELEN=64) | N/A | N/A | N/A | <v x 1 x i64> | <v x 2 x i64> | <v x 4 x i64> | <v x 8 x i64> | |
| 26 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 27 | +| i32 | N/A | N/A | <v x 1 x i32> | <v x 2 x i32> | <v x 4 x i32> | <v x 8 x i32> | <v x 16 x i32> | |
| 28 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 29 | +| i16 | N/A | <v x 1 x i16> | <v x 2 x i16> | <v x 4 x i16> | <v x 8 x i16> | <v x 16 x i16> | <v x 32 x i16> | |
| 30 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 31 | +| i8 | <v x 1 x i8> | <v x 2 x i8> | <v x 4 x i8> | <v x 8 x i8> | <v x 16 x i8> | <v x 32 x i8> | <v x 64 x i8> | |
| 32 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 33 | +| double (ELEN=64) | N/A | N/A | N/A | <v x 1 x double> | <v x 2 x double> | <v x 4 x double> | <v x 8 x double> | |
| 34 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 35 | +| float | N/A | N/A | <v x 1 x float> | <v x 2 x float> | <v x 4 x float> | <v x 8 x float> | <v x 16 x float> | |
| 36 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 37 | +| half | N/A | <v x 1 x half> | <v x 2 x half> | <v x 4 x half> | <v x 8 x half> | <v x 16 x half> | <v x 32 x half> | |
| 38 | ++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+ |
| 39 | + |
| 40 | +(Read ``<v x k x ty>`` as ``<vscale x k x ty>``) |
| 41 | + |
| 42 | + |
| 43 | +Mask vector types |
| 44 | +----------------- |
| 45 | + |
| 46 | +Mask vectors are physically represented using a layout of densely packed bits in a vector register. |
| 47 | +They are mapped to the following LLVM IR types: |
| 48 | + |
| 49 | +- ``<vscale x 1 x i1>`` |
| 50 | +- ``<vscale x 2 x i1>`` |
| 51 | +- ``<vscale x 4 x i1>`` |
| 52 | +- ``<vscale x 8 x i1>`` |
| 53 | +- ``<vscale x 16 x i1>`` |
| 54 | +- ``<vscale x 32 x i1>`` |
| 55 | +- ``<vscale x 64 x i1>`` |
| 56 | + |
| 57 | +Two types with the same SEW/LMUL ratio will have the same related mask type. |
| 58 | +For instance, two different comparisons one under SEW=64, LMUL=2 and the other under SEW=32, LMUL=1 will both generate a mask ``<vscale x 2 x i1>``. |
| 59 | + |
| 60 | +Representation in LLVM IR |
| 61 | +========================= |
| 62 | + |
| 63 | +Vector instructions can be represented in three main ways in LLVM IR: |
| 64 | + |
| 65 | +1. Regular instructions on both scalable and fixed-length vector types |
| 66 | + |
| 67 | + .. code-block:: llvm |
| 68 | +
|
| 69 | + %c = add <vscale x 4 x i32> %a, %b |
| 70 | + %f = add <4 x i32> %d, %e |
| 71 | +
|
| 72 | +2. RISC-V vector intrinsics, which mirror the `C intrinsics specification <https://github.com/riscv-non-isa/rvv-intrinsic-doc>`_ |
| 73 | + |
| 74 | + These come in unmasked variants: |
| 75 | + |
| 76 | + .. code-block:: llvm |
| 77 | +
|
| 78 | + %c = call @llvm.riscv.vadd.nxv4i32.nxv4i32( |
| 79 | + <vscale x 4 x i32> %passthru, |
| 80 | + <vscale x 4 x i32> %a, |
| 81 | + <vscale x 4 x i32> %b, |
| 82 | + i64 %avl |
| 83 | + ) |
| 84 | + |
| 85 | + As well as masked variants: |
| 86 | + |
| 87 | + .. code-block:: llvm |
| 88 | +
|
| 89 | + %c = call @llvm.riscv.vadd.mask.nxv4i32.nxv4i32( |
| 90 | + <vscale x 4 x i32> %passthru, |
| 91 | + <vscale x 4 x i32> %a, |
| 92 | + <vscale x 4 x i32> %b, |
| 93 | + <vscale x 4 x i1> %mask, |
| 94 | + i64 %avl, |
| 95 | + i64 0 ; policy (must be an immediate) |
| 96 | + ) |
| 97 | + |
| 98 | + Both allow setting the AVL as well as controlling the inactive/tail elements via the passthru operand, but the masked variant also provides operands for the mask and ``vta``/``vma`` policy bits. |
| 99 | + |
| 100 | + The only valid types are scalable vector types. |
| 101 | + |
| 102 | +3. :ref:`Vector predication (VP) intrinsics <int_vp>` |
| 103 | + |
| 104 | + .. code-block:: llvm |
| 105 | +
|
| 106 | + %c = call @llvm.vp.add.nxv4i32( |
| 107 | + <vscale x 4 x i32> %a, |
| 108 | + <vscale x 4 x i32> %b, |
| 109 | + <vscale x 4 x i1> %m |
| 110 | + i32 %evl |
| 111 | + ) |
| 112 | + |
| 113 | + Unlike RISC-V intrinsics, VP intrinsics are target agnostic so they can be emitted from other optimisation passes in the middle-end (like the loop vectorizer). They also support fixed-length vector types. |
| 114 | + |
| 115 | + VP intrinsics also don't have passthru operands, but tail/mask undisturbed behaviour can be emulated by using the output in a ``@llvm.vp.merge``. |
| 116 | + It will get lowered as a ``vmerge``, but will be merged back into the underlying instruction's mask via ``RISCVDAGToDAGISel::performCombineVMergeAndVOps``. |
| 117 | + |
| 118 | + |
| 119 | +The different properties of the above representations are summarized below: |
| 120 | + |
| 121 | ++----------------------+--------------+-----------------+----------+------------------+----------------------+-----------------+ |
| 122 | +| | AVL | Masking | Passthru | Scalable vectors | Fixed-length vectors | Target agnostic | |
| 123 | ++======================+==============+=================+==========+==================+======================+=================+ |
| 124 | +| LLVM IR instructions | Always VLMAX | No | None | Yes | Yes | Yes | |
| 125 | ++----------------------+--------------+-----------------+----------+------------------+----------------------+-----------------+ |
| 126 | +| RVV intrinsics | Yes | Yes | Yes | Yes | No | No | |
| 127 | ++----------------------+--------------+-----------------+----------+------------------+----------------------+-----------------+ |
| 128 | +| VP intrinsics | Yes (EVL) | Yes | No | Yes | Yes | Yes | |
| 129 | ++----------------------+--------------+-----------------+----------+------------------+----------------------+-----------------+ |
| 130 | + |
| 131 | +SelectionDAG lowering |
| 132 | +===================== |
| 133 | + |
| 134 | +For most regular **scalable** vector LLVM IR instructions, their corresponding SelectionDAG nodes are legal on RISC-V and don't require any custom lowering. |
| 135 | + |
| 136 | +.. code-block:: |
| 137 | +
|
| 138 | + t5: nxv4i32 = add t2, t4 |
| 139 | +
|
| 140 | +RISC-V vector intrinsics also don't require any custom lowering. |
| 141 | + |
| 142 | +.. code-block:: |
| 143 | +
|
| 144 | + t12: nxv4i32 = llvm.riscv.vadd TargetConstant:i64<10056>, undef:nxv4i32, t2, t4, t6 |
| 145 | +
|
| 146 | +Fixed-length vectors |
| 147 | +-------------------- |
| 148 | + |
| 149 | +Because there are no fixed-length vector patterns, fixed-length vectors need to be custom lowered and performed in a scalable "container" type: |
| 150 | + |
| 151 | +1. The fixed-length vector operands are inserted into scalable containers with ``insert_subvector`` nodes. The container type is chosen such that its minimum size will fit the fixed-length vector (see ``getContainerForFixedLengthVector``). |
| 152 | +2. The operation is then performed on the container type via a **VL (vector length) node**. These are custom nodes defined in ``RISCVInstrInfoVVLPatterns.td`` that mirror target agnostic SelectionDAG nodes, as well as some RVV instructions. They contain an AVL operand, which is set to the number of elements in the fixed-length vector. |
| 153 | + Some nodes also have a passthru or mask operand, which will usually be set to ``undef`` and all ones when lowering fixed-length vectors. |
| 154 | +3. The result is put back into a fixed-length vector via ``extract_subvector``. |
| 155 | + |
| 156 | +.. code-block:: |
| 157 | +
|
| 158 | + t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0 |
| 159 | + t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1 |
| 160 | + t4: v4i32 = extract_subvector t2, Constant:i64<0> |
| 161 | + t7: v4i32 = extract_subvector t6, Constant:i64<0> |
| 162 | + t8: v4i32 = add t4, t7 |
| 163 | +
|
| 164 | + // is custom lowered to: |
| 165 | +
|
| 166 | + t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0 |
| 167 | + t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1 |
| 168 | + t15: nxv2i1 = RISCVISD::VMSET_VL Constant:i64<4> |
| 169 | + t16: nxv2i32 = RISCVISD::ADD_VL t2, t6, undef:nxv2i32, t15, Constant:i64<4> |
| 170 | + t17: v4i32 = extract_subvector t16, Constant:i64<0> |
| 171 | +
|
| 172 | +VL nodes often have a passthru or mask operand, which are usually set to ``undef`` and all ones for fixed-length vectors. |
| 173 | + |
| 174 | +The ``insert_subvector`` and ``extract_subvector`` nodes responsible for wrapping and unwrapping will get combined away, and eventually we will lower all fixed-length vector types to scalable. Note that fixed-length vectors at the interface of a function are passed in a scalable vector container. |
| 175 | + |
| 176 | +.. note:: |
| 177 | + |
| 178 | + The only ``insert_subvector`` and ``extract_subvector`` nodes that make it through lowering are those that can be performed as an exact subregister insert or extract. This means that any fixed-length vector ``insert_subvector`` and ``extract_subvector`` nodes that aren't legalized must lie on a register group boundary, so the exact VLEN must be known at compile time (i.e., compiled with ``-mrvv-vector-bits=zvl`` or ``-mllvm -riscv-v-vector-bits-max=VLEN``, or have an exact ``vscale_range`` attribute). |
| 179 | + |
| 180 | +Vector predication intrinsics |
| 181 | +----------------------------- |
| 182 | + |
| 183 | +VP intrinsics also get custom lowered via VL nodes. |
| 184 | + |
| 185 | +.. code-block:: |
| 186 | +
|
| 187 | + t12: nxv2i32 = vp_add t2, t4, t6, Constant:i64<8> |
| 188 | +
|
| 189 | + // is custom lowered to: |
| 190 | +
|
| 191 | + t18: nxv2i32 = RISCVISD::ADD_VL t2, t4, undef:nxv2i32, t6, Constant:i64<8> |
| 192 | +
|
| 193 | +The VP EVL and mask are used for the VL node's AVL and mask respectively, whilst the passthru is set to ``undef``. |
| 194 | + |
| 195 | +Instruction selection |
| 196 | +===================== |
| 197 | + |
| 198 | +``vl`` and ``vtype`` need to be configured correctly, so we can't just directly select the underlying vector ``MachineInstr``. Instead pseudo instructions are selected, which carry the extra information needed to emit the necessary ``vsetvli``\s later. |
| 199 | + |
| 200 | +.. code-block:: |
| 201 | +
|
| 202 | + %c:vrm2 = PseudoVADD_VV_M2 %passthru:vrm2(tied-def 0), %a:vrm2, %b:vrm2, %vl:gpr, 5 /*sew*/, 3 /*policy*/ |
| 203 | +
|
| 204 | +Each vector instruction has multiple pseudo instructions defined in ``RISCVInstrInfoVPseudos.td``. |
| 205 | +There is a variant of each pseudo for each possible LMUL, as well as a masked variant. So a typical instruction like ``vadd.vv`` would have the following pseudos: |
| 206 | + |
| 207 | +.. code-block:: |
| 208 | +
|
| 209 | + %rd:vr = PseudoVADD_VV_MF8 %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, %avl:gpr, sew:imm, policy:imm |
| 210 | + %rd:vr = PseudoVADD_VV_MF4 %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, %avl:gpr, sew:imm, policy:imm |
| 211 | + %rd:vr = PseudoVADD_VV_MF2 %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, %avl:gpr, sew:imm, policy:imm |
| 212 | + %rd:vr = PseudoVADD_VV_M1 %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, %avl:gpr, sew:imm, policy:imm |
| 213 | + %rd:vrm2 = PseudoVADD_VV_M2 %passthru:vrm2(tied-def 0), %rs2:vrm2, %rs1:vrm2, %avl:gpr, sew:imm, policy:imm |
| 214 | + %rd:vrm4 = PseudoVADD_VV_M4 %passthru:vrm4(tied-def 0), %rs2:vrm4, %rs1:vrm4, %avl:gpr, sew:imm, policy:imm |
| 215 | + %rd:vrm8 = PseudoVADD_VV_M8 %passthru:vrm8(tied-def 0), %rs2:vrm8, %rs1:vrm8, %avl:gpr, sew:imm, policy:imm |
| 216 | + %rd:vr = PseudoVADD_VV_MF8_MASK %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 217 | + %rd:vr = PseudoVADD_VV_MF4_MASK %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 218 | + %rd:vr = PseudoVADD_VV_MF2_MASK %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 219 | + %rd:vr = PseudoVADD_VV_M1_MASK %passthru:vr(tied-def 0), %rs2:vr, %rs1:vr, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 220 | + %rd:vrm2 = PseudoVADD_VV_M2_MASK %passthru:vrm2(tied-def 0), %rs2:vrm2, %%rs1:vrm2, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 221 | + %rd:vrm4 = PseudoVADD_VV_M4_MASK %passthru:vrm4(tied-def 0), %rs2:vrm4, %rs1:vrm4, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 222 | + %rd:vrm8 = PseudoVADD_VV_M8_MASK %passthru:vrm8(tied-def 0), %rs2:vrm8, %rs1:vrm8, mask:$v0, %avl:gpr, sew:imm, policy:imm |
| 223 | +
|
| 224 | +.. note:: |
| 225 | + |
| 226 | + Whilst the SEW can be encoded in an operand, we need to use separate pseudos for each LMUL since different register groups will require different register classes: see :ref:`rvv_register_allocation`. |
| 227 | + |
| 228 | + |
| 229 | +Pseudos have operands for the AVL and SEW (encoded as a power of 2), as well as potentially the mask, policy or rounding mode if applicable. |
| 230 | +The passthru operand is tied to the destination register which will determine the inactive/tail elements. |
| 231 | + |
| 232 | +For scalable vectors that should use VLMAX, the AVL is set to a sentinel value of ``-1``. |
| 233 | + |
| 234 | +There are patterns for target agnostic SelectionDAG nodes in ``RISCVInstrInfoVSDPatterns.td``, VL nodes in ``RISCVInstrInfoVVLPatterns.td`` and RVV intrinsics in ``RISCVInstrInfoVPseudos.td``. |
| 235 | + |
| 236 | +Mask patterns |
| 237 | +------------- |
| 238 | + |
| 239 | +For masked pseudos the mask operand is copied to the physical ``$v0`` register during instruction selection with a glued ``CopyToReg`` node: |
| 240 | + |
| 241 | +.. code-block:: |
| 242 | +
|
| 243 | + t23: ch,glue = CopyToReg t0, Register:nxv4i1 $v0, t6 |
| 244 | + t25: nxv4i32 = PseudoVADD_VV_M2_MASK Register:nxv4i32 $noreg, t2, t4, Register:nxv4i1 $v0, TargetConstant:i64<8>, TargetConstant:i64<5>, TargetConstant:i64<1>, t23:1 |
| 245 | +
|
| 246 | +The patterns in ``RISCVInstrInfoVVLPatterns.td`` only match masked pseudos to reduce the size of the match table, even if the node's mask is all ones and could be an unmasked pseudo. |
| 247 | +``RISCVFoldMasks::convertToUnmasked`` will detect if the mask is all ones and convert it into its unmasked form. |
| 248 | + |
| 249 | +.. code-block:: |
| 250 | +
|
| 251 | + $v0 = PseudoVMSET_M_B16 -1, 32 |
| 252 | + %rd:vrm2 = PseudoVADD_VV_M2_MASK %passthru:vrm2(tied-def 0), %rs2:vrm2, %rs1:vrm2, $v0, %avl:gpr, sew:imm, policy:imm |
| 253 | +
|
| 254 | + // gets optimized to: |
| 255 | +
|
| 256 | + %rd:vrm2 = PseudoVADD_VV_M2 %passthru:vrm2(tied-def 0), %rs2:vrm2, %rs1:vrm2, %avl:gpr, sew:imm, policy:imm |
| 257 | +
|
| 258 | +.. note:: |
| 259 | + |
| 260 | + Any ``vmset.m`` can be treated as an all ones mask since the tail elements past AVL are ``undef`` and can be replaced with ones. |
| 261 | + |
| 262 | +.. _rvv_register_allocation: |
| 263 | + |
| 264 | +Register allocation |
| 265 | +=================== |
| 266 | + |
| 267 | +Register allocation is split between vector and scalar registers, with vector allocation running first: |
| 268 | + |
| 269 | +.. code-block:: |
| 270 | +
|
| 271 | + $v8m2 = PseudoVADD_VV_M2 $v8m2(tied-def 0), $v8m2, $v10m2, %vl:gpr, 5, 3 |
| 272 | +
|
| 273 | +.. note:: |
| 274 | + |
| 275 | + Register allocation is split so that :ref:`RISCVInsertVSETVLI` can run after vector register allocation, but before scalar register allocation. It needs to be run before scalar register allocation as it may need to create a new virtual register to set the AVL to VLMAX. |
| 276 | + |
| 277 | + Performing ``RISCVInsertVSETVLI`` after vector register allocation imposes fewer constraints on the machine scheduler since it cannot schedule instructions past ``vsetvli``\s, and it allows us to emit further vector pseudos during spilling or constant rematerialization. |
| 278 | + |
| 279 | +There are four register classes for vectors: |
| 280 | + |
| 281 | +- ``VR`` for vector registers (``v0``, ``v1,``, ..., ``v32``). Used when :math:`\text{LMUL} \leq 1` and mask registers. |
| 282 | +- ``VRM2`` for vector groups of length 2 i.e., :math:`\text{LMUL}=2` (``v0m2``, ``v2m2``, ..., ``v30m2``) |
| 283 | +- ``VRM4`` for vector groups of length 4 i.e., :math:`\text{LMUL}=4` (``v0m4``, ``v4m4``, ..., ``v28m4``) |
| 284 | +- ``VRM8`` for vector groups of length 8 i.e., :math:`\text{LMUL}=8` (``v0m8``, ``v8m8``, ..., ``v24m8``) |
| 285 | + |
| 286 | +:math:`\text{LMUL} \lt 1` types and mask types do not benefit from having a dedicated class, so ``VR`` is used in their case. |
| 287 | + |
| 288 | +Some instructions have a constraint that a register operand cannot be ``V0`` or overlap with ``V0``, so for these cases we also have ``VRNoV0`` variants. |
| 289 | + |
| 290 | +.. _RISCVInsertVSETVLI: |
| 291 | + |
| 292 | +RISCVInsertVSETVLI |
| 293 | +================== |
| 294 | + |
| 295 | +After vector registers are allocated, the ``RISCVInsertVSETVLI`` pass will insert the necessary ``vsetvli``\s for the pseudos. |
| 296 | + |
| 297 | +.. code-block:: |
| 298 | +
|
| 299 | + dead $x0 = PseudoVSETVLI %vl:gpr, 209, implicit-def $vl, implicit-def $vtype |
| 300 | + $v8m2 = PseudoVADD_VV_M2 $v8m2(tied-def 0), $v8m2, $v10m2, $noreg, 5, implicit $vl, implicit $vtype |
| 301 | +
|
| 302 | +The physical ``$vl`` and ``$vtype`` registers are implicitly defined by the ``PseudoVSETVLI``, and are implicitly used by the ``PseudoVADD``. |
| 303 | +The ``vtype`` operand (``209`` in this example) is encoded as per the specification via ``RISCVVType::encodeVTYPE``. |
| 304 | + |
| 305 | +``RISCVInsertVSETVLI`` performs dataflow analysis to emit as few ``vsetvli``\s as possible. It will also try to minimize the number of ``vsetvli``\s that set VL, i.e., it will emit ``vsetvli x0, x0`` if only ``vtype`` needs changed but ``vl`` doesn't. |
| 306 | + |
| 307 | +Pseudo expansion and printing |
| 308 | +============================= |
| 309 | + |
| 310 | +After scalar register allocation, the ``RISCVExpandPseudoInsts.cpp`` pass expands the ``PseudoVSETVLI`` instructions. |
| 311 | + |
| 312 | +.. code-block:: |
| 313 | +
|
| 314 | + dead $x0 = VSETVLI $x1, 209, implicit-def $vtype, implicit-def $vl |
| 315 | + renamable $v8m2 = PseudoVADD_VV_M2 $v8m2(tied-def 0), $v8m2, $v10m2, $noreg, 5, implicit $vl, implicit $vtype |
| 316 | +
|
| 317 | +Note that the vector pseudo remains as it's needed to encode the register class for the LMUL. Its AVL and SEW operands are no longer used. |
| 318 | + |
| 319 | +``RISCVAsmPrinter`` will then lower the pseudo instructions into real ``MCInst``\s. |
| 320 | + |
| 321 | +.. code-block:: nasm |
| 322 | +
|
| 323 | + vsetvli a0, zero, e32, m2, ta, ma |
| 324 | + vadd.vv v8, v8, v10 |
| 325 | +
|
| 326 | +
|
| 327 | +
|
| 328 | +See also |
| 329 | +======== |
| 330 | + |
| 331 | +- `[llvm-dev] [RFC] Code generation for RISC-V V-extension <https://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html>`_ |
| 332 | +- `2023 LLVM Dev Mtg - Vector codegen in the RISC-V backend <https://youtu.be/-ox8iJmbp0c?feature=shared>`_ |
| 333 | +- `2023 LLVM Dev Mtg - How to add an C intrinsic and code-gen it, using the RISC-V vector C intrinsics <https://youtu.be/t17O_bU1jks?feature=shared>`_ |
| 334 | +- `2021 LLVM Dev Mtg “Optimizing code for scalable vector architectures” <https://youtu.be/daWLCyhwrZ8?feature=shared>`_ |
0 commit comments