@@ -1229,20 +1229,20 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
1229
1229
operation within a row (16 contiguous lanes) of the second input operand.
1230
1230
The third and fourth inputs must be scalar values. these are combined into
1231
1231
a single 64-bit value representing lane selects used to swizzle within each
1232
- row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>,
1232
+ row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>,
1233
1233
<2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
1234
1234
1235
1235
llvm.amdgcn.permlanex16 Provides direct access to v_permlanex16_b32. Performs arbitrary gather-style
1236
1236
operation across two rows of the second input operand (each row is 16 contiguous
1237
1237
lanes). The third and fourth inputs must be scalar values. these are combined
1238
1238
into a single 64-bit value representing lane selects used to swizzle within each
1239
- row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>, <2 x half>,
1239
+ row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>, <2 x half>,
1240
1240
<2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
1241
1241
1242
1242
llvm.amdgcn.permlane64 Provides direct access to v_permlane64_b32. Performs a specific permutation across
1243
1243
lanes of the input operand where the high half and low half of a wave64 are swapped.
1244
- Performs no operation in wave32 mode. Currently implemented for i16, i32, float, half,
1245
- bfloat, <2 x i16>, <2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the
1244
+ Performs no operation in wave32 mode. Currently implemented for i16, i32, float, half,
1245
+ bfloat, <2 x i16>, <2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the
1246
1246
32-bit vectors.
1247
1247
1248
1248
llvm.amdgcn.udot2 Provides direct access to v_dot2_u32_u16 across targets which
0 commit comments