[AArch64] Implement FP8 Neon reinterpret intrinsics #121804

momchil-velikov · 2025-01-06T17:20:24Z

No description provided.

github-actions · 2025-01-06T17:24:29Z

✅ With the latest revision this PR passed the Python code formatter.

github-actions · 2025-01-06T17:24:29Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff a2995cb4bb21ba2fe6277bbcd24b8ab1b357e12d 31792b734f3dbfe38a0674da705dfe880bb5f061 --extensions h,c,cpp -- clang/test/CodeGen/AArch64/builtin-shufflevector-fp8.c clang/test/CodeGen/AArch64/fp8-cast.c clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_cvt.c clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fdot.c clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fmla.c clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_reinterpret.c clang/test/Sema/aarch64-fp8-cast.c clang/test/Sema/aarch64-fp8-intrinsics/acle_neon_fp8_cvt.c clang/test/Sema/aarch64-fp8-intrinsics/acle_neon_fp8_fdot.c clang/test/Sema/aarch64-fp8-intrinsics/acle_neon_fp8_fmla.c clang/test/Sema/builtin-shufflevector.c clang/include/clang/AST/Type.h clang/include/clang/Sema/Sema.h clang/lib/AST/ASTContext.cpp clang/lib/AST/ItaniumMangle.cpp clang/lib/AST/Type.cpp clang/lib/CodeGen/CGBuiltin.cpp clang/lib/CodeGen/CodeGenFunction.h clang/lib/CodeGen/CodeGenTypes.cpp clang/lib/CodeGen/Targets/AArch64.cpp clang/lib/Sema/SemaCast.cpp clang/lib/Sema/SemaChecking.cpp clang/lib/Sema/SemaExpr.cpp clang/utils/TableGen/NeonEmitter.cpp clang/utils/TableGen/SveEmitter.cpp

View the diff from clang-format here.

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 3f265da96b..69e441f7c5 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -14162,8 +14162,8 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     llvm::Type *Ty = llvm::FixedVectorType::get(Int8Ty, 16);
     Ops[0] = Builder.CreateInsertVector(Ty, PoisonValue::get(Ty), Ops[0],
                                         Builder.getInt64(0));
-    return EmitFP8NeonCvtCall(Intrinsic::aarch64_neon_fp8_fcvtn2,
-                              Ty, Ops[1]->getType(), false, Ops, E, "vfcvtn2");
+    return EmitFP8NeonCvtCall(Intrinsic::aarch64_neon_fp8_fcvtn2, Ty,
+                              Ops[1]->getType(), false, Ops, E, "vfcvtn2");
   }
 
   case NEON::BI__builtin_neon_vdot_f16_mf8_fpm:

* The FP8 scalar type (`__mfp8`) was described as a vector type * The FP8 vector types were described/assumed to have integer element type (the element type ought to be `__mfp8`), * Add support for `m` type specifier (denoting `__mfp8`) in `DecodeTypeFromStr` and create SVE builtin prototypes using the specifier, instead of `int8_t`.

…VM vector types

…shufflevector The Neon vector types for FP8 (`__MFloat8x8_t` and `__MFloat8x16_t`) are implemented as builtin types and need a special case in `__builtin_shufflevector`.

THis patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)

This patch adds the following intrinsics: float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t)

This patch adds the following intrinsics: * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)

momchil-velikov requested review from jthackray, rgwott, Lukacma, CarolineConcatto and tmatheson-arm January 6, 2025 17:20

momchil-velikov added 23 commits January 13, 2025 10:25

[fixup] Add a comment about special case of mapping FP8 vectors to LL…

01ee70b

…VM vector types

[Clang][AArch64] Allow FP8 Neon vector types to be used by __builtin_…

a4d998f

…shufflevector The Neon vector types for FP8 (`__MFloat8x8_t` and `__MFloat8x16_t`) are implemented as builtin types and need a special case in `__builtin_shufflevector`.

[fixup] Fix formatting (NFC)

9496937

[fixup] Address review comments, minor tweaks

e6da3ea

FP8 bitcast

42b36c5

[AArch64] Add Neon FP8 conversion intrinsics

de28a76

[fixup] Add tests, fix calling the wrong LLVM intrinsic

66730cd

[fixup] Refector much of common code into a helper function (NFC)

f7436ef

[fixup] Add target features test, remove redundant bf16 guard

67627ef

[fixup] Clear the NoManglingQ flag for FP8

d617b03

[fixup] Remove instcombine,tailcallelim from test run lines

95d61df

[fixup] Remove not needed argument (NFC)

4b01184

[fixup] Update intrinsics declarations

b57c87e

[fixup] Add C++ runs to tests, remove some opt passes

d7896c3

[fixup] Update intrinsics definitions

9749148

[fixup] Remove some opt passes from RUN lines

2b939e3

[fixup] Update intrinsics definitions

402f1ef

[fixup] Regenerate tests

7d3bfe6

[AArch64] Implement FP8 Neon reinterpret intrinsics

a921682

[fixup] Remove some opt passes from tests, regenerate tests

31792b7

momchil-velikov force-pushed the fp8-neon-reinterpret branch from a6354f2 to 31792b7 Compare January 14, 2025 11:58

momchil-velikov requested a review from Endilll as a code owner January 14, 2025 11:58

Endilll removed their request for review January 15, 2025 18:29

This was referenced Jan 20, 2025

[AArch64] Implement FP8 Neon reinterpret intrinsics #120476

Merged

[Clang][AArch64] Allow FP8 Neon vector types to be used by __builtin_shufflevector #119031

Closed

momchil-velikov closed this Jan 27, 2025

momchil-velikov deleted the fp8-neon-reinterpret branch January 29, 2025 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Implement FP8 Neon reinterpret intrinsics #121804

[AArch64] Implement FP8 Neon reinterpret intrinsics #121804

Uh oh!

momchil-velikov commented Jan 6, 2025

Uh oh!

github-actions bot commented Jan 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

[AArch64] Implement FP8 Neon reinterpret intrinsics #121804

[AArch64] Implement FP8 Neon reinterpret intrinsics #121804

Uh oh!

Conversation

momchil-velikov commented Jan 6, 2025

Uh oh!

github-actions bot commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025 •

edited

Loading