-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][AArch64] Use __clang_arm_builtin_alias for overloaded svreinterpret's #92427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang][AArch64] Use __clang_arm_builtin_alias for overloaded svreinterpret's #92427
Conversation
The intrinsics are currently defined as: __aio __attribute__((target("sve"))) svint8_t svreinterpret_s8(svuint8_t op) __arm_streaming_compatible { return __builtin_sve_reinterpret_s8_u8(op); } which doesn't work when calling it from an __arm_streaming function when only +sme is available. By defining it in the same way as we've defined all the other intrinsics, we can leave it to the code in SemaChecking to verify that either +sve or +sme is available. This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b intrinsics, that convert between svcount_t and svbool_t, as these are available both in SME2 and SVE2p1.
@llvm/pr-subscribers-clang Author: Sander de Smalen (sdesmalen-arm) ChangesThe intrinsics are currently defined as: __aio attribute((target("sve"))) which doesn't work when calling it from an __arm_streaming function when only +sme is available. By defining it in the same way as we've defined all the other intrinsics, we can leave it to the code in SemaChecking to verify that either +sve or +sme is available. This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b intrinsics, that convert between svcount_t and svbool_t, as these are available both in SME2 and SVE2p1. Patch is 74.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92427.diff 6 Files Affected:
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index 15340ebb62b36..6e5656b037d1d 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -2186,9 +2186,6 @@ let TargetGuard = "sme2" in {
def SVSQRSHRUN_X4 : SInst<"svqrshrun[_n]_{0}[_{d}_x4]", "b4i", "il", MergeNone, "aarch64_sve_sqrshrun_x4", [IsStreaming], [ImmCheck<1, ImmCheckShiftRight, 0>]>;
- def REINTERPRET_SVBOOL_TO_SVCOUNT : Inst<"svreinterpret[_c]", "}P", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
- def REINTERPRET_SVCOUNT_TO_SVBOOL : Inst<"svreinterpret[_b]", "P}", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
-
// SQDMULH
def SVSQDMULH_SINGLE_X2 : SInst<"svqdmulh[_single_{d}_x2]", "22d", "csil", MergeNone, "aarch64_sve_sqdmulh_single_vgx2", [IsStreaming], []>;
def SVSQDMULH_SINGLE_X4 : SInst<"svqdmulh[_single_{d}_x4]", "44d", "csil", MergeNone, "aarch64_sve_sqdmulh_single_vgx4", [IsStreaming], []>;
@@ -2197,6 +2194,9 @@ let TargetGuard = "sme2" in {
}
let TargetGuard = "sve2p1|sme2" in {
+ def REINTERPRET_SVBOOL_TO_SVCOUNT : Inst<"svreinterpret[_c]", "}P", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
+ def REINTERPRET_SVCOUNT_TO_SVBOOL : Inst<"svreinterpret[_b]", "P}", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
+
// SQRSHRN / UQRSHRN
def SVQRSHRN_X2 : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "h2i", "i", MergeNone, "aarch64_sve_sqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
def SVUQRSHRN_X2 : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "e2i", "Ui", MergeNone, "aarch64_sve_uqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
index b3d5f4a4c4a53..e702d36ad3954 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
@@ -2,15 +2,17 @@
// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve2p1 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve2p1 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-#include <arm_sme.h>
+#include <arm_sve.h>
#ifdef SVE_OVERLOADED_FORMS
-// A simple used,unused... macro, long enough to represent any SVE builtin.§
+// A simple used,unused... macro, long enough to represent any SVE builtin.
#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3
#else
#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4
diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
index bf2cd23e40802..41208bfb1f435 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
@@ -4,6 +4,10 @@
// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x4 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x4 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE2
// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
@@ -18,9 +22,16 @@
// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -DTUPLE=x4 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE4
// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sve -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sme -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
#include <arm_sve.h>
+#if defined __ARM_FEATURE_SME
+#define MODE_ATTR __arm_streaming
+#else
+#define MODE_ATTR
+#endif
+
#ifdef TUPLE
#define TYPE_1(base,tuple) base ## tuple ## _t
#define TYPE_0(base,tuple) TYPE_1(base,tuple)
@@ -81,7 +92,7 @@
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 64 x i8>
// CPP-TUPLE4-NEXT: ret <vscale x 64 x i8> [[TMP0]]
//
-TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) {
+TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_s8, _bf16)(op);
}
@@ -125,7 +136,7 @@ TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x i16>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x i16> [[TMP0]]
//
-TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) {
+TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_s16, _bf16)(op);
}
@@ -169,7 +180,7 @@ TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x i32>
// CPP-TUPLE4-NEXT: ret <vscale x 16 x i32> [[TMP0]]
//
-TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) {
+TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_s32, _bf16)(op);
}
// CHECK-LABEL: @test_svreinterpret_s64_bf16(
@@ -212,7 +223,7 @@ TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x i64>
// CPP-TUPLE4-NEXT: ret <vscale x 8 x i64> [[TMP0]]
//
-TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) {
+TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_s64, _bf16)(op);
}
@@ -256,7 +267,7 @@ TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 64 x i8>
// CPP-TUPLE4-NEXT: ret <vscale x 64 x i8> [[TMP0]]
//
-TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_u8, _bf16)(op);
}
@@ -300,7 +311,7 @@ TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x i16>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x i16> [[TMP0]]
//
-TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_u16, _bf16)(op);
}
@@ -344,7 +355,7 @@ TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x i32>
// CPP-TUPLE4-NEXT: ret <vscale x 16 x i32> [[TMP0]]
//
-TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_u32, _bf16)(op);
}
@@ -388,7 +399,7 @@ TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x i64>
// CPP-TUPLE4-NEXT: ret <vscale x 8 x i64> [[TMP0]]
//
-TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_u64, _bf16)(op);
}
@@ -432,7 +443,7 @@ TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 64 x i8> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _s8)(op);
}
@@ -476,7 +487,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x i16> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _s16)(op);
}
@@ -520,7 +531,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 16 x i32> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _s32)(op);
}
@@ -564,7 +575,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 8 x i64> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _s64)(op);
}
@@ -608,7 +619,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 64 x i8> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _u8)(op);
}
@@ -652,7 +663,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x i16> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _u16)(op);
}
@@ -696,7 +707,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 16 x i32> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _u32)(op);
}
@@ -740,7 +751,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 8 x i64> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _u64)(op);
}
@@ -776,7 +787,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) {
// CPP-TUPLE4-NEXT: entry:
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[OP:%.*]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _bf16)(op);
}
@@ -820,7 +831,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x half> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _f16)(op);
}
@@ -864,7 +875,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 16 x float> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _f32)(op);
}
@@ -908,7 +919,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 8 x double> [[OP:%.*]] to <vscale x 32 x bfloat>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x bfloat> [[TMP0]]
//
-TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_bf16, _f64)(op);
}
@@ -952,7 +963,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x float>
// CPP-TUPLE4-NEXT: ret <vscale x 16 x float> [[TMP0]]
//
-TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_f32, _bf16)(op);
}
@@ -996,7 +1007,7 @@ TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x half>
// CPP-TUPLE4-NEXT: ret <vscale x 32 x half> [[TMP0]]
//
-TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_f16, _bf16)(op);
}
@@ -1040,6 +1051,6 @@ TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) {
// CPP-TUPLE4-NEXT: [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x double>
// CPP-TUPLE4-NEXT: ret <vscale x 8 x double> [[TMP0]]
//
-TYPE(svfloat64) test_svreinterpret_f64_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat64) test_svreinterpret_f64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
return SVE_ACLE_FUNC(svreinterpret_f64, _bf16)(op);
}
diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
index 3d9d5c3ce45ae..e61bbf3e03d7e 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
@@ -4,6 +4,10 @@
// RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
// RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
// RUN: %clang_cc1 -DTUPLE=x4 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
+// RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
+// RUN: %clang_cc1 -DTUPLE=x4 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE2
// RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
@@ -17,9 +21,16 @@
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -DTUPLE=x4 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE4
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64...
[truncated]
|
However we choose to emit this particular builtin, should we provide a way to write a function like this? Like, the caller has to either support sve, or have streaming enabled. Maybe call it __arm_streaming_compatible_requires_sve. |
I'd like to make two (probably obvious) points before answering this:
So if someone wants to say both “this function should be streaming-compatible” and “this function assumes it has access to these streaming SVE features” then I think those things should be specified as two separate annotations, rather than a single combined one. The second one (about assuming/requiring available features) is useful more generally, for non-streaming and |
The key here is that Thinking about it a bit more, maybe we can just do some magic to make things work? Say, if you specify |
Thinking a bit more, this is probably not quite what we want: even if the function body itself is streaming compatible, it might call non-streaming functions that require SVE. Maybe spell this something like |
I suppose the idea here is that:
should be diagnosed, on the basis that, when compiled with default flags, First, it's IMO valid to do:
That is, feature gating can be dynamic. It doesn't need to be a load-time thing. Second, at least in GCC, the
And I'd argue that that's a feature rather than a bug. It allows load-time selection of DSOs based on the target. |
clang specifically diagnoses always_inline functions. So for example, say you want to write something like:
Conceptually, this should be fine, but currently it's an error. (You can sort of approximate it with an But regardless of the diagnostics, if the user specifies "+sve" on a target that doesn't actually have SVE, we could miscompile; for example, if you call a versioned function, clang uses the features specified in the caller. |
Yeah. I think in GCC, the reason for the equivalent behaviour is that (a) the compiler assumes (without checking) that the contents really do require the given target and (b) the compiler doesn't support changes to the ISA within a function. So the error is that the target mismatch caused inlining to fail, rather than that the mismatch occurred at all.
That's a good point. I suppose it comes down to two fundamentally different use cases:
Perhaps (2) could be called “streaming SVE functions”. If the compiler is supposed to enforce the constraints, then the set of assumed streaming SVE features would need to be a property of the function type. |
The intrinsics are currently defined as:
which doesn't work when calling it from an __arm_streaming function when only +sme is available. By defining it in the same way as we've defined all the other intrinsics, we can leave it to the code in SemaChecking to verify that either +sve or +sme is available.
This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b intrinsics, that convert between svcount_t and svbool_t, as these are available both in SME2 and SVE2p1.