Skip to content

[Clang][AArch64] Use __clang_arm_builtin_alias for overloaded svreinterpret's #92427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sdesmalen-arm
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm commented May 16, 2024

The intrinsics are currently defined as:

  __aio __attribute__((target("sve")))
  svint8_t svreinterpret_s8(svuint8_t op) __arm_streaming_compatible {
    return __builtin_sve_reinterpret_s8_u8(op);
  }

which doesn't work when calling it from an __arm_streaming function when only +sme is available. By defining it in the same way as we've defined all the other intrinsics, we can leave it to the code in SemaChecking to verify that either +sve or +sme is available.

This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b intrinsics, that convert between svcount_t and svbool_t, as these are available both in SME2 and SVE2p1.

The intrinsics are currently defined as:

  __aio __attribute__((target("sve")))
  svint8_t svreinterpret_s8(svuint8_t op) __arm_streaming_compatible {
    return __builtin_sve_reinterpret_s8_u8(op);
  }

which doesn't work when calling it from an __arm_streaming function when only
+sme is available. By defining it in the same way as we've defined all the other
intrinsics, we can leave it to the code in SemaChecking to verify that either
+sve or +sme is available.

This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b
intrinsics, that convert between svcount_t and svbool_t, as these are available
both in SME2 and SVE2p1.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 16, 2024
@llvmbot
Copy link
Member

llvmbot commented May 16, 2024

@llvm/pr-subscribers-clang

Author: Sander de Smalen (sdesmalen-arm)

Changes

The intrinsics are currently defined as:

__aio attribute((target("sve")))
svint8_t svreinterpret_s8(svuint8_t op) __arm_streaming_compatible {
return __builtin_sve_reinterpret_s8_u8(op);
}

which doesn't work when calling it from an __arm_streaming function when only +sme is available. By defining it in the same way as we've defined all the other intrinsics, we can leave it to the code in SemaChecking to verify that either +sve or +sme is available.

This PR also fixes the target guards for the svreinterpret_c and svreinterpret_b intrinsics, that convert between svcount_t and svbool_t, as these are available both in SME2 and SVE2p1.


Patch is 74.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92427.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/arm_sve.td (+3-3)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c (+4-2)
  • (modified) clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c (+34-23)
  • (modified) clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c (+132-121)
  • (removed) clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret_from_streaming_mode.c (-35)
  • (modified) clang/utils/TableGen/SveEmitter.cpp (+9-12)
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index 15340ebb62b36..6e5656b037d1d 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -2186,9 +2186,6 @@ let TargetGuard = "sme2" in {
 
   def SVSQRSHRUN_X4 : SInst<"svqrshrun[_n]_{0}[_{d}_x4]", "b4i", "il", MergeNone, "aarch64_sve_sqrshrun_x4", [IsStreaming], [ImmCheck<1, ImmCheckShiftRight, 0>]>;
 
-  def REINTERPRET_SVBOOL_TO_SVCOUNT : Inst<"svreinterpret[_c]", "}P", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
-  def REINTERPRET_SVCOUNT_TO_SVBOOL : Inst<"svreinterpret[_b]", "P}", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
-
   // SQDMULH
   def SVSQDMULH_SINGLE_X2 : SInst<"svqdmulh[_single_{d}_x2]", "22d", "csil", MergeNone, "aarch64_sve_sqdmulh_single_vgx2", [IsStreaming], []>;
   def SVSQDMULH_SINGLE_X4 : SInst<"svqdmulh[_single_{d}_x4]", "44d", "csil", MergeNone, "aarch64_sve_sqdmulh_single_vgx4", [IsStreaming], []>;
@@ -2197,6 +2194,9 @@ let TargetGuard = "sme2" in {
 }
 
 let TargetGuard = "sve2p1|sme2" in {
+  def REINTERPRET_SVBOOL_TO_SVCOUNT : Inst<"svreinterpret[_c]", "}P", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
+  def REINTERPRET_SVCOUNT_TO_SVBOOL : Inst<"svreinterpret[_b]", "P}", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
+
   // SQRSHRN / UQRSHRN
   def SVQRSHRN_X2   : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "h2i", "i",    MergeNone, "aarch64_sve_sqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
   def SVUQRSHRN_X2  : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "e2i", "Ui",   MergeNone, "aarch64_sve_uqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
index b3d5f4a4c4a53..e702d36ad3954 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_reinterpret_svcount_svbool.c
@@ -2,15 +2,17 @@
 
 // REQUIRES: aarch64-registered-target
 
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve2p1 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve2p1 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
 // RUN: %clang_cc1 -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
 // RUN: %clang_cc1 -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sme2 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
 
-#include <arm_sme.h>
+#include <arm_sve.h>
 
 #ifdef SVE_OVERLOADED_FORMS
-// A simple used,unused... macro, long enough to represent any SVE builtin.§
+// A simple used,unused... macro, long enough to represent any SVE builtin.
 #define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3
 #else
 #define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4
diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
index bf2cd23e40802..41208bfb1f435 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c
@@ -4,6 +4,10 @@
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x4 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x4 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
 // RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x2 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE2
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DTUPLE=x3 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
@@ -18,9 +22,16 @@
 // RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -DTUPLE=x4 -triple aarch64 -target-feature +sve -target-feature +bf16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE4
 
 // RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sve -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +sme -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 
 #include <arm_sve.h>
 
+#if defined __ARM_FEATURE_SME
+#define MODE_ATTR __arm_streaming
+#else
+#define MODE_ATTR
+#endif
+
 #ifdef TUPLE
 #define TYPE_1(base,tuple) base ## tuple ## _t
 #define TYPE_0(base,tuple) TYPE_1(base,tuple)
@@ -81,7 +92,7 @@
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 64 x i8>
 // CPP-TUPLE4-NEXT:    ret <vscale x 64 x i8> [[TMP0]]
 //
-TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) {
+TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_s8, _bf16)(op);
 }
 
@@ -125,7 +136,7 @@ TYPE(svint8) test_svreinterpret_s8_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x i16>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x i16> [[TMP0]]
 //
-TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) {
+TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_s16, _bf16)(op);
 }
 
@@ -169,7 +180,7 @@ TYPE(svint16) test_svreinterpret_s16_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x i32>
 // CPP-TUPLE4-NEXT:    ret <vscale x 16 x i32> [[TMP0]]
 //
-TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) {
+TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_s32, _bf16)(op);
 }
 // CHECK-LABEL: @test_svreinterpret_s64_bf16(
@@ -212,7 +223,7 @@ TYPE(svint32) test_svreinterpret_s32_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x i64>
 // CPP-TUPLE4-NEXT:    ret <vscale x 8 x i64> [[TMP0]]
 //
-TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) {
+TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_s64, _bf16)(op);
 }
 
@@ -256,7 +267,7 @@ TYPE(svint64) test_svreinterpret_s64_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 64 x i8>
 // CPP-TUPLE4-NEXT:    ret <vscale x 64 x i8> [[TMP0]]
 //
-TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_u8, _bf16)(op);
 }
 
@@ -300,7 +311,7 @@ TYPE(svuint8) test_svreinterpret_u8_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x i16>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x i16> [[TMP0]]
 //
-TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_u16, _bf16)(op);
 }
 
@@ -344,7 +355,7 @@ TYPE(svuint16) test_svreinterpret_u16_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x i32>
 // CPP-TUPLE4-NEXT:    ret <vscale x 16 x i32> [[TMP0]]
 //
-TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_u32, _bf16)(op);
 }
 
@@ -388,7 +399,7 @@ TYPE(svuint32) test_svreinterpret_u32_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x i64>
 // CPP-TUPLE4-NEXT:    ret <vscale x 8 x i64> [[TMP0]]
 //
-TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) {
+TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_u64, _bf16)(op);
 }
 
@@ -432,7 +443,7 @@ TYPE(svuint64) test_svreinterpret_u64_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 64 x i8> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _s8)(op);
 }
 
@@ -476,7 +487,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s8(TYPE(svint8) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x i16> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _s16)(op);
 }
 
@@ -520,7 +531,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s16(TYPE(svint16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i32> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _s32)(op);
 }
 
@@ -564,7 +575,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s32(TYPE(svint32) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 8 x i64> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _s64)(op);
 }
 
@@ -608,7 +619,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_s64(TYPE(svint64) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 64 x i8> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _u8)(op);
 }
 
@@ -652,7 +663,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u8(TYPE(svuint8) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x i16> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _u16)(op);
 }
 
@@ -696,7 +707,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u16(TYPE(svuint16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i32> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _u32)(op);
 }
 
@@ -740,7 +751,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u32(TYPE(svuint32) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 8 x i64> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _u64)(op);
 }
 
@@ -776,7 +787,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_u64(TYPE(svuint64) op) {
 // CPP-TUPLE4-NEXT:  entry:
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[OP:%.*]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _bf16)(op);
 }
 
@@ -820,7 +831,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x half> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _f16)(op);
 }
 
@@ -864,7 +875,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f16(TYPE(svfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x float> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _f32)(op);
 }
 
@@ -908,7 +919,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f32(TYPE(svfloat32) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 8 x double> [[OP:%.*]] to <vscale x 32 x bfloat>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x bfloat> [[TMP0]]
 //
-TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) {
+TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_bf16, _f64)(op);
 }
 
@@ -952,7 +963,7 @@ TYPE(svbfloat16) test_svreinterpret_bf16_f64(TYPE(svfloat64) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 16 x float>
 // CPP-TUPLE4-NEXT:    ret <vscale x 16 x float> [[TMP0]]
 //
-TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_f32, _bf16)(op);
 }
 
@@ -996,7 +1007,7 @@ TYPE(svfloat32) test_svreinterpret_f32_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 32 x half>
 // CPP-TUPLE4-NEXT:    ret <vscale x 32 x half> [[TMP0]]
 //
-TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_f16, _bf16)(op);
 }
 
@@ -1040,6 +1051,6 @@ TYPE(svfloat16) test_svreinterpret_f16_bf16(TYPE(svbfloat16) op) {
 // CPP-TUPLE4-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 32 x bfloat> [[OP:%.*]] to <vscale x 8 x double>
 // CPP-TUPLE4-NEXT:    ret <vscale x 8 x double> [[TMP0]]
 //
-TYPE(svfloat64) test_svreinterpret_f64_bf16(TYPE(svbfloat16) op) {
+TYPE(svfloat64) test_svreinterpret_f64_bf16(TYPE(svbfloat16) op) MODE_ATTR {
   return SVE_ACLE_FUNC(svreinterpret_f64, _bf16)(op);
 }
diff --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
index 3d9d5c3ce45ae..e61bbf3e03d7e 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
@@ -4,6 +4,10 @@
 // RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
 // RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
 // RUN: %clang_cc1 -DTUPLE=x4 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE2
+// RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE3
+// RUN: %clang_cc1 -DTUPLE=x4 -triple aarch64 -target-feature +sme -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=TUPLE4
 // RUN: %clang_cc1 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
 // RUN: %clang_cc1 -DTUPLE=x2 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE2
 // RUN: %clang_cc1 -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
@@ -17,9 +21,16 @@
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -DTUPLE=x3 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE3
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -DTUPLE=x4 -triple aarch64 -target-feature +sve -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,tailcallelim | FileCheck %s -check-prefix=CPP-TUPLE4
 // RUN: %clang_cc1 -triple aarch64 -target-feature +sve -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64...
[truncated]

@efriedma-quic
Copy link
Collaborator

However we choose to emit this particular builtin, should we provide a way to write a function like this? Like, the caller has to either support sve, or have streaming enabled. Maybe call it __arm_streaming_compatible_requires_sve.

@rsandifo-arm
Copy link
Collaborator

However we choose to emit this particular builtin, should we provide a way to write a function like this? Like, the caller has to either support sve, or have streaming enabled. Maybe call it __arm_streaming_compatible_requires_sve.

I'd like to make two (probably obvious) points before answering this:

  • The existing three-way choice between normal, __arm_streaming, and __arm_streaming_compatible is simply a choice between possible incoming and outgoing PSTATE.SM states at runtime (0, 1, and 0 or 1, respectively). It's intended to be independent of the architecture level, except for the fact that using __arm_streaming is an error when SME is not enabled.

  • If __arm_streaming_compatible is being used to write a vector routine, the question isn't likely to be just “do I have SVE?” but “do I have this particular set of SVE features”? For example, I imagine many integer-based algorithms would want at least SVE2. At some point SVE2p1 would become a baseline for some users.

So if someone wants to say both “this function should be streaming-compatible” and “this function assumes it has access to these streaming SVE features” then I think those things should be specified as two separate annotations, rather than a single combined one. The second one (about assuming/requiring available features) is useful more generally, for non-streaming and __arm_streaming as well as __arm_streaming_compatible.

@efriedma-quic
Copy link
Collaborator

The key here is that __arm_streaming_compatible is the only way to write code that runs in both streaming and non-streaming mode; outside of __arm_streaming_compatible, there generally isn't an issue. If you know you're not in streaming mode, you can just check directly for SVE/SVE2, and if you know you're in streaming mode, you can check directly for SME/SME2.

Thinking about it a bit more, maybe we can just do some magic to make things work? Say, if you specify __attribute__((target("sve"))) __arm_streaming_compatible, and the caller is in streaming mode, allow the call even if the caller doesn't have SVE proper.

@efriedma-quic
Copy link
Collaborator

Thinking about it a bit more, maybe we can just do some magic to make things work? Say, if you specify __attribute__((target("sve"))) __arm_streaming_compatible, and the caller is in streaming mode, allow the call even if the caller doesn't have SVE proper.

Thinking a bit more, this is probably not quite what we want: even if the function body itself is streaming compatible, it might call non-streaming functions that require SVE. Maybe spell this something like __attribute__((target("sve-or-streaming"))) __arm_streaming_compatible.

@rsandifo-arm
Copy link
Collaborator

Thinking about it a bit more, maybe we can just do some magic to make things work? Say, if you specify __attribute__((target("sve"))) __arm_streaming_compatible, and the caller is in streaming mode, allow the call even if the caller doesn't have SVE proper.

Thinking a bit more, this is probably not quite what we want: even if the function body itself is streaming compatible, it might call non-streaming functions that require SVE. Maybe spell this something like __attribute__((target("sve-or-streaming"))) __arm_streaming_compatible.

I suppose the idea here is that:

   __attribute__((target("sve"))) void f() { … }
   void g() { … f(); … }

should be diagnosed, on the basis that, when compiled with default flags, g doesn't guarantee the availability of SVE, whereas f requires it? If so, I don't think we should do that, for two reasons:

First, it's IMO valid to do:

   __attribute__((target("sve"))) void sve_routine() { … }
   void main_interface() {
     if (SVE_is_available() && problem_has_certain_characteristics())
       sve_routine();
     else
       …
   }

That is, feature gating can be dynamic. It doesn't need to be a load-time thing.

Second, at least in GCC, the target attribute is not part of a function's type, so it's not an error to do:

foo.h: void f();
foo.cc: __attribute__((target("sve"))) void f() { … }

And I'd argue that that's a feature rather than a bug. It allows load-time selection of DSOs based on the target.

@efriedma-quic
Copy link
Collaborator

clang specifically diagnoses always_inline functions. So for example, say you want to write something like:

#include <arm_sve.h>
__attribute__((always_inline, target("+sve")))
static inline void f(void* p) __arm_streaming_compatible {
  *(svuint32_t*)p = svmul_m(svptrue_b32(), *(svuint32_t*)p, *(svuint32_t*)p);
}
//////////
void g(void* p) __arm_streaming { f(p); }

Conceptually, this should be fine, but currently it's an error. (You can sort of approximate it with an #ifdef __ARM_FEATURE_SVE, but that's pretty ugly.)

But regardless of the diagnostics, if the user specifies "+sve" on a target that doesn't actually have SVE, we could miscompile; for example, if you call a versioned function, clang uses the features specified in the caller.

@rsandifo-arm
Copy link
Collaborator

clang specifically diagnoses always_inline functions. So for example, say you want to write something like:

#include <arm_sve.h>
__attribute__((always_inline, target("+sve")))
static inline void f(void* p) __arm_streaming_compatible {
  *(svuint32_t*)p = svmul_m(svptrue_b32(), *(svuint32_t*)p, *(svuint32_t*)p);
}
//////////
void g(void* p) __arm_streaming { f(p); }

Conceptually, this should be fine, but currently it's an error. (You can sort of approximate it with an #ifdef __ARM_FEATURE_SVE, but that's pretty ugly.)

Yeah. I think in GCC, the reason for the equivalent behaviour is that (a) the compiler assumes (without checking) that the contents really do require the given target and (b) the compiler doesn't support changes to the ISA within a function. So the error is that the target mismatch caused inlining to fail, rather than that the mismatch occurred at all.

But regardless of the diagnostics, if the user specifies "+sve" on a target that doesn't actually have SVE, we could miscompile; for example, if you call a versioned function, clang uses the features specified in the caller.

That's a good point. I suppose it comes down to two fundamentally different use cases:

  1. A function that can be called in either PSTATE.SM mode and “just work”. This is what __arm_streaming_compatible is currently supposed to provide. That is, what the function can do is determined by where the function can be called (which is anywhere).
  2. A function that cannot be called with PSTATE.SM==0 on targets that have SME but not SVE. That is, where the function can be called is determined by what the function can do.

Perhaps (2) could be called “streaming SVE functions”. If the compiler is supposed to enforce the constraints, then the set of assumed streaming SVE features would need to be a property of the function type.

@sdesmalen-arm sdesmalen-arm merged commit f81da75 into llvm:main May 23, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants