-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
377be5e
to
9f98094
Compare
@llvm/pr-subscribers-backend-aarch64 Author: Paul Walker (paulwalker-arm) ChangesCode generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions where not available unless a matching SVE feature was enabled. As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient. This PR seperates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]". NOTE: In some instances this means the compiler will support unsupported configurations, which is fine. NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features. Patch is 45.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145322.diff 50 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index eb5a5199b8951..9634937991860 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -58,11 +58,11 @@ include "AArch64SystemOperands.td"
class AArch64Unsupported { list<Predicate> F; }
-let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_SME2p1] in
+let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_StreamingSME2, HasSVE2p1_or_SME2p1] in
def SVE2p1Unsupported : AArch64Unsupported;
def SVE2Unsupported : AArch64Unsupported {
- let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
+ let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasNonStreamingSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
HasSMEF8F32, HasSVEAES, HasSVE2SHA3, HasSVE2SM4, HasSVEBitPerm,
HasSVEB16B16],
SVE2p1Unsupported.F);
@@ -85,9 +85,9 @@ def SME2p1Unsupported : AArch64Unsupported {
}
def SME2Unsupported : AArch64Unsupported {
- let F = !listconcat([HasSME2, HasSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
+ let F = !listconcat([HasSME2, HasNonStreamingSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
HasSMEF8F16, HasSMEF8F32, HasSMEF16F16_or_SMEF8F16, HasSMEB16B16,
- HasNonStreamingSVE2_or_SSVE_AES],
+ HasNonStreamingSVE2_or_SSVE_AES, HasSVE2p1_or_StreamingSME2],
SME2p1Unsupported.F);
}
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 0f3f24f0853c9..9d57711e84cde 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -249,22 +249,23 @@ def HasSVE_or_SME
AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME),
"sve or sme">;
def HasNonStreamingSVE_or_SME2p2
- : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+ : Predicate<"Subtarget->isSVEAvailable() ||"
"(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2p2())">,
AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME2p2),
"sve or sme2p2">;
def HasNonStreamingSVE_or_SSVE_FEXPA
- : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+ : Predicate<"Subtarget->isSVEAvailable() ||"
"(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSSVE_FEXPA())">,
AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSSVE_FEXPA),
"sve or ssve-fexpa">;
def HasSVE2_or_SME
- : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+ : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2() || Subtarget->hasSME())">,
AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME),
"sve2 or sme">;
-def HasSVE2_or_SME2
- : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+def HasNonStreamingSVE2_or_SME2
+ : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE2()) ||"
+ "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2())">,
AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME2),
"sve2 or sme2">;
def HasNonStreamingSVE2_or_SSVE_AES
@@ -274,17 +275,22 @@ def HasNonStreamingSVE2_or_SSVE_AES
"sve2 or ssve-aes">;
def HasSVE2p1_or_SME
- : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+ : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME())">,
AssemblerPredicateWithAll<(any_of FeatureSME, FeatureSVE2p1),
"sme or sve2p1">;
def HasSVE2p1_or_SME2
- : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+ : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2())">,
AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
"sme2 or sve2p1">;
def HasSVE2p1_or_SME2p1
- : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2p1())">,
+ : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2p1())">,
AssemblerPredicateWithAll<(any_of FeatureSME2p1, FeatureSVE2p1),
"sme2p1 or sve2p1">;
+def HasSVE2p1_or_StreamingSME2
+ : Predicate<"(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSVE2p1()) ||"
+ "(Subtarget->isStreaming() && Subtarget->hasSME2())">,
+ AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
+ "sme2 or sve2p1">;
def HasSVE2p2_or_SME2p2
: Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p2() || Subtarget->hasSME2p2())">,
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 2360e30de63b0..7628acb3f9d90 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -4154,11 +4154,6 @@ defm UDOT_ZZZ_HtoS : sve2p1_two_way_dot_vv<"udot", 0b1, int_aarch64_sve_udot_x2
defm SDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"sdot", 0b0, int_aarch64_sve_sdot_lane_x2>;
defm UDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"udot", 0b1, int_aarch64_sve_udot_lane_x2>;
-defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
-defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
-defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
-defm PTRUE_C : sve2p1_ptrue_pn<"ptrue">;
-
defm SQCVTN_Z2Z_StoH : sve2p1_multi_vec_extract_narrow<"sqcvtn", 0b00, int_aarch64_sve_sqcvtn_x2>;
defm UQCVTN_Z2Z_StoH : sve2p1_multi_vec_extract_narrow<"uqcvtn", 0b01, int_aarch64_sve_uqcvtn_x2>;
defm SQCVTUN_Z2Z_StoH : sve2p1_multi_vec_extract_narrow<"sqcvtun", 0b10, int_aarch64_sve_sqcvtun_x2>;
@@ -4166,6 +4161,22 @@ defm SQRSHRN_Z2ZI_StoH : sve2p1_multi_vec_shift_narrow<"sqrshrn", 0b101, int_aa
defm UQRSHRN_Z2ZI_StoH : sve2p1_multi_vec_shift_narrow<"uqrshrn", 0b111, int_aarch64_sve_uqrshrn_x2>;
defm SQRSHRUN_Z2ZI_StoH : sve2p1_multi_vec_shift_narrow<"sqrshrun", 0b001, int_aarch64_sve_sqrshrun_x2>;
+defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
+defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
+defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
+defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
+defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
+defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
+defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
+defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
+} // End HasSVE2p1_or_SME2
+
+let Predicates = [HasSVE2p1_or_StreamingSME2] in {
+defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
+defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
+defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
+defm PTRUE_C : sve2p1_ptrue_pn<"ptrue">;
+
// Load to two registers
defm LD1B_2Z : sve2p1_mem_cld_ss_2z<"ld1b", 0b00, 0b0, ZZ_b_mul_r, GPR64shifted8, ZZ_b_strided_and_contiguous>;
defm LD1H_2Z : sve2p1_mem_cld_ss_2z<"ld1h", 0b01, 0b0, ZZ_h_mul_r, GPR64shifted16, ZZ_h_strided_and_contiguous>;
@@ -4289,14 +4300,6 @@ defm : store_pn_x4<nxv8bf16, int_aarch64_sve_stnt1_pn_x4, STNT1H_4Z_IMM>;
defm : store_pn_x4<nxv4f32, int_aarch64_sve_stnt1_pn_x4, STNT1W_4Z_IMM>;
defm : store_pn_x4<nxv2f64, int_aarch64_sve_stnt1_pn_x4, STNT1D_4Z_IMM>;
-defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
-defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
-defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
-defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
-defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
-defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
-defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
-defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
defm WHILEGE_CXX : sve2p1_int_while_rr_pn<"whilege", 0b000>;
defm WHILEGT_CXX : sve2p1_int_while_rr_pn<"whilegt", 0b001>;
defm WHILELT_CXX : sve2p1_int_while_rr_pn<"whilelt", 0b010>;
@@ -4305,7 +4308,7 @@ defm WHILEHS_CXX : sve2p1_int_while_rr_pn<"whilehs", 0b100>;
defm WHILEHI_CXX : sve2p1_int_while_rr_pn<"whilehi", 0b101>;
defm WHILELO_CXX : sve2p1_int_while_rr_pn<"whilelo", 0b110>;
defm WHILELS_CXX : sve2p1_int_while_rr_pn<"whilels", 0b111>;
-} // End HasSVE2p1_or_SME2
+} // End HasSVE2p1_or_StreamingSME2
let Predicates = [HasSVE_or_SME] in {
@@ -4510,7 +4513,7 @@ let Predicates = [HasNonStreamingSVE2p2_or_SME2p2] in {
//===----------------------------------------------------------------------===//
// SVE2 FP8 instructions
//===----------------------------------------------------------------------===//
-let Predicates = [HasSVE2_or_SME2, HasFP8] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFP8] in {
// FP8 upconvert
defm F1CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b00, "f1cvt", nxv8f16, int_aarch64_sve_fp8_cvt1>;
defm F2CVT_ZZ : sve2_fp8_cvt_single<0b0, 0b01, "f2cvt", nxv8f16, int_aarch64_sve_fp8_cvt2>;
@@ -4527,15 +4530,15 @@ defm FCVTNB_Z2Z_StoB : sve2_fp8_down_cvt_single<0b01, "fcvtnb", ZZ_s_mul_r, nxv4
defm BFCVTN_Z2Z_HtoB : sve2_fp8_down_cvt_single<0b10, "bfcvtn", ZZ_h_mul_r, nxv8bf16, int_aarch64_sve_fp8_cvtn>;
defm FCVTNT_Z2Z_StoB : sve2_fp8_down_cvt_single_top<0b11, "fcvtnt", ZZ_s_mul_r, nxv4f32, int_aarch64_sve_fp8_cvtnt>;
-} // End HasSVE2_or_SME2, HasFP8
+} // End HasNonStreamingSVE2_or_SME2, HasFP8
-let Predicates = [HasSVE2_or_SME2, HasFAMINMAX] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFAMINMAX] in {
defm FAMIN_ZPmZ : sve_fp_2op_p_zds<0b1111, "famin", "FAMIN_ZPZZ", int_aarch64_sve_famin, DestructiveBinaryComm>;
defm FAMAX_ZPmZ : sve_fp_2op_p_zds<0b1110, "famax", "FAMAX_ZPZZ", int_aarch64_sve_famax, DestructiveBinaryComm>;
defm FAMAX_ZPZZ : sve_fp_bin_pred_hfd<AArch64famax_p>;
defm FAMIN_ZPZZ : sve_fp_bin_pred_hfd<AArch64famin_p>;
-} // End HasSVE2_or_SME2, HasFAMINMAX
+} // End HasNonStreamingSVE2_or_SME2, HasFAMINMAX
let Predicates = [HasSSVE_FP8FMA] in {
// FP8 Widening Multiply-Add Long - Indexed Group
@@ -4579,14 +4582,14 @@ defm FDOT_ZZZI_BtoS : sve2_fp8_dot_indexed_s<"fdot", int_aarch64_sve_fp8_fdot_la
defm FDOT_ZZZ_BtoS : sve_fp8_dot<0b1, ZPR32, "fdot", nxv4f32, int_aarch64_sve_fp8_fdot>;
}
-let Predicates = [HasSVE2_or_SME2, HasLUT] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasLUT] in {
// LUTI2
defm LUTI2_ZZZI : sve2_luti2_vector_index<"luti2">;
// LUTI4
defm LUTI4_ZZZI : sve2_luti4_vector_index<"luti4">;
// LUTI4 (two contiguous registers)
defm LUTI4_Z2ZZI : sve2_luti4_vector_vg2_index<"luti4">;
-} // End HasSVE2_or_SME2, HasLUT
+} // End HasNonStreamingSVE2_or_SME2, HasLUT
//===----------------------------------------------------------------------===//
// Checked Pointer Arithmetic (FEAT_CPA)
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
index a1093c28467ab..de9811b92424e 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
index 2ffba10e21100..e42f2b1cfba48 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
index 021d4855905e7..fb837c4279f6e 100644
--- a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
+++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
@@ -1,6 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
-; RUN: llc -mtriple=aarch64-linux-gnu -force-streaming -mattr=+ssve-fexpa < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+ssve-fexpa -force-streaming < %s | FileCheck %s
define <vscale x 8 x half> @fexpa_h(<vscale x 8 x i16> %a) {
; CHECK-LABEL: fexpa_h:
@@ -27,4 +27,4 @@ define <vscale x 2 x double> @fexpa_d(<vscale x 2 x i1> %pg, <vscale x 2 x i64>
; CHECK-NEXT: ret
%out = call <vscale x 2 x double> @llvm.aarch64.sve.fexpa.x.nxv2f64(<vscale x 2 x i64> %a)
ret <vscale x 2 x double> %out
-}
\ No newline at end of file
+}
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
index e2483cff3d186..1ccb5264aa837 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
; ADDHNB
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
index 2f7b82751cdcf..2b2c9da1a3063 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
index 8d395adda0799..3a2a02f80a58b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
index 6005fb69ae1ba..27416375ad6af 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
index fc08f2cdf94a9..317dea251937a 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
@@ -1,6 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2-aes < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2,+sve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+ssve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme,+ssve-aes -force-streaming < %s | FileCheck %s
;
; AESD
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
index 7d16f8383d968..9be3a67f88b09 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2 < %s | FileCheck %s
; RUN: llc -mattr=+sme2 -force-streaming < %s | FileCheck %s
target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
index 16041766605e9..1ce7564167992 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
index 52c04d614b4e1..d9a3a28ff9aa8 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
index 6dc2c67b5fd9e..70824f88caef5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
index c46016e0c40de..c78a872fcfbf5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
index 5cea7536e1f3c..8e53a82401e0b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
@@ -1,5 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -...
[truncated]
|
Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions where not available unless a matching SVE feature was enabled. As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient. This PR seperates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]". NOTE: In some instances this means the compiler will support unsupported configurations, which is fine. NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.
9f98094
to
ec57179
Compare
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I'm not an expert in this area. Just a couple of typos in the commit message: "meant some SME instructions where not available" and "seperate".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not gone through all of the instructions to make sure their predicates are (still) correct, but this does indeed look more in the spirit of the specification.
AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME), | ||
"sve2 or sme">; | ||
def HasSVE2_or_SME2 | ||
: Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">, | ||
def HasNonStreamingSVE2_or_SME2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe move these above HasSVE2_or_SME
to bundle them together with the other HasNonStreamingSVE*_or_SME*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to order the combined predicates based on the required SVE feature so that, for example, all SVE2 features are kept together.
Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions were not available unless a matching SVE feature was enabled.
As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient.
This PR separates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]".
NOTE: In some instances this means the compiler will support unsupported configurations, which is fine.
NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.