[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

paulwalker-arm · 2025-06-23T12:49:44Z

Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions were not available unless a matching SVE feature was enabled.

As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient.

This PR separates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]".

NOTE: In some instances this means the compiler will support unsupported configurations, which is fine.

NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.

llvmbot · 2025-06-24T17:40:42Z

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions where not available unless a matching SVE feature was enabled.

As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient.

This PR seperates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]".

NOTE: In some instances this means the compiler will support unsupported configurations, which is fine.

NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.

Patch is 45.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145322.diff

50 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64.td (+4-4)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+14-8)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+23-20)
(modified) llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll (+1)
(modified) llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll (+3-1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-non-widening-pairwise-arith.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-polynomial-arithmetic-128.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-polynomial-arithmetic.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-psel.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-revd.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-unary-narrowing.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-complex-arith.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-while-reversed.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-while.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-complex-int-arith.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-dsp.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-pairwise-arith.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmlsl.ll (+4-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-cntp.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-dots.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-dupq.ll (+4-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-extq.ll (+4-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fclamp.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fp-reduce.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-int-reduce.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-loads.ll (+3-2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-multivec-loads.ll (+4-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-multivec-stores.ll (+4-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-pmov-to-pred.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-pmov-to-vector.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-predicate-as-counter.ll (+1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-qcvtn.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-qrshr.ll (+3)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-sclamp.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-stores.ll (+3-2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-uclamp.ll (+2)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-while-pn.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-while-pp.ll (+3)

diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index eb5a5199b8951..9634937991860 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -58,11 +58,11 @@ include "AArch64SystemOperands.td"
 
 class AArch64Unsupported { list<Predicate> F; }
 
-let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_SME2p1] in
+let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_StreamingSME2, HasSVE2p1_or_SME2p1] in
 def SVE2p1Unsupported : AArch64Unsupported;
 
 def SVE2Unsupported : AArch64Unsupported {
-  let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
+  let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasNonStreamingSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
                        HasSMEF8F32, HasSVEAES, HasSVE2SHA3, HasSVE2SM4, HasSVEBitPerm,
                        HasSVEB16B16],
                        SVE2p1Unsupported.F);
@@ -85,9 +85,9 @@ def SME2p1Unsupported : AArch64Unsupported {
 }
 
 def SME2Unsupported : AArch64Unsupported {
-  let F = !listconcat([HasSME2, HasSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
+  let F = !listconcat([HasSME2, HasNonStreamingSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
                       HasSMEF8F16, HasSMEF8F32, HasSMEF16F16_or_SMEF8F16, HasSMEB16B16,
-                      HasNonStreamingSVE2_or_SSVE_AES],
+                      HasNonStreamingSVE2_or_SSVE_AES, HasSVE2p1_or_StreamingSME2],
                       SME2p1Unsupported.F);
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 0f3f24f0853c9..9d57711e84cde 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -249,22 +249,23 @@ def HasSVE_or_SME
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME),
                 "sve or sme">;
 def HasNonStreamingSVE_or_SME2p2
-    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+    : Predicate<"Subtarget->isSVEAvailable() ||"
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2p2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME2p2),
                 "sve or sme2p2">;
 def HasNonStreamingSVE_or_SSVE_FEXPA
-    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+    : Predicate<"Subtarget->isSVEAvailable() ||"
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSSVE_FEXPA())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSSVE_FEXPA),
                 "sve or ssve-fexpa">;
 
 def HasSVE2_or_SME
-    : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2() || Subtarget->hasSME())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME),
                 "sve2 or sme">;
-def HasSVE2_or_SME2
-    : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+def HasNonStreamingSVE2_or_SME2
+    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE2()) ||"
+                "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME2),
                 "sve2 or sme2">;
 def HasNonStreamingSVE2_or_SSVE_AES
@@ -274,17 +275,22 @@ def HasNonStreamingSVE2_or_SSVE_AES
                 "sve2 or ssve-aes">;
 
 def HasSVE2p1_or_SME
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME, FeatureSVE2p1),
                 "sme or sve2p1">;
 def HasSVE2p1_or_SME2
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
                 "sme2 or sve2p1">;
 def HasSVE2p1_or_SME2p1
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2p1())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2p1())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME2p1, FeatureSVE2p1),
                 "sme2p1 or sve2p1">;
+def HasSVE2p1_or_StreamingSME2
+    : Predicate<"(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSVE2p1()) ||"
+                "(Subtarget->isStreaming() && Subtarget->hasSME2())">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
+                "sme2 or sve2p1">;
 
 def HasSVE2p2_or_SME2p2
     : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p2() || Subtarget->hasSME2p2())">,
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 2360e30de63b0..7628acb3f9d90 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -4154,11 +4154,6 @@ defm UDOT_ZZZ_HtoS  : sve2p1_two_way_dot_vv<"udot", 0b1, int_aarch64_sve_udot_x2
 defm SDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"sdot", 0b0, int_aarch64_sve_sdot_lane_x2>;
 defm UDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"udot", 0b1, int_aarch64_sve_udot_lane_x2>;
 
-defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
-defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
-defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
-defm PTRUE_C  : sve2p1_ptrue_pn<"ptrue">;
-
 defm SQCVTN_Z2Z_StoH  : sve2p1_multi_vec_extract_narrow<"sqcvtn", 0b00, int_aarch64_sve_sqcvtn_x2>;
 defm UQCVTN_Z2Z_StoH  : sve2p1_multi_vec_extract_narrow<"uqcvtn", 0b01, int_aarch64_sve_uqcvtn_x2>;
 defm SQCVTUN_Z2Z_StoH : sve2p1_multi_vec_extract_narrow<"sqcvtun", 0b10, int_aarch64_sve_sqcvtun_x2>;
@@ -4166,6 +4161,22 @@ defm SQRSHRN_Z2ZI_StoH  : sve2p1_multi_vec_shift_narrow<"sqrshrn", 0b101, int_aa
 defm UQRSHRN_Z2ZI_StoH  : sve2p1_multi_vec_shift_narrow<"uqrshrn", 0b111, int_aarch64_sve_uqrshrn_x2>;
 defm SQRSHRUN_Z2ZI_StoH : sve2p1_multi_vec_shift_narrow<"sqrshrun", 0b001, int_aarch64_sve_sqrshrun_x2>;
 
+defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
+defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
+defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
+defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
+defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
+defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
+defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
+defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
+} // End HasSVE2p1_or_SME2
+
+let Predicates = [HasSVE2p1_or_StreamingSME2] in {
+defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
+defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
+defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
+defm PTRUE_C  : sve2p1_ptrue_pn<"ptrue">;
+
 // Load to two registers
 defm LD1B_2Z       : sve2p1_mem_cld_ss_2z<"ld1b", 0b00, 0b0, ZZ_b_mul_r, GPR64shifted8, ZZ_b_strided_and_contiguous>;
 defm LD1H_2Z       : sve2p1_mem_cld_ss_2z<"ld1h", 0b01, 0b0, ZZ_h_mul_r, GPR64shifted16, ZZ_h_strided_and_contiguous>;
@@ -4289,14 +4300,6 @@ defm : store_pn_x4<nxv8bf16, int_aarch64_sve_stnt1_pn_x4, STNT1H_4Z_IMM>;
 defm : store_pn_x4<nxv4f32, int_aarch64_sve_stnt1_pn_x4, STNT1W_4Z_IMM>;
 defm : store_pn_x4<nxv2f64, int_aarch64_sve_stnt1_pn_x4, STNT1D_4Z_IMM>;
 
-defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
-defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
-defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
-defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
-defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
-defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
-defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
-defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
 defm WHILEGE_CXX  : sve2p1_int_while_rr_pn<"whilege", 0b000>;
 defm WHILEGT_CXX  : sve2p1_int_while_rr_pn<"whilegt", 0b001>;
 defm WHILELT_CXX  : sve2p1_int_while_rr_pn<"whilelt", 0b010>;
@@ -4305,7 +4308,7 @@ defm WHILEHS_CXX  : sve2p1_int_while_rr_pn<"whilehs", 0b100>;
 defm WHILEHI_CXX  : sve2p1_int_while_rr_pn<"whilehi", 0b101>;
 defm WHILELO_CXX  : sve2p1_int_while_rr_pn<"whilelo", 0b110>;
 defm WHILELS_CXX  : sve2p1_int_while_rr_pn<"whilels", 0b111>;
-} // End HasSVE2p1_or_SME2
+} // End HasSVE2p1_or_StreamingSME2
 
 let Predicates = [HasSVE_or_SME] in {
 
@@ -4510,7 +4513,7 @@ let Predicates = [HasNonStreamingSVE2p2_or_SME2p2] in {
 //===----------------------------------------------------------------------===//
 // SVE2 FP8 instructions
 //===----------------------------------------------------------------------===//
-let Predicates = [HasSVE2_or_SME2, HasFP8] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFP8] in {
 // FP8 upconvert
 defm F1CVT_ZZ     : sve2_fp8_cvt_single<0b0, 0b00, "f1cvt",    nxv8f16,  int_aarch64_sve_fp8_cvt1>;
 defm F2CVT_ZZ     : sve2_fp8_cvt_single<0b0, 0b01, "f2cvt",    nxv8f16,  int_aarch64_sve_fp8_cvt2>;
@@ -4527,15 +4530,15 @@ defm FCVTNB_Z2Z_StoB : sve2_fp8_down_cvt_single<0b01, "fcvtnb", ZZ_s_mul_r, nxv4
 defm BFCVTN_Z2Z_HtoB : sve2_fp8_down_cvt_single<0b10, "bfcvtn", ZZ_h_mul_r, nxv8bf16, int_aarch64_sve_fp8_cvtn>;
 
 defm FCVTNT_Z2Z_StoB : sve2_fp8_down_cvt_single_top<0b11, "fcvtnt", ZZ_s_mul_r, nxv4f32,  int_aarch64_sve_fp8_cvtnt>;
-} // End HasSVE2_or_SME2, HasFP8
+} // End HasNonStreamingSVE2_or_SME2, HasFP8
 
-let Predicates = [HasSVE2_or_SME2, HasFAMINMAX] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFAMINMAX] in {
 defm FAMIN_ZPmZ : sve_fp_2op_p_zds<0b1111, "famin", "FAMIN_ZPZZ", int_aarch64_sve_famin, DestructiveBinaryComm>;
 defm FAMAX_ZPmZ : sve_fp_2op_p_zds<0b1110, "famax", "FAMAX_ZPZZ", int_aarch64_sve_famax, DestructiveBinaryComm>;
 
 defm FAMAX_ZPZZ : sve_fp_bin_pred_hfd<AArch64famax_p>;
 defm FAMIN_ZPZZ : sve_fp_bin_pred_hfd<AArch64famin_p>;
-} // End HasSVE2_or_SME2, HasFAMINMAX
+} // End HasNonStreamingSVE2_or_SME2, HasFAMINMAX
 
 let Predicates = [HasSSVE_FP8FMA] in {
 // FP8 Widening Multiply-Add Long - Indexed Group
@@ -4579,14 +4582,14 @@ defm FDOT_ZZZI_BtoS : sve2_fp8_dot_indexed_s<"fdot", int_aarch64_sve_fp8_fdot_la
 defm FDOT_ZZZ_BtoS : sve_fp8_dot<0b1, ZPR32, "fdot", nxv4f32, int_aarch64_sve_fp8_fdot>;
 }
 
-let Predicates = [HasSVE2_or_SME2, HasLUT] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasLUT] in {
 // LUTI2
   defm LUTI2_ZZZI : sve2_luti2_vector_index<"luti2">;
 // LUTI4
   defm LUTI4_ZZZI   : sve2_luti4_vector_index<"luti4">;
 // LUTI4 (two contiguous registers)
   defm LUTI4_Z2ZZI  : sve2_luti4_vector_vg2_index<"luti4">;
-} // End HasSVE2_or_SME2, HasLUT
+} // End HasNonStreamingSVE2_or_SME2, HasLUT
 
 //===----------------------------------------------------------------------===//
 // Checked Pointer Arithmetic (FEAT_CPA)
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
index a1093c28467ab..de9811b92424e 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
index 2ffba10e21100..e42f2b1cfba48 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
index 021d4855905e7..fb837c4279f6e 100644
--- a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
+++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
-; RUN: llc -mtriple=aarch64-linux-gnu -force-streaming -mattr=+ssve-fexpa < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+ssve-fexpa -force-streaming < %s | FileCheck %s
 
 define <vscale x 8 x half> @fexpa_h(<vscale x 8 x i16> %a) {
 ; CHECK-LABEL: fexpa_h:
@@ -27,4 +27,4 @@ define <vscale x 2 x double> @fexpa_d(<vscale x 2 x i1> %pg, <vscale x 2 x i64>
 ; CHECK-NEXT:    ret
   %out = call <vscale x 2 x double> @llvm.aarch64.sve.fexpa.x.nxv2f64(<vscale x 2 x i64> %a)
   ret <vscale x 2 x double> %out
-}
\ No newline at end of file
+}
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
index e2483cff3d186..1ccb5264aa837 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ; ADDHNB
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
index 2f7b82751cdcf..2b2c9da1a3063 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
index 8d395adda0799..3a2a02f80a58b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
index 6005fb69ae1ba..27416375ad6af 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
index fc08f2cdf94a9..317dea251937a 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2-aes < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2,+sve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+ssve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme,+ssve-aes -force-streaming < %s | FileCheck %s
 
 ;
 ; AESD
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
index 7d16f8383d968..9be3a67f88b09 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2 -force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
index 16041766605e9..1ce7564167992 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
index 52c04d614b4e1..d9a3a28ff9aa8 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
index 6dc2c67b5fd9e..70824f88caef5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
index c46016e0c40de..c78a872fcfbf5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
index 5cea7536e1f3c..8e53a82401e0b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
@@ -1,5 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -...
[truncated]

Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions where not available unless a matching SVE feature was enabled. As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient. This PR seperates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]". NOTE: In some instances this means the compiler will support unsupported configurations, which is fine. NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.

paulwalker-arm · 2025-07-01T09:50:32Z

ping

jthackray

LGTM, although I'm not an expert in this area. Just a couple of typos in the commit message: "meant some SME instructions where not available" and "seperate".

sdesmalen-arm

I've not gone through all of the instructions to make sure their predicates are (still) correct, but this does indeed look more in the spirit of the specification.

sdesmalen-arm · 2025-07-01T10:24:15Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td

                AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME),
                "sve2 or sme">;
-def HasSVE2_or_SME2
-    : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+def HasNonStreamingSVE2_or_SME2


nit: maybe move these above HasSVE2_or_SME to bundle them together with the other HasNonStreamingSVE*_or_SME* ?

I've tried to order the combined predicates based on the required SVE feature so that, for example, all SVE2 features are kept together.

paulwalker-arm mentioned this pull request Jun 23, 2025

[AArch64] Fix predicates for SME2p2. #145315

Closed

sdesmalen-arm mentioned this pull request Jun 23, 2025

[AArch64][CostModel] Lower cost of dupq (SVE2.1) #144918

Merged

paulwalker-arm force-pushed the sve-feature-flags branch from 377be5e to 9f98094 Compare June 24, 2025 16:21

paulwalker-arm changed the title ~~[WIP] fix up feature flags~~ [LLVM][AArch64] Relax SVE/SME codegen predicates. Jun 24, 2025

paulwalker-arm requested review from jthackray, sdesmalen-arm, Lukacma and CarolineConcatto June 24, 2025 17:39

paulwalker-arm marked this pull request as ready for review June 24, 2025 17:40

llvmbot added the backend:AArch64 label Jun 24, 2025

paulwalker-arm force-pushed the sve-feature-flags branch from 9f98094 to ec57179 Compare June 26, 2025 12:31

jthackray approved these changes Jul 1, 2025

View reviewed changes

sdesmalen-arm approved these changes Jul 1, 2025

View reviewed changes

paulwalker-arm merged commit 7cc8fe2 into llvm:main Jul 2, 2025
7 checks passed

paulwalker-arm deleted the sve-feature-flags branch July 2, 2025 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

Uh oh!

paulwalker-arm commented Jun 23, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 24, 2025

Uh oh!

paulwalker-arm commented Jul 1, 2025

Uh oh!

jthackray left a comment

Uh oh!

sdesmalen-arm left a comment

Uh oh!

sdesmalen-arm Jul 1, 2025

Uh oh!

paulwalker-arm Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

Uh oh!

Conversation

paulwalker-arm commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 24, 2025

Uh oh!

paulwalker-arm commented Jul 1, 2025

Uh oh!

jthackray left a comment

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paulwalker-arm commented Jun 23, 2025 •

edited

Loading