Skip to content

[LLVM][AArch64] Relax SVE/SME codegen predicates. #145322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2025

Conversation

paulwalker-arm
Copy link
Collaborator

@paulwalker-arm paulwalker-arm commented Jun 23, 2025

Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions were not available unless a matching SVE feature was enabled.

As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient.

This PR separates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]".

NOTE: In some instances this means the compiler will support unsupported configurations, which is fine.

NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.

@paulwalker-arm paulwalker-arm changed the title [WIP] fix up feature flags [LLVM][AArch64] Relax SVE/SME codegen predicates. Jun 24, 2025
@paulwalker-arm paulwalker-arm marked this pull request as ready for review June 24, 2025 17:40
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

Code generation predicates like HasSVE2_or_SME implemented a strict divide between streaming and non-streaming which meant some SME instructions where not available unless a matching SVE feature was enabled.

As a specific example, in order to enable multi-register WHILE instructions in non-streaming mode a user must enable "sve2p1" when using "sme2" should be sufficient.

This PR seperates the streaming/non-streaming requirement from a features's SVE/SME designation, which in most cases means "+sveX[pY]" and "+sve,+smeV[pW]" can be used interchangeable when an instruction is available via "+sveX[pY]" or "+smeV[pW]".

NOTE: In some instances this means the compiler will support unsupported configurations, which is fine.

NOTE: This PR does not fix all the predicates as I plan to follow up with other PRs to relax the crypto, bitperm and fp8 features.


Patch is 45.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145322.diff

50 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64.td (+4-4)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+14-8)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+23-20)
  • (modified) llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll (+3-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-non-widening-pairwise-arith.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-polynomial-arithmetic-128.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-polynomial-arithmetic.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-psel.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-revd.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-unary-narrowing.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-complex-arith.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-while-reversed.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-while.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-complex-int-arith.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-dsp.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-widening-pairwise-arith.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmlsl.ll (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-cntp.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-dots.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-dupq.ll (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-extq.ll (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fclamp.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fp-reduce.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-int-reduce.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-loads.ll (+3-2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-multivec-loads.ll (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-multivec-stores.ll (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-pmov-to-pred.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-pmov-to-vector.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-predicate-as-counter.ll (+1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-qcvtn.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-qrshr.ll (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-sclamp.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-stores.ll (+3-2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-uclamp.ll (+2)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-while-pn.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-while-pp.ll (+3)
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index eb5a5199b8951..9634937991860 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -58,11 +58,11 @@ include "AArch64SystemOperands.td"
 
 class AArch64Unsupported { list<Predicate> F; }
 
-let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_SME2p1] in
+let F = [HasSVE2p1, HasSVE2p1_or_SME2, HasSVE2p1_or_StreamingSME2, HasSVE2p1_or_SME2p1] in
 def SVE2p1Unsupported : AArch64Unsupported;
 
 def SVE2Unsupported : AArch64Unsupported {
-  let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
+  let F = !listconcat([HasSVE2, HasSVE2_or_SME, HasNonStreamingSVE2_or_SME2, HasSSVE_FP8FMA, HasSMEF8F16,
                        HasSMEF8F32, HasSVEAES, HasSVE2SHA3, HasSVE2SM4, HasSVEBitPerm,
                        HasSVEB16B16],
                        SVE2p1Unsupported.F);
@@ -85,9 +85,9 @@ def SME2p1Unsupported : AArch64Unsupported {
 }
 
 def SME2Unsupported : AArch64Unsupported {
-  let F = !listconcat([HasSME2, HasSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
+  let F = !listconcat([HasSME2, HasNonStreamingSVE2_or_SME2, HasSVE2p1_or_SME2, HasSSVE_FP8FMA,
                       HasSMEF8F16, HasSMEF8F32, HasSMEF16F16_or_SMEF8F16, HasSMEB16B16,
-                      HasNonStreamingSVE2_or_SSVE_AES],
+                      HasNonStreamingSVE2_or_SSVE_AES, HasSVE2p1_or_StreamingSME2],
                       SME2p1Unsupported.F);
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 0f3f24f0853c9..9d57711e84cde 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -249,22 +249,23 @@ def HasSVE_or_SME
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME),
                 "sve or sme">;
 def HasNonStreamingSVE_or_SME2p2
-    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+    : Predicate<"Subtarget->isSVEAvailable() ||"
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2p2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSME2p2),
                 "sve or sme2p2">;
 def HasNonStreamingSVE_or_SSVE_FEXPA
-    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE()) ||"
+    : Predicate<"Subtarget->isSVEAvailable() ||"
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSSVE_FEXPA())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE, FeatureSSVE_FEXPA),
                 "sve or ssve-fexpa">;
 
 def HasSVE2_or_SME
-    : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2() || Subtarget->hasSME())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME),
                 "sve2 or sme">;
-def HasSVE2_or_SME2
-    : Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+def HasNonStreamingSVE2_or_SME2
+    : Predicate<"(Subtarget->isSVEAvailable() && Subtarget->hasSVE2()) ||"
+                "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME2),
                 "sve2 or sme2">;
 def HasNonStreamingSVE2_or_SSVE_AES
@@ -274,17 +275,22 @@ def HasNonStreamingSVE2_or_SSVE_AES
                 "sve2 or ssve-aes">;
 
 def HasSVE2p1_or_SME
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME, FeatureSVE2p1),
                 "sme or sve2p1">;
 def HasSVE2p1_or_SME2
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
                 "sme2 or sve2p1">;
 def HasSVE2p1_or_SME2p1
-    : Predicate<"Subtarget->hasSVE2p1() || (Subtarget->isStreaming() && Subtarget->hasSME2p1())">,
+    : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p1() || Subtarget->hasSME2p1())">,
                 AssemblerPredicateWithAll<(any_of FeatureSME2p1, FeatureSVE2p1),
                 "sme2p1 or sve2p1">;
+def HasSVE2p1_or_StreamingSME2
+    : Predicate<"(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSVE2p1()) ||"
+                "(Subtarget->isStreaming() && Subtarget->hasSME2())">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2, FeatureSVE2p1),
+                "sme2 or sve2p1">;
 
 def HasSVE2p2_or_SME2p2
     : Predicate<"Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p2() || Subtarget->hasSME2p2())">,
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 2360e30de63b0..7628acb3f9d90 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -4154,11 +4154,6 @@ defm UDOT_ZZZ_HtoS  : sve2p1_two_way_dot_vv<"udot", 0b1, int_aarch64_sve_udot_x2
 defm SDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"sdot", 0b0, int_aarch64_sve_sdot_lane_x2>;
 defm UDOT_ZZZI_HtoS : sve2p1_two_way_dot_vvi<"udot", 0b1, int_aarch64_sve_udot_lane_x2>;
 
-defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
-defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
-defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
-defm PTRUE_C  : sve2p1_ptrue_pn<"ptrue">;
-
 defm SQCVTN_Z2Z_StoH  : sve2p1_multi_vec_extract_narrow<"sqcvtn", 0b00, int_aarch64_sve_sqcvtn_x2>;
 defm UQCVTN_Z2Z_StoH  : sve2p1_multi_vec_extract_narrow<"uqcvtn", 0b01, int_aarch64_sve_uqcvtn_x2>;
 defm SQCVTUN_Z2Z_StoH : sve2p1_multi_vec_extract_narrow<"sqcvtun", 0b10, int_aarch64_sve_sqcvtun_x2>;
@@ -4166,6 +4161,22 @@ defm SQRSHRN_Z2ZI_StoH  : sve2p1_multi_vec_shift_narrow<"sqrshrn", 0b101, int_aa
 defm UQRSHRN_Z2ZI_StoH  : sve2p1_multi_vec_shift_narrow<"uqrshrn", 0b111, int_aarch64_sve_uqrshrn_x2>;
 defm SQRSHRUN_Z2ZI_StoH : sve2p1_multi_vec_shift_narrow<"sqrshrun", 0b001, int_aarch64_sve_sqrshrun_x2>;
 
+defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
+defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
+defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
+defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
+defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
+defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
+defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
+defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
+} // End HasSVE2p1_or_SME2
+
+let Predicates = [HasSVE2p1_or_StreamingSME2] in {
+defm CNTP_XCI : sve2p1_pcount_pn<"cntp", 0b000>;
+defm PEXT_PCI : sve2p1_pred_as_ctr_to_mask<"pext", int_aarch64_sve_pext>;
+defm PEXT_2PCI : sve2p1_pred_as_ctr_to_mask_pair<"pext">;
+defm PTRUE_C  : sve2p1_ptrue_pn<"ptrue">;
+
 // Load to two registers
 defm LD1B_2Z       : sve2p1_mem_cld_ss_2z<"ld1b", 0b00, 0b0, ZZ_b_mul_r, GPR64shifted8, ZZ_b_strided_and_contiguous>;
 defm LD1H_2Z       : sve2p1_mem_cld_ss_2z<"ld1h", 0b01, 0b0, ZZ_h_mul_r, GPR64shifted16, ZZ_h_strided_and_contiguous>;
@@ -4289,14 +4300,6 @@ defm : store_pn_x4<nxv8bf16, int_aarch64_sve_stnt1_pn_x4, STNT1H_4Z_IMM>;
 defm : store_pn_x4<nxv4f32, int_aarch64_sve_stnt1_pn_x4, STNT1W_4Z_IMM>;
 defm : store_pn_x4<nxv2f64, int_aarch64_sve_stnt1_pn_x4, STNT1D_4Z_IMM>;
 
-defm WHILEGE_2PXX : sve2p1_int_while_rr_pair<"whilege", 0b000>;
-defm WHILEGT_2PXX : sve2p1_int_while_rr_pair<"whilegt", 0b001>;
-defm WHILELT_2PXX : sve2p1_int_while_rr_pair<"whilelt", 0b010>;
-defm WHILELE_2PXX : sve2p1_int_while_rr_pair<"whilele", 0b011>;
-defm WHILEHS_2PXX : sve2p1_int_while_rr_pair<"whilehs", 0b100>;
-defm WHILEHI_2PXX : sve2p1_int_while_rr_pair<"whilehi", 0b101>;
-defm WHILELO_2PXX : sve2p1_int_while_rr_pair<"whilelo", 0b110>;
-defm WHILELS_2PXX : sve2p1_int_while_rr_pair<"whilels", 0b111>;
 defm WHILEGE_CXX  : sve2p1_int_while_rr_pn<"whilege", 0b000>;
 defm WHILEGT_CXX  : sve2p1_int_while_rr_pn<"whilegt", 0b001>;
 defm WHILELT_CXX  : sve2p1_int_while_rr_pn<"whilelt", 0b010>;
@@ -4305,7 +4308,7 @@ defm WHILEHS_CXX  : sve2p1_int_while_rr_pn<"whilehs", 0b100>;
 defm WHILEHI_CXX  : sve2p1_int_while_rr_pn<"whilehi", 0b101>;
 defm WHILELO_CXX  : sve2p1_int_while_rr_pn<"whilelo", 0b110>;
 defm WHILELS_CXX  : sve2p1_int_while_rr_pn<"whilels", 0b111>;
-} // End HasSVE2p1_or_SME2
+} // End HasSVE2p1_or_StreamingSME2
 
 let Predicates = [HasSVE_or_SME] in {
 
@@ -4510,7 +4513,7 @@ let Predicates = [HasNonStreamingSVE2p2_or_SME2p2] in {
 //===----------------------------------------------------------------------===//
 // SVE2 FP8 instructions
 //===----------------------------------------------------------------------===//
-let Predicates = [HasSVE2_or_SME2, HasFP8] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFP8] in {
 // FP8 upconvert
 defm F1CVT_ZZ     : sve2_fp8_cvt_single<0b0, 0b00, "f1cvt",    nxv8f16,  int_aarch64_sve_fp8_cvt1>;
 defm F2CVT_ZZ     : sve2_fp8_cvt_single<0b0, 0b01, "f2cvt",    nxv8f16,  int_aarch64_sve_fp8_cvt2>;
@@ -4527,15 +4530,15 @@ defm FCVTNB_Z2Z_StoB : sve2_fp8_down_cvt_single<0b01, "fcvtnb", ZZ_s_mul_r, nxv4
 defm BFCVTN_Z2Z_HtoB : sve2_fp8_down_cvt_single<0b10, "bfcvtn", ZZ_h_mul_r, nxv8bf16, int_aarch64_sve_fp8_cvtn>;
 
 defm FCVTNT_Z2Z_StoB : sve2_fp8_down_cvt_single_top<0b11, "fcvtnt", ZZ_s_mul_r, nxv4f32,  int_aarch64_sve_fp8_cvtnt>;
-} // End HasSVE2_or_SME2, HasFP8
+} // End HasNonStreamingSVE2_or_SME2, HasFP8
 
-let Predicates = [HasSVE2_or_SME2, HasFAMINMAX] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasFAMINMAX] in {
 defm FAMIN_ZPmZ : sve_fp_2op_p_zds<0b1111, "famin", "FAMIN_ZPZZ", int_aarch64_sve_famin, DestructiveBinaryComm>;
 defm FAMAX_ZPmZ : sve_fp_2op_p_zds<0b1110, "famax", "FAMAX_ZPZZ", int_aarch64_sve_famax, DestructiveBinaryComm>;
 
 defm FAMAX_ZPZZ : sve_fp_bin_pred_hfd<AArch64famax_p>;
 defm FAMIN_ZPZZ : sve_fp_bin_pred_hfd<AArch64famin_p>;
-} // End HasSVE2_or_SME2, HasFAMINMAX
+} // End HasNonStreamingSVE2_or_SME2, HasFAMINMAX
 
 let Predicates = [HasSSVE_FP8FMA] in {
 // FP8 Widening Multiply-Add Long - Indexed Group
@@ -4579,14 +4582,14 @@ defm FDOT_ZZZI_BtoS : sve2_fp8_dot_indexed_s<"fdot", int_aarch64_sve_fp8_fdot_la
 defm FDOT_ZZZ_BtoS : sve_fp8_dot<0b1, ZPR32, "fdot", nxv4f32, int_aarch64_sve_fp8_fdot>;
 }
 
-let Predicates = [HasSVE2_or_SME2, HasLUT] in {
+let Predicates = [HasNonStreamingSVE2_or_SME2, HasLUT] in {
 // LUTI2
   defm LUTI2_ZZZI : sve2_luti2_vector_index<"luti2">;
 // LUTI4
   defm LUTI4_ZZZI   : sve2_luti4_vector_index<"luti4">;
 // LUTI4 (two contiguous registers)
   defm LUTI4_Z2ZZI  : sve2_luti4_vector_vg2_index<"luti4">;
-} // End HasSVE2_or_SME2, HasLUT
+} // End HasNonStreamingSVE2_or_SME2, HasLUT
 
 //===----------------------------------------------------------------------===//
 // Checked Pointer Arithmetic (FEAT_CPA)
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
index a1093c28467ab..de9811b92424e 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvt-cvtlt.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
index 2ffba10e21100..e42f2b1cfba48 100644
--- a/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
+++ b/llvm/test/CodeGen/AArch64/fp8-sve-cvtn.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+sve2,+fp8 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2,+fp8 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2,+fp8 --force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
index 021d4855905e7..fb837c4279f6e 100644
--- a/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
+++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-fexpa.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
-; RUN: llc -mtriple=aarch64-linux-gnu -force-streaming -mattr=+ssve-fexpa < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+ssve-fexpa -force-streaming < %s | FileCheck %s
 
 define <vscale x 8 x half> @fexpa_h(<vscale x 8 x i16> %a) {
 ; CHECK-LABEL: fexpa_h:
@@ -27,4 +27,4 @@ define <vscale x 2 x double> @fexpa_d(<vscale x 2 x i1> %pg, <vscale x 2 x i64>
 ; CHECK-NEXT:    ret
   %out = call <vscale x 2 x double> @llvm.aarch64.sve.fexpa.x.nxv2f64(<vscale x 2 x i64> %a)
   ret <vscale x 2 x double> %out
-}
\ No newline at end of file
+}
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
index e2483cff3d186..1ccb5264aa837 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-add-sub.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ; ADDHNB
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
index 2f7b82751cdcf..2b2c9da1a3063 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-binary-narrowing-shr.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
index 8d395adda0799..3a2a02f80a58b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-complex-dot.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
index 6005fb69ae1ba..27416375ad6af 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-contiguous-conflict-detection.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
index fc08f2cdf94a9..317dea251937a 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2-aes < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2,+sve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+ssve-aes < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme,+ssve-aes -force-streaming < %s | FileCheck %s
 
 ;
 ; AESD
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
index 7d16f8383d968..9be3a67f88b09 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-faminmax.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mattr=+sve,+sme2 < %s | FileCheck %s
 ; RUN: llc -mattr=+sme2 -force-streaming < %s | FileCheck %s
 
 target triple = "aarch64-linux"
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
index 16041766605e9..1ce7564167992 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-converts.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
index 52c04d614b4e1..d9a3a28ff9aa8 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
index 6dc2c67b5fd9e..70824f88caef5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-widening-mul-acc.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
index c46016e0c40de..c78a872fcfbf5 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-int-mul-lane.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+sme < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s
 
 ;
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
index 5cea7536e1f3c..8e53a82401e0b 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-luti.ll
@@ -1,5 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -...
[truncated]

Code generation predicates like HasSVE2_or_SME implemented a strict
divide between streaming and non-streaming which meant some SME
instructions where not available unless a matching SVE feature was
enabled.

As a specific example, in order to enable multi-register WHILE
instructions in non-streaming mode a user must enable "sve2p1" when
using "sme2" should be sufficient.

This PR seperates the streaming/non-streaming requirement from a
features's SVE/SME designation, which in most cases means "+sveX[pY]"
and "+sve,+smeV[pW]" can be used interchangeable when an instruction
is available via "+sveX[pY]" or "+smeV[pW]".

NOTE: In some instances this means the compiler will support
unsupported configurations, which is fine.

NOTE: This PR does not fix all the predicates as I plan to follow up
with other PRs to relax the crypto, bitperm and fp8 features.
@paulwalker-arm
Copy link
Collaborator Author

ping

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I'm not an expert in this area. Just a couple of typos in the commit message: "meant some SME instructions where not available" and "seperate".

Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not gone through all of the instructions to make sure their predicates are (still) correct, but this does indeed look more in the spirit of the specification.

AssemblerPredicateWithAll<(any_of FeatureSVE2, FeatureSME),
"sve2 or sme">;
def HasSVE2_or_SME2
: Predicate<"Subtarget->hasSVE2() || (Subtarget->isStreaming() && Subtarget->hasSME2())">,
def HasNonStreamingSVE2_or_SME2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move these above HasSVE2_or_SME to bundle them together with the other HasNonStreamingSVE*_or_SME* ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to order the combined predicates based on the required SVE feature so that, for example, all SVE2 features are kept together.

@paulwalker-arm paulwalker-arm merged commit 7cc8fe2 into llvm:main Jul 2, 2025
7 checks passed
@paulwalker-arm paulwalker-arm deleted the sve-feature-flags branch July 2, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants