Skip to content

[LLVM][AArch64] Add new feature +sme-mop4 and +sme-tmop #121935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 14, 2025

Conversation

CarolineConcatto
Copy link
Contributor

The 2024-12 ISA spec release[1] add these features:
FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product instructions
and
FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product instructions
to allow these instructions to be available outside Armv9.6/sme2p2

[1] https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' mc Machine (object) code labels Jan 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2025

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang-driver

Author: None (CarolineConcatto)

Changes

The 2024-12 ISA spec release[1] add these features:
FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product instructions
and
FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product instructions
to allow these instructions to be available outside Armv9.6/sme2p2

[1] https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads


Patch is 181.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121935.diff

41 Files Affected:

  • (modified) clang/test/Driver/print-supported-extensions-aarch64.c (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64.td (+2-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+6)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+32-21)
  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+2)
  • (modified) llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/bftmopa.s (+8-6)
  • (modified) llvm/test/MC/AArch64/SME2p2/directive-arch.s (+8)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4a-fp8-fp16-widening.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4a-fp8-fp32-widening.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp16-fp32-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp16-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp32-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp64-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/ftmopa.s (+18-16)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/stmopa.s (+8-6)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4a-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4s-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sutmopa.s (+5-3)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4s-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4a-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4s-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/ustmopa.s (+5-3)
  • (modified) llvm/test/MC/AArch64/SME2p2/utmopa.s (+8-6)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+8-2)
diff --git a/clang/test/Driver/print-supported-extensions-aarch64.c b/clang/test/Driver/print-supported-extensions-aarch64.c
index 09d499548aa565..77812189f12d0d 100644
--- a/clang/test/Driver/print-supported-extensions-aarch64.c
+++ b/clang/test/Driver/print-supported-extensions-aarch64.c
@@ -51,6 +51,8 @@
 // CHECK-NEXT:     pcdphint            FEAT_PCDPHINT                                          Enable Armv9.6-A Producer Consumer Data Placement hints
 // CHECK-NEXT:     pmuv3               FEAT_PMUv3                                             Enable Armv8.0-A PMUv3 Performance Monitors extension
 // CHECK-NEXT:     pops                FEAT_PoPS                                              Enable Armv9.6-A Point Of Physical Storage (PoPS) DC instructions
+// CHECK-NEXT:     sme-mop4            FEAT_SME_MOP4                                          Enable SME Quarter-tile outer product instructions
+// CHECK-NEXT:     sme-tmop            FEAT_SME_TMOP                                          Enable SME Structured sparsity outer product instructions
 // CHECK-NEXT:     predres             FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
 // CHECK-NEXT:     rng                 FEAT_RNG                                               Enable Random Number generation instructions
 // CHECK-NEXT:     ras                 FEAT_RAS, FEAT_RASv1p1                                 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index e3dd334e7b098b..67cc98965b5799 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -74,7 +74,8 @@ def SVEUnsupported : AArch64Unsupported {
 }
 
 let F = [HasSME2p2, HasSVE2p2orSME2p2, HasNonStreamingSVEorSME2p2,
-         HasNonStreamingSVE2p2orSME2p2] in
+         HasNonStreamingSVE2p2orSME2p2, HasSME2p2orSME_MOP4,
+         HasSME2p2orSME_TMOP] in
 def SME2p2Unsupported : AArch64Unsupported;
 
 def SME2p1Unsupported : AArch64Unsupported {
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 41eb9a73bd013d..e9606c2db60673 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -565,6 +565,12 @@ def FeaturePCDPHINT: ExtensionWithMArch<"pcdphint", "PCDPHINT", "FEAT_PCDPHINT",
 def FeaturePoPS: ExtensionWithMArch<"pops", "PoPS", "FEAT_PoPS",
   "Enable Armv9.6-A Point Of Physical Storage (PoPS) DC instructions">;
 
+def FeatureSME_MOP4: ExtensionWithMArch<"sme-mop4", "SME_MOP4", "FEAT_SME_MOP4",
+  "Enable SME Quarter-tile outer product instructions", [FeatureSME2]>;
+
+def FeatureSME_TMOP: ExtensionWithMArch<"sme-tmop", "SME_TMOP", "FEAT_SME_TMOP",
+  "Enable SME Structured sparsity outer product instructions.", [FeatureSME2]>;
+
 //===----------------------------------------------------------------------===//
 //  Other Features
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index c6f5cdcd1d5fe7..3983665c69b2cb 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -286,6 +286,14 @@ def HasNonStreamingSVE2p2orSME2p2
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2p2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2p2, FeatureSME2p2),
                 "sme2p2 or sve2p2">;
+def HasSME2p2orSME_MOP4
+    : Predicate<"(Subtarget->isStreaming() &&"
+                "(Subtarget->hasSME2p2() || Subtarget->hasSME_MOP4()))">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2p2, FeatureSME_MOP4), "sme2p2 or sme-mop4">;
+def HasSME2p2orSME_TMOP
+    : Predicate<"(Subtarget->isStreaming() &&"
+                "(Subtarget->hasSME2p2()) || Subtarget->hasSME_TMOP()))">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2p2, FeatureSME_TMOP), "sme2p2 or sme-tmop">;
 
 // A subset of NEON instructions are legal in Streaming SVE execution mode,
 // so don't need the additional check for 'isNeonAvailable'.
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index aee54ed47a3ab4..33a032974c4594 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -147,7 +147,7 @@ defm USMOPA_MPPZZ_D : sme_int_outer_product_i64<0b100, "usmopa", int_aarch64_sme
 defm USMOPS_MPPZZ_D : sme_int_outer_product_i64<0b101, "usmops", int_aarch64_sme_usmops_wide>;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_MOP4] in {
   defm SMOP4A  : sme_quarter_outer_product_i8_i32<0b0, 0b0, 0b0, "smop4a">;
   defm SMOP4S  : sme_quarter_outer_product_i8_i32<0b0, 0b0, 0b1, "smop4s">;
   defm SUMOP4A : sme_quarter_outer_product_i8_i32<0b0, 0b1, 0b0, "sumop4a">;
@@ -163,7 +163,7 @@ let Predicates = [HasSME2p2] in {
   defm UMOP4S : sme_quarter_outer_product_i16_i32<0b1, 0b1, "umop4s">;
 }
 
-let Predicates = [HasSME2p2, HasSMEI16I64] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEI16I64] in {
   defm SMOP4A : sme_quarter_outer_product_i64<0b0, 0b0, 0b0, "smop4a">;
   defm SMOP4S : sme_quarter_outer_product_i64<0b0, 0b0, 0b1, "smop4s">;
   defm SUMOP4A : sme_quarter_outer_product_i64<0b0, 0b1, 0b0, "sumop4a">;
@@ -174,7 +174,7 @@ let Predicates = [HasSME2p2, HasSMEI16I64] in {
   defm USMOP4S : sme_quarter_outer_product_i64<0b1, 0b0, 0b1, "usmop4s">;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_TMOP] in {
 def STMOPA_M2ZZZI_BtoS  : sme_int_sparse_outer_product_i32<0b00100, ZZ_b_mul_r, ZPR8,  "stmopa">;
 def STMOPA_M2ZZZI_HtoS  : sme_int_sparse_outer_product_i32<0b00101, ZZ_h_mul_r, ZPR16, "stmopa">;
 def UTMOPA_M2ZZZI_BtoS  : sme_int_sparse_outer_product_i32<0b11100, ZZ_b_mul_r, ZPR8,  "utmopa">;
@@ -1053,41 +1053,52 @@ let Predicates = [HasSME2, HasSVEBFSCALE] in {
   defm BFSCALE : sme2_bfscale_multi<"bfscale">;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_MOP4] in {
+  defm BFMOP4A : sme2_bfmop4as_widening<0, "bfmop4a">;
+  defm BFMOP4S : sme2_bfmop4as_widening<1, "bfmop4s">;
+
+  defm FMOP4A : sme2_fmop4as_fp16_fp32_widening<0, "fmop4a">;
+  defm FMOP4S : sme2_fmop4as_fp16_fp32_widening<1, "fmop4s">;
+
+  defm FMOP4A : sme2_fmop4as_fp32_non_widening<0, "fmop4a">;
+  defm FMOP4S : sme2_fmop4as_fp32_non_widening<1, "fmop4s">;
+}
+
+let Predicates = [HasSME2p2orSME_TMOP] in {
   def FTMOPA_M2ZZZI_HtoS  : sme_tmopa_32b<0b11000, ZZ_h_mul_r, ZPR16, "ftmopa">;
   def FTMOPA_M2ZZZI_StoS  : sme_tmopa_32b<0b00000, ZZ_s_mul_r, ZPR32, "ftmopa">;
   def BFTMOPA_M2ZZZI_HtoS : sme_tmopa_32b<0b10000, ZZ_h_mul_r, ZPR16, "bftmopa">;
+}
 
-  defm BFMOP4A : sme2_bfmop4as_widening<0, "bfmop4a">;
-  defm BFMOP4S : sme2_bfmop4as_widening<1, "bfmop4s">;
-
+let Predicates = [HasSME2p2] in {
   defm FMUL_2ZZ  : sme2_multi2_fmul_sm<"fmul">;
   defm FMUL_2Z2Z : sme2_multi2_fmul_mm< "fmul">;
   defm FMUL_4ZZ  : sme2_multi4_fmul_sm<"fmul">;
   defm FMUL_4Z4Z : sme2_multi4_fmul_mm< "fmul">;
 
-  defm FMOP4A : sme2_fmop4as_fp32_non_widening<0, "fmop4a">;
-  defm FMOP4S : sme2_fmop4as_fp32_non_widening<1, "fmop4s">;
-
-  defm FMOP4A : sme2_fmop4as_fp16_fp32_widening<0, "fmop4a">;
-  defm FMOP4S : sme2_fmop4as_fp16_fp32_widening<1, "fmop4s">;
-}
+} // [HasSME2p2]
 
-let Predicates = [HasSME2p2, HasSMEB16B16] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEB16B16] in {
   def BFTMOPA_M2ZZZI_HtoH : sme_tmopa_16b<0b11001, ZZ_h_mul_r, ZPR16, "bftmopa">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF8F32], Uses = [FPMR, FPCR] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF8F32], Uses = [FPMR, FPCR] in {
   def FTMOPA_M2ZZZI_BtoS : sme_tmopa_32b<0b01000, ZZ_b_mul_r, ZPR8, "ftmopa">;
-}
+} 
 
-let Predicates = [HasSME2p2, HasSMEF8F16], Uses = [FPMR, FPCR] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF8F16], Uses = [FPMR, FPCR] in {
   def FTMOPA_M2ZZZI_BtoH : sme_tmopa_16b<0b01001, ZZ_b_mul_r, ZPR8, "ftmopa">;
+}
+
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF8F16], Uses = [FPMR, FPCR] in {
   defm FMOP4A : sme2_fmop4a_fp8_fp16_2way<"fmop4a">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF16F16] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF16F16] in {
   def FTMOPA_M2ZZZI_HtoH : sme_tmopa_16b<0b10001, ZZ_h_mul_r, ZPR16, "ftmopa">;
+}
+
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF16F16] in {
   defm FMOP4A : sme2_fmop4as_fp16_non_widening<0, "fmop4a">;
   defm FMOP4S : sme2_fmop4as_fp16_non_widening<1, "fmop4s">;
 }
@@ -1098,17 +1109,17 @@ let Predicates = [HasSME2, HasSVEBFSCALE] in {
 }
 
 let Uses = [FPMR, FPCR] in {
-let Predicates = [HasSME2p2, HasSMEF8F32] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF8F32] in {
   defm FMOP4A : sme2_fmop4a_fp8_fp32_4way<"fmop4a">;
 }
 }
 
-let Predicates = [HasSME2p2, HasSMEB16B16] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEB16B16] in {
   defm BFMOP4A : sme2_bfmop4as_non_widening<0, "bfmop4a">;
   defm BFMOP4S : sme2_bfmop4as_non_widening<1, "bfmop4s">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF64F64] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF64F64] in {
   defm FMOP4A : sme2_fmop4as_fp64_non_widening<0, "fmop4a">;
   defm FMOP4S : sme2_fmop4as_fp64_non_widening<1, "fmop4s">;
 }
diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index f44afd804c2bde..8b0296f7c788e9 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -3827,6 +3827,8 @@ static const struct Extension {
     {"lsui", {AArch64::FeatureLSUI}},
     {"occmo", {AArch64::FeatureOCCMO}},
     {"pcdphint", {AArch64::FeaturePCDPHINT}},
+    {"sme-mop4", {AArch64::FeatureSME_MOP4}},
+    {"sme-tmop", {AArch64::FeatureSME_TMOP}},
 };
 
 static void setRequiredFeatureString(FeatureBitset FBS, std::string &Str) {
diff --git a/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s b/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
index b98bb99def0569..eb3382b67e9cfd 100644
--- a/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
+++ b/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
@@ -1,5 +1,7 @@
 // RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme2p2,+sme-b16b16 < %s \
 // RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme-mop4,+sme-b16b16 < %s \
+// RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
 // RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
 // RUN:        | FileCheck %s --check-prefix=CHECK-ERROR
 // RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sme2p2,+sme-b16b16 < %s \
@@ -19,19 +21,19 @@
 bfmop4a za0.h, z0.h, z16.h  // 10000001-00100000-00000000-00001000
 // CHECK-INST: bfmop4a za0.h, z0.h, z16.h
 // CHECK-ENCODING: [0x08,0x00,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200008 <unknown>
 
 bfmop4a za1.h, z12.h, z24.h  // 10000001-00101000-00000001-10001001
 // CHECK-INST: bfmop4a za1.h, z12.h, z24.h
 // CHECK-ENCODING: [0x89,0x01,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280189 <unknown>
 
 bfmop4a za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11001001
 // CHECK-INST: bfmop4a za1.h, z14.h, z30.h
 // CHECK-ENCODING: [0xc9,0x01,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e01c9 <unknown>
 
 // Single and multiple vectors
@@ -39,19 +41,19 @@ bfmop4a za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11001001
 bfmop4a za0.h, z0.h, {z16.h-z17.h}  // 10000001-00110000-00000000-00001000
 // CHECK-INST: bfmop4a za0.h, z0.h, { z16.h, z17.h }
 // CHECK-ENCODING: [0x08,0x00,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300008 <unknown>
 
 bfmop4a za1.h, z12.h, {z24.h-z25.h}  // 10000001-00111000-00000001-10001001
 // CHECK-INST: bfmop4a za1.h, z12.h, { z24.h, z25.h }
 // CHECK-ENCODING: [0x89,0x01,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380189 <unknown>
 
 bfmop4a za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11001001
 // CHECK-INST: bfmop4a za1.h, z14.h, { z30.h, z31.h }
 // CHECK-ENCODING: [0xc9,0x01,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e01c9 <unknown>
 
 // Multiple and single vectors
@@ -59,19 +61,19 @@ bfmop4a za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11001001
 bfmop4a za0.h, {z0.h-z1.h}, z16.h  // 10000001-00100000-00000010-00001000
 // CHECK-INST: bfmop4a za0.h, { z0.h, z1.h }, z16.h
 // CHECK-ENCODING: [0x08,0x02,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200208 <unknown>
 
 bfmop4a za1.h, {z12.h-z13.h}, z24.h  // 10000001-00101000-00000011-10001001
 // CHECK-INST: bfmop4a za1.h, { z12.h, z13.h }, z24.h
 // CHECK-ENCODING: [0x89,0x03,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280389 <unknown>
 
 bfmop4a za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11001001
 // CHECK-INST: bfmop4a za1.h, { z14.h, z15.h }, z30.h
 // CHECK-ENCODING: [0xc9,0x03,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e03c9 <unknown>
 
 // Multiple vectors
@@ -79,19 +81,19 @@ bfmop4a za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11001001
 bfmop4a za0.h, {z0.h-z1.h}, {z16.h-z17.h}  // 10000001-00110000-00000010-00001000
 // CHECK-INST: bfmop4a za0.h, { z0.h, z1.h }, { z16.h, z17.h }
 // CHECK-ENCODING: [0x08,0x02,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300208 <unknown>
 
 bfmop4a za1.h, {z12.h-z13.h}, {z24.h-z25.h}  // 10000001-00111000-00000011-10001001
 // CHECK-INST: bfmop4a za1.h, { z12.h, z13.h }, { z24.h, z25.h }
 // CHECK-ENCODING: [0x89,0x03,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380389 <unknown>
 
 bfmop4a za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11001001
 // CHECK-INST: bfmop4a za1.h, { z14.h, z15.h }, { z30.h, z31.h }
 // CHECK-ENCODING: [0xc9,0x03,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e03c9 <unknown>
 
 
@@ -102,19 +104,19 @@ bfmop4a za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11001
 bfmop4s za0.h, z0.h, z16.h  // 10000001-00100000-00000000-00011000
 // CHECK-INST: bfmop4s za0.h, z0.h, z16.h
 // CHECK-ENCODING: [0x18,0x00,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200018 <unknown>
 
 bfmop4s za1.h, z12.h, z24.h  // 10000001-00101000-00000001-10011001
 // CHECK-INST: bfmop4s za1.h, z12.h, z24.h
 // CHECK-ENCODING: [0x99,0x01,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280199 <unknown>
 
 bfmop4s za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11011001
 // CHECK-INST: bfmop4s za1.h, z14.h, z30.h
 // CHECK-ENCODING: [0xd9,0x01,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e01d9 <unknown>
 
 // Single and multiple vectors
@@ -122,19 +124,19 @@ bfmop4s za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11011001
 bfmop4s za0.h, z0.h, {z16.h-z17.h}  // 10000001-00110000-00000000-00011000
 // CHECK-INST: bfmop4s za0.h, z0.h, { z16.h, z17.h }
 // CHECK-ENCODING: [0x18,0x00,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300018 <unknown>
 
 bfmop4s za1.h, z12.h, {z24.h-z25.h}  // 10000001-00111000-00000001-10011001
 // CHECK-INST: bfmop4s za1.h, z12.h, { z24.h, z25.h }
 // CHECK-ENCODING: [0x99,0x01,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380199 <unknown>
 
 bfmop4s za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11011001
 // CHECK-INST: bfmop4s za1.h, z14.h, { z30.h, z31.h }
 // CHECK-ENCODING: [0xd9,0x01,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e01d9 <unknown>
 
 // Multiple and single vectors
@@ -142,19 +144,19 @@ bfmop4s za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11011001
 bfmop4s za0.h, {z0.h-z1.h}, z16.h  // 10000001-00100000-00000010-00011000
 // CHECK-INST: bfmop4s za0.h, { z0.h, z1.h }, z16.h
 // CHECK-ENCODING: [0x18,0x02,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200218 <unknown>
 
 bfmop4s za1.h, {z12.h-z13.h}, z24.h  // 10000001-00101000-00000011-10011001
 // CHECK-INST: bfmop4s za1.h, { z12.h, z13.h }, z24.h
 // CHECK-ENCODING: [0x99,0x03,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280399 <unknown>
 
 bfmop4s za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11011001
 // CHECK-INST: bfmop4s za1.h, { z14.h, z15.h }, z30.h
 // CHECK-ENCODING: [0xd9,0x03,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e03d9 <unknown>
 
 // Multiple vectors
@@ -162,17 +164,17 @@ bfmop4s za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11011001
 bfmop4s za0.h, {z0.h-z1.h}, {z16.h-z17.h}  // 10000001-00110000-00000010-00011000
 // CHECK-INST: bfmop4s za0.h, { z0.h, z1.h }, { z16.h, z17.h }
 // CHECK-ENCODING: [0x18,0x02,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300218 <unknown>
 
 bfmop4s za1.h, {z12.h-z13.h}, {z24.h-z25.h}  // 10000001-00111000-00000011-10011001
 // CHECK-INST: bfmop4s za1.h, { z12.h, z13.h }, { z24.h, z25.h }
 // CHECK-ENCODING: [0x99,0x03,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380399 <unknown>
 
 bfmop4s za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11011001
 // CHECK-INST: bfmop4s za1.h, { z14.h, z15.h }, { z30.h, z31.h }
 // CHECK-ENCODING: [0xd9,0x03,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e03d9 <unknown>
diff --git a/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s b/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
index 40d08e503c8bb3..b550342a71c77a 100644
--- a/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
+++ b/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
@@ -1,5 +1,7 @@
 // RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme2p2 < %s \
 // RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme-mop4 < %s \
+// RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2025

@llvm/pr-subscribers-clang

Author: None (CarolineConcatto)

Changes

The 2024-12 ISA spec release[1] add these features:
FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product instructions
and
FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product instructions
to allow these instructions to be available outside Armv9.6/sme2p2

[1] https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads


Patch is 181.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121935.diff

41 Files Affected:

  • (modified) clang/test/Driver/print-supported-extensions-aarch64.c (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64.td (+2-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+6)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+32-21)
  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+2)
  • (modified) llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/bftmopa.s (+8-6)
  • (modified) llvm/test/MC/AArch64/SME2p2/directive-arch.s (+8)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4a-fp8-fp16-widening.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4a-fp8-fp32-widening.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp16-fp32-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp16-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp32-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/fmop4as-fp64-non-widening.s (+26-24)
  • (modified) llvm/test/MC/AArch64/SME2p2/ftmopa.s (+18-16)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4a-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/smop4s-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/stmopa.s (+8-6)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4a-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4s-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sumop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/sutmopa.s (+5-3)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-16to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4a-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/umop4s-8to32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4a-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4a-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4s-32.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/usmop4s-64.s (+14-12)
  • (modified) llvm/test/MC/AArch64/SME2p2/ustmopa.s (+5-3)
  • (modified) llvm/test/MC/AArch64/SME2p2/utmopa.s (+8-6)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+8-2)
diff --git a/clang/test/Driver/print-supported-extensions-aarch64.c b/clang/test/Driver/print-supported-extensions-aarch64.c
index 09d499548aa565..77812189f12d0d 100644
--- a/clang/test/Driver/print-supported-extensions-aarch64.c
+++ b/clang/test/Driver/print-supported-extensions-aarch64.c
@@ -51,6 +51,8 @@
 // CHECK-NEXT:     pcdphint            FEAT_PCDPHINT                                          Enable Armv9.6-A Producer Consumer Data Placement hints
 // CHECK-NEXT:     pmuv3               FEAT_PMUv3                                             Enable Armv8.0-A PMUv3 Performance Monitors extension
 // CHECK-NEXT:     pops                FEAT_PoPS                                              Enable Armv9.6-A Point Of Physical Storage (PoPS) DC instructions
+// CHECK-NEXT:     sme-mop4            FEAT_SME_MOP4                                          Enable SME Quarter-tile outer product instructions
+// CHECK-NEXT:     sme-tmop            FEAT_SME_TMOP                                          Enable SME Structured sparsity outer product instructions
 // CHECK-NEXT:     predres             FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
 // CHECK-NEXT:     rng                 FEAT_RNG                                               Enable Random Number generation instructions
 // CHECK-NEXT:     ras                 FEAT_RAS, FEAT_RASv1p1                                 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index e3dd334e7b098b..67cc98965b5799 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -74,7 +74,8 @@ def SVEUnsupported : AArch64Unsupported {
 }
 
 let F = [HasSME2p2, HasSVE2p2orSME2p2, HasNonStreamingSVEorSME2p2,
-         HasNonStreamingSVE2p2orSME2p2] in
+         HasNonStreamingSVE2p2orSME2p2, HasSME2p2orSME_MOP4,
+         HasSME2p2orSME_TMOP] in
 def SME2p2Unsupported : AArch64Unsupported;
 
 def SME2p1Unsupported : AArch64Unsupported {
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 41eb9a73bd013d..e9606c2db60673 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -565,6 +565,12 @@ def FeaturePCDPHINT: ExtensionWithMArch<"pcdphint", "PCDPHINT", "FEAT_PCDPHINT",
 def FeaturePoPS: ExtensionWithMArch<"pops", "PoPS", "FEAT_PoPS",
   "Enable Armv9.6-A Point Of Physical Storage (PoPS) DC instructions">;
 
+def FeatureSME_MOP4: ExtensionWithMArch<"sme-mop4", "SME_MOP4", "FEAT_SME_MOP4",
+  "Enable SME Quarter-tile outer product instructions", [FeatureSME2]>;
+
+def FeatureSME_TMOP: ExtensionWithMArch<"sme-tmop", "SME_TMOP", "FEAT_SME_TMOP",
+  "Enable SME Structured sparsity outer product instructions.", [FeatureSME2]>;
+
 //===----------------------------------------------------------------------===//
 //  Other Features
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index c6f5cdcd1d5fe7..3983665c69b2cb 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -286,6 +286,14 @@ def HasNonStreamingSVE2p2orSME2p2
                 "(Subtarget->isSVEorStreamingSVEAvailable() && Subtarget->hasSME2p2())">,
                 AssemblerPredicateWithAll<(any_of FeatureSVE2p2, FeatureSME2p2),
                 "sme2p2 or sve2p2">;
+def HasSME2p2orSME_MOP4
+    : Predicate<"(Subtarget->isStreaming() &&"
+                "(Subtarget->hasSME2p2() || Subtarget->hasSME_MOP4()))">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2p2, FeatureSME_MOP4), "sme2p2 or sme-mop4">;
+def HasSME2p2orSME_TMOP
+    : Predicate<"(Subtarget->isStreaming() &&"
+                "(Subtarget->hasSME2p2()) || Subtarget->hasSME_TMOP()))">,
+                AssemblerPredicateWithAll<(any_of FeatureSME2p2, FeatureSME_TMOP), "sme2p2 or sme-tmop">;
 
 // A subset of NEON instructions are legal in Streaming SVE execution mode,
 // so don't need the additional check for 'isNeonAvailable'.
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index aee54ed47a3ab4..33a032974c4594 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -147,7 +147,7 @@ defm USMOPA_MPPZZ_D : sme_int_outer_product_i64<0b100, "usmopa", int_aarch64_sme
 defm USMOPS_MPPZZ_D : sme_int_outer_product_i64<0b101, "usmops", int_aarch64_sme_usmops_wide>;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_MOP4] in {
   defm SMOP4A  : sme_quarter_outer_product_i8_i32<0b0, 0b0, 0b0, "smop4a">;
   defm SMOP4S  : sme_quarter_outer_product_i8_i32<0b0, 0b0, 0b1, "smop4s">;
   defm SUMOP4A : sme_quarter_outer_product_i8_i32<0b0, 0b1, 0b0, "sumop4a">;
@@ -163,7 +163,7 @@ let Predicates = [HasSME2p2] in {
   defm UMOP4S : sme_quarter_outer_product_i16_i32<0b1, 0b1, "umop4s">;
 }
 
-let Predicates = [HasSME2p2, HasSMEI16I64] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEI16I64] in {
   defm SMOP4A : sme_quarter_outer_product_i64<0b0, 0b0, 0b0, "smop4a">;
   defm SMOP4S : sme_quarter_outer_product_i64<0b0, 0b0, 0b1, "smop4s">;
   defm SUMOP4A : sme_quarter_outer_product_i64<0b0, 0b1, 0b0, "sumop4a">;
@@ -174,7 +174,7 @@ let Predicates = [HasSME2p2, HasSMEI16I64] in {
   defm USMOP4S : sme_quarter_outer_product_i64<0b1, 0b0, 0b1, "usmop4s">;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_TMOP] in {
 def STMOPA_M2ZZZI_BtoS  : sme_int_sparse_outer_product_i32<0b00100, ZZ_b_mul_r, ZPR8,  "stmopa">;
 def STMOPA_M2ZZZI_HtoS  : sme_int_sparse_outer_product_i32<0b00101, ZZ_h_mul_r, ZPR16, "stmopa">;
 def UTMOPA_M2ZZZI_BtoS  : sme_int_sparse_outer_product_i32<0b11100, ZZ_b_mul_r, ZPR8,  "utmopa">;
@@ -1053,41 +1053,52 @@ let Predicates = [HasSME2, HasSVEBFSCALE] in {
   defm BFSCALE : sme2_bfscale_multi<"bfscale">;
 }
 
-let Predicates = [HasSME2p2] in {
+let Predicates = [HasSME2p2orSME_MOP4] in {
+  defm BFMOP4A : sme2_bfmop4as_widening<0, "bfmop4a">;
+  defm BFMOP4S : sme2_bfmop4as_widening<1, "bfmop4s">;
+
+  defm FMOP4A : sme2_fmop4as_fp16_fp32_widening<0, "fmop4a">;
+  defm FMOP4S : sme2_fmop4as_fp16_fp32_widening<1, "fmop4s">;
+
+  defm FMOP4A : sme2_fmop4as_fp32_non_widening<0, "fmop4a">;
+  defm FMOP4S : sme2_fmop4as_fp32_non_widening<1, "fmop4s">;
+}
+
+let Predicates = [HasSME2p2orSME_TMOP] in {
   def FTMOPA_M2ZZZI_HtoS  : sme_tmopa_32b<0b11000, ZZ_h_mul_r, ZPR16, "ftmopa">;
   def FTMOPA_M2ZZZI_StoS  : sme_tmopa_32b<0b00000, ZZ_s_mul_r, ZPR32, "ftmopa">;
   def BFTMOPA_M2ZZZI_HtoS : sme_tmopa_32b<0b10000, ZZ_h_mul_r, ZPR16, "bftmopa">;
+}
 
-  defm BFMOP4A : sme2_bfmop4as_widening<0, "bfmop4a">;
-  defm BFMOP4S : sme2_bfmop4as_widening<1, "bfmop4s">;
-
+let Predicates = [HasSME2p2] in {
   defm FMUL_2ZZ  : sme2_multi2_fmul_sm<"fmul">;
   defm FMUL_2Z2Z : sme2_multi2_fmul_mm< "fmul">;
   defm FMUL_4ZZ  : sme2_multi4_fmul_sm<"fmul">;
   defm FMUL_4Z4Z : sme2_multi4_fmul_mm< "fmul">;
 
-  defm FMOP4A : sme2_fmop4as_fp32_non_widening<0, "fmop4a">;
-  defm FMOP4S : sme2_fmop4as_fp32_non_widening<1, "fmop4s">;
-
-  defm FMOP4A : sme2_fmop4as_fp16_fp32_widening<0, "fmop4a">;
-  defm FMOP4S : sme2_fmop4as_fp16_fp32_widening<1, "fmop4s">;
-}
+} // [HasSME2p2]
 
-let Predicates = [HasSME2p2, HasSMEB16B16] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEB16B16] in {
   def BFTMOPA_M2ZZZI_HtoH : sme_tmopa_16b<0b11001, ZZ_h_mul_r, ZPR16, "bftmopa">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF8F32], Uses = [FPMR, FPCR] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF8F32], Uses = [FPMR, FPCR] in {
   def FTMOPA_M2ZZZI_BtoS : sme_tmopa_32b<0b01000, ZZ_b_mul_r, ZPR8, "ftmopa">;
-}
+} 
 
-let Predicates = [HasSME2p2, HasSMEF8F16], Uses = [FPMR, FPCR] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF8F16], Uses = [FPMR, FPCR] in {
   def FTMOPA_M2ZZZI_BtoH : sme_tmopa_16b<0b01001, ZZ_b_mul_r, ZPR8, "ftmopa">;
+}
+
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF8F16], Uses = [FPMR, FPCR] in {
   defm FMOP4A : sme2_fmop4a_fp8_fp16_2way<"fmop4a">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF16F16] in {
+let Predicates = [HasSME2p2orSME_TMOP, HasSMEF16F16] in {
   def FTMOPA_M2ZZZI_HtoH : sme_tmopa_16b<0b10001, ZZ_h_mul_r, ZPR16, "ftmopa">;
+}
+
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF16F16] in {
   defm FMOP4A : sme2_fmop4as_fp16_non_widening<0, "fmop4a">;
   defm FMOP4S : sme2_fmop4as_fp16_non_widening<1, "fmop4s">;
 }
@@ -1098,17 +1109,17 @@ let Predicates = [HasSME2, HasSVEBFSCALE] in {
 }
 
 let Uses = [FPMR, FPCR] in {
-let Predicates = [HasSME2p2, HasSMEF8F32] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF8F32] in {
   defm FMOP4A : sme2_fmop4a_fp8_fp32_4way<"fmop4a">;
 }
 }
 
-let Predicates = [HasSME2p2, HasSMEB16B16] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEB16B16] in {
   defm BFMOP4A : sme2_bfmop4as_non_widening<0, "bfmop4a">;
   defm BFMOP4S : sme2_bfmop4as_non_widening<1, "bfmop4s">;
 }
 
-let Predicates = [HasSME2p2, HasSMEF64F64] in {
+let Predicates = [HasSME2p2orSME_MOP4, HasSMEF64F64] in {
   defm FMOP4A : sme2_fmop4as_fp64_non_widening<0, "fmop4a">;
   defm FMOP4S : sme2_fmop4as_fp64_non_widening<1, "fmop4s">;
 }
diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index f44afd804c2bde..8b0296f7c788e9 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -3827,6 +3827,8 @@ static const struct Extension {
     {"lsui", {AArch64::FeatureLSUI}},
     {"occmo", {AArch64::FeatureOCCMO}},
     {"pcdphint", {AArch64::FeaturePCDPHINT}},
+    {"sme-mop4", {AArch64::FeatureSME_MOP4}},
+    {"sme-tmop", {AArch64::FeatureSME_TMOP}},
 };
 
 static void setRequiredFeatureString(FeatureBitset FBS, std::string &Str) {
diff --git a/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s b/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
index b98bb99def0569..eb3382b67e9cfd 100644
--- a/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
+++ b/llvm/test/MC/AArch64/SME2p2/bfmop4as-non-widening.s
@@ -1,5 +1,7 @@
 // RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme2p2,+sme-b16b16 < %s \
 // RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme-mop4,+sme-b16b16 < %s \
+// RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
 // RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
 // RUN:        | FileCheck %s --check-prefix=CHECK-ERROR
 // RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sme2p2,+sme-b16b16 < %s \
@@ -19,19 +21,19 @@
 bfmop4a za0.h, z0.h, z16.h  // 10000001-00100000-00000000-00001000
 // CHECK-INST: bfmop4a za0.h, z0.h, z16.h
 // CHECK-ENCODING: [0x08,0x00,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200008 <unknown>
 
 bfmop4a za1.h, z12.h, z24.h  // 10000001-00101000-00000001-10001001
 // CHECK-INST: bfmop4a za1.h, z12.h, z24.h
 // CHECK-ENCODING: [0x89,0x01,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280189 <unknown>
 
 bfmop4a za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11001001
 // CHECK-INST: bfmop4a za1.h, z14.h, z30.h
 // CHECK-ENCODING: [0xc9,0x01,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e01c9 <unknown>
 
 // Single and multiple vectors
@@ -39,19 +41,19 @@ bfmop4a za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11001001
 bfmop4a za0.h, z0.h, {z16.h-z17.h}  // 10000001-00110000-00000000-00001000
 // CHECK-INST: bfmop4a za0.h, z0.h, { z16.h, z17.h }
 // CHECK-ENCODING: [0x08,0x00,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300008 <unknown>
 
 bfmop4a za1.h, z12.h, {z24.h-z25.h}  // 10000001-00111000-00000001-10001001
 // CHECK-INST: bfmop4a za1.h, z12.h, { z24.h, z25.h }
 // CHECK-ENCODING: [0x89,0x01,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380189 <unknown>
 
 bfmop4a za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11001001
 // CHECK-INST: bfmop4a za1.h, z14.h, { z30.h, z31.h }
 // CHECK-ENCODING: [0xc9,0x01,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e01c9 <unknown>
 
 // Multiple and single vectors
@@ -59,19 +61,19 @@ bfmop4a za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11001001
 bfmop4a za0.h, {z0.h-z1.h}, z16.h  // 10000001-00100000-00000010-00001000
 // CHECK-INST: bfmop4a za0.h, { z0.h, z1.h }, z16.h
 // CHECK-ENCODING: [0x08,0x02,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200208 <unknown>
 
 bfmop4a za1.h, {z12.h-z13.h}, z24.h  // 10000001-00101000-00000011-10001001
 // CHECK-INST: bfmop4a za1.h, { z12.h, z13.h }, z24.h
 // CHECK-ENCODING: [0x89,0x03,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280389 <unknown>
 
 bfmop4a za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11001001
 // CHECK-INST: bfmop4a za1.h, { z14.h, z15.h }, z30.h
 // CHECK-ENCODING: [0xc9,0x03,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e03c9 <unknown>
 
 // Multiple vectors
@@ -79,19 +81,19 @@ bfmop4a za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11001001
 bfmop4a za0.h, {z0.h-z1.h}, {z16.h-z17.h}  // 10000001-00110000-00000010-00001000
 // CHECK-INST: bfmop4a za0.h, { z0.h, z1.h }, { z16.h, z17.h }
 // CHECK-ENCODING: [0x08,0x02,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300208 <unknown>
 
 bfmop4a za1.h, {z12.h-z13.h}, {z24.h-z25.h}  // 10000001-00111000-00000011-10001001
 // CHECK-INST: bfmop4a za1.h, { z12.h, z13.h }, { z24.h, z25.h }
 // CHECK-ENCODING: [0x89,0x03,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380389 <unknown>
 
 bfmop4a za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11001001
 // CHECK-INST: bfmop4a za1.h, { z14.h, z15.h }, { z30.h, z31.h }
 // CHECK-ENCODING: [0xc9,0x03,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e03c9 <unknown>
 
 
@@ -102,19 +104,19 @@ bfmop4a za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11001
 bfmop4s za0.h, z0.h, z16.h  // 10000001-00100000-00000000-00011000
 // CHECK-INST: bfmop4s za0.h, z0.h, z16.h
 // CHECK-ENCODING: [0x18,0x00,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200018 <unknown>
 
 bfmop4s za1.h, z12.h, z24.h  // 10000001-00101000-00000001-10011001
 // CHECK-INST: bfmop4s za1.h, z12.h, z24.h
 // CHECK-ENCODING: [0x99,0x01,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280199 <unknown>
 
 bfmop4s za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11011001
 // CHECK-INST: bfmop4s za1.h, z14.h, z30.h
 // CHECK-ENCODING: [0xd9,0x01,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e01d9 <unknown>
 
 // Single and multiple vectors
@@ -122,19 +124,19 @@ bfmop4s za1.h, z14.h, z30.h  // 10000001-00101110-00000001-11011001
 bfmop4s za0.h, z0.h, {z16.h-z17.h}  // 10000001-00110000-00000000-00011000
 // CHECK-INST: bfmop4s za0.h, z0.h, { z16.h, z17.h }
 // CHECK-ENCODING: [0x18,0x00,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300018 <unknown>
 
 bfmop4s za1.h, z12.h, {z24.h-z25.h}  // 10000001-00111000-00000001-10011001
 // CHECK-INST: bfmop4s za1.h, z12.h, { z24.h, z25.h }
 // CHECK-ENCODING: [0x99,0x01,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380199 <unknown>
 
 bfmop4s za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11011001
 // CHECK-INST: bfmop4s za1.h, z14.h, { z30.h, z31.h }
 // CHECK-ENCODING: [0xd9,0x01,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e01d9 <unknown>
 
 // Multiple and single vectors
@@ -142,19 +144,19 @@ bfmop4s za1.h, z14.h, {z30.h-z31.h}  // 10000001-00111110-00000001-11011001
 bfmop4s za0.h, {z0.h-z1.h}, z16.h  // 10000001-00100000-00000010-00011000
 // CHECK-INST: bfmop4s za0.h, { z0.h, z1.h }, z16.h
 // CHECK-ENCODING: [0x18,0x02,0x20,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81200218 <unknown>
 
 bfmop4s za1.h, {z12.h-z13.h}, z24.h  // 10000001-00101000-00000011-10011001
 // CHECK-INST: bfmop4s za1.h, { z12.h, z13.h }, z24.h
 // CHECK-ENCODING: [0x99,0x03,0x28,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81280399 <unknown>
 
 bfmop4s za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11011001
 // CHECK-INST: bfmop4s za1.h, { z14.h, z15.h }, z30.h
 // CHECK-ENCODING: [0xd9,0x03,0x2e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 812e03d9 <unknown>
 
 // Multiple vectors
@@ -162,17 +164,17 @@ bfmop4s za1.h, {z14.h-z15.h}, z30.h  // 10000001-00101110-00000011-11011001
 bfmop4s za0.h, {z0.h-z1.h}, {z16.h-z17.h}  // 10000001-00110000-00000010-00011000
 // CHECK-INST: bfmop4s za0.h, { z0.h, z1.h }, { z16.h, z17.h }
 // CHECK-ENCODING: [0x18,0x02,0x30,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81300218 <unknown>
 
 bfmop4s za1.h, {z12.h-z13.h}, {z24.h-z25.h}  // 10000001-00111000-00000011-10011001
 // CHECK-INST: bfmop4s za1.h, { z12.h, z13.h }, { z24.h, z25.h }
 // CHECK-ENCODING: [0x99,0x03,0x38,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 81380399 <unknown>
 
 bfmop4s za1.h, {z14.h-z15.h}, {z30.h-z31.h}  // 10000001-00111110-00000011-11011001
 // CHECK-INST: bfmop4s za1.h, { z14.h, z15.h }, { z30.h, z31.h }
 // CHECK-ENCODING: [0xd9,0x03,0x3e,0x81]
-// CHECK-ERROR: instruction requires: sme2p2 sme-b16b16
+// CHECK-ERROR: instruction requires: sme2p2 or sme-mop4 sme-b16b16
 // CHECK-UNKNOWN: 813e03d9 <unknown>
diff --git a/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s b/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
index 40d08e503c8bb3..b550342a71c77a 100644
--- a/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
+++ b/llvm/test/MC/AArch64/SME2p2/bfmop4as-widening.s
@@ -1,5 +1,7 @@
 // RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme2p2 < %s \
 // RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sme-mop4 < %s \
+// RUN:        | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-...
[truncated]

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The 2024-12 ISA spec release[1] add these features:
 FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product instructions
and
 FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product instructions
to allow these instructions to be available outside Armv9.6/sme2p2

[1] https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads
Copy link

github-actions bot commented Jan 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@Lukacma Lukacma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch Carol it looks good ! I think we should also add dependency test for the new features, to both TargetParserTest.cpp and aarch64-implied-sme-features.c .

@CarolineConcatto
Copy link
Contributor Author

Thank you @Lukacma,
the tests are there now.

@CarolineConcatto CarolineConcatto merged commit 5ec7ecd into llvm:main Jan 14, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants