-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][ArmSME] Add test-lower-to-arm-sme pipeline #81732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The ArmSME compilation pipeline has evolved significantly and is now sufficiently complex enough that it warrants a proper lowering pipeline that encapsulates the various passes and orderings. Currently the pipeline is loosely defined in our integration tests, but these have diverged and are not using the same passes or ordering everywhere. This patch introduces a test-lower-to-arm-sme pipeline mirroring test-lower-to-llvm that provides some sanity when running e2e examples and can be used a reference for targeting ArmSME in MLIR. All the integration tests are updated to use this pipeline. The intention is to productize the pipeline once it becomes more mature.
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Cullen Rhodes (c-rhodes) ChangesThe ArmSME compilation pipeline has evolved significantly and is now sufficiently complex enough that it warrants a proper lowering pipeline that encapsulates the various passes and orderings. Currently the pipeline is loosely defined in our integration tests, but these have diverged and are not using the same passes or ordering everywhere. This patch introduces a test-lower-to-arm-sme pipeline mirroring test-lower-to-llvm that provides some sanity when running e2e examples and can be used a reference for targeting ArmSME in MLIR. All the integration tests are updated to use this pipeline. The intention is to productize the pipeline once it becomes more mature. Patch is 26.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81732.diff 23 Files Affected:
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/fill-2d.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/fill-2d.mlir
index 44ff1afe76d383..8724e09ba0bd98 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/fill-2d.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/fill-2d.mlir
@@ -1,13 +1,6 @@
// RUN: mlir-opt %s \
-// RUN: -transform-interpreter \
-// RUN: -test-transform-dialect-erase-schedule \
-// RUN: -lower-vector-mask \
-// RUN: -one-shot-bufferize="bufferize-function-boundaries" \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// RUN: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// RUN: -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -convert-arm-sme-to-llvm -cse -canonicalize \
-// RUN: -test-lower-to-llvm | \
+// RUN: -transform-interpreter -test-transform-dialect-erase-schedule \
+// RUN: -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=entry -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
index c781d5e0af846e..656b04815c8562 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
@@ -1,12 +1,6 @@
// RUN: mlir-opt %s \
// RUN: -transform-interpreter -test-transform-dialect-erase-schedule \
-// RUN: -one-shot-bufferize="bufferize-function-boundaries" -canonicalize \
-// RUN: -convert-vector-to-arm-sme -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// RUN: -convert-vector-to-scf -cse -arm-sve-legalize-vector-storage \
-// RUN: -convert-arm-sme-to-llvm \
-// RUN: -convert-vector-to-llvm=enable-arm-sve \
-// RUN: -cse -canonicalize -test-lower-to-llvm | \
+// RUN: -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=main -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
index 31c3202c3fc57b..2ac0bb591e442a 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
@@ -1,12 +1,6 @@
// RUN: mlir-opt %s \
// RUN: -transform-interpreter -test-transform-dialect-erase-schedule \
-// RUN: -canonicalize \
-// RUN: -convert-vector-to-arm-sme -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// RUN: -convert-vector-to-scf -cse -arm-sve-legalize-vector-storage \
-// RUN: -convert-arm-sme-to-llvm \
-// RUN: -convert-vector-to-llvm=enable-arm-sve \
-// RUN: -cse -canonicalize -test-lower-to-llvm | \
+// RUN: -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=main -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
index d5c35068ccb32e..2a55e91b316043 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
@@ -1,11 +1,6 @@
// RUN: mlir-opt %s \
// RUN: -transform-interpreter -test-transform-dialect-erase-schedule \
-// RUN: -one-shot-bufferize="bufferize-function-boundaries" -canonicalize \
-// RUN: -arm-sme-vector-legalization -canonicalize -cse \
-// RUN: -convert-vector-to-arm-sme -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// RUN: -convert-vector-to-scf=full-unroll -convert-arm-sme-to-llvm \
-// RUN: -test-lower-to-llvm | \
+// RUN: -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=main -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/use-too-many-tiles.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/use-too-many-tiles.mlir
index 42fe21cccd48a7..9c84a36fb17a22 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/use-too-many-tiles.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/use-too-many-tiles.mlir
@@ -1,10 +1,5 @@
// RUN: mlir-opt %s \
-// RUN: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// RUN: -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// RUN: -convert-vector-to-scf -cse -arm-sve-legalize-vector-storage \
-// RUN: -convert-arm-sme-to-llvm -convert-vector-to-llvm=enable-arm-sve -cse \
-// RUN: -canonicalize -test-lower-to-llvm -verify-diagnostics | \
+// RUN: -test-lower-to-arm-sme -verify-diagnostics | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=main -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-store-128-bit-tile.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-store-128-bit-tile.mlir
index 59b4a7e6a52f9b..8bf2c87b848a1a 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-store-128-bit-tile.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-store-128-bit-tile.mlir
@@ -1,9 +1,5 @@
// DEFINE: %{entry_point} = test_load_store_zaq0
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -allocate-arm-sme-tiles -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-load-vertical.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-load-vertical.mlir
index 064141c349241e..0ef15353e9248e 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-load-vertical.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-load-vertical.mlir
@@ -1,9 +1,5 @@
// DEFINE: %{entry_point} = entry
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-multi-tile-transpose.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-multi-tile-transpose.mlir
index 0827d9b7464add..b088cd135b36a4 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-multi-tile-transpose.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-multi-tile-transpose.mlir
@@ -1,10 +1,4 @@
-// RUN: mlir-opt %s -arm-sme-vector-legalization -cse -canonicalize \
-// RUN: -convert-vector-to-arm-sme -allocate-arm-sme-tiles -convert-arm-sme-to-scf \
-// RUN: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// RUN: -convert-vector-to-scf -cse -arm-sve-legalize-vector-storage \
-// RUN: -convert-arm-sme-to-llvm \
-// RUN: -convert-vector-to-llvm=enable-arm-sve \
-// RUN: -cse -canonicalize -test-lower-to-llvm | \
+// RUN: mlir-opt %s -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -e=main -entry-point-result=void \
// RUN: -march=aarch64 -mattr="+sve,+sme" \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f16f16f32.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f16f16f32.mlir
index f081838300a9a4..d2153e8241d572 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f16f16f32.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f16f16f32.mlir
@@ -1,11 +1,6 @@
+// DEFINE: %{opts} =
// DEFINE: %{entry} = main
-// DEFINE: %{fusion_opts} = -arm-sme-outer-product-fusion
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme %{fusion_opts} \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm -o %t
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme=%{opts} -o %t
// DEFINE: %{run} = %mcr_aarch64_cmd %t \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry} -entry-point-result=void \
@@ -18,7 +13,7 @@
// Check result is the same when outerproducts are not combined into widening
// variant.
-// REDEFINE: %{fusion_opts} =
+// REDEFINE: %{opts} = fuse-outer-products=false
// RUN: %{run} | FileCheck %s
func.func @main() {
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f32.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f32.mlir
index 5f41b37560e760..0e07e2299bdc81 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f32.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f32.mlir
@@ -1,10 +1,5 @@
// DEFINE: %{entry_point} = test_outerproduct_no_accumulator_4x4xf32
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm -o %t
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme -o %t
// DEFINE: %{run} = %mcr_aarch64_cmd %t \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f64.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f64.mlir
index a1bb9b7d6f80ec..8fb4864895b63a 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f64.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-f64.mlir
@@ -1,10 +1,5 @@
// DEFINE: %{entry_point} = test_outerproduct_no_accumulator_2x2xf64
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm -o %t
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme -o %t
// DEFINE: %{run} = %mcr_aarch64_cmd %t \
// DEFINE: -march=aarch64 -mattr=+sve,+sme-f64f64 \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-i8i8i32.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-i8i8i32.mlir
index 1770e579f0bd68..befd78e92a161a 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-i8i8i32.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-outerproduct-i8i8i32.mlir
@@ -1,11 +1,5 @@
// DEFINE: %{entry} = main
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// DEFINE: -arm-sme-outer-product-fusion \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-read-2d.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-read-2d.mlir
index 6e028d5fb83614..387531122dcc16 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-read-2d.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-read-2d.mlir
@@ -1,9 +1,5 @@
// DEFINE: %{entry_point} = entry
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-write-2d.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-write-2d.mlir
index c0c1f55d7ddd1a..415dd2ffae97a2 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-write-2d.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transfer-write-2d.mlir
@@ -1,10 +1,5 @@
// DEFINE: %{entry_point} = entry
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za only-if-required-by-ops" \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transpose.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transpose.mlir
index eee3c56351d81e..ec344caeb24cad 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transpose.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/test-transpose.mlir
@@ -1,9 +1,5 @@
// DEFINE: %{entry_point} = entry
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=void \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
index 223bc8ce74343b..85509a88ca8964 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
@@ -1,8 +1,4 @@
-// RUN: mlir-opt %s -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// RUN: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// RUN: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// RUN: -convert-arm-sme-to-llvm -cse -canonicalize \
-// RUN: -test-lower-to-llvm | \
+// RUN: mlir-opt %s -test-lower-to-arm-sme | \
// RUN: %mcr_aarch64_cmd \
// RUN: -march=aarch64 -mattr=+sve,+sme \
// RUN: -e entry -entry-point-result=i32 \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
index 2f151e2ec72fb7..6a1f3b49889f00 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
@@ -1,9 +1,5 @@
// DEFINE: %{entry_point} = za0_d_f64
-// DEFINE: %{compile} = mlir-opt %s \
-// DEFINE: -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -cse -canonicalize \
-// DEFINE: -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=i32 \
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir
index f28bf19b299934..15c892a5b57294 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir
@@ -1,8 +1,5 @@
// DEFINE: %{entry_point} = entry
-// DEFINE: %{compile} = mlir-opt %s -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za" \
-// DEFINE: -convert-vector-to-arm-sme -convert-arith-to-arm-sme \
-// DEFINE: -convert-arm-sme-to-scf -allocate-arm-sme-tiles \
-// DEFINE: -convert-arm-sme-to-llvm -test-lower-to-llvm
+// DEFINE: %{compile} = mlir-opt %s -test-lower-to-arm-sme
// DEFINE: %{run} = %mcr_aarch64_cmd \
// DEFINE: -march=aarch64 -mattr=+sve,+sme \
// DEFINE: -e %{entry_point} -entry-point-result=i32 \
diff --git a/mlir/test/lib/Dialect/ArmSME/CMakeLists.txt b/mlir/test/lib/Dialect/ArmSME/CMakeLists.txt
new file mode 100644
index 00000000000000..40442d9a0405dd
--- /dev/null
+++ b/mlir/test/lib/Dialect/ArmSME/CMakeLists.txt
@@ -0,0 +1,27 @@
+# Exclude tests from libMLIR.so
+add_mlir_library(MLIRArmSMETestPasses
+ TestLowerToArmSME.cpp
+
+ EXCLUDE_FROM_LIBMLIR
+
+ LINK_LIBS PUBLIC
+ MLIRAffineToStandard
+ MLIRArithToArmSME
+ MLIRArmSMEToLLVM
+ MLIRArmSMEToSCF
+ MLIRFuncDialect
+ MLIRFuncToLLVM
+ MLIRIR
+ MLIRIndexToLLVM
+ MLIRLLVMDialect
+ MLIRLinalgTransforms
+ MLIRMathToLLVM
+ MLIRMemRefToLLVM
+ MLIRMemRefTransforms
+ MLIRPass
+ MLIRReconcileUnrealizedCasts
+ MLIRSCFToControlFlow
+ MLIRTransforms
+ MLIRVectorToLLVMPass
+ MLIRVectorToSCF
+ )
diff --git a/mlir/test/lib/Dialect/ArmSME/TestLowerToArmSME.cpp b/mlir/test/lib/Dialect/ArmSME/TestLowerToArmSME.cpp
new file mode 100644
index 00000000000000..bbeb5ca2b4c532
--- /dev/null
+++ b/mlir/test/lib/Dialect/ArmSME/TestLowerToArmSME.cpp
@@ -0,0 +1,157 @@
+//===- TestLowerToArmSME.cpp - Test lowering to ArmSME as a sink pass -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements a pass for testing the lowering to ArmSME as a
+// generally usable sink pass.
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Conversion/AffineToStandard/AffineToStandard.h"
+#include "mlir/Conversion/ArithToArmSME/ArithToArmSME.h"
+#include "mlir/Conversion/ArmSMEToLLVM/ArmSMEToLLVM.h"
+#include "mlir/Conversion/ArmSMEToSCF/ArmSMEToSCF.h"
+#include "mlir/Conversion/FuncToLLVM/ConvertFuncToLLVMPass.h"
+#include "mlir/Conversion/IndexToLLVM/IndexToLLVM.h"
+#include "mlir/Conversion/MathToLLVM/MathToLLVM.h"
+#include "mlir/Conversion/MemRefToLLVM/MemRefToLLVM.h"
+#include "mlir/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.h"
+...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one thing:
"'-arm-sme-outer-product-fusion' pass"), | ||
llvm::cl::init(true)}; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave a comment with the full pipeline in a form that can be passed mlir-opt
? I'm pretty sure in the future we'll want to try different orders/passes, so I think that'd be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-dump-pass-pipeline
can be used for that? This is what it looks like
mlir-opt input.mlir -test-lower-to-arm-sme -dump-pass-pipeline
Pass Manager with 13 passes:
builtin.module(arm-sme-vector-legalization,canonicalize{ max-iterations=10 max-num-rewrites=-1 region-simplify=true test-convergence=false top-down=true},cse,convert-arith-to-arm-sme,convert-vector-to-arm-sme,func.func(arm-sme-outer-product-fusion),convert-arm-sme-to-scf,convert-vector-to-scf{full-unroll=true lower-tensors=false target-rank=1},func.func(allocate-arm-sme-tiles),func.func(enable-arm-streaming{only-if-required-by-ops=true streaming-mode=streaming-locally za-mode=new-za}),convert-arm-sme-to-llvm,canonicalize{ max-iterations=10 max-num-rewrites=-1 region-simplify=true test-convergence=false top-down=true},cse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's still more noisy than I'd like, I'd prefer the simple form (e.g. -convert-vector-to-arm-sme -convert-arith-to-arm-sme ...
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment will be formatted by clang-format and not easy to copy/paste. It'll also have to be kept in sync with the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is this change makes it harder experiment with the passes for an ArmSME test. You either have to use -dump-pass-pipeline
which is in a pretty inconvenient form, or edit the passes in the pipeline and recompile.
I normally copy passes from a test (or from the LLVM lit logs) then play around with the order or adding/removing them, and this obscures that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy/pasting:
// -arm-sme-vector-legalization -canonicalize -cse -convert-arith-to-arm-sme
// -convert-vector-to-arm-sme -arm-sme-outer-product-fusion
// -convert-arm-sme-to-scf -convert-vector-to-scf=full-unroll
// -allocate-arm-sme-tiles
// -enable-arm-streaming="streaming-mode=streaming-locally za-mode=new-za
// only-if-required-by-ops" -convert-arm-sme-to-llvm -canonicalize -cse
seems less convenient than -dump-pass-pipeline
to me. There's a bit of noise mostly from canonicalization options but it's not the end of the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to land this and we can see how things go 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's still more noisy than I'd like, I'd prefer the simple form (e.g. -convert-vector-to-arm-sme -convert-arith-to-arm-sme ...).
This seems hardly like a reason to encode this in the codebase IMO, this isn't really maintainable.
More importantly the textual pipeline is the only real correct form to describe a pipeline: the short one is only a convenience for running single passes but ambiguous when forming a more complex pipeline.
The option parsing for example can be very confusing when the same pass is attempted to be executed multiple time with different options.
If you would like to skip the default value for the options that aren't touched, this is likely doable: patch welcome for -dump-pass-pipeline here!
The ArmSME compilation pipeline has evolved significantly and is now sufficiently complex enough that it warrants a proper lowering pipeline that encapsulates the various passes and orderings. Currently the pipeline is loosely defined in our integration tests, but these have diverged and are not using the same passes or ordering everywhere.
This patch introduces a test-lower-to-arm-sme pipeline mirroring test-lower-to-llvm that provides some sanity when running e2e examples and can be used a reference for targeting ArmSME in MLIR.
All the integration tests are updated to use this pipeline. The intention is to productize the pipeline once it becomes more mature.