-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][ArmSME][test] Unroll reduction dimension in multi-tile-matmul.mlir #81160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mlir This tests both llvm#80148 and llvm#80170 work together to allow unrolling the reduction dimension of a matmul.
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir-sme Author: Benjamin Maxwell (MacDue) ChangesThis tests both #80148 and #80170 work together to allow unrolling the reduction dimension of a matmul. Full diff: https://github.com/llvm/llvm-project/pull/81160.diff 1 Files Affected:
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
index 327f237ba8948..d5c35068ccb32 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
@@ -73,14 +73,14 @@ module attributes {transform.with_named_sequence} {
%matmul = transform.structured.match ops{["linalg.matmul"]} in %module
: (!transform.any_op) -> !transform.any_op
- // Step 1: Tile for size [8] x [8], which corresponds to (2 x SVLs) x (2 x SVLs),
- // where SVLs is the number of 32-bit elements in a vector of SVL bits.
- // This uses all four 32-bit SME virtual tiles.
- %tiled_linalg_op, %loop_i, %loop_j, %loop_k = transform.structured.tile_using_for %matmul[[8], [8], 1]
+ // Step 1: Tile for size [8] x [8] (unrolled by 4), which corresponds to
+ // (2 x SVLs) x (2 x SVLs), where SVLs is the number of 32-bit elements in a
+ // vector of SVL bits. This uses all four 32-bit SME virtual tiles.
+ %tiled_linalg_op, %loop_i, %loop_j, %loop_k = transform.structured.tile_using_for %matmul[[8], [8], 4]
: (!transform.any_op) -> (!transform.any_op, !transform.op<"scf.for">, !transform.op<"scf.for">, !transform.op<"scf.for">)
// Step 2: Vectorize.
- transform.structured.vectorize %tiled_linalg_op vector_sizes [[8], [8], 1]
+ transform.structured.vectorize %tiled_linalg_op vector_sizes [[8], [8], 4]
: !transform.any_op
// Step 3: Bufferize ahead of TransferReadDropUnitDimsPattern, which
|
// Step 1: Tile for size [8] x [8] (unrolled by 4), which corresponds to | ||
// (2 x SVLs) x (2 x SVLs), where SVLs is the number of 32-bit elements in a | ||
// vector of SVL bits. This uses all four 32-bit SME virtual tiles. | ||
%tiled_linalg_op, %loop_i, %loop_j, %loop_k = transform.structured.tile_using_for %matmul[[8], [8], 4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the K dims controls the number of SSVE regs that we would use here, right? And we need 4 regs per 1 K dim? So the possible max before spilling happens would be 8, right?
That's tangential to this PR - I'm just curious about the optimal set-up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's about right 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cheers
This tests both #80148 and #80170 work together to allow unrolling the reduction dimension of a matmul.