Skip to content

[mlir][linalg] Block pack matmul pass #89782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions mlir/include/mlir/Dialect/Linalg/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -141,4 +141,63 @@ def LinalgDetensorizePass : InterfacePass<"linalg-detensorize", "FunctionOpInter
];
}

def LinalgBlockPackMatmul : Pass<"linalg-block-pack-matmul"> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not going to hold on to it, but I really dislike having such passes. I think this is just dead weight, and can never fit all downstream uses, and really we should need just transformation methods/patterns and let the passes live downstream and just have a test pass for testing. Such passes in core do not age well.

Copy link
Member

@rengolin rengolin May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such passes in core do not age well.

That's true, but I think it's orthogonal. We need the functionality upstream so that we can all use and share, and it will only age if no one uses. Being in a test pass or a transform or a dialect pass won't change much that equation.

I'd love if IREE and other compilers could bring more functionality to this pass and start using the upstream stuff (parametrized, cost modelled, etc). This is what makes them not age. If we don't provide a way for people to use in their projects, then MLIR gets less "batteries" (as @joker-eph usually say).

let summary = "Convert linalg matmul ops to block layout and back";
let description = [{
Pack a matmul operation into blocked layout with two levels of subdivision:
- major 2D blocks - outer dimensions, consist of minor blocks
- minor 2D blocks - inner dimensions, consist of scalar elements

A 2D matmul MxNxK gets reshaped into blocked 4D representation
as: [MB][NB][mb][nb] += [MB][KB][mb][kb] * [NB][KB][nb][kb]
where the (MB, NB, KB) dimensions represent the major blocks,
and the (mb, nb, kb) are the minor blocks of their respective
original 2D dimensions (M, N, K).

Depending on the initial operands' data layout and the specified
packing options, the major blocks dimensions might get transposed
e.g., [MB][KB] -> [KB][MB]. The minor blocks can also be transposed
e.g., [mb][kb] -> [kb][mb].
Any present batch dimensions remain unchanged.
The final result is unpacked back to the original shape.

For example, given a matmul operation:
```mlir
%res = linalg.matmul ins(%A, %B) outs(%C)
```
the default transformation result can be represented as:
```mlir
%A_packed = pack %A : 2D <MxK> -> 4D <MBxKBxmbxkb>
%B_packed = pack %B : 2D <KxN> -> 4D <NBxKBxnbxkb>
%C_packed = pack %C : 2D <MxN> -> 4D <MBxNBxmbxnb>
%res_packed = linalg.mmt4d ins(%A_packed, %B_packed) outs(%C_packed)
%res = unpack %res_packed : 4D <MBxNBxmbxnb> -> 2D <MxN>
```
}];
let dependentDialects = ["linalg::LinalgDialect", "tensor::TensorDialect"];
let options = [
ListOption<"blockFactors", "block-factors", "int64_t",
"Block factors (mb, nb, kb) for relayout">,
Option<"allowPadding", "allow-padding", "bool",
/*default=*/"true",
"Allow packing padding">,
ListOption<"mnkPaddedSizesNextMultipleOf", "mnk-padded-multiples", "int64_t",
"Next multiples of the packing sizes">,
ListOption<"mnkOrder", "mnk-order", "int64_t",
"Permutation of matmul (M, N, K) dimensions order">,
Option<"lhsTransposeOuterBlocks", "lhs-transpose-outer-blocks", "bool",
/*default=*/"false",
"Transpose LHS outer block layout [MB][KB] -> [KB][MB]">,
Option<"lhsTransposeInnerBlocks", "lhs-transpose-inner-blocks", "bool",
/*default=*/"false",
"Transpose LHS inner block layout [mb][kb] -> [kb][mb]">,
Option<"rhsTransposeOuterBlocks", "rhs-transpose-outer-blocks", "bool",
/*default=*/"true",
"Transpose RHS outer block layout [KB][NB] -> [NB][KB]">,
Option<"rhsTransposeInnerBlocks", "rhs-transpose-inner-blocks", "bool",
/*default=*/"true",
"Transpose RHS inner block layout [kb][nb] -> [nb][kb]">
];
}

#endif // MLIR_DIALECT_LINALG_PASSES
64 changes: 64 additions & 0 deletions mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -1162,6 +1162,66 @@ packMatmulGreedily(RewriterBase &rewriter, LinalgOp linalgOp,
ArrayRef<int64_t> mnkPaddedSizesNextMultipleOf,
ArrayRef<int64_t> mnkOrder);

struct BlockPackMatmulOptions {
/// Minor block factors (mb, nb, kb) for packing relayout where mb, mn are
/// the parallel dimensions and kb is the reduction dimension.
SmallVector<int64_t, 3> blockFactors;

/// If true, allows packing of dimensions that only partially fit into the
/// block factors.
bool allowPadding = true;

/// Next multiples of the packing sizes.
SmallVector<int64_t, 3> mnkPaddedSizesNextMultipleOf;

/// Permutation of matmul (M, N, K) dimensions order.
SmallVector<int64_t, 3> mnkOrder = {0, 1, 2};

/// Transpose LHS outer block layout [MB][KB] -> [KB][MB].
bool lhsTransposeOuterBlocks = false;

/// Transpose LHS inner block layout [mb][kb] -> [kb][mb].
bool lhsTransposeInnerBlocks = false;

/// Transpose RHS outer block layout [KB][NB] -> [NB][KB].
bool rhsTransposeOuterBlocks = true;

/// Transpose RHS inner block layout [kb][nb] -> [nb][kb].
bool rhsTransposeInnerBlocks = true;
};

/// Function type which is used to control matmul packing.
/// It is expected to return valid packing configuration for each operation.
/// Lack of packing options indicates that no valid configuration could be
/// assigned and the operation will not be packed.
using ControlBlockPackMatmulFn =
std::function<std::optional<BlockPackMatmulOptions>(linalg::LinalgOp)>;

/// Pack a matmul operation into blocked 4D layout.
///
/// Relayout a matmul operation into blocked layout with two levels of
/// subdivision:
/// - major 2D blocks - outer dimensions, consist of minor blocks
/// - minor 2D blocks - inner dimensions, consist of scalar elements
///
/// A 2D matmul MxNxK gets reshaped into blocked 4D representation
/// as: [MB][NB][mb][nb] += [MB][KB][mb][kb] * [NB][KB][nb][kb]
/// where the (MB, NB, KB) dimensions represent the major blocks,
/// and the (mb, nb, kb) are the minor blocks of their respective
/// original 2D dimensions (M, N, K).
///
/// Depending on the initial operands' data layout and the specified
/// packing options, the major blocks dimensions might get transposed
/// e.g., [MB][KB] -> [KB][MB]. The minor blocks can also be transposed
/// e.g., [mb][kb] -> [kb][mb].
/// Any present batch dimensions remain unchanged.
/// The final result is unpacked back to the original shape.
///
/// Return failure if no valid packing options are provided.
FailureOr<PackResult>
blockPackMatmul(RewriterBase &rewriter, linalg::LinalgOp linalgOp,
const ControlBlockPackMatmulFn &controlPackMatmul);

/// Rewrite tensor.from_elements to linalg.generic.
FailureOr<Operation *>
rewriteInDestinationPassingStyle(RewriterBase &rewriter,
Expand Down Expand Up @@ -1628,6 +1688,10 @@ void populateSplitReductionPattern(
void populateTransposeMatmulPatterns(RewritePatternSet &patterns,
bool transposeLHS = true);

/// Patterns to block pack Linalg matmul ops.
void populateBlockPackMatmulPatterns(RewritePatternSet &patterns,
const ControlBlockPackMatmulFn &controlFn);

} // namespace linalg
} // namespace mlir

Expand Down
Loading