-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][vector] Add scalable lowering for transfer_write(transpose)
#101353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][vector] Add scalable lowering for transfer_write(transpose)
#101353
Conversation
@llvm/pr-subscribers-mlir Author: Benjamin Maxwell (MacDue) ChangesThis specifically handles the case of a transpose from a vector type like Example:
Becomes:
Patch is 23.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101353.diff 5 Files Affected:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left couple of minor comments but otherwise LGTM cheers
6ef6af3
to
c83f20c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG % some minor suggestions
ae9adad
to
3359eb3
Compare
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Just a couple of minor nits.
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353 Signed-off-by: Benjamin Maxwell <[email protected]>
3359eb3
to
8545557
Compare
This specifically handles the case of a transpose from a vector type like `vector<8x[4]xf32>` to `vector<[4]x8xf32>`. Such transposes occur fairly frequently when scalably vectorizing `linalg.generic`s. There is no direct lowering for these (as types like `vector<[4]x8xf32>` cannot be represented in LLVM-IR). However, if the only use of the transpose is a write, then it is possible to lower the `transfer_write(transpose)` as a VLA loop. Example: ```mlir %transpose = vector.transpose %vec, [1, 0] : vector<4x[4]xf32> to vector<[4]x4xf32> vector.transfer_write %transpose, %dest[%i, %j] {in_bounds = [true, true]} : vector<[4]x4xf32>, memref<?x?xf32> ``` Becomes: ```mlir %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %0 = vector.extract %arg0[0] : vector<[4]xf32> from vector<4x[4]xf32> %1 = vector.extract %arg0[1] : vector<[4]xf32> from vector<4x[4]xf32> %2 = vector.extract %arg0[2] : vector<[4]xf32> from vector<4x[4]xf32> %3 = vector.extract %arg0[3] : vector<[4]xf32> from vector<4x[4]xf32> %vscale = vector.vscale %c4_vscale = arith.muli %vscale, %c4 : index scf.for %idx = %c0 to %c4_vscale step %c1 { %4 = vector.extract %0[%idx] : f32 from vector<[4]xf32> %5 = vector.extract %1[%idx] : f32 from vector<[4]xf32> %6 = vector.extract %2[%idx] : f32 from vector<[4]xf32> %7 = vector.extract %3[%idx] : f32 from vector<[4]xf32> %slice_i = affine.apply #map(%idx)[%i] %slice = vector.from_elements %4, %5, %6, %7 : vector<4xf32> vector.transfer_write %slice, %arg1[%slice_i, %j] {in_bounds = [true]} : vector<4xf32>, memref<?x?xf32> } ```
8545557
to
7b50608
Compare
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353 Signed-off-by: Benjamin Maxwell <[email protected]>
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353 Signed-off-by: Benjamin Maxwell <[email protected]>
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353 Signed-off-by: Benjamin Maxwell <[email protected]>
This enables a general scalable lowering for `transfer_write(transpose)` when ArmSME is _not_ available. The ArmSME dialect already had its own (more specific) lowerings for cases like this, which is why these lowerings are disabled when SME is available. Depends on: llvm/llvm-project#101353 --------- Signed-off-by: Benjamin Maxwell <[email protected]>
This specifically handles the case of a transpose from a vector type like
vector<8x[4]xf32>
tovector<[4]x8xf32>
. Such transposes occur fairly frequently when scalably vectorizinglinalg.generic
s. There is no direct lowering for these (as types likevector<[4]x8xf32>
cannot be represented in LLVM-IR). However, if the only use of the transpose is a write, then it is possible to lower thetransfer_write(transpose)
as a VLA loop.Example:
Becomes: