Skip to content

[mlir][vector] Decouple unrolling gather and gather to llvm lowering #132206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h
Original file line number Diff line number Diff line change
Expand Up @@ -241,16 +241,20 @@ void populateVectorStepLoweringPatterns(RewritePatternSet &patterns,

/// Populate the pattern set with the following patterns:
///
/// [FlattenGather]
/// Flattens 2 or more dimensional `vector.gather` ops by unrolling the
/// [UnrollGather]
/// Unrolls 2 or more dimensional `vector.gather` ops by unrolling the
/// outermost dimension.
void populateVectorGatherLoweringPatterns(RewritePatternSet &patterns,
PatternBenefit benefit = 1);

/// Populate the pattern set with the following patterns:
///
/// [Gather1DToConditionalLoads]
/// Turns 1-d `vector.gather` into a scalarized sequence of `vector.loads` or
/// `tensor.extract`s. To avoid out-of-bounds memory accesses, these
/// loads/extracts are made conditional using `scf.if` ops.
void populateVectorGatherLoweringPatterns(RewritePatternSet &patterns,
PatternBenefit benefit = 1);
void populateVectorGatherToConditionalLoadPatterns(RewritePatternSet &patterns,
PatternBenefit benefit = 1);

/// Populates instances of `MaskOpRewritePattern` to lower masked operations
/// with `vector.mask`. Patterns should rewrite the `vector.mask` operation and
Expand Down
47 changes: 14 additions & 33 deletions mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -269,49 +269,30 @@ class VectorGatherOpConversion
if (failed(isMemRefTypeSupported(memRefType, *this->getTypeConverter())))
return failure();

auto loc = gather->getLoc();
VectorType vType = gather.getVectorType();
if (vType.getRank() > 1)
return failure();

Location loc = gather->getLoc();

// Resolve alignment.
unsigned align;
if (failed(getMemRefAlignment(*getTypeConverter(), memRefType, align)))
return failure();

// Resolve address.
Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getBase(),
adaptor.getIndices(), rewriter);
Value base = adaptor.getBase();
Value ptrs =
getIndexedPtrs(rewriter, loc, *this->getTypeConverter(), memRefType,
base, ptr, adaptor.getIndexVec(), vType);

auto llvmNDVectorTy = adaptor.getIndexVec().getType();
// Handle the simple case of 1-D vector.
if (!isa<LLVM::LLVMArrayType>(llvmNDVectorTy)) {
auto vType = gather.getVectorType();
// Resolve address.
Value ptrs =
getIndexedPtrs(rewriter, loc, *this->getTypeConverter(), memRefType,
base, ptr, adaptor.getIndexVec(), vType);
// Replace with the gather intrinsic.
rewriter.replaceOpWithNewOp<LLVM::masked_gather>(
gather, typeConverter->convertType(vType), ptrs, adaptor.getMask(),
adaptor.getPassThru(), rewriter.getI32IntegerAttr(align));
return success();
}

const LLVMTypeConverter &typeConverter = *this->getTypeConverter();
auto callback = [align, memRefType, base, ptr, loc, &rewriter,
&typeConverter](Type llvm1DVectorTy,
ValueRange vectorOperands) {
// Resolve address.
Value ptrs = getIndexedPtrs(
rewriter, loc, typeConverter, memRefType, base, ptr,
/*index=*/vectorOperands[0], cast<VectorType>(llvm1DVectorTy));
// Create the gather intrinsic.
return rewriter.create<LLVM::masked_gather>(
loc, llvm1DVectorTy, ptrs, /*mask=*/vectorOperands[1],
/*passThru=*/vectorOperands[2], rewriter.getI32IntegerAttr(align));
};
SmallVector<Value> vectorOperands = {
adaptor.getIndexVec(), adaptor.getMask(), adaptor.getPassThru()};
return LLVM::detail::handleMultidimensionalVectors(
gather, vectorOperands, *getTypeConverter(), callback, rewriter);
// Replace with the gather intrinsic.
rewriter.replaceOpWithNewOp<LLVM::masked_gather>(
gather, typeConverter->convertType(vType), ptrs, adaptor.getMask(),
adaptor.getPassThru(), rewriter.getI32IntegerAttr(align));
return success();
}
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ void ConvertVectorToLLVMPass::runOnOperation() {
populateVectorInsertExtractStridedSliceTransforms(patterns);
populateVectorStepLoweringPatterns(patterns);
populateVectorRankReducingFMAPattern(patterns);
populateVectorGatherLoweringPatterns(patterns);
(void)applyPatternsGreedily(getOperation(), std::move(patterns));
}

Expand Down
18 changes: 12 additions & 6 deletions mlir/lib/Dialect/Vector/Transforms/LowerVectorGather.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ using namespace mlir;
using namespace mlir::vector;

namespace {
/// Flattens 2 or more dimensional `vector.gather` ops by unrolling the
/// Unrolls 2 or more dimensional `vector.gather` ops by unrolling the
/// outermost dimension. For example:
/// ```
/// %g = vector.gather %base[%c0][%v], %mask, %pass_thru :
Expand All @@ -56,14 +56,14 @@ namespace {
/// When applied exhaustively, this will produce a sequence of 1-d gather ops.
///
/// Supports vector types with a fixed leading dimension.
struct FlattenGather : OpRewritePattern<vector::GatherOp> {
struct UnrollGather : OpRewritePattern<vector::GatherOp> {
using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(vector::GatherOp op,
PatternRewriter &rewriter) const override {
VectorType resultTy = op.getType();
if (resultTy.getRank() < 2)
return rewriter.notifyMatchFailure(op, "already flat");
return rewriter.notifyMatchFailure(op, "already 1-D");

// Unrolling doesn't take vscale into account. Pattern is disabled for
// vectors with leading scalable dim(s).
Expand Down Expand Up @@ -107,7 +107,8 @@ struct FlattenGather : OpRewritePattern<vector::GatherOp> {
/// ```mlir
/// %subview = memref.subview %M (...)
/// : memref<100x3xf32> to memref<100xf32, strided<[3]>>
/// %gather = vector.gather %subview[%idxs] (...) : memref<100xf32, strided<[3]>>
/// %gather = vector.gather %subview[%idxs] (...)
/// : memref<100xf32, strided<[3]>>
/// ```
/// ==>
/// ```mlir
Expand Down Expand Up @@ -269,6 +270,11 @@ struct Gather1DToConditionalLoads : OpRewritePattern<vector::GatherOp> {

void mlir::vector::populateVectorGatherLoweringPatterns(
RewritePatternSet &patterns, PatternBenefit benefit) {
patterns.add<FlattenGather, RemoveStrideFromGatherSource,
Gather1DToConditionalLoads>(patterns.getContext(), benefit);
patterns.add<UnrollGather>(patterns.getContext(), benefit);
}

void mlir::vector::populateVectorGatherToConditionalLoadPatterns(
RewritePatternSet &patterns, PatternBenefit benefit) {
patterns.add<RemoveStrideFromGatherSource, Gather1DToConditionalLoads>(
patterns.getContext(), benefit);
}
46 changes: 0 additions & 46 deletions mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -2074,52 +2074,6 @@ func.func @gather_index_scalable(%arg0: memref<?xindex>, %arg1: vector<[3]xindex

// -----

func.func @gather_2d_from_1d(%arg0: memref<?xf32>, %arg1: vector<2x3xi32>, %arg2: vector<2x3xi1>, %arg3: vector<2x3xf32>) -> vector<2x3xf32> {
%0 = arith.constant 0: index
%1 = vector.gather %arg0[%0][%arg1], %arg2, %arg3 : memref<?xf32>, vector<2x3xi32>, vector<2x3xi1>, vector<2x3xf32> into vector<2x3xf32>
return %1 : vector<2x3xf32>
}

// CHECK-LABEL: func @gather_2d_from_1d
// CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}} : (!llvm.ptr, i64) -> !llvm.ptr, f32
// CHECK: %[[I0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<3xi32>>
// CHECK: %[[M0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<3xi1>>
// CHECK: %[[S0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<3xf32>>
// CHECK: %[[P0:.*]] = llvm.getelementptr %[[B]][%[[I0]]] : (!llvm.ptr, vector<3xi32>) -> !llvm.vec<3 x ptr>, f32
// CHECK: %[[G0:.*]] = llvm.intr.masked.gather %[[P0]], %[[M0]], %[[S0]] {alignment = 4 : i32} : (!llvm.vec<3 x ptr>, vector<3xi1>, vector<3xf32>) -> vector<3xf32>
// CHECK: %{{.*}} = llvm.insertvalue %[[G0]], %{{.*}}[0] : !llvm.array<2 x vector<3xf32>>
// CHECK: %[[I1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<3xi32>>
// CHECK: %[[M1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<3xi1>>
// CHECK: %[[S1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<3xf32>>
// CHECK: %[[P1:.*]] = llvm.getelementptr %[[B]][%[[I1]]] : (!llvm.ptr, vector<3xi32>) -> !llvm.vec<3 x ptr>, f32
// CHECK: %[[G1:.*]] = llvm.intr.masked.gather %[[P1]], %[[M1]], %[[S1]] {alignment = 4 : i32} : (!llvm.vec<3 x ptr>, vector<3xi1>, vector<3xf32>) -> vector<3xf32>
// CHECK: %{{.*}} = llvm.insertvalue %[[G1]], %{{.*}}[1] : !llvm.array<2 x vector<3xf32>>

// -----

func.func @gather_2d_from_1d_scalable(%arg0: memref<?xf32>, %arg1: vector<2x[3]xi32>, %arg2: vector<2x[3]xi1>, %arg3: vector<2x[3]xf32>) -> vector<2x[3]xf32> {
%0 = arith.constant 0: index
%1 = vector.gather %arg0[%0][%arg1], %arg2, %arg3 : memref<?xf32>, vector<2x[3]xi32>, vector<2x[3]xi1>, vector<2x[3]xf32> into vector<2x[3]xf32>
return %1 : vector<2x[3]xf32>
}

// CHECK-LABEL: func @gather_2d_from_1d_scalable
// CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}} : (!llvm.ptr, i64) -> !llvm.ptr, f32
// CHECK: %[[I0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<[3]xi32>>
// CHECK: %[[M0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<[3]xi1>>
// CHECK: %[[S0:.*]] = llvm.extractvalue %{{.*}}[0] : !llvm.array<2 x vector<[3]xf32>>
// CHECK: %[[P0:.*]] = llvm.getelementptr %[[B]][%[[I0]]] : (!llvm.ptr, vector<[3]xi32>) -> !llvm.vec<? x 3 x ptr>, f32
// CHECK: %[[G0:.*]] = llvm.intr.masked.gather %[[P0]], %[[M0]], %[[S0]] {alignment = 4 : i32} : (!llvm.vec<? x 3 x ptr>, vector<[3]xi1>, vector<[3]xf32>) -> vector<[3]xf32>
// CHECK: %{{.*}} = llvm.insertvalue %[[G0]], %{{.*}}[0] : !llvm.array<2 x vector<[3]xf32>>
// CHECK: %[[I1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<[3]xi32>>
// CHECK: %[[M1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<[3]xi1>>
// CHECK: %[[S1:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.array<2 x vector<[3]xf32>>
// CHECK: %[[P1:.*]] = llvm.getelementptr %[[B]][%[[I1]]] : (!llvm.ptr, vector<[3]xi32>) -> !llvm.vec<? x 3 x ptr>, f32
// CHECK: %[[G1:.*]] = llvm.intr.masked.gather %[[P1]], %[[M1]], %[[S1]] {alignment = 4 : i32} : (!llvm.vec<? x 3 x ptr>, vector<[3]xi1>, vector<[3]xf32>) -> vector<[3]xf32>
// CHECK: %{{.*}} = llvm.insertvalue %[[G1]], %{{.*}}[1] : !llvm.array<2 x vector<[3]xf32>>

// -----


func.func @gather_1d_from_2d(%arg0: memref<4x4xf32>, %arg1: vector<4xi32>, %arg2: vector<4xi1>, %arg3: vector<4xf32>) -> vector<4xf32> {
%0 = arith.constant 3 : index
Expand Down
6 changes: 3 additions & 3 deletions mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -1663,7 +1663,7 @@ func.func @flat_transpose(%arg0: vector<16xf32>) -> vector<16xf32> {

func.func @gather_with_mask(%arg0: memref<?xf32>, %arg1: vector<2x3xi32>, %arg2: vector<2x3xf32>) -> vector<2x3xf32> {
%0 = arith.constant 0: index
%1 = vector.constant_mask [1, 2] : vector<2x3xi1>
%1 = vector.constant_mask [2, 2] : vector<2x3xi1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test modified? I am asking as previously one of the outer lanes was masked out and currently it isn't. Is this significant?

Copy link
Member Author

@Groverkss Groverkss Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is mentioned in the PR summary:

There are still tests for 2D vector.gather, but the constant mask for these test is modified. This is because with the updated lowering, one of the unrolled vector.gather disappears because it is masked off (also demonstrating why this is a better lowering path)

It's to make sure that unrolling is actually producing 2 unrolled gathers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, now I see what you meant, thanks!

%2 = vector.gather %arg0[%0][%arg1], %1, %arg2 : memref<?xf32>, vector<2x3xi32>, vector<2x3xi1>, vector<2x3xf32> into vector<2x3xf32>
return %2 : vector<2x3xf32>
}
Expand All @@ -1677,9 +1677,9 @@ func.func @gather_with_mask(%arg0: memref<?xf32>, %arg1: vector<2x3xi32>, %arg2:
func.func @gather_with_mask_scalable(%arg0: memref<?xf32>, %arg1: vector<2x[3]xi32>, %arg2: vector<2x[3]xf32>) -> vector<2x[3]xf32> {
%0 = arith.constant 0: index
// vector.constant_mask only supports 'none set' or 'all set' scalable
// dimensions, hence [1, 3] rather than [1, 2] as in the example for fixed
// dimensions, hence [2, 3] rather than [2, 2] as in the example for fixed
// width vectors above.
%1 = vector.constant_mask [1, 3] : vector<2x[3]xi1>
%1 = vector.constant_mask [2, 3] : vector<2x[3]xi1>
%2 = vector.gather %arg0[%0][%arg1], %1, %arg2 : memref<?xf32>, vector<2x[3]xi32>, vector<2x[3]xi1>, vector<2x[3]xf32> into vector<2x[3]xf32>
return %2 : vector<2x[3]xf32>
}
Expand Down
1 change: 1 addition & 0 deletions mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,7 @@ struct TestVectorGatherLowering
void runOnOperation() override {
RewritePatternSet patterns(&getContext());
populateVectorGatherLoweringPatterns(patterns);
populateVectorGatherToConditionalLoadPatterns(patterns);
(void)applyPatternsGreedily(getOperation(), std::move(patterns));
}
};
Expand Down
Loading