Skip to content

[SLP][REVEC] Expand getelementptr into vector form. #103704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13760,6 +13760,27 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;
}
if (isa<FixedVectorType>(ScalarTy)) {
assert(SLPReVec && "FixedVectorType is not expected.");
// CreateMaskedGather expects VecTy and VecPtr have same size. We need
// to expand VecPtr if ScalarTy is a vector type.
unsigned ScalarTyNumElements =
cast<FixedVectorType>(ScalarTy)->getNumElements();
unsigned VecTyNumElements =
cast<FixedVectorType>(VecTy)->getNumElements();
assert(VecTyNumElements % ScalarTyNumElements == 0 &&
"Cannot expand getelementptr.");
unsigned VF = VecTyNumElements / ScalarTyNumElements;
SmallVector<Constant *> Indices(VecTyNumElements);
transform(seq(VecTyNumElements), Indices.begin(), [=](unsigned I) {
return Builder.getInt64(I % ScalarTyNumElements);
});
Comment on lines +13775 to +13777
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we rereading the same data here several times? Can it be replaced with subvector insert instead?

Copy link
Contributor Author

@HanKuanChen HanKuanChen Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this dea6a00?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean read the data once and then just copy it several times in IR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can. Each time the ScalarTyNumElements is different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It may vary between calls, but it is not modified here. Instead of re-reading the same data several times, better just read it once and then insert subvector(s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you have the cost model for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not represent it as 2 loads + insert?

This test has -slp-threshold=-100. Basically, scalar version (loads + insert) cost is smaller than vector version (masked gather) cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you have the cost model for this change?

GEPs are calculated by TTI.getGEPCost in getGEPCosts.
TTI->getGatherScatterOpCost correctly calculates the cost if ScalarTy is FixedVectorType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not represent it as 2 loads + insert?

This test has -slp-threshold=-100. Basically, scalar version (loads + insert) cost is smaller than vector version (masked gather) cost.

By scalar you mean small vector <2 x >?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

VecPtr = Builder.CreateGEP(
VecTy->getElementType(),
Builder.CreateShuffleVector(
VecPtr, createReplicatedMask(ScalarTyNumElements, VF)),
ConstantVector::get(Indices));
}
// Use the minimum alignment of the gathered loads.
Align CommonAlignment = computeCommonAlignment<LoadInst>(E->Scalars);
NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);
Expand Down
43 changes: 43 additions & 0 deletions llvm/test/Transforms/SLPVectorizer/RISCV/revec.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -mtriple=riscv64 -mcpu=sifive-x280 -passes=slp-vectorizer -S -slp-revec -slp-max-reg-size=1024 -slp-threshold=-100 %s | FileCheck %s

define i32 @test() {
; CHECK-LABEL: @test(
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[IF_END_I87:%.*]]
; CHECK: if.end.i87:
; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> getelementptr (i32, <4 x ptr> <ptr inttoptr (i64 64036 to ptr), ptr inttoptr (i64 64036 to ptr), ptr inttoptr (i64 64064 to ptr), ptr inttoptr (i64 64064 to ptr)>, <4 x i64> <i64 0, i64 1, i64 0, i64 1>), i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> poison)
; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.vector.insert.v4i32.v2i32(<4 x i32> poison, <2 x i32> poison, i64 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inserting poison here?

; CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.vector.insert.v4i32.v2i32(<4 x i32> [[TMP1]], <2 x i32> zeroinitializer, i64 2)
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: switch i32 0, label [[SW_BB509_I:%.*]] [
; CHECK-NEXT: i32 1, label [[SW_BB509_I]]
; CHECK-NEXT: i32 0, label [[IF_THEN458_I:%.*]]
; CHECK-NEXT: ]
; CHECK: if.then458.i:
; CHECK-NEXT: br label [[SW_BB509_I]]
; CHECK: sw.bb509.i:
; CHECK-NEXT: [[TMP4:%.*]] = phi <4 x i32> [ [[TMP0]], [[IF_THEN458_I]] ], [ [[TMP3]], [[IF_END_I87]] ], [ [[TMP3]], [[IF_END_I87]] ]
; CHECK-NEXT: ret i32 0
;
entry:
%getelementptr0 = getelementptr i8, ptr null, i64 64036
%getelementptr1 = getelementptr i8, ptr null, i64 64064
br label %if.end.i87

if.end.i87: ; preds = %entry
%0 = load <2 x i32>, ptr %getelementptr0, align 4
%1 = load <2 x i32>, ptr %getelementptr1, align 8
switch i32 0, label %sw.bb509.i [
i32 1, label %sw.bb509.i
i32 0, label %if.then458.i
]

if.then458.i: ; preds = %if.end.i87
br label %sw.bb509.i

sw.bb509.i: ; preds = %if.then458.i, %if.end.i87, %if.end.i87
%4 = phi <2 x i32> [ %0, %if.then458.i ], [ %0, %if.end.i87 ], [ %0, %if.end.i87 ]
%5 = phi <2 x i32> [ %1, %if.then458.i ], [ zeroinitializer, %if.end.i87 ], [ zeroinitializer, %if.end.i87 ]
ret i32 0
}
Loading