Skip to content

[SLP][REVEC] Support more mask pattern usage in shufflevector. #106212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 3, 2024
61 changes: 46 additions & 15 deletions llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -272,8 +272,7 @@ static void transformScalarShuffleIndiciesToVector(unsigned VecTyNumElements,
/// A group has the following features
/// 1. All of value in a group are shufflevector.
/// 2. The mask of all shufflevector is isExtractSubvectorMask.
/// 3. The mask of all shufflevector uses all of the elements of the source (and
/// the elements are used in order).
/// 3. The mask of all shufflevector uses all of the elements of the source.
/// e.g., it is 1 group (%0)
/// %1 = shufflevector <16 x i8> %0, <16 x i8> poison,
/// <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
Expand Down Expand Up @@ -309,7 +308,7 @@ static unsigned getShufflevectorNumGroups(ArrayRef<Value *> VL) {
auto *SV = cast<ShuffleVectorInst>(VL[I]);
Value *Src = SV->getOperand(0);
ArrayRef<Value *> Group = VL.slice(I, GroupSize);
SmallVector<int> ExtractionIndex(SVNumElements);
SmallBitVector UsedIndex(SVNumElements);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to account all elements here, just group indices should be enough, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will fix it.

if (!all_of(Group, [&](Value *V) {
auto *SV = cast<ShuffleVectorInst>(V);
// From the same source.
Expand All @@ -318,12 +317,11 @@ static unsigned getShufflevectorNumGroups(ArrayRef<Value *> VL) {
int Index;
if (!SV->isExtractSubvectorMask(Index))
return false;
for (int I : seq<int>(Index, Index + SV->getShuffleMask().size()))
ExtractionIndex.push_back(I);
UsedIndex.set(Index, Index + SV->getShuffleMask().size());
return true;
}))
return 0;
if (!is_sorted(ExtractionIndex))
if (!UsedIndex.all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just reuse the same group several times but some other group do not use at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example? I am not sure what you mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, <0,1> extracted several time, while <2,3> is not extracted at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot. We had a getShufflevectorNumGroups in getScalarsVectorizationState. If getShufflevectorNumGroups returns 0, then buildTree_rec will stop (a TreeEntry::NeedToGather will return).
getShufflevectorNumGroups use

    SmallVector<int> ExtractionIndex(SVNumElements);
    if (!all_of(Group, [&](Value *V) {
          auto *SV = cast<ShuffleVectorInst>(V);
          // From the same source.
          if (SV->getOperand(0) != Src)
            return false;
          int Index;
          if (!SV->isExtractSubvectorMask(Index))
            return false;
          for (int I : seq<int>(Index, Index + SV->getShuffleMask().size()))
            ExtractionIndex.push_back(I);
          return true;
        }))
      return 0;
    if (!is_sorted(ExtractionIndex))
      return 0;

to make sure each elements will be accessed.

return 0;
++NumGroup;
}
Expand Down Expand Up @@ -10195,12 +10193,38 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
return VecCost;
};
if (SLPReVec && !E->isAltShuffle())
return GetCostDiff(GetScalarCost, [](InstructionCost) {
// shufflevector will be eliminated by instcombine because the
// shufflevector masks are used in order (guaranteed by
// getShufflevectorNumGroups). The vector cost is 0.
return TTI::TCC_Free;
});
return GetCostDiff(
GetScalarCost, [&](InstructionCost) -> InstructionCost {
// If a group uses mask in order, the shufflevector can be
// eliminated by instcombine. Then the cost is 0.
bool IsIdentity = true;
auto *SV = cast<ShuffleVectorInst>(VL.front());
unsigned SVNumElements =
cast<FixedVectorType>(SV->getOperand(0)->getType())
->getNumElements();
unsigned GroupSize = SVNumElements / SV->getShuffleMask().size();
for (size_t I = 0, E = VL.size(); I != E; I += GroupSize) {
ArrayRef<Value *> Group = VL.slice(I, GroupSize);
SmallVector<int> ExtractionIndex(SVNumElements);
for_each(Group, [&](Value *V) {
auto *SV = cast<ShuffleVectorInst>(V);
int Index;
SV->isExtractSubvectorMask(Index);
for (int I :
seq<int>(Index, Index + SV->getShuffleMask().size()))
ExtractionIndex.push_back(I);
});
if (!is_sorted(ExtractionIndex)) {
IsIdentity = false;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do a return here, no need to modify IsIdentity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather doubt you need all this, just a heck for identity should be enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can vectorize the following instructions and eliminate shufflevector.

  %5 = shufflevector <4 x i64> %3, <4 x i64> poison, <2 x i32> <i32 0, i32 1>
  %6 = shufflevector <4 x i64> %3, <4 x i64> poison, <2 x i32> <i32 2, i32 3>
  %7 = shufflevector <4 x i64> %4, <4 x i64> poison, <2 x i32> <i32 0, i32 1>
  %8 = shufflevector <4 x i64> %4, <4 x i64> poison, <2 x i32> <i32 2, i32 3>

But this cannot.

  %5 = shufflevector <4 x i64> %3, <4 x i64> poison, <2 x i32> <i32 2, i32 3>
  %6 = shufflevector <4 x i64> %3, <4 x i64> poison, <2 x i32> <i32 0, i32 1>
  %7 = shufflevector <4 x i64> %4, <4 x i64> poison, <2 x i32> <i32 0, i32 1>
  %8 = shufflevector <4 x i64> %4, <4 x i64> poison, <2 x i32> <i32 2, i32 3>

We need to check all shufflevector and know whether there is an identity mask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you don't need to check for sorted array, just need to check that each next extract mask starting index > than the previous one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Fixed.

return IsIdentity
? TTI::TCC_Free
: ::getShuffleCost(
*TTI, TargetTransformInfo::SK_PermuteSingleSrc,
VecTy, calculateShufflevectorMask(E->Scalars));
});
return GetCostDiff(GetScalarCost, GetVectorCost);
}
case Instruction::Freeze:
Expand Down Expand Up @@ -13971,9 +13995,16 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;
}
// The current shufflevector usage always duplicate the source.
V = Builder.CreateShuffleVector(Src,
calculateShufflevectorMask(E->Scalars));
assert(isa<ShuffleVectorInst>(Src) &&
"Not supported shufflevector usage.");
auto *SVSrc = cast<ShuffleVectorInst>(Src);
assert(isa<PoisonValue>(SVSrc->getOperand(1)) &&
"Not supported shufflevector usage.");
SmallVector<int> ThisMask(calculateShufflevectorMask(E->Scalars));
SmallVector<int> NewMask(ThisMask.size());
transform(ThisMask, NewMask.begin(),
[&SVSrc](int Mask) { return SVSrc->getShuffleMask()[Mask]; });
V = Builder.CreateShuffleVector(SVSrc->getOperand(0), NewMask);
propagateIRFlags(V, E->Scalars, VL0);
} else {
assert(E->isAltShuffle() &&
Expand Down
37 changes: 26 additions & 11 deletions llvm/test/Transforms/SLPVectorizer/revec-shufflevector.ll
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,9 @@ define void @test2(ptr %in, ptr %out) {
; CHECK-LABEL: @test2(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x i32>, ptr [[IN:%.*]], align 1
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP1]] to <4 x i64>
; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[TMP3]], <4 x i64> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP3]], <4 x i64> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[OUT:%.*]], i64 16
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[OUT]], i64 32
; CHECK-NEXT: store <2 x i64> [[TMP5]], ptr [[OUT]], align 8
; CHECK-NEXT: store <2 x i64> [[TMP6]], ptr [[TMP7]], align 8
; CHECK-NEXT: store <4 x i64> [[TMP4]], ptr [[TMP8]], align 8
; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i32> [[TMP0]] to <8 x i64>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i64> [[TMP1]], <8 x i64> poison, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: store <8 x i64> [[TMP2]], ptr [[OUT:%.*]], align 8
; CHECK-NEXT: ret void
;
entry:
Expand All @@ -67,3 +59,26 @@ entry:
store <2 x i64> %8, ptr %12, align 8
ret void
}

define void @test3(<16 x i32> %0, ptr %out) {
; CHECK-LABEL: @test3(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[TMP0:%.*]], <16 x i32> poison, <16 x i32> <i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: store <16 x i32> [[TMP1]], ptr [[OUT:%.*]], align 4
; CHECK-NEXT: ret void
;
entry:
%1 = shufflevector <16 x i32> %0, <16 x i32> poison, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
%2 = shufflevector <16 x i32> %0, <16 x i32> poison, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
%3 = shufflevector <16 x i32> %0, <16 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%4 = shufflevector <16 x i32> %0, <16 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%5 = getelementptr inbounds i32, ptr %out, i64 0
%6 = getelementptr inbounds i32, ptr %out, i64 4
%7 = getelementptr inbounds i32, ptr %out, i64 8
%8 = getelementptr inbounds i32, ptr %out, i64 12
store <4 x i32> %1, ptr %5, align 4
store <4 x i32> %2, ptr %6, align 4
store <4 x i32> %3, ptr %7, align 4
store <4 x i32> %4, ptr %8, align 4
ret void
}
Loading