Skip to content

[RISCV][TTI] Add shuffle costing for masked slide lowering #128537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 28, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Feb 24, 2025

This change adds the TTI costing corresponding to the recently added isMaskedSlidePair lowering for vector shuffles. However, since the existing costing code hadn't covered either slideup, slidedown, or the (now removed) isElementRotate, the impact is larger in scope than just that new lowering.

This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles.  However, since the
existing costing code hadn't covered either slideup, slidedown, or
the (now removed) isElementRotate, the impact is larger in scope
than just that new lowering.
@llvmbot llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Feb 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

This change adds the TTI costing corresponding to the recently added isMaskedSlidePair lowering for vector shuffles. However, since the existing costing code hadn't covered either slideup, slidedown, or the (now removed) isElementRotate, the impact is larger in scope than just that new lowering.


Patch is 111.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128537.diff

11 Files Affected:

  • (modified) llvm/include/llvm/Analysis/VectorUtils.h (+9)
  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+30)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-27)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+66-2)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+6)
  • (modified) llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll (+6-6)
  • (modified) llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll (+1-1)
  • (modified) llvm/test/Analysis/CostModel/RISCV/shuffle-transpose.ll (+32-32)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll (+128-616)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/reductions.ll (+11-13)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/revec.ll (+4-5)
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index f21594c557e0e..3d57afcb8b6e4 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -203,6 +203,15 @@ bool getShuffleDemandedElts(int SrcWidth, ArrayRef<int> Mask,
                             const APInt &DemandedElts, APInt &DemandedLHS,
                             APInt &DemandedRHS, bool AllowUndefElts = false);
 
+/// Does this shuffle mask represent either one slide shuffle or a pair of
+/// two slide shuffles, combined with a select on some constant vector mask?
+/// A slide is a shuffle masks which shifts some set of elements up or down
+/// the vector, with all other elements being undefined.  An identity shuffle
+/// will be matched a slide by 0.  The output parameter provides the source
+/// (-1 means no source), and slide direction for each slide.
+bool isMaskedSlidePair(ArrayRef<int> Mask, int NumElts,
+                       std::pair<int, int> SrcInfo[2]);
+
 /// Replace each shuffle mask index with the scaled sequential indices for an
 /// equivalent mask of narrowed elements. Mask elements that are less than 0
 /// (sentinel values) are repeated in the output mask.
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index 53be7fc0bee9f..ee37ab0965a92 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -413,6 +413,36 @@ bool llvm::getShuffleDemandedElts(int SrcWidth, ArrayRef<int> Mask,
   return true;
 }
 
+bool llvm::isMaskedSlidePair(ArrayRef<int> Mask, int NumElts,
+                             std::pair<int, int> SrcInfo[2]) {
+  int SignalValue = NumElts * 2;
+  SrcInfo[0] = {-1, SignalValue};
+  SrcInfo[1] = {-1, SignalValue};
+  for (unsigned i = 0; i != Mask.size(); ++i) {
+    int M = Mask[i];
+    if (M < 0)
+      continue;
+    int Src = M >= (int)NumElts;
+    int Diff = (int)i - (M % NumElts);
+    bool Match = false;
+    for (int j = 0; j < 2; j++) {
+      if (SrcInfo[j].first == -1) {
+        assert(SrcInfo[j].second == SignalValue);
+        SrcInfo[j].first = Src;
+        SrcInfo[j].second = Diff;
+      }
+      if (SrcInfo[j].first == Src && SrcInfo[j].second == Diff) {
+        Match = true;
+        break;
+      }
+    }
+    if (!Match)
+      return false;
+  }
+  assert(SrcInfo[0].first != -1 && "Must find one slide");
+  return true;
+}
+
 void llvm::narrowShuffleMaskElts(int Scale, ArrayRef<int> Mask,
                                  SmallVectorImpl<int> &ScaledMask) {
   assert(Scale > 0 && "Unexpected scaling factor");
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 6076fe56416ad..551c814ef6dcc 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4563,31 +4563,8 @@ static bool isInterleaveShuffle(ArrayRef<int> Mask, MVT VT, int &EvenSrc,
 /// Is this mask representing a masked combination of two slides?
 static bool isMaskedSlidePair(ArrayRef<int> Mask,
                               std::pair<int, int> SrcInfo[2]) {
-  int NumElts = Mask.size();
-  int SignalValue = NumElts * 2;
-  SrcInfo[0] = {-1, SignalValue};
-  SrcInfo[1] = {-1, SignalValue};
-  for (unsigned i = 0; i != Mask.size(); ++i) {
-    int M = Mask[i];
-    if (M < 0)
-      continue;
-    int Src = M >= (int)NumElts;
-    int Diff = (int)i - (M % NumElts);
-    bool Match = false;
-    for (int j = 0; j < 2; j++) {
-      if (SrcInfo[j].first == -1) {
-        assert(SrcInfo[j].second == SignalValue);
-        SrcInfo[j].first = Src;
-        SrcInfo[j].second = Diff;
-      }
-      if (SrcInfo[j].first == Src && SrcInfo[j].second == Diff) {
-        Match = true;
-        break;
-      }
-    }
-    if (!Match)
-      return false;
-  }
+  if (!llvm::isMaskedSlidePair(Mask, Mask.size(), SrcInfo))
+    return false;
 
   // Avoid matching vselect idioms
   if (SrcInfo[0].second == 0 && SrcInfo[1].second == 0)
@@ -5607,7 +5584,7 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
   std::pair<int, int> SrcInfo[2];
   unsigned RotateAmt;
   MVT RotateVT;
-  if (isMaskedSlidePair(Mask, SrcInfo) &&
+  if (::isMaskedSlidePair(Mask, SrcInfo) &&
       (isElementRotate(SrcInfo, NumElts) ||
        !isLegalBitRotate(Mask, VT, Subtarget, RotateVT, RotateAmt))) {
     SDValue Sources[2];
@@ -5967,7 +5944,7 @@ bool RISCVTargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {
   std::pair<int, int> SrcInfo[2];
   int Dummy1, Dummy2;
   return ShuffleVectorInst::isReverseMask(M, NumElts) ||
-         (isMaskedSlidePair(M, SrcInfo) && isElementRotate(SrcInfo, NumElts)) ||
+         (::isMaskedSlidePair(M, SrcInfo) && isElementRotate(SrcInfo, NumElts)) ||
          isInterleaveShuffle(M, SVT, Dummy1, Dummy2, Subtarget);
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index dfea25e11c0b6..1b53db38b13a1 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -474,6 +474,60 @@ costShuffleViaVRegSplitting(RISCVTTIImpl &TTI, MVT LegalVT,
   return InstructionCost::getInvalid();
 }
 
+InstructionCost RISCVTTIImpl::getSlideCost(FixedVectorType *Tp, ArrayRef<int> Mask,
+                                           TTI::TargetCostKind CostKind) {
+  // Avoid missing masks
+  if (Mask.size() <= 2)
+    return InstructionCost::getInvalid();
+
+  int NumElts = Tp->getNumElements();
+  std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Tp);
+  // Avoid scalarization cases
+  if (!LT.second.isFixedLengthVector())
+    return InstructionCost::getInvalid();
+
+  // Requires moving elements between parts, which requires additional
+  // unmodeled instructions.
+  if (LT.first != 1)
+    return InstructionCost::getInvalid();
+
+  auto getSlideOpcode = [&](int SlideAmt) {
+    assert(SlideAmt != 0);
+    if (SlideAmt < 0)
+      return SlideAmt > -32 ? RISCV::VSLIDEDOWN_VI : RISCV::VSLIDEDOWN_VX;
+    return  SlideAmt < 32 ? RISCV::VSLIDEUP_VI : RISCV::VSLIDEUP_VX;
+  };
+
+  std::pair<int, int> SrcInfo[2];
+  if (!isMaskedSlidePair(Mask, NumElts, SrcInfo))
+    return InstructionCost::getInvalid();
+
+  if (SrcInfo[1].second == 0)
+    std::swap(SrcInfo[0], SrcInfo[1]);
+
+  InstructionCost FirstSlideCost = 0;
+  if (SrcInfo[0].second != 0) {
+    unsigned Opcode = getSlideOpcode(SrcInfo[0].second);
+    FirstSlideCost = getRISCVInstructionCost(Opcode, LT.second, CostKind);
+  }
+
+  if (SrcInfo[1].first == -1)
+    return FirstSlideCost;
+
+  InstructionCost SecondSlideCost = 0;
+  if (SrcInfo[1].second != 0) {
+    unsigned Opcode = getSlideOpcode(SrcInfo[1].second);
+    SecondSlideCost = getRISCVInstructionCost(Opcode, LT.second, CostKind);
+  } else {
+    SecondSlideCost = getRISCVInstructionCost(RISCV::VMERGE_VVM, LT.second, CostKind);
+  }
+
+  auto EC = Tp->getElementCount();
+  VectorType *MaskTy = VectorType::get(IntegerType::getInt1Ty(Tp->getContext()), EC);
+  InstructionCost MaskCost = getConstantPoolLoadCost(MaskTy, CostKind);
+  return FirstSlideCost + SecondSlideCost + MaskCost;
+}
+
 InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
                                              VectorType *Tp, ArrayRef<int> Mask,
                                              TTI::TargetCostKind CostKind,
@@ -487,8 +541,8 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
   // First, handle cases where having a fixed length vector enables us to
   // give a more accurate cost than falling back to generic scalable codegen.
   // TODO: Each of these cases hints at a modeling gap around scalable vectors.
-  if (ST->hasVInstructions() && isa<FixedVectorType>(Tp) &&
-      LT.second.isFixedLengthVector()) {
+  if (auto *FVTp = dyn_cast<FixedVectorType>(Tp);
+      FVTp && ST->hasVInstructions() && LT.second.isFixedLengthVector()) {
     InstructionCost VRegSplittingCost = costShuffleViaVRegSplitting(
         *this, LT.second, ST->getRealVLen(), Tp, Mask, CostKind);
     if (VRegSplittingCost.isValid())
@@ -544,6 +598,11 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
           return Cost;
         }
       }
+
+      if (InstructionCost SlideCost = getSlideCost(FVTp, Mask, CostKind);
+          SlideCost.isValid())
+        return SlideCost;
+
       // vrgather + cost of generating the mask constant.
       // We model this for an unknown mask with a single vrgather.
       if (LT.first == 1 && (LT.second.getScalarSizeInBits() != 8 ||
@@ -558,6 +617,11 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
     }
     case TTI::SK_Transpose:
     case TTI::SK_PermuteTwoSrc: {
+
+      if (InstructionCost SlideCost = getSlideCost(FVTp, Mask, CostKind);
+          SlideCost.isValid())
+        return SlideCost;
+
       // 2 x (vrgather + cost of generating the mask constant) + cost of mask
       // register for the second vrgather. We model this for an unknown
       // (shuffle) mask.
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 134a7333b9b06..3f57560d3c127 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -63,6 +63,12 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
   /// type.
   InstructionCost getConstantPoolLoadCost(Type *Ty,
                                           TTI::TargetCostKind CostKind);
+
+  /// If this shuffle can be lowered as a masked slide pair (at worst),
+  /// return a cost for it.
+  InstructionCost getSlideCost(FixedVectorType *Tp, ArrayRef<int> Mask,
+                               TTI::TargetCostKind CostKind);
+
 public:
   explicit RISCVTTIImpl(const RISCVTargetMachine *TM, const Function &F)
       : BaseT(TM, F.getDataLayout()), ST(TM->getSubtargetImpl(F)),
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll
index c951184a31731..06c709e4cc879 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll
@@ -186,7 +186,7 @@ define void @insert_subvec() vscale_range(2,2) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_1 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_2 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_3 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16i32_4_05 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v16i32_4_05 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; CHECK-SIZE-LABEL: 'insert_subvec'
@@ -225,7 +225,7 @@ define void @insert_subvec() vscale_range(2,2) {
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_1 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_2 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_3 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
-; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16i32_4_05 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v16i32_4_05 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %v4i8_2_0 = shufflevector <4 x i8> poison, <4 x i8> poison, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
@@ -737,8 +737,8 @@ define void @multipart() vscale_range(2,2) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16b = shufflevector <8 x i16> poison, <8 x i16> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16c = shufflevector <16 x i16> poison, <16 x i16> poison, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16d = shufflevector <16 x i16> poison, <16 x i16> poison, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v32a = shufflevector <4 x i32> poison, <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v32a4 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v32a = shufflevector <4 x i32> poison, <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %v32a4 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v32idrev = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 15, i32 14, i32 13, i32 12, i32 16, i32 17, i32 18, i32 19, i32 31, i32 30, i32 29, i32 28>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 47 for instruction: %v32many = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 47 for instruction: %v32many2 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 1, i32 4, i32 8, i32 12, i32 17, i32 20, i32 24, i32 28, i32 2, i32 6, i32 11, i32 14, i32 18, i32 22, i32 27, i32 30>
@@ -757,8 +757,8 @@ define void @multipart() vscale_range(2,2) {
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16b = shufflevector <8 x i16> poison, <8 x i16> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16c = shufflevector <16 x i16> poison, <16 x i16> poison, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16d = shufflevector <16 x i16> poison, <16 x i16> poison, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
-; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v32a = shufflevector <4 x i32> poison, <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
-; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v32a4 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
+; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v32a = shufflevector <4 x i32> poison, <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
+; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %v32a4 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v32idrev = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 15, i32 14, i32 13, i32 12, i32 16, i32 17, i32 18, i32 19, i32 31, i32 30, i32 29, i32 28>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v32many = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v32many2 = shufflevector <16 x i32> poison, <16 x i32> poison, <16 x i32> <i32 1, i32 4, i32 8, i32 12, i32 17, i32 20, i32 24, i32 28, i32 2, i32 6, i32 11, i32 14, i32 18, i32 22, i32 27, i32 30>
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index e8dd30345cc76..d2bfb61a11b00 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -19,7 +19,7 @@ define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 19 for instruction: %V512_567u = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 poison>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_567u = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 poison>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; VLEN128-LABEL: 'test_vXf64'
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-transpose.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-transpose.ll
index 8f784a07d3124..ef069fee8526e 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-transpose.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-transpose.ll
@@ -10,11 +10,11 @@ target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
 define <8 x i8> @trn1.v8i8(<8 x i8> %v0, <8 x i8> %v1) {
 ; CHECK-LABEL: 'trn1.v8i8'
-; CHECK-NEXT:  Cost Model: Found an estim...
[truncated]

Copy link

github-actions bot commented Feb 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 434 to 437
if (SrcInfo[j].first == Src && SrcInfo[j].second == Diff) {
Match = true;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (SrcInfo[j].first == Src && SrcInfo[j].second == Diff) {
Match = true;
break;
}
if (SrcInfo[j].first != Src || SrcInfo[j].second != Diff)
return false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work. You need to walk multiple elements of the array, and exit only if none match.

Co-authored-by: Alexey Bataev <[email protected]>
@preames preames force-pushed the pr-riscv-slidepair-tti-costing branch from e11397b to 8621324 Compare February 24, 2025 21:51
@preames
Copy link
Collaborator Author

preames commented Feb 24, 2025

FYI, still in progress of updating w.r.t. review comments.

@preames
Copy link
Collaborator Author

preames commented Feb 24, 2025

Ok, review comments addressed. Sorry for the very messy commits, I usually rebase and squash, but managed to push first this time. I rewrote the history, so hopefully that's a bit easier to follow.

Comment on lines +432 to +433
DiffE = Diff;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DiffE = Diff;
}
DiffE = Diff;
continue;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, that would be incorrect. You need to fall into the match clause to consider this a successful match.

@@ -413,6 +413,36 @@ bool llvm::getShuffleDemandedElts(int SrcWidth, ArrayRef<int> Mask,
return true;
}

bool llvm::isMaskedSlidePair(ArrayRef<int> Mask, int NumElts,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to move up to VectorUtils? Couldn't it stay in the RISCV backend?

Copy link
Collaborator Author

@preames preames Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see an obvious spot to put it. If you have a suggestion, happy to move it back.

p.s. I'd meant to explicitly pose this question when posting the review, but apparently didn't. Oops.

preames added a commit to preames/llvm-project that referenced this pull request Feb 27, 2025
The getShuffleCost api expects to only deal with non-length changing
shuffles.  We were failing to extend the mask appropriately before
invoking it.  This came up in llvm#128537
in discussion of a potential invariant, but is otherwise unrelated.
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

preames added a commit that referenced this pull request Feb 28, 2025
The getShuffleCost api, in concept, expects to only deal with non-length
changing shuffles. We were failing to extend the mask appropriately
before invoking it. This came up in
#128537 in discussion of a
potential invariant, but is otherwise unrelated.
@preames preames merged commit 4904728 into llvm:main Feb 28, 2025
9 of 11 checks passed
@preames preames deleted the pr-riscv-slidepair-tti-costing branch February 28, 2025 02:58
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 28, 2025
…(#129137)

The getShuffleCost api, in concept, expects to only deal with non-length
changing shuffles. We were failing to extend the mask appropriately
before invoking it. This came up in
llvm/llvm-project#128537 in discussion of a
potential invariant, but is otherwise unrelated.
preames added a commit that referenced this pull request Feb 28, 2025
…128537)"

This reverts commit 4904728.  Downstream
test failed, reverting during investigation.
preames added a commit that referenced this pull request Feb 28, 2025
…128537)"

With a fix for fully undef masks.  These can't reach the lowering code, but
can reach the costing code via e.g. SLP.

This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles. However, since the
existing costing code hadn't covered either slideup, slidedown, or the
(now removed) isElementRotate, the impact is larger in scope than just
that new lowering.

---------

Co-authored-by: Alexey Bataev <[email protected]>
Co-authored-by: Luke Lau <[email protected]>
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
The getShuffleCost api, in concept, expects to only deal with non-length
changing shuffles. We were failing to extend the mask appropriately
before invoking it. This came up in
llvm#128537 in discussion of a
potential invariant, but is otherwise unrelated.
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles. However, since the
existing costing code hadn't covered either slideup, slidedown, or the
(now removed) isElementRotate, the impact is larger in scope than just
that new lowering.

---------

Co-authored-by: Alexey Bataev <[email protected]>
Co-authored-by: Luke Lau <[email protected]>
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…lvm#128537)"

This reverts commit 4904728.  Downstream
test failed, reverting during investigation.
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…lvm#128537)"

With a fix for fully undef masks.  These can't reach the lowering code, but
can reach the costing code via e.g. SLP.

This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles. However, since the
existing costing code hadn't covered either slideup, slidedown, or the
(now removed) isElementRotate, the impact is larger in scope than just
that new lowering.

---------

Co-authored-by: Alexey Bataev <[email protected]>
Co-authored-by: Luke Lau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants