[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select #145920

RKSimon · 2025-06-26T15:51:21Z

More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select than vice-versa.

Another step towards #145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.

…terns before SK_Select More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select then vice-versa. Another step towards llvm#145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.

llvmbot · 2025-06-26T15:51:54Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-analysis

Author: Simon Pilgrim (RKSimon)

Changes

More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select then vice-versa.

Another step towards #145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.

Patch is 230.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145920.diff

12 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+7-7)
(modified) llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll (+5-5)
(modified) llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll (+24-24)
(modified) llvm/test/Analysis/CostModel/ARM/shuffle.ll (+2-2)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll (+12-12)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll (+60-60)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-select.ll (+4-4)
(modified) llvm/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll (+13-33)
(modified) llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll (+51-113)
(modified) llvm/test/Analysis/CostModel/X86/shuffle-select.ll (+41-72)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hadd.ll (+2-1)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hsub.ll (+2-1)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index c22928c9bcd94..d513e9472a152 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1538,7 +1538,7 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
       if (Shuffle->changesLength()) {
         // Treat a 'subvector widening' as a free shuffle.
         if (Shuffle->increasesLength() && Shuffle->isIdentityWithPadding())
-          return 0;
+          return TTI::TCC_Free;
 
         if (Shuffle->isExtractSubvectorMask(SubIndex))
           return TargetTTI->getShuffleCost(TTI::SK_ExtractSubvector, VecTy,
@@ -1599,18 +1599,13 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
       }
 
       if (Shuffle->isIdentity())
-        return 0;
+        return TTI::TCC_Free;
 
       if (Shuffle->isReverse())
         return TargetTTI->getShuffleCost(TTI::SK_Reverse, VecTy, VecSrcTy, Mask,
                                          CostKind, 0, nullptr, Operands,
                                          Shuffle);
 
-      if (Shuffle->isSelect())
-        return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, VecSrcTy, Mask,
-                                         CostKind, 0, nullptr, Operands,
-                                         Shuffle);
-
       if (Shuffle->isTranspose())
         return TargetTTI->getShuffleCost(TTI::SK_Transpose, VecTy, VecSrcTy,
                                          Mask, CostKind, 0, nullptr, Operands,
@@ -1632,6 +1627,11 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
             FixedVectorType::get(VecTy->getScalarType(), NumSubElts), Operands,
             Shuffle);
 
+      if (Shuffle->isSelect())
+        return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, VecSrcTy, Mask,
+                                         CostKind, 0, nullptr, Operands,
+                                         Shuffle);
+
       if (Shuffle->isSplice(SubIndex))
         return TargetTTI->getShuffleCost(TTI::SK_Splice, VecTy, VecSrcTy, Mask,
                                          CostKind, SubIndex, nullptr, Operands,
diff --git a/llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll b/llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll
index 8d68781d0c1b7..41c272291d7ca 100644
--- a/llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll
@@ -87,21 +87,21 @@ define void @concat() {
 
 define void @insert_subvec() {
 ; CHECK-LABEL: 'insert_subvec'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:12 CodeSize:6 Lat:12 SizeLat:12 for: %v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4i8_2_1 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found costs of RThru:28 CodeSize:14 Lat:28 SizeLat:28 for: %v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i8_2_1 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:3 Lat:6 SizeLat:6 for: %v8i8_2_05 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found costs of RThru:60 CodeSize:30 Lat:60 SizeLat:60 for: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i8_4_1 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i8_4_2 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i8_4_3 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:14 CodeSize:7 Lat:14 SizeLat:14 for: %v16i8_4_05 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4i16_2_0 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4i16_2_1 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found costs of RThru:28 CodeSize:14 Lat:28 SizeLat:28 for: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i16_2_1 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i16_2_2 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i16_2_3 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
@@ -118,7 +118,7 @@ define void @insert_subvec() {
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i32_2_2 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v8i32_2_3 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:3 Lat:6 SizeLat:6 for: %v8i32_2_05 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found costs of 0 for: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i32_4_1 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i32_4_2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v16i32_4_3 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll b/llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll
index 1705cd2a2f091..4fc8c44e12668 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll
@@ -16,11 +16,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 2>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 0>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf22 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 2>
-; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
+; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf30 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 0>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf33 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf12 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 2>
-; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
+; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf13 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf31 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 1>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf23 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 3>
@@ -53,11 +53,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 2>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 0>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf22_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 2>
-; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf03_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 3>
+; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf03_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf30_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 0>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf33_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf12_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 1, i32 2>
-; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf21_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 1>
+; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf21_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 1>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf13_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 1, i32 3>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf31_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 1>
 ; GFX9-10-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf23_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 3>
@@ -93,11 +93,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 2>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 0>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf22 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 2>
-; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
+; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf30 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 0>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf33 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf12 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 2>
-; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
+; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf13 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf31 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 1>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf23 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 3>
@@ -130,11 +130,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 2>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 0>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf22_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 2>
-; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf03_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 3>
+; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf03_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf30_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 0>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf33_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf12_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 1, i32 2>
-; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf21_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 1>
+; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf21_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 1>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf13_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 1, i32 3>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf31_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 3, i32 1>
 ; VI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf23_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 3>
@@ -170,11 +170,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 2>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 0>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf22 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 2>
-; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
+; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf03 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 0, i32 3>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf30 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 0>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf33 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 3>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf12 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 2>
-; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
+; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf21 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 1>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf13 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 1, i32 3>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf31 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 3, i32 1>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %shuf23 = shufflevector <2 x i16> %vec1, <2 x i16> %vec1, <2 x i32> <i32 2, i32 3>
@@ -207,11 +207,11 @@ define amdgpu_kernel void @shufflevector_i16(<2 x i16> %vec1, <2 x i16> %vec2) {
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %shuf02_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 0, i32 2>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %shuf20_2 = shufflevector <2 x i16> %vec1, <2 x i16> %vec2, <2 x i32> <i32 2, i32 0>
 ; GFX9-10-SIZE-NEXT:  Cost Model: Found an estimated...
[truncated]

github-actions · 2025-06-26T15:54:18Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll llvm/test/Analysis/CostModel/ARM/shuffle.ll llvm/test/Analysis/CostModel/RISCV/shuffle-exact-vlen.ll llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll llvm/test/Analysis/CostModel/RISCV/shuffle-select.ll llvm/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll llvm/test/Analysis/CostModel/X86/shuffle-select.ll llvm/test/Transforms/PhaseOrdering/X86/hadd.ll llvm/test/Transforms/PhaseOrdering/X86/hsub.ll

The following files introduce new uses of undef:

llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

preames

LGTM

…terns before SK_Select (llvm#145920) More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select than vice-versa. Another step towards llvm#145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.

mikaelholmen · 2025-06-27T05:51:47Z

Hi @RKSimon

With this patch I start seeing lots of crashes with llvm-stress generated input. E.g. this reduced reproducer:
llc -march=x86-64 -mcpu=corei7 -o /dev/null bbi-108404.ll
crashes with

llc: ../lib/IR/Instructions.cpp:2140: static bool llvm::ShuffleVectorInst::isInsertSubvectorMask(ArrayRef<int>, int, int &, int &): Assertion `!Src0Elts.isZero() && !Src1Elts.isZero() && "2-source shuffle not found"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null stress_14455.ll
1.	Running pass 'Function Pass Manager' on module 'stress_14455.ll'.
2.	Running pass 'CodeGen Prepare' on function '@autogen_SD0'
 #0 0x00005574e2cbd466 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/llc+0x76a4466)
 #1 0x00005574e2cbaf85 llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x76a1f85)
 #2 0x00005574e2cbdb99 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f17eba76d10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f17e941652f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f17e93e9e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f17e93e9d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f17e940ee86 (/lib64/libc.so.6+0x46e86)
 #8 0x00005574e2205d97 llvm::ShuffleVectorInst::isInsertSubvectorMask(llvm::ArrayRef<int>, int, int&, int&) (build-all/bin/llc+0x6becd97)
 #9 0x00005574e1499a21 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const X86TargetTransformInfo.cpp:0:0
#10 0x00005574e1495224 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::isExpensiveToSpeculativelyExecute(llvm::Instruction const*) const X86TargetTransformInfo.cpp:0:0
#11 0x00005574e1b3df3c (anonymous namespace)::CodeGenPrepare::optimizeSelectInst(llvm::SelectInst*) CodeGenPrepare.cpp:0:0
#12 0x00005574e1b22ebb (anonymous namespace)::CodeGenPrepare::_run(llvm::Function&) CodeGenPrepare.cpp:0:0
#13 0x00005574e1b209dc (anonymous namespace)::CodeGenPrepareLegacyPass::runOnFunction(llvm::Function&) CodeGenPrepare.cpp:0:0
#14 0x00005574e2227839 llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x6c0e839)
#15 0x00005574e222fe12 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x6c16e12)
#16 0x00005574e22282f8 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x6c0f2f8)
#17 0x00005574dfc6e5f0 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#18 0x00005574dfc6bd00 main (build-all/bin/llc+0x4652d00)
#19 0x00007f17e94027e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#20 0x00005574dfc6b16e _start (build-all/bin/llc+0x465216e)
Abort (core dumped)

bbi-108404.ll.gz

RKSimon · 2025-06-27T07:46:20Z

Interesting - we were matching "all undefined" shuffle masks as a SK_Select shuffle :/

…e masks as free llvm#145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds

RKSimon · 2025-06-27T08:36:48Z

#146039 should fix this

…e masks as free (#146039) #145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds

…terns before SK_Select (llvm#145920) More closely match improveShuffleKindFromMask's shuffle ordering by trying to match a SK_InsertSubvector shuffles patterns before SK_Select - both can match many of the same patterns, but its much easier to recognise when a SK_InsertSubvector can be converted to SK_Select than vice-versa. Another step towards llvm#145335 - which I'm hoping will allow us to generalise improveShuffleKindFromMask and remove getInstructionCost's shuffle matching entirely.

…e masks as free (llvm#146039) llvm#145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds

RKSimon requested review from preames, alexey-bataev and davemgreen June 26, 2025 15:51

llvmbot added backend:AMDGPU backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 26, 2025

alexey-bataev approved these changes Jun 26, 2025

View reviewed changes

preames approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'main' into costmodel-shuffle-select

6fce47f

RKSimon merged commit e25db2f into llvm:main Jun 26, 2025
6 of 7 checks passed

RKSimon deleted the costmodel-shuffle-select branch June 26, 2025 19:15

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Jun 27, 2025

[TTI] getInstructionCost - consistently treat all undef/poison shuffl…

0afc45a

…e masks as free llvm#145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds

RKSimon mentioned this pull request Jun 27, 2025

[TTI] getInstructionCost - consistently treat all undef/poison shuffle masks as free #146039

Merged

RKSimon added a commit that referenced this pull request Jun 27, 2025

[TTI] getInstructionCost - consistently treat all undef/poison shuffl…

08f074a

…e masks as free (#146039) #145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select #145920

[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select #145920

Uh oh!

RKSimon commented Jun 26, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

preames left a comment

Uh oh!

Uh oh!

mikaelholmen commented Jun 27, 2025

Uh oh!

RKSimon commented Jun 27, 2025 •

edited

Loading

Uh oh!

RKSimon commented Jun 27, 2025

Uh oh!

Uh oh!

[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select #145920

[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select #145920

Uh oh!

Conversation

RKSimon commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikaelholmen commented Jun 27, 2025

Uh oh!

RKSimon commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon commented Jun 27, 2025

Uh oh!

Uh oh!

RKSimon commented Jun 26, 2025 •

edited

Loading

llvmbot commented Jun 26, 2025 •

edited

Loading

RKSimon commented Jun 27, 2025 •

edited

Loading