Skip to content

Commit 4bbc329

Browse files
committed
[SLP] Fix for the min/max intrinsic cost.
The min/max intrinsic cost is currently too low because in the cost calculation we subtract the cost of the vector compare as we will not emit it. For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE which returns 3, the worst case cost. I think we should be passing VecPred instead, since we know the predicates of the compare instr. I think this is related to commit b3b993a which introduced the predicate argument to getCmpSelInstrCost(). https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83 Differential Revision: https://reviews.llvm.org/D120439
1 parent 6136f97 commit 4bbc329

File tree

2 files changed

+7
-10
lines changed

2 files changed

+7
-10
lines changed

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5358,9 +5358,8 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
53585358
// If the selects are the only uses of the compares, they will be dead
53595359
// and we can adjust the cost by removing their cost.
53605360
if (IntrinsicAndUse.second)
5361-
IntrinsicCost -=
5362-
TTI->getCmpSelInstrCost(Instruction::ICmp, VecTy, MaskTy,
5363-
CmpInst::BAD_ICMP_PREDICATE, CostKind);
5361+
IntrinsicCost -= TTI->getCmpSelInstrCost(Instruction::ICmp, VecTy,
5362+
MaskTy, VecPred, CostKind);
53645363
VecCost = std::min(VecCost, IntrinsicCost);
53655364
}
53665365
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));

llvm/test/Transforms/SLPVectorizer/X86/arith-max-cost.ll

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,11 @@
66
; This maps to a single PMAX instruction in x86.
77
define void @smax_intrinsic_cost(i64 %arg0, i64 %arg1) {
88
; CHECK-LABEL: @smax_intrinsic_cost(
9-
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[ARG0:%.*]], i32 0
10-
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[ARG1:%.*]], i32 1
11-
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <2 x i64> [[TMP2]], <i64 123, i64 456>
12-
; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP3]], <2 x i64> [[TMP2]], <2 x i64> <i64 123, i64 456>
13-
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0
14-
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
15-
; CHECK-NEXT: [[ROOT:%.*]] = icmp sle i64 [[TMP5]], [[TMP6]]
9+
; CHECK-NEXT: [[ICMP0:%.*]] = icmp sgt i64 [[ARG0:%.*]], 123
10+
; CHECK-NEXT: [[ICMP1:%.*]] = icmp sgt i64 [[ARG1:%.*]], 456
11+
; CHECK-NEXT: [[SELECT0:%.*]] = select i1 [[ICMP0]], i64 [[ARG0]], i64 123
12+
; CHECK-NEXT: [[SELECT1:%.*]] = select i1 [[ICMP1]], i64 [[ARG1]], i64 456
13+
; CHECK-NEXT: [[ROOT:%.*]] = icmp sle i64 [[SELECT0]], [[SELECT1]]
1614
; CHECK-NEXT: ret void
1715
;
1816
%icmp0 = icmp sgt i64 %arg0, 123

0 commit comments

Comments
 (0)