Skip to content

Commit bb3c2fc

Browse files
committed
[ARM][SLP] Fix incorrect cost function for SLP Vectorization of ZExt/SExt
PR #117350 made changes to the SLP vectorizer which introduced a regression on ARM vectorization benchmarks. This was due to the changes assuming that SExt/ZExt vector instructions have constant cost. This behaviour is expected for RISCV but not on ARM where we take into account source and destination type of SExt/ZExt instructions when calculating vector cost. Change-Id: I6f995dcde26e5aaf62b779b63e52988fb333f941
1 parent 7bd492f commit bb3c2fc

File tree

2 files changed

+29
-1
lines changed

2 files changed

+29
-1
lines changed

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1794,7 +1794,6 @@ InstructionCost ARMTTIImpl::getExtendedReductionCost(
17941794
case ISD::ADD:
17951795
if (ST->hasMVEIntegerOps() && ValVT.isSimple() && ResVT.isSimple()) {
17961796
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(ValTy);
1797-
17981797
// The legal cases are:
17991798
// VADDV u/s 8/16/32
18001799
// VADDLV u/s 32
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
2+
; RUN: opt < %s -passes=slp-vectorizer --mtriple arm-none-eabi -mattr=+mve -S -o - | FileCheck %s
3+
4+
define i64 @vadd_32_64(ptr readonly %a) {
5+
; CHECK-LABEL: define i64 @vadd_32_64(
6+
; CHECK-SAME: ptr readonly [[A:%.*]]) #[[ATTR0:[0-9]+]] {
7+
; CHECK-NEXT: [[ENTRY:.*:]]
8+
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr [[A]], align 4
9+
; CHECK-NEXT: [[TMP1:%.*]] = sext <4 x i32> [[TMP0]] to <4 x i64>
10+
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP1]])
11+
; CHECK-NEXT: ret i64 [[TMP2]]
12+
;
13+
entry:
14+
%0 = load i32, ptr %a, align 4
15+
%conv = sext i32 %0 to i64
16+
%arrayidx1 = getelementptr inbounds nuw i8, ptr %a, i32 4
17+
%1 = load i32, ptr %arrayidx1, align 4
18+
%conv2 = sext i32 %1 to i64
19+
%add = add nsw i64 %conv2, %conv
20+
%arrayidx3 = getelementptr inbounds nuw i8, ptr %a, i32 8
21+
%2 = load i32, ptr %arrayidx3, align 4
22+
%conv4 = sext i32 %2 to i64
23+
%add5 = add nsw i64 %add, %conv4
24+
%arrayidx6 = getelementptr inbounds nuw i8, ptr %a, i32 12
25+
%3 = load i32, ptr %arrayidx6, align 4
26+
%conv7 = sext i32 %3 to i64
27+
%add8 = add nsw i64 %add5, %conv7
28+
ret i64 %add8
29+
}

0 commit comments

Comments
 (0)