Skip to content

Commit 94df95d

Browse files
authored
[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999)
SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur. Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.
1 parent 6881c6d commit 94df95d

10 files changed

+238
-255
lines changed

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1775,7 +1775,7 @@ InstructionCost X86TTIImpl::getShuffleCost(
17751775
}
17761776

17771777
// For 2-input shuffles, we must account for splitting the 2 inputs into many.
1778-
if (Kind == TTI::SK_PermuteTwoSrc && LT.first != 1) {
1778+
if (Kind == TTI::SK_PermuteTwoSrc && !IsInLaneShuffle && LT.first != 1) {
17791779
// We assume that source and destination have the same vector type.
17801780
InstructionCost NumOfDests = LT.first;
17811781
InstructionCost NumOfShufflesPerDest = LT.first * 2 - 1;

llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector-codesize.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector-latency.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector-sizelatency.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-transpose-codesize.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-transpose-latency.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-transpose-sizelatency.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.

llvm/test/Analysis/CostModel/X86/shuffle-transpose.ll

Lines changed: 28 additions & 28 deletions
Large diffs are not rendered by default.
Lines changed: 13 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,19 @@
11
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
2-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64 -O3 -S < %s | FileCheck %s --check-prefixes=SSE,SSE2
3-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v2 -O3 -S < %s | FileCheck %s --check-prefixes=SSE,SSE4
4-
; RUN: opt -mtriple=x86_64-- -mcpu=btver2 -O3 -S < %s | FileCheck %s --check-prefixes=AVX,AVX1
5-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v3 -O3 -S < %s | FileCheck %s --check-prefixes=AVX,AVX2
6-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64 -passes="default<O3>" -S < %s | FileCheck %s --check-prefixes=SSE,SSE2
7-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v2 -passes="default<O3>" -S < %s | FileCheck %s --check-prefixes=SSE,SSE4
8-
; RUN: opt -mtriple=x86_64-- -mcpu=btver2 -passes="default<O3>" -S < %s | FileCheck %s --check-prefixes=AVX,AVX1
9-
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v3 -passes="default<O3>" -S < %s | FileCheck %s --check-prefixes=AVX,AVX2
2+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64 -O3 -S < %s | FileCheck %s
3+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v2 -O3 -S < %s | FileCheck %s
4+
; RUN: opt -mtriple=x86_64-- -mcpu=btver2 -O3 -S < %s | FileCheck %s
5+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v3 -O3 -S < %s | FileCheck %s
6+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64 -passes="default<O3>" -S < %s | FileCheck %s
7+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v2 -passes="default<O3>" -S < %s | FileCheck %s
8+
; RUN: opt -mtriple=x86_64-- -mcpu=btver2 -passes="default<O3>" -S < %s | FileCheck %s
9+
; RUN: opt -mtriple=x86_64-- -mcpu=x86-64-v3 -passes="default<O3>" -S < %s | FileCheck %s
1010

1111
define <4 x double> @PR94546(<4 x double> %a, <4 x double> %b) {
12-
; SSE2-LABEL: @PR94546(
13-
; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A:%.*]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 6>
14-
; SSE2-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 7>
15-
; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
16-
; SSE2-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 poison, i32 poison, i32 1>
17-
; SSE2-NEXT: ret <4 x double> [[TMP4]]
18-
;
19-
; SSE4-LABEL: @PR94546(
20-
; SSE4-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A:%.*]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 poison, i32 poison, i32 6>
21-
; SSE4-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 poison, i32 poison, i32 7>
22-
; SSE4-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
23-
; SSE4-NEXT: ret <4 x double> [[TMP3]]
24-
;
25-
; AVX-LABEL: @PR94546(
26-
; AVX-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A:%.*]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 poison, i32 poison, i32 6>
27-
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 poison, i32 poison, i32 7>
28-
; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
29-
; AVX-NEXT: ret <4 x double> [[TMP3]]
12+
; CHECK-LABEL: @PR94546(
13+
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A:%.*]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 poison, i32 poison, i32 6>
14+
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 poison, i32 poison, i32 7>
15+
; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
16+
; CHECK-NEXT: ret <4 x double> [[TMP3]]
3017
;
3118
%vecext = extractelement <4 x double> %a, i32 0
3219
%vecext1 = extractelement <4 x double> %a, i32 1
@@ -47,7 +34,3 @@ define <4 x double> @PR94546(<4 x double> %a, <4 x double> %b) {
4734
%shuffle = shufflevector <4 x double> %vecinit13, <4 x double> %a, <4 x i32> <i32 0, i32 poison, i32 poison, i32 3>
4835
ret <4 x double> %shuffle
4936
}
50-
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
51-
; AVX1: {{.*}}
52-
; AVX2: {{.*}}
53-
; SSE: {{.*}}

0 commit comments

Comments
 (0)