Skip to content

Commit f89f670

Browse files
committed
[CostModel][X86] Broadcast shuffles can be free if they are from a one-use load
AVX1+ can handle 32/64-bit broadcast loads, AVX2+ can handle all broadcast loads (we should be able to improve isLegalBroadcastLoad to handle more of this type matching).
1 parent a9e8730 commit f89f670

File tree

2 files changed

+83
-135
lines changed

2 files changed

+83
-135
lines changed

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1490,12 +1490,20 @@ InstructionCost X86TTIImpl::getShuffleCost(
14901490
if (Kind == TTI::SK_Transpose)
14911491
Kind = TTI::SK_PermuteTwoSrc;
14921492

1493-
// For Broadcasts we are splatting the first element from the first input
1494-
// register, so only need to reference that input and all the output
1495-
// registers are the same.
1496-
if (Kind == TTI::SK_Broadcast)
1493+
if (Kind == TTI::SK_Broadcast) {
1494+
// For Broadcasts we are splatting the first element from the first input
1495+
// register, so only need to reference that input and all the output
1496+
// registers are the same.
14971497
LT.first = 1;
14981498

1499+
// If we're broadcasting a load then AVX/AVX2 can do this for free.
1500+
using namespace PatternMatch;
1501+
if (!Args.empty() && match(Args[0], m_OneUse(m_Load(m_Value()))) &&
1502+
(ST->hasAVX2() ||
1503+
(ST->hasAVX() && LT.second.getScalarSizeInBits() >= 32)))
1504+
return TTI::TCC_Free;
1505+
}
1506+
14991507
// Treat <X x bfloat> shuffles as <X x half>.
15001508
if (LT.second.isVector() && LT.second.getScalarType() == MVT::bf16)
15011509
LT.second = LT.second.changeVectorElementType(MVT::f16);

0 commit comments

Comments
 (0)