Skip to content

Commit de5d588

Browse files
authored
[AArch64] Tweak the costs of experimental_cttz_elts intrinsic (#125093)
The experimental_cttz_elts intrinsic currently returns a cost of 1 for all types, however we know that it currently requires 2 SVE instructions when lowering this - brkb and cntp. Both of these instructions have a throughput that is half of a basic vector instruction such as a vector add. This patch bumps the cost of this intrinsic up to 4 to reflect two instructions of lower throughput.
1 parent eb6ca12 commit de5d588

File tree

2 files changed

+74
-64
lines changed

2 files changed

+74
-64
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -940,6 +940,16 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
940940
}
941941
break;
942942
}
943+
case Intrinsic::experimental_cttz_elts: {
944+
EVT ArgVT = getTLI()->getValueType(DL, ICA.getArgTypes()[0]);
945+
if (!getTLI()->shouldExpandCttzElements(ArgVT)) {
946+
// This will consist of a SVE brkb and a cntp instruction. These
947+
// typically have the same latency and half the throughput as a vector
948+
// add instruction.
949+
return 4;
950+
}
951+
break;
952+
}
943953
default:
944954
break;
945955
}

0 commit comments

Comments
 (0)