-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Consider negated powers of 2 when calculating throughput cost #143013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-aarch64 Author: AZero13 (AZero13) ChangesNegated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all. Full diff: https://github.com/llvm/llvm-project/pull/143013.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 68aec80f07e1d..16f5f76dd0482 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4005,7 +4005,7 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
// have similar cost.
auto VT = TLI->getValueType(DL, Ty);
if (VT.isScalarInteger() && VT.getSizeInBits() <= 64) {
- if (Op2Info.isPowerOf2()) {
+ if (Op2Info.isPowerOf2() || Op2Info.isNegatedPowerOf2()) {
return ISD == ISD::SDIV ? (3 * AddCost + AsrCost)
: (3 * AsrCost + AddCost);
} else {
@@ -4013,7 +4013,7 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
}
} else if (VT.isVector()) {
InstructionCost UsraCost = 2 * AsrCost;
- if (Op2Info.isPowerOf2()) {
+ if (Op2Info.isPowerOf2() || Op2Info.isNegatedPowerOf2()) {
// Division with scalable types corresponds to native 'asrd'
// instruction when SVE is available.
// e.g. %1 = sdiv <vscale x 4 x i32> %a, splat (i32 8)
|
4e96dc6
to
53177f3
Compare
@davemgreen Thoughts? Yes, I know exact is not covered but that is for another time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good to me for signed divides. The divide cost code could do with some work, I had a patch to try and address some of it but it didn't get very far.
Looking at the codegen, should we be adding 1 extra fro the cost of the neg?
Not really Because the asr is folded into it most of the time |
I was looking into https://llvm.godbolt.org/z/xTKTddj73. The vector versions will not generally be able to fold the sub into the other instructions. For the scalar versions, sub+asr is usually a more expensive instruction than a single sub, although exact throughput costs become difficult to be precise about. |
cceda2b
to
32d9eb6
Compare
Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.
Addressed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM.
Fixed! @davemgreen I do not have merge perms. Can you please merge? |
…st (llvm#143013) Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.
…st (llvm#143013) Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.
Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.