Skip to content

[AArch64] Consider negated powers of 2 when calculating throughput cost #143013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2025

Conversation

AZero13
Copy link
Contributor

@AZero13 AZero13 commented Jun 5, 2025

Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.

@llvmbot
Copy link
Member

llvmbot commented Jun 5, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: AZero13 (AZero13)

Changes

Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.


Full diff: https://github.com/llvm/llvm-project/pull/143013.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+2-2)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 68aec80f07e1d..16f5f76dd0482 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4005,7 +4005,7 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
       // have similar cost.
       auto VT = TLI->getValueType(DL, Ty);
       if (VT.isScalarInteger() && VT.getSizeInBits() <= 64) {
-        if (Op2Info.isPowerOf2()) {
+        if (Op2Info.isPowerOf2() || Op2Info.isNegatedPowerOf2()) {
           return ISD == ISD::SDIV ? (3 * AddCost + AsrCost)
                                   : (3 * AsrCost + AddCost);
         } else {
@@ -4013,7 +4013,7 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
         }
       } else if (VT.isVector()) {
         InstructionCost UsraCost = 2 * AsrCost;
-        if (Op2Info.isPowerOf2()) {
+        if (Op2Info.isPowerOf2() || Op2Info.isNegatedPowerOf2()) {
           // Division with scalable types corresponds to native 'asrd'
           // instruction when SVE is available.
           // e.g. %1 = sdiv <vscale x 4 x i32> %a, splat (i32 8)

@AZero13 AZero13 changed the title [AArch64] Negated powers of 2 not considered when it was meant to be [AArch64] Consider negated powers of 2 when calculating throughput cost Jun 5, 2025
@AZero13 AZero13 force-pushed the powero2 branch 2 times, most recently from 4e96dc6 to 53177f3 Compare June 5, 2025 18:18
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jun 5, 2025
@AZero13
Copy link
Contributor Author

AZero13 commented Jun 5, 2025

@davemgreen Thoughts? Yes, I know exact is not covered but that is for another time.

@davemgreen davemgreen requested review from davemgreen and sushgokh June 6, 2025 08:47
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good to me for signed divides. The divide cost code could do with some work, I had a patch to try and address some of it but it didn't get very far.

Looking at the codegen, should we be adding 1 extra fro the cost of the neg?

@AZero13
Copy link
Contributor Author

AZero13 commented Jun 6, 2025

Not really

Because the asr is folded into it most of the time

@AZero13 AZero13 requested a review from davemgreen June 7, 2025 15:21
@davemgreen
Copy link
Collaborator

Not really

Because the asr is folded into it most of the time

I was looking into https://llvm.godbolt.org/z/xTKTddj73. The vector versions will not generally be able to fold the sub into the other instructions. For the scalar versions, sub+asr is usually a more expensive instruction than a single sub, although exact throughput costs become difficult to be precise about.

@AZero13 AZero13 force-pushed the powero2 branch 2 times, most recently from cceda2b to 32d9eb6 Compare June 8, 2025 18:53
Negated powers of 2 have similar or (exact in the case of remainder) codegen with lowering sdiv. In the case of sdiv, it just negates the result in the end anyway, so nothing dissimilar at all.
@AZero13
Copy link
Contributor Author

AZero13 commented Jun 9, 2025

Not really
Because the asr is folded into it most of the time

I was looking into https://llvm.godbolt.org/z/xTKTddj73. The vector versions will not generally be able to fold the sub into the other instructions. For the scalar versions, sub+asr is usually a more expensive instruction than a single sub, although exact throughput costs become difficult to be precise about.

Addressed!

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@AZero13
Copy link
Contributor Author

AZero13 commented Jun 10, 2025

Fixed! @davemgreen I do not have merge perms. Can you please merge?

@AZero13 AZero13 requested a review from davemgreen June 10, 2025 21:12
@davemgreen davemgreen merged commit 79a72c4 into llvm:main Jun 11, 2025
7 checks passed
@AZero13 AZero13 deleted the powero2 branch June 11, 2025 13:25
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…st (llvm#143013)

Negated powers of 2 have similar or (exact in the case of remainder)
codegen with lowering sdiv. In the case of sdiv, it just negates the
result in the end anyway, so nothing dissimilar at all.
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…st (llvm#143013)

Negated powers of 2 have similar or (exact in the case of remainder)
codegen with lowering sdiv. In the case of sdiv, it just negates the
result in the end anyway, so nothing dissimilar at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants