Skip to content

[llvm][aarch64] Apple A16 & A17 had adrp-add fusion, but A14 did not #81325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 10, 2024

Conversation

jroelofs
Copy link
Contributor

@jroelofs jroelofs commented Feb 9, 2024

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Jon Roelofs (jroelofs)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/81325.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64.td (+2-1)
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index 02fb01caf7e80..bc5cb852e13f1 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -1120,7 +1120,6 @@ def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
                                     FeatureFuseArithmeticLogic,
                                     FeatureFuseCCSelect,
                                     FeatureFuseCryptoEOR,
-                                    FeatureFuseAdrpAdd,
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMove,
@@ -1149,6 +1148,7 @@ def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
                                     FeatureArithmeticCbzFusion,
                                     FeatureDisableLatencySchedHeuristic,
                                     FeatureFuseAddress,
+                                    FeatureFuseAdrpAdd,
                                     FeatureFuseAES,
                                     FeatureFuseArithmeticLogic,
                                     FeatureFuseCCSelect,
@@ -1165,6 +1165,7 @@ def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
                                     FeatureArithmeticCbzFusion,
                                     FeatureDisableLatencySchedHeuristic,
                                     FeatureFuseAddress,
+                                    FeatureFuseAdrpAdd,
                                     FeatureFuseAES,
                                     FeatureFuseArithmeticLogic,
                                     FeatureFuseCCSelect,

@jroelofs jroelofs requested a review from aemerson February 9, 2024 22:36
@jthackray jthackray self-requested a review February 9, 2024 22:41
Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alexander-shaposhnikov
Copy link
Collaborator

LG

@jroelofs jroelofs merged commit eb1b428 into llvm:main Feb 10, 2024
@jroelofs jroelofs deleted the jroelofs/adrp-add-fusion branch February 10, 2024 00:52
@davemgreen
Copy link
Collaborator

FuseAdrpAdd is also used to enable linker relaxation in lld and the gnu linkers. I guess you are assuming this will make use of linker optimization hints instead and use the Apple linker? (That sounds OK if so).

Changes like this should ideally be accompanied by a test. Hopefully that is fairly simple, like the test in llvm/test/CodeGen/AArch64/misched-fusion-addadrp.ll.

@jroelofs
Copy link
Contributor Author

I was assuming Apple ld/ld-prime, and only thinking about the effect of this on instruction scheduling / macro-fusion + knowledge of the respective uarch's.

Can you tell me more about how that linker relaxation works? Is it an optimization that keys off of FeatureAdrpAdd itself, or is it merely enabled by placing the adrp next to the add?

Do you think A14 deserves:

+ // apple-a14 doesn't actually fuse these, but scheduling them 
+ // adjacent allows linker relaxation.
FeatureFuseAdrpAdd,

instead of:

- FeatureFuseAdrpAdd,

Changes like this should ideally be accompanied by a test.

I'll add one.

jroelofs added a commit that referenced this pull request Feb 11, 2024
There are a couple of open questions on what we should do for A14, so I'll
leave that off for now.

#81325 (comment)
@jroelofs
Copy link
Contributor Author

Changes like this should ideally be accompanied by a test.

I'll add one.

ffab5a0

@davemgreen
Copy link
Collaborator

I was assuming Apple ld/ld-prime, and only thinking about the effect of this on instruction scheduling / macro-fusion + knowledge of the respective uarch's.

Can you tell me more about how that linker relaxation works? Is it an optimization that keys off of FeatureAdrpAdd itself, or is it merely enabled by placing the adrp next to the add?

As far as I understand inside the linker it is done via adjacent instructions. It doesn't have much other information, other than the relocations. I think so long as the linker used will most likely be apple ld using LoH's, then it's fine to be more accurate with the fusion and not "mis-use" the feature to get adjacent instructions.

Changes like this should ideally be accompanied by a test.

I'll add one.

ffab5a0

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants