-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Stop reserved registers from being saved in prolog/epilog #138448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-aarch64 Author: None (yasuna-oribe) ChangesGCC's man page is clear on how -ffixed-reg must behave:
This implies prolog/epilog code also must not save/restore explicitly fixed registers, even when it is callee-saved. Some projects rely on this (GCC's) behavior. For example,
should not touch x28 outside of For riscv64, clang behaves the same as GCC, so I am inclined to believe this is indeed a bug. Fixes #111379. Full diff: https://github.com/llvm/llvm-project/pull/138448.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 78ac57e3e92a6..2d72f8757d7c0 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -3619,6 +3619,13 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
if (Reg == BasePointerReg)
SavedRegs.set(Reg);
+ // Don't save fixed registers specified with -ffixed-reg.
+ if (AArch64::GPR64RegClass.contains(Reg) &&
+ RegInfo->isReservedReg(MF, Reg)) {
+ SavedRegs.reset(Reg);
+ continue;
+ }
+
bool RegUsed = SavedRegs.test(Reg);
unsigned PairedReg = AArch64::NoRegister;
const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
|
This comment was marked as resolved.
This comment was marked as resolved.
Yep, tests succeeded now. I'm sorry for the force push, I just noticed I'm supposed to use |
…/epilog GCC's man page is clear on how -ffixed-reg must behave: ``` Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role). ``` This implies prolog/epilog code also must not save/restore explicitly fixed registers, even when it is callee-saved. Some projects rely on this (GCC's) behavior. For example, ``` void f() { register uint64_t x28 asm("x28") = 0xee; asm volatile("" : "+r"(x28)); // avoid mov being eliminated } ``` should not touch x28 outside of `mov w28,#0xee`. For riscv64 clang behaves the same as GCC. Fixes llvm#111379.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would you be willing to add some tests just to verify behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests? I have a feeling this has come up before and there were edge cases to worry about, but don't know the history.
I'd like to check with our GCC team about the behaviour of -ffixed particularly for a Callee-saved register. The GCC docs leave a lot of room for interpretation. At one level -ffixed-xN can be seen as "I'm globally reserving xN for the whole program" In which case it would make sense to exclude a registers from the callee saved set. An alternative interpretation is "I'm globally reserving xN for the functions in this translation unit" for example I am using binary library code that I don't control so I can't globally guarantee that xN is reserved. In that case it would make sense to still callee save the register, although it would mean that only temporary registers (like x18) can be used if we require no callee saves. |
GCC for AArch64 does seem to callee save these registers https://godbolt.org/z/3oj3dMb1P so it seems like the As expected it won't save if a temporary register like x9 is used instead of a callee save register. |
@smithp35, do you mean it doesn't? (https://godbolt.org/z/6PWT4WP1h) If I'm not mistaken, I don't see x28 (callee-saved reg) being saved or restored anywhere in the GCC assembly, regardless if it's a local or global variable. This pull request is specifically for that GCC behavior, which llvm's riscv64 target follows (https://godbolt.org/z/cxeqnc6hT). @nasherm @davemgreen I've added the tests (llvm/test/CodeGen/AArch64/reserveXreg.ll). |
@@ -551,6 +563,11 @@ bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF, | |||
return getReservedRegs(MF)[Reg]; | |||
} | |||
|
|||
bool AArch64RegisterInfo::isUserReservedReg(const MachineFunction &MF, | |||
MCRegister Reg) const { | |||
return getUserReservedRegs(MF)[Reg]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This recomputes the entire BitVector every time we query for a callee-saved register? That seems expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was fully aware of this when writing the function, but take a look at line 431, // FIXME: avoid re-calculating this every time.
for getStrictlyReservedRegs, a way more expensive function returning a BitVector of registers, that is also called in the same loop. I've followed the golden rule and followed existing code.
The loop in determineCalleeSaves only loops over callee-saved regs (CSRegs) anyway, and if you insist, I can save the BitVector at the top of the function and lookup from it, but it will be inconsistent with the rest of the function that uses isReserved directly (which calls getReservedRegs which in turn calls getStrictlyReservedRegs), breaking the golden rule, or require me to overhaul things out of my scope, and I doubt it will make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess given the actual number of GPRs on AArch64 is a fixed number, it's not a big deal to loop 1000 times.
|
||
; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu | FileCheck %s | ||
|
||
define void @t1() "target-features"="+reserve-x1" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these tests aren't really doing anything because the register in question is caller-save anyway. But... I guess it's not a big deal; the code is pretty uniform anyway. Maybe could use preserve_most calling convention to check a few more registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know of that. I've changed these to be preserve_mostcc but I'll remove X1-X8 and X16-X18 (never preserved) too if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -551,6 +563,11 @@ bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF, | |||
return getReservedRegs(MF)[Reg]; | |||
} | |||
|
|||
bool AArch64RegisterInfo::isUserReservedReg(const MachineFunction &MF, | |||
MCRegister Reg) const { | |||
return getUserReservedRegs(MF)[Reg]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess given the actual number of GPRs on AArch64 is a fixed number, it's not a big deal to loop 1000 times.
My apologies I must have interpreted the top window of my compiler explorer example as clang. I did ask the GCC team about the intended behaviour. There wasn't any clear specification of how it should behave and using callee-save registers is at the users own risk. |
Does this pull request have enough approvals to be merged? If so, can anyone merge it for me?
@smithp35, Well, since it indeed is the users' own risk, and GCC/riscv64 llvm exhibits the behavior this PR implements, I believe this PR is a correct decision rather than leaving behavior inconsistent across targets and compilers. |
Just to be clear, this was just reporting what the GCC Team had said, not presenting this as an argument that this should not be done. There's a defacto implementation, and unless GCC are going to change (no plans), it is better to match GCC than not. I would recommend that if a user has a choice of register to reserve, a temporary register is lower risk than a callee-save, but that's out of scope of this change. |
It's really not obvious what's supposed to happen when you mix register variables and reserved registers. You're telling the compiler to not generate code using that register... then at the same time, you're telling the compiler to generate code using that register. There isn't any obvious way to define what that means. So you get weird results in various passes. Like, here's a fun example in gcc:
The constant 0xee completely disappears. That said, this seems like an improvement; spilling is clearly not expected here. |
@yasuna-oribe Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/18873 Here is the relevant piece of the build log for the reference
|
Fixes buildbot failure https://lab.llvm.org/buildbot/#/builders/16/builds/18873 originating from llvm#138448. Normally ignored silently but fails on higher error levels.
Fixes buildbot failure https://lab.llvm.org/buildbot/#/builders/16/builds/18873 originating from llvm#138448. Normally ignored silently but fails on higher error levels. Buildbot errors: ``` /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6 + /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll + /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu '+reserve-x8' is not a recognized feature for this target (ignoring feature) '+reserve-x8' is not a recognized feature for this target (ignoring feature) '+reserve-x16' is not a recognized feature for this target (ignoring feature) '+reserve-x16' is not a recognized feature for this target (ignoring feature) '+reserve-x17' is not a recognized feature for this target (ignoring feature) '+reserve-x17' is not a recognized feature for this target (ignoring feature) ```
Fixes buildbot failure https://lab.llvm.org/buildbot/#/builders/16/builds/18873 originating from #138448. Normally ignored silently but fails on higher error levels. Buildbot errors: ``` /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6 + /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll + /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu '+reserve-x8' is not a recognized feature for this target (ignoring feature) '+reserve-x8' is not a recognized feature for this target (ignoring feature) '+reserve-x16' is not a recognized feature for this target (ignoring feature) '+reserve-x16' is not a recognized feature for this target (ignoring feature) '+reserve-x17' is not a recognized feature for this target (ignoring feature) '+reserve-x17' is not a recognized feature for this target (ignoring feature) ```
This causes a 0.2% compile-time regression for O0-g builds (https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u). Is it possible to query the reserved registers in a more efficient way? |
With this patch, the reserveXreg.ll restcase fails if we run verifiers, e.g. if adding -verify-machineinstrs to the run line (or if you compiled llc with EXPENSIVE_CHECKS).
|
There's some discussion of the machine-verifier failure on #139696 . Passing -verify-machineinstrs=false should disable the machine-verifier checks on the expensive checks buildbot. |
@mikaelholmen, I'm aware of that, honestly I have no idea what that means. I've tested the following minimal reproducible example after reverting the patch in my local tree and it still gives me that error with -DLLVM_ENABLE_EXPENSIVE_CHECKS. I really don't see why a simple
The error also happens if I compile the following with
Mind you, all these errors are happening even without my changes to LLVM. If the failing test case is causing problems for others, please revert it. Or alternatively, as @efriedma-quic suggested, would passing @nikic |
…regression [AArch64] Disable machine-verifier for failing test, fix perf regression Disables machine-verifier on failing test for now for the test to pass on expensive-checks. Also fixes performance regression mentioned in llvm#138448 (https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u) by computing reserved registers upfront and not every loop iteration.
I've opened #140005 that disables machine-verifier for the test (apparently there are quite some tests that do this, so I assume it's OK to do) and hopefully fixes the performance regression by computing the BitVectors upfront. I've tested it locally with expensive-checks so it shouldn't cause issues this time. |
…ion (#140005) Disables machine-verifier on failing test for now for the test to pass on expensive-checks. Also fixes performance regression (https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u) mentioned in #138448 by not computing reserved registers every loop iteration.
For why the machine verifier is complaining: there's a rule that every register has to be defined before it is used. Internally, "X1" and "W1" are different registers, so defining "W1" doesn't define "X1". (This is how it works for all registers with subregisters; it's not specific to what kind of register is involved.) The usual way to represent this is adding an implicit-def operand to the instruction; this reflects the general aarch64 rule that defining "W1" also defines "X1". |
GCC's documentation is clear on how -ffixed-reg must behave:
This implies prolog/epilog code also must not save/restore explicitly fixed registers, even when it is callee-saved. Some projects rely on this (GCC's) behavior.
For example,
should not touch x28 outside of
mov w28,#0xee
.For riscv64, clang behaves the same as GCC, so I am inclined to believe this is indeed a bug.
Fixes #111379.