Skip to content

[AArch64] Stop reserved registers from being saved in prolog/epilog #138448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 12, 2025

Conversation

yasuna-oribe
Copy link
Contributor

@yasuna-oribe yasuna-oribe commented May 4, 2025

GCC's documentation is clear on how -ffixed-reg must behave:

  Treat the register named reg as a fixed register; generated
  code should never refer to it (except perhaps as a stack pointer,
  frame pointer or in some other fixed role).

This implies prolog/epilog code also must not save/restore explicitly fixed registers, even when it is callee-saved. Some projects rely on this (GCC's) behavior.

For example,

void f() {
  register uint64_t x28 asm("x28") = 0xee;
  asm volatile("" : "+r"(x28)); // avoid mov being eliminated
}

should not touch x28 outside of mov w28,#0xee.

For riscv64, clang behaves the same as GCC, so I am inclined to believe this is indeed a bug.

Fixes #111379.

Copy link

github-actions bot commented May 4, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 4, 2025

@llvm/pr-subscribers-backend-aarch64

Author: None (yasuna-oribe)

Changes

GCC's man page is clear on how -ffixed-reg must behave:

  Treat the register named reg as a fixed register; generated
  code should never refer to it (except perhaps as a stack pointer,
  frame pointer or in some other fixed role).

This implies prolog/epilog code also must not save/restore explicitly fixed registers, even when it is callee-saved. Some projects rely on this (GCC's) behavior.

For example,

void f() {
  register uint64_t x28 asm("x28") = 0xee;
  asm volatile("" : "+r"(x28)); // avoid mov being eliminated
}

should not touch x28 outside of mov w28,#<!-- -->0xee.

For riscv64, clang behaves the same as GCC, so I am inclined to believe this is indeed a bug.

Fixes #111379.


Full diff: https://github.com/llvm/llvm-project/pull/138448.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+7)
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 78ac57e3e92a6..2d72f8757d7c0 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -3619,6 +3619,13 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
     if (Reg == BasePointerReg)
       SavedRegs.set(Reg);
 
+    // Don't save fixed registers specified with -ffixed-reg.
+    if (AArch64::GPR64RegClass.contains(Reg) &&
+        RegInfo->isReservedReg(MF, Reg)) {
+      SavedRegs.reset(Reg);
+      continue;
+    }
+
     bool RegUsed = SavedRegs.test(Reg);
     unsigned PairedReg = AArch64::NoRegister;
     const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);

@yasuna-oribe yasuna-oribe changed the title [AArch64] Fix reserved registers being saved in prolog/epilog [AArch64] Stop reserved registers from being saved in prolog/epilog May 4, 2025
@yasuna-oribe yasuna-oribe marked this pull request as draft May 4, 2025 13:17
@yasuna-oribe

This comment was marked as resolved.

@yasuna-oribe yasuna-oribe marked this pull request as ready for review May 4, 2025 15:40
@yasuna-oribe
Copy link
Contributor Author

yasuna-oribe commented May 4, 2025

Yep, tests succeeded now. I'm sorry for the force push, I just noticed I'm supposed to use git --fixup.
ping @davemgreen @sjoerdmeijer @nasherm @smithp35 (?)

…/epilog

GCC's man page is clear on how -ffixed-reg must behave:
```
  Treat the register named reg as a fixed register; generated
  code should never refer to it (except perhaps as a stack pointer,
  frame pointer or in some other fixed role).
```

This implies prolog/epilog code also must not save/restore explicitly
fixed registers, even when it is callee-saved. Some projects rely on
this (GCC's) behavior.

For example,
```
void f() {
  register uint64_t x28 asm("x28") = 0xee;
  asm volatile("" : "+r"(x28)); // avoid mov being eliminated
}
```
should not touch x28 outside of `mov w28,#0xee`.

For riscv64 clang behaves the same as GCC.

Fixes llvm#111379.
@yasuna-oribe yasuna-oribe reopened this May 6, 2025
Copy link
Contributor

@nasherm nasherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would you be willing to add some tests just to verify behavior?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests? I have a feeling this has come up before and there were edge cases to worry about, but don't know the history.

@davemgreen davemgreen requested a review from efriedma-quic May 6, 2025 13:29
@smithp35
Copy link
Collaborator

smithp35 commented May 6, 2025

I'd like to check with our GCC team about the behaviour of -ffixed particularly for a Callee-saved register. The GCC docs leave a lot of room for interpretation.

At one level -ffixed-xN can be seen as "I'm globally reserving xN for the whole program" In which case it would make sense to exclude a registers from the callee saved set. An alternative interpretation is "I'm globally reserving xN for the functions in this translation unit" for example I am using binary library code that I don't control so I can't globally guarantee that xN is reserved. In that case it would make sense to still callee save the register, although it would mean that only temporary registers (like x18) can be used if we require no callee saves.

@smithp35
Copy link
Collaborator

smithp35 commented May 6, 2025

I'd like to check with our GCC team about the behaviour of -ffixed particularly for a Callee-saved register. The GCC docs leave a lot of room for interpretation.

GCC for AArch64 does seem to callee save these registers https://godbolt.org/z/3oj3dMb1P so it seems like the (except perhaps as a stack pointer, frame pointer or in some other fixed role). is applying in this case.

As expected it won't save if a temporary register like x9 is used instead of a callee save register.

@yasuna-oribe
Copy link
Contributor Author

yasuna-oribe commented May 6, 2025

GCC for AArch64 does seem to callee save these registers

@smithp35, do you mean it doesn't? (https://godbolt.org/z/6PWT4WP1h) If I'm not mistaken, I don't see x28 (callee-saved reg) being saved or restored anywhere in the GCC assembly, regardless if it's a local or global variable. This pull request is specifically for that GCC behavior, which llvm's riscv64 target follows (https://godbolt.org/z/cxeqnc6hT).

@nasherm @davemgreen I've added the tests (llvm/test/CodeGen/AArch64/reserveXreg.ll).

@@ -551,6 +563,11 @@ bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF,
return getReservedRegs(MF)[Reg];
}

bool AArch64RegisterInfo::isUserReservedReg(const MachineFunction &MF,
MCRegister Reg) const {
return getUserReservedRegs(MF)[Reg];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recomputes the entire BitVector every time we query for a callee-saved register? That seems expensive.

Copy link
Contributor Author

@yasuna-oribe yasuna-oribe May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fully aware of this when writing the function, but take a look at line 431, // FIXME: avoid re-calculating this every time. for getStrictlyReservedRegs, a way more expensive function returning a BitVector of registers, that is also called in the same loop. I've followed the golden rule and followed existing code.

The loop in determineCalleeSaves only loops over callee-saved regs (CSRegs) anyway, and if you insist, I can save the BitVector at the top of the function and lookup from it, but it will be inconsistent with the rest of the function that uses isReserved directly (which calls getReservedRegs which in turn calls getStrictlyReservedRegs), breaking the golden rule, or require me to overhaul things out of my scope, and I doubt it will make a difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess given the actual number of GPRs on AArch64 is a fixed number, it's not a big deal to loop 1000 times.


; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu | FileCheck %s

define void @t1() "target-features"="+reserve-x1" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these tests aren't really doing anything because the register in question is caller-save anyway. But... I guess it's not a big deal; the code is pretty uniform anyway. Maybe could use preserve_most calling convention to check a few more registers.

Copy link
Contributor Author

@yasuna-oribe yasuna-oribe May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know of that. I've changed these to be preserve_mostcc but I'll remove X1-X8 and X16-X18 (never preserved) too if you want.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -551,6 +563,11 @@ bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF,
return getReservedRegs(MF)[Reg];
}

bool AArch64RegisterInfo::isUserReservedReg(const MachineFunction &MF,
MCRegister Reg) const {
return getUserReservedRegs(MF)[Reg];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess given the actual number of GPRs on AArch64 is a fixed number, it's not a big deal to loop 1000 times.

@smithp35
Copy link
Collaborator

smithp35 commented May 9, 2025

GCC for AArch64 does seem to callee save these registers

@smithp35, do you mean it doesn't? (https://godbolt.org/z/6PWT4WP1h) If I'm not mistaken, I don't see x28 (callee-saved reg) being saved or restored anywhere in the GCC assembly, regardless if it's a local or global variable. This pull request is specifically for that GCC behavior, which llvm's riscv64 target follows (https://godbolt.org/z/cxeqnc6hT).

My apologies I must have interpreted the top window of my compiler explorer example as clang.

I did ask the GCC team about the intended behaviour. There wasn't any clear specification of how it should behave and using callee-save registers is at the users own risk.

@yasuna-oribe
Copy link
Contributor Author

yasuna-oribe commented May 10, 2025

Does this pull request have enough approvals to be merged? If so, can anyone merge it for me?

There wasn't any clear specification of how it should behave and using callee-save registers is at the users own risk.

@smithp35, Well, since it indeed is the users' own risk, and GCC/riscv64 llvm exhibits the behavior this PR implements, I believe this PR is a correct decision rather than leaving behavior inconsistent across targets and compilers.

@smithp35
Copy link
Collaborator

There wasn't any clear specification of how it should behave and using callee-save registers is at the users own risk.

@smithp35, Well, since it indeed is the users' own risk, and GCC/riscv64 llvm exhibits the behavior this PR implements, I believe this PR is a correct decision rather than leaving behavior inconsistent across targets and compilers.

Just to be clear, this was just reporting what the GCC Team had said, not presenting this as an argument that this should not be done. There's a defacto implementation, and unless GCC are going to change (no plans), it is better to match GCC than not.

I would recommend that if a user has a choice of register to reserve, a temporary register is lower risk than a callee-save, but that's out of scope of this change.

@efriedma-quic
Copy link
Collaborator

It's really not obvious what's supposed to happen when you mix register variables and reserved registers. You're telling the compiler to not generate code using that register... then at the same time, you're telling the compiler to generate code using that register. There isn't any obvious way to define what that means. So you get weird results in various passes.

Like, here's a fun example in gcc:

void g(void);
void f(void) { register long x28 asm("x28") = 0xee; g(); asm volatile("" : "+r"(x28)); }

The constant 0xee completely disappears.


That said, this seems like an improvement; spilling is clearly not expected here.

@efriedma-quic efriedma-quic merged commit 26f97ee into llvm:main May 12, 2025
11 checks passed
Copy link

@yasuna-oribe Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 13, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/18873

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AArch64/reserveXreg.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)

# After Register Coalescer
********** INTERVALS **********


yasuna-oribe added a commit to yasuna-oribe/llvm-project that referenced this pull request May 13, 2025
Fixes buildbot failure https://lab.llvm.org/buildbot/#/builders/16/builds/18873
originating from llvm#138448. Normally ignored silently but fails on higher
error levels.
yasuna-oribe added a commit to yasuna-oribe/llvm-project that referenced this pull request May 13, 2025
Fixes buildbot failure https://lab.llvm.org/buildbot/#/builders/16/builds/18873
originating from llvm#138448. Normally ignored silently but fails on higher
error levels.

Buildbot errors:
```
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
```
davemgreen pushed a commit that referenced this pull request May 13, 2025
Fixes buildbot failure
https://lab.llvm.org/buildbot/#/builders/16/builds/18873 originating
from #138448. Normally ignored silently but fails on higher error
levels.

Buildbot errors:
```
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
```
@nikic
Copy link
Contributor

nikic commented May 13, 2025

This causes a 0.2% compile-time regression for O0-g builds (https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u). Is it possible to query the reserved registers in a more efficient way?

@mikaelholmen
Copy link
Collaborator

Hi @yasuna-oribe

With this patch, the reserveXreg.ll restcase fails if we run verifiers, e.g. if adding -verify-machineinstrs to the run line (or if you compiled llc with EXPENSIVE_CHECKS).
It then fails like

# After Register Coalescer
********** INTERVALS **********
W1 [16r,16d:1)[64r,64d:0) 0@64r 1@16r
W1_HI [16r,16d:1)[64r,64d:0) 0@64r 1@16r
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function t1: NoPHIs, TracksLiveness, TiedOpsRewritten

0B	bb.0 (%ir-block.0):
16B	  $w1 = MOVi32imm 256
64B	  INLINEASM &"" [sideeffect] [attdialect], $0:[regdef], implicit-def $x1, $1:[reguse], $x1
80B	  RET_ReallyLR

# End machine code for function t1.

*** Bad machine code: Defining instruction does not modify register ***
- function:    t1
- basic block: %bb.0  (0x55d733a4b4b0) [0B;96B)
- instruction: 16B	$w1 = MOVi32imm 256
- liverange:   [16r,16d:1)[64r,64d:0) 0@64r 1@16r
- regunit:     W1_HI
- ValNo:       1 (def 16r)
LLVM ERROR: Found 1 machine code errors.

@efriedma-quic
Copy link
Collaborator

There's some discussion of the machine-verifier failure on #139696 .

Passing -verify-machineinstrs=false should disable the machine-verifier checks on the expensive checks buildbot.

@yasuna-oribe
Copy link
Contributor Author

yasuna-oribe commented May 15, 2025

@mikaelholmen, I'm aware of that, honestly I have no idea what that means. I've tested the following minimal reproducible example after reverting the patch in my local tree and it still gives me that error with -DLLVM_ENABLE_EXPENSIVE_CHECKS. I really don't see why a simple mov w1,#256 would fail verification of any kind, can anyone more well-versed in LLVM explain to me what's wrong?

define preserve_mostcc void @t1() "target-features"="+reserve-x1" {
  call i64 asm sideeffect "", "={x1},{x1}"(i64 256)
  ret void
}

The error also happens if I compile the following with -ffixed-x14 -S -emit-llvm and llc it:

register long a asm("x14");
void f(void) { a = 256; }

Mind you, all these errors are happening even without my changes to LLVM. If the failing test case is causing problems for others, please revert it. Or alternatively, as @efriedma-quic suggested, would passing -verify-machineinstrs=false be an adequate temporary fix?

@nikic Take a look at the above discussion, if you want I'll move getReservedRegs and getUserReservedRegs to the top of the function. EDIT: (wow, that's much more than I thought.) I'll move getReservedRegs and getUserReservedRegs to the top of the function in the PR that fixes all this, it'll probably reduce compile times back to normal.

yasuna-oribe added a commit to yasuna-oribe/llvm-project that referenced this pull request May 15, 2025
…regression

[AArch64] Disable machine-verifier for failing test, fix perf regression

Disables machine-verifier on failing test for now for the test to pass on
expensive-checks. Also fixes performance regression mentioned in llvm#138448
(https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u)
by computing reserved registers upfront and not every loop iteration.
@yasuna-oribe
Copy link
Contributor Author

I've opened #140005 that disables machine-verifier for the test (apparently there are quite some tests that do this, so I assume it's OK to do) and hopefully fixes the performance regression by computing the BitVectors upfront. I've tested it locally with expensive-checks so it shouldn't cause issues this time.

efriedma-quic pushed a commit that referenced this pull request May 15, 2025
…ion (#140005)

Disables machine-verifier on failing test for now for the test to pass
on expensive-checks. Also fixes performance regression
(https://llvm-compile-time-tracker.com/compare.php?from=64082912a500d004c53ad1b3425098b495572663&to=26f97ee9aa413db240c397f96ddd5b0553a57d30&stat=instructions:u)
mentioned in #138448 by not computing reserved registers every loop
iteration.
@efriedma-quic
Copy link
Collaborator

For why the machine verifier is complaining: there's a rule that every register has to be defined before it is used. Internally, "X1" and "W1" are different registers, so defining "W1" doesn't define "X1". (This is how it works for all registers with subregisters; it's not specific to what kind of register is involved.)

The usual way to represent this is adding an implicit-def operand to the instruction; this reflects the general aarch64 rule that defining "W1" also defines "X1".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-ffixed-reg option not respected in function prolog/epilog
9 participants