Skip to content

[Aarch64] [ISel] Don't save vaargs registers if vaargs are unused #126780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2025

Conversation

pskrgag
Copy link
Contributor

@pskrgag pskrgag commented Feb 11, 2025

If vaargs are not used there is no need to save them. LLVM already implements such optimization for x86, as well as gcc [1].

Some ABI tests are kept almost as-is, except for stack offsets, by just adding llvm.va_start. Only laapcs_vararg_frame.ll test was rewritten to match new behavior.

[1] https://godbolt.org/z/GWWKr8xMd

@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Pavel Skripkin (pskrgag)

Changes

If vaargs are not used there is no need to save them. LLVM already implements such optimization for x86, as well as gcc [1].

Some ABI tests are kept almost as-is, except for stack offsets, by just adding llvm.va_start. Only laapcs_vararg_frame.ll test was rewritten to match new behavior.

[1] https://godbolt.org/z/GWWKr8xMd


Full diff: https://github.com/llvm/llvm-project/pull/126780.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll (+4-14)
  • (modified) llvm/test/CodeGen/AArch64/alloca.ll (+7-6)
  • (modified) llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll (-4)
  • (modified) llvm/test/CodeGen/AArch64/darwinpcs-tail.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/vararg-tallcall.ll (+4-2)
  • (modified) llvm/test/CodeGen/AArch64/win64_vararg2.ll (+4-7)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 34464d317beaf..5103374e14d80 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8189,7 +8189,7 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
   }
 
   // varargs
-  if (isVarArg) {
+  if (isVarArg && DAG.getMachineFunction().getFrameInfo().hasVAStart()) {
     if (!Subtarget->isTargetDarwin() || IsWin64) {
       // The AAPCS variadic function ABI is identical to the non-variadic
       // one. As a result there may be more arguments in registers and we should
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll b/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
index 7892e892a2412..ddc4347950b27 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
@@ -3,22 +3,12 @@
 ; RUN: llc < %s --global-isel=1 -mtriple=aarch64-linux-gnu -mattr=+fp-armv8 | FileCheck %s --check-prefix=GISEL
 
 define void @va(i32 %count, half %f, ...) nounwind {
-; CHECK-LABEL: va:
-; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    sub sp, sp, #176
-; CHECK-NEXT:    stp x2, x3, [sp, #128]
-; CHECK-NEXT:    str x1, [sp, #120]
-; CHECK-NEXT:    stp x4, x5, [sp, #144]
-; CHECK-NEXT:    stp x6, x7, [sp, #160]
-; CHECK-NEXT:    stp q1, q2, [sp]
-; CHECK-NEXT:    stp q3, q4, [sp, #32]
-; CHECK-NEXT:    stp q5, q6, [sp, #64]
-; CHECK-NEXT:    str q7, [sp, #96]
-; CHECK-NEXT:    add sp, sp, #176
+; CHECK-LABEL: va:                                     // @va
+; CHECK:       // %bb.0:                               // %entry
 ; CHECK-NEXT:    ret
 ;
-; GISEL-LABEL: va:
-; GISEL:       // %bb.0: // %entry
+; GISEL-LABEL: va:                                     // @va
+; GISEL:       // %bb.0:                               // %entry
 ; GISEL-NEXT:    sub sp, sp, #176
 ; GISEL-NEXT:    stp x1, x2, [sp, #120]
 ; GISEL-NEXT:    stp x3, x4, [sp, #136]
diff --git a/llvm/test/CodeGen/AArch64/alloca.ll b/llvm/test/CodeGen/AArch64/alloca.ll
index ca3a500d79f32..76b50d62ab600 100644
--- a/llvm/test/CodeGen/AArch64/alloca.ll
+++ b/llvm/test/CodeGen/AArch64/alloca.ll
@@ -82,21 +82,22 @@ define void @test_variadic_alloca(i64 %n, ...) {
 ; [...]
 ; CHECK-DAG: stp     q0, q1, [x29, #-192]
 
-; CHECK-DAG: stp     x5, x6, [x29, #-24]
+; CHECK-DAG: stp     x5, x6, [x29, #-32]
 ; [...]
-; CHECK-DAG: stp     x1, x2, [x29, #-56]
+; CHECK-DAG: stp     x1, x2, [x29, #-64]
 
 ; CHECK-NOFP-ARM64: stp     x29, x30, [sp, #-16]!
 ; CHECK-NOFP-ARM64: mov     x29, sp
 ; CHECK-NOFP-ARM64: sub     sp, sp, #64
-; CHECK-NOFP-ARM64-DAG: stp     x5, x6, [x29, #-24]
+; CHECK-NOFP-ARM64-DAG: stp     x5, x6, [x29, #-32]
 ; [...]
-; CHECK-NOFP-ARM64-DAG: stp     x3, x4, [x29, #-40]
+; CHECK-NOFP-ARM64-DAG: stp     x3, x4, [x29, #-48]
 ; [...]
-; CHECK-NOFP-ARM64-DAG: stp     x1, x2, [x29, #-56]
+; CHECK-NOFP-ARM64-DAG: stp     x1, x2, [x29, #-64]
 ; [...]
 ; CHECK-NOFP-ARM64: mov     x8, sp
-
+  %valist = alloca i8
+  call void @llvm.va_start(ptr %valist)
   %addr = alloca i8, i64 %n
 
   call void @use_addr(ptr %addr)
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll b/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
index 1ed6a273338ab..9e2633f3f63ea 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
@@ -24,10 +24,6 @@ define void @has_varargs(...) hybrid_patchable nounwind {
 ; CHECK-NEXT:      .p2align 2
 ; CHECK-NEXT:  "#has_varargs$hp_target":               // @"#has_varargs$hp_target"
 ; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:      sub     sp, sp, #32
-; CHECK-NEXT:      stp     x0, x1, [x4, #-32]
-; CHECK-NEXT:      stp     x2, x3, [x4, #-16]
-; CHECK-NEXT:      add     sp, sp, #32
 ; CHECK-NEXT:      ret
   ret void
 }
diff --git a/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll b/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
index 5d3c755d0d73d..da176894c48a9 100644
--- a/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
+++ b/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
@@ -8,8 +8,8 @@
 ; CHECK-LABEL: _tailTest:
 ; CHECK:       b __ZN1C3addEPKcz
 ; CHECK-LABEL: __ZThn8_N1C1fEiiiiiiiiiz:
-; CHECK:       ldr     w9, [sp, #4]
-; CHECK:       str     w9, [sp, #4]
+; CHECK:       ldr     w8, [sp, #4]
+; CHECK:       str     w8, [sp, #4]
 ; CHECK:       b __ZN1C1fEiiiiiiiiiz
 
 %class.C = type { %class.A.base, [4 x i8], %class.B.base, [4 x i8] }
diff --git a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll b/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
index 812837639196e..6abe33ac6816e 100644
--- a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
+++ b/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
@@ -14,6 +14,8 @@ $"??_9B@@$BA@AA" = comdat any
 ; Function Attrs: noinline optnone
 define linkonce_odr void @"??_9B@@$BA@AA"(ptr %this, ...) #1 comdat align 2  {
 entry:
+  %valist = alloca i8
+  call void @llvm.va_start.p0(ptr %valist)
   %this.addr = alloca ptr, align 8
   store ptr %this, ptr %this.addr, align 8
   %this1 = load ptr, ptr %this.addr, align 8
@@ -37,6 +39,6 @@ attributes #1 = { noinline optnone "thunk" }
 ; CHECK-EC: ldr     x9, [x0]
 ; CHECK-EC: ldr     x11, [x9]
 ; CHECK-EC: mov     v0.16b, v7.16b
-; CHECK-EC: add     x4, sp, #64
-; CHECK-EC: add     sp, sp, #64
+; CHECK-EC: add     x4, sp, #80
+; CHECK-EC: add     sp, sp, #80
 ; CHECK-EC: br      x11
diff --git a/llvm/test/CodeGen/AArch64/win64_vararg2.ll b/llvm/test/CodeGen/AArch64/win64_vararg2.ll
index dff49148fb772..27d7d4786e4df 100644
--- a/llvm/test/CodeGen/AArch64/win64_vararg2.ll
+++ b/llvm/test/CodeGen/AArch64/win64_vararg2.ll
@@ -7,17 +7,14 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) {
 ; CHECK-LABEL: va_func:
 ; CHECK:       .seh_proc va_func
 ; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:    sub sp, sp, #80
-; CHECK-NEXT:    .seh_stackalloc 80
+; CHECK-NEXT:    sub sp, sp, #32
+; CHECK-NEXT:    .seh_stackalloc 32
 ; CHECK-NEXT:    str x19, [sp, #16] // 8-byte Folded Spill
 ; CHECK-NEXT:    .seh_save_reg x19, 16
 ; CHECK-NEXT:    str x30, [sp, #24] // 8-byte Folded Spill
 ; CHECK-NEXT:    .seh_save_reg x30, 24
 ; CHECK-NEXT:    .seh_endprologue
 ; CHECK-NEXT:    mov w19, w0
-; CHECK-NEXT:    stp x3, x4, [sp, #40]
-; CHECK-NEXT:    stp x5, x6, [sp, #56]
-; CHECK-NEXT:    str x7, [sp, #72]
 ; CHECK-NEXT:    str w0, [sp, #12]
 ; CHECK-NEXT:    strb w1, [sp, #11]
 ; CHECK-NEXT:    strb w2, [sp, #10]
@@ -29,8 +26,8 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) {
 ; CHECK-NEXT:    .seh_save_reg x30, 24
 ; CHECK-NEXT:    ldr x19, [sp, #16] // 8-byte Folded Reload
 ; CHECK-NEXT:    .seh_save_reg x19, 16
-; CHECK-NEXT:    add sp, sp, #80
-; CHECK-NEXT:    .seh_stackalloc 80
+; CHECK-NEXT:    add sp, sp, #32
+; CHECK-NEXT:    .seh_stackalloc 32
 ; CHECK-NEXT:    .seh_endepilogue
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:    .seh_endfunclet

@pskrgag pskrgag requested review from cjacek and mstorsjo February 11, 2025 19:49
@pskrgag
Copy link
Contributor Author

pskrgag commented Feb 21, 2025

gentle ping

@@ -7,17 +7,14 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) {
; CHECK-LABEL: va_func:
; CHECK: .seh_proc va_func
; CHECK-NEXT: // %bb.0:
; CHECK-NEXT: sub sp, sp, #80
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another test which is trying to check vararg behaviour, so should have a va_start call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like GISel does not support vastart for win calling convention

bool AArch64InstructionSelector::selectVaStartAAPCS(
    MachineInstr &I, MachineFunction &MF, MachineRegisterInfo &MRI) const {

  if (STI.isCallingConvWin64(MF.getFunction().getCallingConv(),
                             MF.getFunction().isVarArg()))
    return false;

@@ -8189,7 +8189,7 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
}

// varargs
if (isVarArg) {
if (isVarArg && DAG.getMachineFunction().getFrameInfo().hasVAStart()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this path is already handled in globalisel that should also be updated to match

Copy link
Contributor Author

@pskrgag pskrgag Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think I get what you mean, sorry.

IIUC, GlobalIsel is quite new approach for generating machine code, which is supposed to be a successor for SelectionDAG. It can be turned on using --global-isel=1.

So, I didn't find any hasVAStart() references in GISel code and tried simple examples to see if such optimization exists.

After some tests I've observed that llc actually crashes on vaargs, when GISel is turned on. Here is godbolt link for x86 https://godbolt.org/z/q66feEb65. And locally (after addressing @ostannard review) I got similar failure on CodeGen/AArch64/win64_vararg2.ll test.

+ /home/paskripkin/Documents/git/llvm-project/build/bin/llc -global-isel -mtriple=aarch64-pc-win32
+ /home/paskripkin/Documents/git/llvm-project/build/bin/FileCheck /home/paskripkin/Documents/git/llvm-project/llvm/test/CodeGen/AArch64/win64_vararg2.ll --check-prefix=GISEL
LLVM ERROR: cannot select: G_VASTART %17:gpr(p0) :: (store (s64) into %ir.valist) (in function: va_func)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/paskripkin/Documents/git/llvm-project/build/bin/llc -global-isel -mtriple=aarch64-pc-win32
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'InstructionSelect' on function '@va_func'

Is it known? If not, I will try to look deeper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it works with -global-isel-abort=0, you're fine; you don't need to implement G_VASTART on any target where it isn't already implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added -global-isel-abort=0 to failing test.

@pskrgag
Copy link
Contributor Author

pskrgag commented Mar 3, 2025

gentle ping

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pskrgag
Copy link
Contributor Author

pskrgag commented Mar 5, 2025

thanks for the review!

should I wait for second approval, or I can merge it?

@ostannard
Copy link
Collaborator

We only require one approval, and I think you've addressed the other reviewers comments, so you can merge this now.

@pskrgag pskrgag merged commit e122483 into llvm:main Mar 5, 2025
11 checks passed
@zmodem
Copy link
Collaborator

zmodem commented Mar 14, 2025

We're hitting a large amount of test failures on Windows arm64 after this change.

+cc @rnk for windows and calling convention expertise. I don't really understand what "Don't save vaargs registers if vaargs are unused" means, but maybe Windows is different somehow?

For example, in the changed file I see several instances of if (IsWin64) <do something based on isVarArg>, and this PR is only changing one use of isVarArg, so maybe things were getting out of sync?

I'll prepare a revert of this for now.

zmodem added a commit that referenced this pull request Mar 14, 2025
…used (#126780)"

This caused miscompiles on windows arm64, see comment on the PR.

This reverts commit e122483.
@DavidSpickett
Copy link
Collaborator

Linaro's 2 stage Windows arm64 build also started failing on a build including this: https://lab.llvm.org/buildbot/#/builders/124/builds/490

(I made a mistake analysing the results, that's why we're delayed so much)

We are investigating on our side.

(FYI @luporl)

@pskrgag
Copy link
Contributor Author

pskrgag commented Mar 14, 2025

Sorry for breakage.

I am not Windows ABI expert, so I am fine with revert, since it's just an optimization for very odd corner case.

@rnk
Copy link
Collaborator

rnk commented Mar 14, 2025

I actually don't know the win64 arm vararg convention, but historically Microsoft has taken an approach much more similar to Apple, where variadic arguments all get pass in memory on the stack, for simplicity, if that helps explain the failures.

@mstorsjo
Copy link
Member

I actually don't know the win64 arm vararg convention, but historically Microsoft has taken an approach much more similar to Apple, where variadic arguments all get pass in memory on the stack, for simplicity, if that helps explain the failures.

On Windows/arm64, normally floats are passed in vector registers and ints in GPRs, but for vararg functions, both floats and ints are passed in GPRs. Normally, up to 8 arguments are passed in registers. For vararg functions, the prologue dumps all non-fixed argument registers on the stack, so that it lines up with the following arguments on the stack. (E.g. for a function with 3 fixed parameters, passed in x0, x1, x2, the function dumps x3, x4, ... x7 on the stack, so that varargs can iterate over them plus any following arguments on the stack in one sequence.)

I presume the purpose of this patch is to skip the couple of instructions for dumping these registers on the stack, if nobody actually is going to use the varargs (no va_start etc). Sounds like a reasonable idea (and a quick test with one case seems like it roughly does what it's supposed to), but apparently it does break something somewhere.

@pskrgag
Copy link
Contributor Author

pskrgag commented Mar 14, 2025

Is there any way to reproduce these results locally? At least to understand what got broken and try to fix it

(I guess, i will need to find windows vm...)

@pskrgag
Copy link
Contributor Author

pskrgag commented Mar 15, 2025

So i decided to simply drop win64 from the patch. See #131459

@zmodem
Copy link
Collaborator

zmodem commented Mar 17, 2025

Is there any way to reproduce these results locally? At least to understand what got broken and try to fix it

I don't have a great setup for debugging win/arm64 right now. Maybe the Linaro folks have a better idea of what's happening.

I think I could share the binary that's failing though, in case that's useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants