-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Aarch64] [ISel] Don't save vaargs registers if vaargs are unused #126780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: Pavel Skripkin (pskrgag) ChangesIf vaargs are not used there is no need to save them. LLVM already implements such optimization for x86, as well as gcc [1]. Some ABI tests are kept almost as-is, except for stack offsets, by just adding llvm.va_start. Only laapcs_vararg_frame.ll test was rewritten to match new behavior. [1] https://godbolt.org/z/GWWKr8xMd Full diff: https://github.com/llvm/llvm-project/pull/126780.diff 7 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 34464d317beaf..5103374e14d80 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8189,7 +8189,7 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
}
// varargs
- if (isVarArg) {
+ if (isVarArg && DAG.getMachineFunction().getFrameInfo().hasVAStart()) {
if (!Subtarget->isTargetDarwin() || IsWin64) {
// The AAPCS variadic function ABI is identical to the non-variadic
// one. As a result there may be more arguments in registers and we should
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll b/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
index 7892e892a2412..ddc4347950b27 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/aapcs_vararg_frame.ll
@@ -3,22 +3,12 @@
; RUN: llc < %s --global-isel=1 -mtriple=aarch64-linux-gnu -mattr=+fp-armv8 | FileCheck %s --check-prefix=GISEL
define void @va(i32 %count, half %f, ...) nounwind {
-; CHECK-LABEL: va:
-; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: sub sp, sp, #176
-; CHECK-NEXT: stp x2, x3, [sp, #128]
-; CHECK-NEXT: str x1, [sp, #120]
-; CHECK-NEXT: stp x4, x5, [sp, #144]
-; CHECK-NEXT: stp x6, x7, [sp, #160]
-; CHECK-NEXT: stp q1, q2, [sp]
-; CHECK-NEXT: stp q3, q4, [sp, #32]
-; CHECK-NEXT: stp q5, q6, [sp, #64]
-; CHECK-NEXT: str q7, [sp, #96]
-; CHECK-NEXT: add sp, sp, #176
+; CHECK-LABEL: va: // @va
+; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ret
;
-; GISEL-LABEL: va:
-; GISEL: // %bb.0: // %entry
+; GISEL-LABEL: va: // @va
+; GISEL: // %bb.0: // %entry
; GISEL-NEXT: sub sp, sp, #176
; GISEL-NEXT: stp x1, x2, [sp, #120]
; GISEL-NEXT: stp x3, x4, [sp, #136]
diff --git a/llvm/test/CodeGen/AArch64/alloca.ll b/llvm/test/CodeGen/AArch64/alloca.ll
index ca3a500d79f32..76b50d62ab600 100644
--- a/llvm/test/CodeGen/AArch64/alloca.ll
+++ b/llvm/test/CodeGen/AArch64/alloca.ll
@@ -82,21 +82,22 @@ define void @test_variadic_alloca(i64 %n, ...) {
; [...]
; CHECK-DAG: stp q0, q1, [x29, #-192]
-; CHECK-DAG: stp x5, x6, [x29, #-24]
+; CHECK-DAG: stp x5, x6, [x29, #-32]
; [...]
-; CHECK-DAG: stp x1, x2, [x29, #-56]
+; CHECK-DAG: stp x1, x2, [x29, #-64]
; CHECK-NOFP-ARM64: stp x29, x30, [sp, #-16]!
; CHECK-NOFP-ARM64: mov x29, sp
; CHECK-NOFP-ARM64: sub sp, sp, #64
-; CHECK-NOFP-ARM64-DAG: stp x5, x6, [x29, #-24]
+; CHECK-NOFP-ARM64-DAG: stp x5, x6, [x29, #-32]
; [...]
-; CHECK-NOFP-ARM64-DAG: stp x3, x4, [x29, #-40]
+; CHECK-NOFP-ARM64-DAG: stp x3, x4, [x29, #-48]
; [...]
-; CHECK-NOFP-ARM64-DAG: stp x1, x2, [x29, #-56]
+; CHECK-NOFP-ARM64-DAG: stp x1, x2, [x29, #-64]
; [...]
; CHECK-NOFP-ARM64: mov x8, sp
-
+ %valist = alloca i8
+ call void @llvm.va_start(ptr %valist)
%addr = alloca i8, i64 %n
call void @use_addr(ptr %addr)
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll b/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
index 1ed6a273338ab..9e2633f3f63ea 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
@@ -24,10 +24,6 @@ define void @has_varargs(...) hybrid_patchable nounwind {
; CHECK-NEXT: .p2align 2
; CHECK-NEXT: "#has_varargs$hp_target": // @"#has_varargs$hp_target"
; CHECK-NEXT: // %bb.0:
-; CHECK-NEXT: sub sp, sp, #32
-; CHECK-NEXT: stp x0, x1, [x4, #-32]
-; CHECK-NEXT: stp x2, x3, [x4, #-16]
-; CHECK-NEXT: add sp, sp, #32
; CHECK-NEXT: ret
ret void
}
diff --git a/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll b/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
index 5d3c755d0d73d..da176894c48a9 100644
--- a/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
+++ b/llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
@@ -8,8 +8,8 @@
; CHECK-LABEL: _tailTest:
; CHECK: b __ZN1C3addEPKcz
; CHECK-LABEL: __ZThn8_N1C1fEiiiiiiiiiz:
-; CHECK: ldr w9, [sp, #4]
-; CHECK: str w9, [sp, #4]
+; CHECK: ldr w8, [sp, #4]
+; CHECK: str w8, [sp, #4]
; CHECK: b __ZN1C1fEiiiiiiiiiz
%class.C = type { %class.A.base, [4 x i8], %class.B.base, [4 x i8] }
diff --git a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll b/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
index 812837639196e..6abe33ac6816e 100644
--- a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
+++ b/llvm/test/CodeGen/AArch64/vararg-tallcall.ll
@@ -14,6 +14,8 @@ $"??_9B@@$BA@AA" = comdat any
; Function Attrs: noinline optnone
define linkonce_odr void @"??_9B@@$BA@AA"(ptr %this, ...) #1 comdat align 2 {
entry:
+ %valist = alloca i8
+ call void @llvm.va_start.p0(ptr %valist)
%this.addr = alloca ptr, align 8
store ptr %this, ptr %this.addr, align 8
%this1 = load ptr, ptr %this.addr, align 8
@@ -37,6 +39,6 @@ attributes #1 = { noinline optnone "thunk" }
; CHECK-EC: ldr x9, [x0]
; CHECK-EC: ldr x11, [x9]
; CHECK-EC: mov v0.16b, v7.16b
-; CHECK-EC: add x4, sp, #64
-; CHECK-EC: add sp, sp, #64
+; CHECK-EC: add x4, sp, #80
+; CHECK-EC: add sp, sp, #80
; CHECK-EC: br x11
diff --git a/llvm/test/CodeGen/AArch64/win64_vararg2.ll b/llvm/test/CodeGen/AArch64/win64_vararg2.ll
index dff49148fb772..27d7d4786e4df 100644
--- a/llvm/test/CodeGen/AArch64/win64_vararg2.ll
+++ b/llvm/test/CodeGen/AArch64/win64_vararg2.ll
@@ -7,17 +7,14 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) {
; CHECK-LABEL: va_func:
; CHECK: .seh_proc va_func
; CHECK-NEXT: // %bb.0:
-; CHECK-NEXT: sub sp, sp, #80
-; CHECK-NEXT: .seh_stackalloc 80
+; CHECK-NEXT: sub sp, sp, #32
+; CHECK-NEXT: .seh_stackalloc 32
; CHECK-NEXT: str x19, [sp, #16] // 8-byte Folded Spill
; CHECK-NEXT: .seh_save_reg x19, 16
; CHECK-NEXT: str x30, [sp, #24] // 8-byte Folded Spill
; CHECK-NEXT: .seh_save_reg x30, 24
; CHECK-NEXT: .seh_endprologue
; CHECK-NEXT: mov w19, w0
-; CHECK-NEXT: stp x3, x4, [sp, #40]
-; CHECK-NEXT: stp x5, x6, [sp, #56]
-; CHECK-NEXT: str x7, [sp, #72]
; CHECK-NEXT: str w0, [sp, #12]
; CHECK-NEXT: strb w1, [sp, #11]
; CHECK-NEXT: strb w2, [sp, #10]
@@ -29,8 +26,8 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) {
; CHECK-NEXT: .seh_save_reg x30, 24
; CHECK-NEXT: ldr x19, [sp, #16] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x19, 16
-; CHECK-NEXT: add sp, sp, #80
-; CHECK-NEXT: .seh_stackalloc 80
+; CHECK-NEXT: add sp, sp, #32
+; CHECK-NEXT: .seh_stackalloc 32
; CHECK-NEXT: .seh_endepilogue
; CHECK-NEXT: ret
; CHECK-NEXT: .seh_endfunclet
|
gentle ping |
@@ -7,17 +7,14 @@ define i1 @va_func(i32 %a, i8 %b, i8 %c, ...) { | |||
; CHECK-LABEL: va_func: | |||
; CHECK: .seh_proc va_func | |||
; CHECK-NEXT: // %bb.0: | |||
; CHECK-NEXT: sub sp, sp, #80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another test which is trying to check vararg behaviour, so should have a va_start
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looks like GISel does not support vastart for win calling convention
bool AArch64InstructionSelector::selectVaStartAAPCS(
MachineInstr &I, MachineFunction &MF, MachineRegisterInfo &MRI) const {
if (STI.isCallingConvWin64(MF.getFunction().getCallingConv(),
MF.getFunction().isVarArg()))
return false;
@@ -8189,7 +8189,7 @@ SDValue AArch64TargetLowering::LowerFormalArguments( | |||
} | |||
|
|||
// varargs | |||
if (isVarArg) { | |||
if (isVarArg && DAG.getMachineFunction().getFrameInfo().hasVAStart()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this path is already handled in globalisel that should also be updated to match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't think I get what you mean, sorry.
IIUC, GlobalIsel is quite new approach for generating machine code, which is supposed to be a successor for SelectionDAG. It can be turned on using --global-isel=1
.
So, I didn't find any hasVAStart()
references in GISel code and tried simple examples to see if such optimization exists.
After some tests I've observed that llc actually crashes on vaargs, when GISel is turned on. Here is godbolt link for x86 https://godbolt.org/z/q66feEb65. And locally (after addressing @ostannard review) I got similar failure on CodeGen/AArch64/win64_vararg2.ll
test.
+ /home/paskripkin/Documents/git/llvm-project/build/bin/llc -global-isel -mtriple=aarch64-pc-win32
+ /home/paskripkin/Documents/git/llvm-project/build/bin/FileCheck /home/paskripkin/Documents/git/llvm-project/llvm/test/CodeGen/AArch64/win64_vararg2.ll --check-prefix=GISEL
LLVM ERROR: cannot select: G_VASTART %17:gpr(p0) :: (store (s64) into %ir.valist) (in function: va_func)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/paskripkin/Documents/git/llvm-project/build/bin/llc -global-isel -mtriple=aarch64-pc-win32
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'InstructionSelect' on function '@va_func'
Is it known? If not, I will try to look deeper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it works with -global-isel-abort=0
, you're fine; you don't need to implement G_VASTART on any target where it isn't already implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added -global-isel-abort=0
to failing test.
gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks for the review! should I wait for second approval, or I can merge it? |
We only require one approval, and I think you've addressed the other reviewers comments, so you can merge this now. |
We're hitting a large amount of test failures on Windows arm64 after this change. +cc @rnk for windows and calling convention expertise. I don't really understand what "Don't save vaargs registers if vaargs are unused" means, but maybe Windows is different somehow? For example, in the changed file I see several instances of I'll prepare a revert of this for now. |
Linaro's 2 stage Windows arm64 build also started failing on a build including this: https://lab.llvm.org/buildbot/#/builders/124/builds/490 (I made a mistake analysing the results, that's why we're delayed so much) We are investigating on our side. (FYI @luporl) |
Sorry for breakage. I am not Windows ABI expert, so I am fine with revert, since it's just an optimization for very odd corner case. |
I actually don't know the win64 arm vararg convention, but historically Microsoft has taken an approach much more similar to Apple, where variadic arguments all get pass in memory on the stack, for simplicity, if that helps explain the failures. |
On Windows/arm64, normally floats are passed in vector registers and ints in GPRs, but for vararg functions, both floats and ints are passed in GPRs. Normally, up to 8 arguments are passed in registers. For vararg functions, the prologue dumps all non-fixed argument registers on the stack, so that it lines up with the following arguments on the stack. (E.g. for a function with 3 fixed parameters, passed in x0, x1, x2, the function dumps x3, x4, ... x7 on the stack, so that varargs can iterate over them plus any following arguments on the stack in one sequence.) I presume the purpose of this patch is to skip the couple of instructions for dumping these registers on the stack, if nobody actually is going to use the varargs (no |
Is there any way to reproduce these results locally? At least to understand what got broken and try to fix it (I guess, i will need to find windows vm...) |
So i decided to simply drop win64 from the patch. See #131459 |
I don't have a great setup for debugging win/arm64 right now. Maybe the Linaro folks have a better idea of what's happening. I think I could share the binary that's failing though, in case that's useful. |
If vaargs are not used there is no need to save them. LLVM already implements such optimization for x86, as well as gcc [1].
Some ABI tests are kept almost as-is, except for stack offsets, by just adding llvm.va_start. Only laapcs_vararg_frame.ll test was rewritten to match new behavior.
[1] https://godbolt.org/z/GWWKr8xMd