-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Runtime] Add register-specific entrypoints for retain/release calls on ARM64. #62103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include/swift/Runtime/CustomRRABI.h
Outdated
|
||
namespace swift { | ||
|
||
#if __arm64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add in defined(_M_ARM64)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're doing that, we should presumably also add in x18
(I'm assuming it isn't reserved on Windows?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also begs the question of whether we're doing this on Windows at all. If we are, we should check how Windows uses the registers — the Windows ABI is very likely different to the System V ABI that gets used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly (at least to me), it looks like Windows is the same in this respect. x18 is reserved (here, it's apparently used to point to data for the current thread), and x16/x17 are scratch registers for procedure calls. https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) Well that simplifies things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to have everything on the same page for once. I added || defined(_M_ARM64)
so this will be available on Windows as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry, LP64 vs LLP64 will still be there to defeat us @mikeash :-p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a relief.
#define NUM_REGS 30 | ||
|
||
// Apply `macro` to "all" registers. Skip x18 since it's reserved, and x30 since | ||
// it's the link register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about other -ffixed-x??
and any thing like x29 being reserved for FP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters so much for ALL_REGS
as it does for FUNCTION_REGS
; I think even the ones we skip would be OK in the former. The latter has to be a little more careful. We don't need to worry about -ffixed-x??
, I don't think, because this is just for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is just for testing, so unless someone is going to build the tests that way, it's fine.
For the actual runtime entrypoints, any entrypoint corresponding to a reserved register wouldn't be called, so it would be harmless other than a bit of wasted code space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, given that this is about code size, that would be kinda nice to handle. I don't see a good way to do this though as there is no macro that we have to check that and I am loathe to add a user controlled define for this. I suppose that if you are truly concerned about space, LTO is a thing, and at that point the DCE should kick in if you are statically linking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds about right. If somebody really needs to squeeze every byte out of a dynamically-linked runtime with extra reserved registers, we can figure it out then.
// Emit declarations for variables called xN stored in xN, initialized with | ||
// regs[N]. | ||
#define PASS_REGS_HELPER(num) \ | ||
register void *x ## num asm ("x" #num) = regs[num]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register
is deprecated, we should avoid that. Do we want to use uintpt_t
instead of void *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't use asm("<register>")
on a variable without also using register
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the type, I don't think it matters whether you use void *
or uintptr_t
or, actually, uint64_t
here, but void *
is convenient because it lets us declare regs
on the Swift side as [UnsafeMutableRawPointer?]
, which then also lets us put object references into it without having to do an unsafeBitCast()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I do slightly worry that @compnerd might have a point here and that we should contemplate what happens if someone makes register ... asm("<register>")
not work in the future. But in that case, the test will fail, so we'll notice that it's happened and can rewrite this code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we'll notice. Worst case we can rewrite this whole thing in inline assembly instead of convincing the compiler to load the registers for us.
static inline void call_##func##_x##reg(void **regs) { \ | ||
PASS_REGS \ | ||
asm("bl _" #func "_x" #reg : : REG_INPUTS "i"(0)); \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can ensure that this doesn't setup a frame? This is also a guaranteed tail call right? Can we attribute that somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's important, given that this is just for a test. I suppose if you wanted to avoid using the register ... asm("<reg>")
annotation, you could write this bit entirely in assembly language, in which case it could do a tail call. It couldn't be inline
then though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually care about whether this sets up a frame or does a tail call. We will overwrite the frame pointer, but that's OK. Otherwise, we just need it to call the requested runtime function with the registers set to the appropriate values. Whether it does that as a tail or standard call, and with or without a frame, is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; is this being enabled on Windows? If so, we should check the Windows ABI to see which registers we should support there.
…on ARM64. This will allow emitted code to condense a sequence like: mov x0, x21 bl swift_retain To: bl swift_retain_x21 Saving code size. rdar://98884198
2f89033
to
b677b56
Compare
Sadly we do not have the ARM64 builds setup on CI, but the official builds do build that (both toolchain and runtiem). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I'm still a bit on the fence - I wish there was a nicer way to do this, but I don't really see one. Most of the immediate concerns can be mostly ignored it seems.
@swift-ci please test |
This will allow emitted code to condense a sequence like:
To:
Saving code size.
rdar://98884198