Skip to content

[Runtime] Add register-specific entrypoints for retain/release calls on ARM64. #62103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 28, 2022

Conversation

mikeash
Copy link
Contributor

@mikeash mikeash commented Nov 14, 2022

This will allow emitted code to condense a sequence like:

mov x0, x21
bl swift_retain

To:

bl swift_retain_x21

Saving code size.

rdar://98884198


namespace swift {

#if __arm64__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add in defined(_M_ARM64).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing that, we should presumably also add in x18 (I'm assuming it isn't reserved on Windows?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also begs the question of whether we're doing this on Windows at all. If we are, we should check how Windows uses the registers — the Windows ABI is very likely different to the System V ABI that gets used elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly (at least to me), it looks like Windows is the same in this respect. x18 is reserved (here, it's apparently used to point to data for the current thread), and x16/x17 are scratch registers for procedure calls. https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) Well that simplifies things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to have everything on the same page for once. I added || defined(_M_ARM64) so this will be available on Windows as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, LP64 vs LLP64 will still be there to defeat us @mikeash :-p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a relief.

#define NUM_REGS 30

// Apply `macro` to "all" registers. Skip x18 since it's reserved, and x30 since
// it's the link register.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about other -ffixed-x?? and any thing like x29 being reserved for FP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters so much for ALL_REGS as it does for FUNCTION_REGS; I think even the ones we skip would be OK in the former. The latter has to be a little more careful. We don't need to worry about -ffixed-x??, I don't think, because this is just for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is just for testing, so unless someone is going to build the tests that way, it's fine.

For the actual runtime entrypoints, any entrypoint corresponding to a reserved register wouldn't be called, so it would be harmless other than a bit of wasted code space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, given that this is about code size, that would be kinda nice to handle. I don't see a good way to do this though as there is no macro that we have to check that and I am loathe to add a user controlled define for this. I suppose that if you are truly concerned about space, LTO is a thing, and at that point the DCE should kick in if you are statically linking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds about right. If somebody really needs to squeeze every byte out of a dynamically-linked runtime with extra reserved registers, we can figure it out then.

// Emit declarations for variables called xN stored in xN, initialized with
// regs[N].
#define PASS_REGS_HELPER(num) \
register void *x ## num asm ("x" #num) = regs[num];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register is deprecated, we should avoid that. Do we want to use uintpt_t instead of void *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't use asm("<register>") on a variable without also using register.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the type, I don't think it matters whether you use void * or uintptr_t or, actually, uint64_t here, but void * is convenient because it lets us declare regs on the Swift side as [UnsafeMutableRawPointer?], which then also lets us put object references into it without having to do an unsafeBitCast().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I do slightly worry that @compnerd might have a point here and that we should contemplate what happens if someone makes register ... asm("<register>") not work in the future. But in that case, the test will fail, so we'll notice that it's happened and can rewrite this code.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we'll notice. Worst case we can rewrite this whole thing in inline assembly instead of convincing the compiler to load the registers for us.

static inline void call_##func##_x##reg(void **regs) { \
PASS_REGS \
asm("bl _" #func "_x" #reg : : REG_INPUTS "i"(0)); \
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can ensure that this doesn't setup a frame? This is also a guaranteed tail call right? Can we attribute that somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's important, given that this is just for a test. I suppose if you wanted to avoid using the register ... asm("<reg>") annotation, you could write this bit entirely in assembly language, in which case it could do a tail call. It couldn't be inline then though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually care about whether this sets up a frame or does a tail call. We will overwrite the frame pointer, but that's OK. Otherwise, we just need it to call the requested runtime function with the registers set to the appropriate values. Whether it does that as a tail or standard call, and with or without a frame, is fine.

Copy link
Contributor

@al45tair al45tair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; is this being enabled on Windows? If so, we should check the Windows ABI to see which registers we should support there.

…on ARM64.

This will allow emitted code to condense a sequence like:

    mov x0, x21
    bl swift_retain

To:

    bl swift_retain_x21

Saving code size.

rdar://98884198
@compnerd
Copy link
Member

Sadly we do not have the ARM64 builds setup on CI, but the official builds do build that (both toolchain and runtiem).

Copy link
Member

@compnerd compnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I'm still a bit on the fence - I wish there was a nicer way to do this, but I don't really see one. Most of the immediate concerns can be mostly ignored it seems.

@mikeash
Copy link
Contributor Author

mikeash commented Nov 18, 2022

@swift-ci please test

2 similar comments
@mikeash
Copy link
Contributor Author

mikeash commented Nov 28, 2022

@swift-ci please test

@mikeash
Copy link
Contributor Author

mikeash commented Nov 28, 2022

@swift-ci please test

@mikeash mikeash merged commit f3feea5 into swiftlang:main Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants