-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Compiler-rt] Add AArch64 routines for __arm_agnostic("sme_za_state") #120059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The specification of these routines can be found here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added two more minor comments, but otherwise this LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more recommendations but otherwise this looks good to me.
There's still some room for improvement not least regarding __arm_sme_state_size
where we go to pains to ensure the returned value is the minimum necessary.
I wonder if we need to be this strict because we could for example always return the current maximum when SME is available regardless of the state of the lazy save or SME2 availability. With that said, I've no evidence these routines have performance concerns so am happy to see how things play out.
stur wzr, [x0, #10] | ||
strh wzr, [x0, #14] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an alignment perspective it would be better as:
strh wzr, [x0, #10]
str wzr, [x0, #12]
// Store ZT0 | ||
str zt0, [x18] | ||
add x18, x18, #64 | ||
b 1f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is not necessary because we should fall through.
757a7f9
to
c0dc2a4
Compare
This broke our our chromium/android build: https://issues.chromium.org/issues/385758019
Is that expected? Are we holding something wrong? Or is that too little information to say something useful? |
I don't think it's expected (looks like the chrome build is clang trunk). It looks like the // Fails to build on clang trunk (+sme2 not set)
void foo(void) __arm_streaming_compatible {
asm(".arch armv9-a+sme2");
asm("smstart");
asm("ldr zt0, [sp]");
}
// (side note: It seems smstart does not require +sme to be assembled) See: https://godbolt.org/z/osdfGd6dn It looks like either of these would work: // Builds successfully on clang trunk.
__attribute__((target("arch=armv9-a+sme2")))
void foo(void) __arm_streaming_compatible {
asm("smstart");
asm("ldr zt0, [sp]");
}
// OR:
void foo2(void) __arm_streaming_compatible {
asm(".arch armv9-a+sme2\n"
"smstart\n"
"ldr zt0, [sp]\n");
} |
Is it a regression introduced by this change? If so, please revert it if it takes a while to fix it. |
The specification of these routines can be found here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines