-
Notifications
You must be signed in to change notification settings - Fork 292
Add shift-and-insert Arm intrinsics. #936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shift-and-insert Arm intrinsics. #936
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
"vsli_n_s8", | ||
"vsliq_n_s8", | ||
"vsli_n_s16", | ||
"vsliq_n_s16", | ||
"vsli_n_s32", | ||
"vsliq_n_s32", | ||
"vsli_n_s64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to have to add these for the verify_all_signatures
tests to pass, I'm guessing because I've put the shift-and-insert tests in their own file (mimicking what's done for the table lookup tests, which also appear in this list of signatures to skip).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that they are not being detected because you are generating the tests within a macro.
#[rustc_args_required_const(2)] | ||
pub unsafe fn vsli_n_s8(a: int8x8_t, b: int8x8_t, n: i32) -> int8x8_t { | ||
if n < 0 || n > 7 { | ||
unreachable_unchecked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a panic here for out-of-bounds values. It's optimized away in practice since this function is going to be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do 👍
☔ The latest upstream changes (presumably dd981c0) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author |
Error: The feature Please let |
This PR adds Arm and Aarch64 Neon intrinsics for shift-left-and-insert and shift-right-and-insert.
The LLVM codegen for these instructions is quite different on Arm vs Aarch64. In the former case, both
vsli
andvsri
get transformed tollvm.arm.neon.vshiftins
, which takes three equal-length vector arguments, where the last argument contains the per-element shift (which can be negative, for a right shift). In the latter case,vsli
andvsri
are transformed tollvm.aarch64.neon.vsli
andllvm.aarch64.neon.vsri
respectively, which takes two vector inputs and a constant for the shift.