Skip to content

Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 13, 2020
Merged

Conversation

jrmuizel
Copy link
Contributor

These intrinsics are implemented differently for aarch64 and arm
in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32
intrinsic. However, there didn't seem to be any advantage to using
that intrinsic instead of just sharing code.

@rust-highfive
Copy link

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

These intrinsics are implemented differently for aarch64 and arm
in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32
intrinsic. However, there didn't seem to be any advantage to using
that intrinsic instead of just sharing code.
@jrmuizel jrmuizel marked this pull request as ready for review September 10, 2020 15:01
@Amanieu
Copy link
Member

Amanieu commented Sep 10, 2020

Can you add tests for out-of-range values? For example passing a negative value to the unsigned conversion should produce a result of 0.

@jrmuizel
Copy link
Contributor Author

It looks like the out of range values will throw a floating point exception: https://developer.arm.com/docs/ddi0596/h/shared-pseudocode-functions/shared-functionsfloat-pseudocode#impl-shared.FPToFixed.5

@Amanieu
Copy link
Member

Amanieu commented Sep 10, 2020

That just sets an exception bit in the FP status register and returns a NaN. I'm more worried about LLVM optimizing it to undef and returning an arbitrary value.

@jrmuizel
Copy link
Contributor Author

It does seem like there's some badness going on with llvm converting to undef. This shows the difference between the aarch64 and ARM approaches:
https://gcc.godbolt.org/z/5Ws1P8

It feels like the ARM behaviour is not really what you'd when using this intrinsic.

@Amanieu
Copy link
Member

Amanieu commented Sep 11, 2020

I checked the behavior with GCC and with the ACLE spec and it seems that the AArch64 behavior is the correct one. It seems that this is a bug in Clang.

I don't think LLVM currently exposes an intrinsic for this on ARM, but you should definitely open a bug report for Clang on https://bugs.llvm.org/.

In the meantime for this PR you can just keep Clang's behavior but with a clear comment explaining that the behavior isn't 100% correct.

@Amanieu
Copy link
Member

Amanieu commented Sep 12, 2020

Can you report the Clang bug on bugs.llvm.org and add a link to the bug report in a comment?

@jrmuizel
Copy link
Contributor Author

It looks like CI is broken because of a missing apple nightly

@Amanieu
Copy link
Member

Amanieu commented Sep 12, 2020

It's being worked on, should be fixed for tomorrow's nightlies.

@Amanieu
Copy link
Member

Amanieu commented Sep 13, 2020

CI is now fixed but there are errors.

@jrmuizel
Copy link
Contributor Author

@jrmuizel jrmuizel force-pushed the vcvtq branch 2 times, most recently from 1472efa to 08e4e5e Compare September 13, 2020 19:06
The ARM implementation uses fptoi that has undefined
behaviour for out of range data. Clang has the same problem:
https://llvm.org/PR47510
@Amanieu Amanieu merged commit 016eff9 into rust-lang:master Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants