Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

jrmuizel · 2020-09-10T14:10:27Z

These intrinsics are implemented differently for aarch64 and arm
in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32
intrinsic. However, there didn't seem to be any advantage to using
that intrinsic instead of just sharing code.

rust-highfive · 2020-09-10T14:10:30Z

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

These intrinsics are implemented differently for aarch64 and arm in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32 intrinsic. However, there didn't seem to be any advantage to using that intrinsic instead of just sharing code.

Amanieu · 2020-09-10T15:30:19Z

Can you add tests for out-of-range values? For example passing a negative value to the unsigned conversion should produce a result of 0.

jrmuizel · 2020-09-10T17:36:48Z

It looks like the out of range values will throw a floating point exception: https://developer.arm.com/docs/ddi0596/h/shared-pseudocode-functions/shared-functionsfloat-pseudocode#impl-shared.FPToFixed.5

Amanieu · 2020-09-10T18:00:59Z

That just sets an exception bit in the FP status register and returns a NaN. I'm more worried about LLVM optimizing it to undef and returning an arbitrary value.

jrmuizel · 2020-09-11T02:41:21Z

It does seem like there's some badness going on with llvm converting to undef. This shows the difference between the aarch64 and ARM approaches:
https://gcc.godbolt.org/z/5Ws1P8

It feels like the ARM behaviour is not really what you'd when using this intrinsic.

Amanieu · 2020-09-11T05:47:53Z

I checked the behavior with GCC and with the ACLE spec and it seems that the AArch64 behavior is the correct one. It seems that this is a bug in Clang.

I don't think LLVM currently exposes an intrinsic for this on ARM, but you should definitely open a bug report for Clang on https://bugs.llvm.org/.

In the meantime for this PR you can just keep Clang's behavior but with a clear comment explaining that the behavior isn't 100% correct.

Amanieu · 2020-09-12T02:37:35Z

Can you report the Clang bug on bugs.llvm.org and add a link to the bug report in a comment?

jrmuizel · 2020-09-12T13:22:36Z

It looks like CI is broken because of a missing apple nightly

Amanieu · 2020-09-12T13:48:35Z

It's being worked on, should be fixed for tomorrow's nightlies.

Amanieu · 2020-09-13T02:49:20Z

CI is now fixed but there are errors.

jrmuizel · 2020-09-13T16:06:00Z

https://bugs.llvm.org/show_bug.cgi?id=47510

The ARM implementation uses fptoi that has undefined behaviour for out of range data. Clang has the same problem: https://llvm.org/PR47510

rust-highfive assigned Amanieu Sep 10, 2020

Add vcvtq_u32_f32 and vcvtq_s32_f32

259ea51

These intrinsics are implemented differently for aarch64 and arm in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32 intrinsic. However, there didn't seem to be any advantage to using that intrinsic instead of just sharing code.

jrmuizel force-pushed the vcvtq branch from e5e85f3 to 259ea51 Compare September 10, 2020 14:23

jrmuizel marked this pull request as ready for review September 10, 2020 15:01

jrmuizel force-pushed the vcvtq branch from 0f8f03e to 1ef9819 Compare September 10, 2020 21:14

Expand the test cases

8da5575

jrmuizel force-pushed the vcvtq branch from 1ef9819 to 8da5575 Compare September 10, 2020 21:54

jrmuizel force-pushed the vcvtq branch from 62da00d to 7744912 Compare September 12, 2020 02:34

jrmuizel force-pushed the vcvtq branch from 7744912 to 46aba9c Compare September 12, 2020 13:12

jrmuizel force-pushed the vcvtq branch 2 times, most recently from 1472efa to 08e4e5e Compare September 13, 2020 19:06

Split out a separate implementation for aarch64

2ccf731

The ARM implementation uses fptoi that has undefined behaviour for out of range data. Clang has the same problem: https://llvm.org/PR47510

jrmuizel force-pushed the vcvtq branch from 08e4e5e to 2ccf731 Compare September 13, 2020 19:41

Amanieu merged commit 016eff9 into rust-lang:master Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

Uh oh!

jrmuizel commented Sep 10, 2020

Uh oh!

rust-highfive commented Sep 10, 2020

Uh oh!

Amanieu commented Sep 10, 2020

Uh oh!

jrmuizel commented Sep 10, 2020

Uh oh!

Amanieu commented Sep 10, 2020

Uh oh!

jrmuizel commented Sep 11, 2020

Uh oh!

Amanieu commented Sep 11, 2020

Uh oh!

Amanieu commented Sep 12, 2020

Uh oh!

jrmuizel commented Sep 12, 2020

Uh oh!

Amanieu commented Sep 12, 2020

Uh oh!

Amanieu commented Sep 13, 2020

Uh oh!

jrmuizel commented Sep 13, 2020

Uh oh!

Uh oh!

Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

Add vcvtq_u32_f32 and vcvtq_s32_f32 #902

Uh oh!

Conversation

jrmuizel commented Sep 10, 2020

Uh oh!

rust-highfive commented Sep 10, 2020

Uh oh!

Amanieu commented Sep 10, 2020

Uh oh!

jrmuizel commented Sep 10, 2020

Uh oh!

Amanieu commented Sep 10, 2020

Uh oh!

jrmuizel commented Sep 11, 2020

Uh oh!

Amanieu commented Sep 11, 2020

Uh oh!

Amanieu commented Sep 12, 2020

Uh oh!

jrmuizel commented Sep 12, 2020

Uh oh!

Amanieu commented Sep 12, 2020

Uh oh!

Amanieu commented Sep 13, 2020

Uh oh!

jrmuizel commented Sep 13, 2020

Uh oh!

Uh oh!