Add AVX 512f gather instructions #862

Daniel-B-Smith · 2020-05-30T22:44:45Z

Adds the gather intrinsics for the AVX 512f instruction set.

rust-highfive · 2020-05-30T22:44:48Z

(rust_highfive has picked a reviewer for you, use r? to override)

Amanieu · 2020-05-30T23:52:56Z

crates/core_arch/src/x86/avx512f.rs

+#[target_feature(enable = "avx512f")]
+#[cfg_attr(test, assert_instr(vpgatherdq))]
+pub unsafe fn _mm512_i32gather_epi64(offsets: __m256i, slice: *const u8, scale: i32) -> __m512i {
+    let zero = _mm512_setzero_si512().as_i64x8();


You should use _mm512_undefined here instead to match what Clang is doing.

Hmm actually it seems that Clang defines _mm512_undefined as zero-initialization, so it doesn't matter either way.

Are you sure? I see it defined as a particular builtin, but _mm512_setzero is explicitly defined as zero initialization. I'm not sure of the behavior of __builtin_ia32_undef512, however.

https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L189
https://github.com/llvm/llvm-project/blob/a3dc9490004ce1601fb1bc67cf218b86a6fdf652/clang/include/clang/Basic/BuiltinsX86.def#L40
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L259
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L253

LLVM should be able to optimize away the dead store, but I'm happy to change the code regardless. I'm not quite sure how/if I can implement _mm512_undefined since my reading of the std::mem::MaybeUninit is that I couldn't create an unitialized __m512i without inviting UB. Assuming the calling convention allows it, I should be able to create a MaybeUninit<__m512i> and pass that to vpgatherdq.

Daniel-B-Smith · 2020-05-31T00:27:54Z

Do you have any suggestions about the CI failure? I see the following assembler repeated over and over:

kxnorw %k0,%k0,%k1
vpxor %xmm1,%xmm1,%xmm1
vpgatherdq (%eax,%ymm0,2),%zmm1{%k1}
vmovdqa64 %zmm1,%zmm0
ret

I can't tell if I'm doing something wrong, hit a compiler bug, or if vpgatherdq just isn't supported on 32 bit systems.

Amanieu · 2020-05-31T00:36:03Z

You need to provide a value for imm8 in the assert_instr macro. This avoids expanding the entire const_imm8 for the disassembly. Search the code for other uses of assert_instr.

Daniel-B-Smith · 2020-05-31T18:26:47Z

I've finished all 12 AVX512f gather intrinsics and the intrinsics needed to test them. Unfortunately, the __m512d comparison assertion is not using a cmpeq intrinsic like all of the other assertions: https://github.com/rust-lang/stdarch/pull/862/files#diff-927c7e8bc826b00593557eb6928a092eR149 I did it that way because _mm512_cmpeq_pd_mask requires Knights Corner instructions. It seems that KNCNI is more complicated than just adding another feature detection, so I punted on adding that.

Amanieu · 2020-05-31T18:31:10Z

crates/std_detect/src/detect/arch/x86.rs

@@ -74,6 +74,7 @@ features! {
    /// * `"avx512bitalg"`
    /// * `"avx512bf16"`
    /// * `"avx512vp2intersect"`
+    /// * `"knc"`


Remove this line if you're not adding knc.

Oops. Fixed.

Amanieu · 2020-05-31T18:33:23Z

crates/core_arch/src/x86/avx512f.rs

+            vgatherdpd(zero, slice, offsets, neg_one, $imm8)
+        };
+    }
+    let r = constify_imm8!(scale, call);


According to the intrinsic documentation, only 1, 2, 4, 8 are valid values for the scale. You should use a custom constify macro that handles this.

Also you should use #[rustc_args_required_const(<arg index of scale>)] since scale is required to be a compile-time constant.

Done.

The new macro panics if it gets a value outside of 1, 2, 4, 8. Arguably, it should catch errors at compile time, but the one thing I tried std::compile_error would not compile at all. If you have any suggestions, I'm more than happy to fix this.

I've added the new macro to src/x86/macros.rs because we should probably change the other existing gather intrinsics to use the new macro as well. That would be a backwards incompatible change, but it would only affect broken code. I don't know what the official policy is in cases like this. I'm happy to make the change in a separate PR if you would like.

Daniel-B-Smith

I'll work on the macro fix later this week.

Daniel-B-Smith · 2020-06-01T02:04:13Z

crates/core_arch/src/x86/avx512f.rs

+#[target_feature(enable = "avx512f")]
+#[cfg_attr(test, assert_instr(vpgatherdq))]
+pub unsafe fn _mm512_i32gather_epi64(offsets: __m256i, slice: *const u8, scale: i32) -> __m512i {
+    let zero = _mm512_setzero_si512().as_i64x8();


Are you sure? I see it defined as a particular builtin, but _mm512_setzero is explicitly defined as zero initialization. I'm not sure of the behavior of __builtin_ia32_undef512, however.

https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L189
https://github.com/llvm/llvm-project/blob/a3dc9490004ce1601fb1bc67cf218b86a6fdf652/clang/include/clang/Basic/BuiltinsX86.def#L40
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L259
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L253

LLVM should be able to optimize away the dead store, but I'm happy to change the code regardless. I'm not quite sure how/if I can implement _mm512_undefined since my reading of the std::mem::MaybeUninit is that I couldn't create an unitialized __m512i without inviting UB. Assuming the calling convention allows it, I should be able to create a MaybeUninit<__m512i> and pass that to vpgatherdq.

Daniel-B-Smith · 2020-06-01T02:06:11Z

crates/std_detect/src/detect/arch/x86.rs

@@ -74,6 +74,7 @@ features! {
    /// * `"avx512bitalg"`
    /// * `"avx512bf16"`
    /// * `"avx512vp2intersect"`
+    /// * `"knc"`


Oops. Fixed.

Amanieu · 2020-06-01T02:10:33Z

undef512 is defined here. As you can see clang defines it to zero-initialize.

Daniel-B-Smith · 2020-06-01T14:48:21Z

Interesting, thanks! The discussion at llvm.org/PR32176 was also very informative. GitHub search only does exact match (by default at least), so my search for __builtin_ia32_undef512 didn't bring up that line.

…avx-512-cmp

crates/core_arch/src/x86/macros.rs

Co-authored-by: bjorn3 <[email protected]>

Daniel-B-Smith · 2020-06-07T17:05:06Z

This should be ready for review.

Amanieu · 2020-06-09T13:54:46Z

You missing these 4 gather intrinsics that are part of AVX512F:

_mm512_i32gather_ps
_mm512_mask_i32gather_ps
_mm512_i32gather_epi32
_mm512_mask_i32gather_epi32

Daniel-B-Smith · 2020-06-13T19:12:25Z

Added those four intrinsics and some of the helpers needed. Even though (as you pointed out elsewhere), we can use the AVX512F/KNCNI intrinsics with just AVX512F CPUs, I didn't do the TODO to make the assert_eq_m512d more correct. After I finish the three open PRs, I will implement the floating point comparison intrinsics and fix that TODO.

Daniel-B-Smith · 2020-06-14T15:02:04Z

Closing since #866 also contains these changes.

Daniel Smith added 3 commits May 30, 2020 19:52

Add 64 bit AVX512f le and ge comparisons

37a37e2

Checkpointing first gather implementation

3f88738

Fix interface to be consistent

cf3e316

rust-highfive assigned gnzlbg May 30, 2020

Amanieu reviewed May 30, 2020

View reviewed changes

Daniel Smith added 9 commits May 30, 2020 22:12

Merge remote-tracking branch 'upstream/master' into avx-512-cmp

72959dd

Fix instruction assert

01102d7

Add _mm512_mask_i32gather_epi64

79dee01

Add pd gather intrinsics

0d3a19b

Add 64 bit index variants

f244d2e

Add 32 bit output gather intrinsics

9b90883

Fix comments

0238065

Fix comparison comments

d7e2afa

s/unsigned/signed/ for epi64

dcf5d47

Amanieu reviewed May 31, 2020

View reviewed changes

Daniel Smith added 3 commits May 31, 2020 18:52

Add neq integer comparisons

d9d0fc9

Remove feature that wasn't added

9a1200d

Merge branch 'master' into moar-avx512f-cmp

ed9bbe4

Daniel-B-Smith commented Jun 1, 2020

View reviewed changes

Daniel Smith added 2 commits June 6, 2020 16:07

Constanting the arguments

f70f643

Merge branch 'avx-512-cmp' of github.com:Daniel-B-Smith/stdarch into …

e29e2ba

…avx-512-cmp

bjorn3 reviewed Jun 6, 2020

View reviewed changes

crates/core_arch/src/x86/macros.rs Outdated Show resolved Hide resolved

Daniel Smith and others added 2 commits June 6, 2020 12:15

Fix comment

c5cec2d

Co-authored-by: bjorn3 <[email protected]>

Make instruction check less specific for CI

f775ef1

Daniel Smith added 4 commits June 6, 2020 19:01

Add comparison operator integer comparisons

2957e2e

Fix comments

7538c0f

Allow non camel case types

33a4dd5

Add cmplt_ep(i|u)32

a74886b

Daniel-B-Smith mentioned this pull request Jun 7, 2020

Add AVX 512f gather, scatter and compare intrinsics #866

Merged

Daniel Smith added 7 commits June 13, 2020 16:45

Allow AVX512f or KNC intrinsics to be gated by avx512f

e8cfdb8

Add remaining 32bit integer comparisons

690a03c

Merge branch 'moar-avx512f-cmp' into avx-512-cmp

45aa0bd

Merge remote-tracking branch 'upstream/master' into moar-avx512f-cmp

475c51d

Fix verify test with updated XML

832166a

Merge branch 'moar-avx512f-cmp' into avx-512-cmp

1c81797

Add remaining gather intrinsics

c761d6f

Daniel-B-Smith closed this Jun 14, 2020

Daniel-B-Smith deleted the avx-512-cmp branch June 14, 2020 15:02

Add AVX 512f gather instructions #862

Add AVX 512f gather instructions #862

Uh oh!

Conversation

Daniel-B-Smith commented May 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented May 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-B-Smith commented May 31, 2020

Uh oh!

Amanieu commented May 31, 2020

Uh oh!

Daniel-B-Smith commented May 31, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-B-Smith left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Amanieu commented Jun 1, 2020

Uh oh!

Daniel-B-Smith commented Jun 1, 2020

Uh oh!

Uh oh!

Daniel-B-Smith commented Jun 7, 2020

Uh oh!

Amanieu commented Jun 9, 2020

Uh oh!

Daniel-B-Smith commented Jun 13, 2020

Uh oh!

Daniel-B-Smith commented Jun 14, 2020

Uh oh!

Uh oh!

Daniel-B-Smith commented May 30, 2020 •

edited

Loading