-
Notifications
You must be signed in to change notification settings - Fork 292
Add AVX 512f gather instructions #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @gnzlbg (rust_highfive has picked a reviewer for you, use r? to override) |
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vpgatherdq))] | ||
pub unsafe fn _mm512_i32gather_epi64(offsets: __m256i, slice: *const u8, scale: i32) -> __m512i { | ||
let zero = _mm512_setzero_si512().as_i64x8(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use _mm512_undefined
here instead to match what Clang is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually it seems that Clang defines _mm512_undefined
as zero-initialization, so it doesn't matter either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I see it defined as a particular builtin, but _mm512_setzero
is explicitly defined as zero initialization. I'm not sure of the behavior of __builtin_ia32_undef512
, however.
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L189
https://github.com/llvm/llvm-project/blob/a3dc9490004ce1601fb1bc67cf218b86a6fdf652/clang/include/clang/Basic/BuiltinsX86.def#L40
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L259
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L253
LLVM should be able to optimize away the dead store, but I'm happy to change the code regardless. I'm not quite sure how/if I can implement _mm512_undefined
since my reading of the std::mem::MaybeUninit
is that I couldn't create an unitialized __m512i
without inviting UB. Assuming the calling convention allows it, I should be able to create a MaybeUninit<__m512i>
and pass that to vpgatherdq
.
Do you have any suggestions about the CI failure? I see the following assembler repeated over and over: kxnorw %k0,%k0,%k1
vpxor %xmm1,%xmm1,%xmm1
vpgatherdq (%eax,%ymm0,2),%zmm1{%k1}
vmovdqa64 %zmm1,%zmm0
ret I can't tell if I'm doing something wrong, hit a compiler bug, or if |
You need to provide a value for |
I've finished all 12 AVX512f gather intrinsics and the intrinsics needed to test them. Unfortunately, the |
@@ -74,6 +74,7 @@ features! { | |||
/// * `"avx512bitalg"` | |||
/// * `"avx512bf16"` | |||
/// * `"avx512vp2intersect"` | |||
/// * `"knc"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line if you're not adding knc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
crates/core_arch/src/x86/avx512f.rs
Outdated
vgatherdpd(zero, slice, offsets, neg_one, $imm8) | ||
}; | ||
} | ||
let r = constify_imm8!(scale, call); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the intrinsic documentation, only 1, 2, 4, 8 are valid values for the scale. You should use a custom constify
macro that handles this.
Also you should use #[rustc_args_required_const(<arg index of scale>)]
since scale
is required to be a compile-time constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The new macro panics if it gets a value outside of 1, 2, 4, 8. Arguably, it should catch errors at compile time, but the one thing I tried std::compile_error
would not compile at all. If you have any suggestions, I'm more than happy to fix this.
I've added the new macro to src/x86/macros.rs
because we should probably change the other existing gather intrinsics to use the new macro as well. That would be a backwards incompatible change, but it would only affect broken code. I don't know what the official policy is in cases like this. I'm happy to make the change in a separate PR if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on the macro fix later this week.
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vpgatherdq))] | ||
pub unsafe fn _mm512_i32gather_epi64(offsets: __m256i, slice: *const u8, scale: i32) -> __m512i { | ||
let zero = _mm512_setzero_si512().as_i64x8(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I see it defined as a particular builtin, but _mm512_setzero
is explicitly defined as zero initialization. I'm not sure of the behavior of __builtin_ia32_undef512
, however.
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L189
https://github.com/llvm/llvm-project/blob/a3dc9490004ce1601fb1bc67cf218b86a6fdf652/clang/include/clang/Basic/BuiltinsX86.def#L40
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L259
https://github.com/llvm/llvm-project/blob/1b02db52b79e01f038775f59193a49850a34184d/clang/lib/Headers/avx512fintrin.h#L253
LLVM should be able to optimize away the dead store, but I'm happy to change the code regardless. I'm not quite sure how/if I can implement _mm512_undefined
since my reading of the std::mem::MaybeUninit
is that I couldn't create an unitialized __m512i
without inviting UB. Assuming the calling convention allows it, I should be able to create a MaybeUninit<__m512i>
and pass that to vpgatherdq
.
@@ -74,6 +74,7 @@ features! { | |||
/// * `"avx512bitalg"` | |||
/// * `"avx512bf16"` | |||
/// * `"avx512vp2intersect"` | |||
/// * `"knc"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
|
Interesting, thanks! The discussion at llvm.org/PR32176 was also very informative. GitHub search only does exact match (by default at least), so my search for |
Co-authored-by: bjorn3 <[email protected]>
This should be ready for review. |
You missing these 4 gather intrinsics that are part of AVX512F:
|
Added those four intrinsics and some of the helpers needed. Even though (as you pointed out elsewhere), we can use the AVX512F/KNCNI intrinsics with just AVX512F CPUs, I didn't do the TODO to make the |
Closing since #866 also contains these changes. |
Adds the gather intrinsics for the AVX 512f instruction set.