-
Notifications
You must be signed in to change notification settings - Fork 292
Avx512 avx512vl #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx512 avx512vl #999
Conversation
r? @Amanieu (rust-highfive has picked a reviewer for you, use r? to override) |
crates/core_arch/src/x86/avx512f.rs
Outdated
}; | ||
} | ||
let r = constify_imm8_gather!(scale, call); | ||
transmute(simd_select_bitmask(mask, r.as_f32x8(), src.as_f32x8())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simd_select_bitmask
is incorrect in this case. We need to use the special LLVM intrinsic llvm.x86.avx512.mask.gather3siv4.sf
.
The difference here is that the LLVM intrinsic will only access memory addresses which are not masked out. However your version will access memory addresses that are masked out, which could cause a crash if a masked value is an invalid offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simd_select_bitmask
is incorrect in this case. We need to use the special LLVM intrinsicllvm.x86.avx512.mask.gather3siv4.sf
.The difference here is that the LLVM intrinsic will only access memory addresses which are not masked out. However your version will access memory addresses that are masked out, which could cause a crash if a masked value is an invalid offset.
True, Thanks for your remind.
llvm.x86.avx512.mask.gather3siv4.sf will requires i1, so I need to remove those mask gather functions
permute_ps,pd: mm256,mm; permutevar_ps,pd: mm256,mm
permutex_epi64,pd: mm256
permutexvar_epi32,epi64,ps,pd: mm256
permutex2var_epi32,epi64,ps,pd: mm256,mm
i32gather_epi32,epi64,ps,pd: mm256,mm
i64gather_epi32,epi64,ps,pd: mm256,mm