Skip to content

Avx512 avx512vl #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 10, 2021
Merged

Avx512 avx512vl #999

merged 10 commits into from
Feb 10, 2021

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Feb 9, 2021

permute_ps,pd: mm256,mm; permutevar_ps,pd: mm256,mm
permutex_epi64,pd: mm256
permutexvar_epi32,epi64,ps,pd: mm256
permutex2var_epi32,epi64,ps,pd: mm256,mm
i32gather_epi32,epi64,ps,pd: mm256,mm
i64gather_epi32,epi64,ps,pd: mm256,mm

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

};
}
let r = constify_imm8_gather!(scale, call);
transmute(simd_select_bitmask(mask, r.as_f32x8(), src.as_f32x8()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_select_bitmask is incorrect in this case. We need to use the special LLVM intrinsic llvm.x86.avx512.mask.gather3siv4.sf.

The difference here is that the LLVM intrinsic will only access memory addresses which are not masked out. However your version will access memory addresses that are masked out, which could cause a crash if a masked value is an invalid offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_select_bitmask is incorrect in this case. We need to use the special LLVM intrinsic llvm.x86.avx512.mask.gather3siv4.sf.

The difference here is that the LLVM intrinsic will only access memory addresses which are not masked out. However your version will access memory addresses that are masked out, which could cause a crash if a masked value is an invalid offset.

True, Thanks for your remind.
llvm.x86.avx512.mask.gather3siv4.sf will requires i1, so I need to remove those mask gather functions

@Amanieu Amanieu merged commit 11fd33d into rust-lang:master Feb 10, 2021
@minybot minybot deleted the avx512_avx512vl branch February 10, 2021 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants