-
Notifications
You must be signed in to change notification settings - Fork 292
Avx512f #933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx512f #933
Conversation
merge from base
merge base
… cvtss_i32, cvt_roundss_u32, cvt_roundss_u64, cvtss_i64, cvtss_u32, cvtss_u64
… cvtsd_i32, cvt_roundsd_u32, cvt_roundsd_u64, cvtsd_i64, cvtsd_u32, cvtsd_u64
… cvt_roundsi64_sd, cvt_roundi64_sd, cvt_roundu32_ss, cvt_roundu64_ss, cvt_roundu64_sd
…i64, cvttsd_i32, cvtt_roundsd_u32, cvtt_roundsd_u64, cvttsd_i64, cvttsd_u32, cvttsd_u64, cvtt_roundss_si32, cvtt_roundss_i32, cvtt_roundss_si64, cvtt_roundss_i64, cvttss_i32, cvtt_roundss_u32, cvtt_roundss_u64, cvttss_i64, cvttss_u32, cvttss_u64;
r? @Amanieu (rust_highfive has picked a reviewer for you, use r? to override) |
It seems "mm_cvt_roundss_si64()" ..., i64,si64 cause i686-unknown-linux-gnu test error. |
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=mm_mask_fmsub_ss&expand=2668) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vfmadd213ss))] //should be vfmsub213ss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong: it should be generating the vfmsub213ss
instruction. Maybe you need to invoke the subtract intrinsic directly instead of negating extractc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow LLVM code, it uses llvm.fma.f32. I think vfmadd213ss and vfmsub213ss have the same latency, and vfmaddss comes first when generating.
__m128 test_mm_mask3_fmsub_ss(__m128 __W, __m128 __X, __m128 __Y, __mmask8 __U){
// CHECK-LABEL: @test_mm_mask3_fmsub_ss
// CHECK: [[NEG:%.+]] = fneg <4 x float> [[ORIGC:%.+]]
// CHECK: [[A:%.+]] = extractelement <4 x float> %{{.}}, i64 0
// CHECK-NEXT: [[B:%.+]] = extractelement <4 x float> %{{.}}, i64 0
// CHECK-NEXT: [[C:%.+]] = extractelement <4 x float> [[NEG]], i64 0
// CHECK-NEXT: [[FMA:%.+]] = call float @llvm.fma.f32(float [[A]], float [[B]], float [[C]])
// CHECK-NEXT: [[C2:%.+]] = extractelement <4 x float> [[ORIGC]], i64 0
// CHECK-NEXT: bitcast i8 %{{.}} to <8 x i1>
// CHECK-NEXT: extractelement <8 x i1> %{{.}}, i64 0
// CHECK-NEXT: [[SEL:%.+]] = select i1 %{{.*}}, float [[FMA]], float [[C2]]
// CHECK-NEXT: insertelement <4 x float> [[ORIGC]], float [[SEL]], i64 0
return _mm_mask3_fmsub_ss(__W, __X, __Y, __U);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that you are using 0. - extractc
instead of -extractc
. The former is a subtraction while the latter is a negation. They have different behavior in edge cases like negative zero.
mask_fmsub: ss,sd; fmsub_round: ss,sd;
fnmadd: ss,sd; fnmsub: ss,sd; fnmadd_round: ss,sd; fnmsub_round: ss,sd;
fixupimm: ss,sd; fixupimm_round: ss,sd;
cvt_roundss_sd, cvtss_sd, cvtroundsd_ss, cvtsd_ss;
cvt_roundss_si32, cvt_roundss_i32, cvt_roundss_si64, cvt_roundss_i64, cvtss_i32, cvt_roundss_u32, cvt_roundss_u64, cvtss_i64, cvtss_u32, cvtss_u64;
cvt_roundsd_si32, cvt_roundsd_i32, cvt_roundsd_si64, cvt_roundsd_i64, cvtsd_i32, cvt_roundsd_u32, cvt_roundsd_u64, cvtsd_i64, cvtsd_u32, cvtsd_u64;
cvt_roundsi32_ss, cvt_roundi32_ss, cvt_roundsi64_ss, cvt_roundi64_ss, cvt_roundsi64_sd, cvt_roundi64_sd, cvt_roundu32_ss, cvt_roundu64_ss, cvt_roundu64_sd;
cvti32_ss; cvti32_sd; cvti64_ss; cvti64_sd;
cvtt_roundsd_si32, cvtt_roundsd_i32, cvtt_roundsd_si64, cvtt_roundsd_i64, cvttsd_i32, cvtt_roundsd_u32, cvtt_roundsd_u64, cvttsd_i64, cvttsd_u32, cvttsd_u64, cvtt_roundss_si32, cvtt_roundss_i32, cvtt_roundss_si64, cvtt_roundss_i64, cvttss_i32, cvtt_roundss_u32, cvtt_roundss_u64, cvttss_i64, cvttss_u32, cvttss_u64;
cvtu32_ss, cvtu32_sd, cvtu64_ss, cvtu64_sd;
mm_comi_ss; mm_comi_sd;