Skip to content

Avx512f #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Nov 7, 2020
Merged

Avx512f #933

merged 25 commits into from
Nov 7, 2020

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Oct 19, 2020

mask_fmsub: ss,sd; fmsub_round: ss,sd;
fnmadd: ss,sd; fnmsub: ss,sd; fnmadd_round: ss,sd; fnmsub_round: ss,sd;
fixupimm: ss,sd; fixupimm_round: ss,sd;
cvt_roundss_sd, cvtss_sd, cvtroundsd_ss, cvtsd_ss;
cvt_roundss_si32, cvt_roundss_i32, cvt_roundss_si64, cvt_roundss_i64, cvtss_i32, cvt_roundss_u32, cvt_roundss_u64, cvtss_i64, cvtss_u32, cvtss_u64;
cvt_roundsd_si32, cvt_roundsd_i32, cvt_roundsd_si64, cvt_roundsd_i64, cvtsd_i32, cvt_roundsd_u32, cvt_roundsd_u64, cvtsd_i64, cvtsd_u32, cvtsd_u64;
cvt_roundsi32_ss, cvt_roundi32_ss, cvt_roundsi64_ss, cvt_roundi64_ss, cvt_roundsi64_sd, cvt_roundi64_sd, cvt_roundu32_ss, cvt_roundu64_ss, cvt_roundu64_sd;
cvti32_ss; cvti32_sd; cvti64_ss; cvti64_sd;
cvtt_roundsd_si32, cvtt_roundsd_i32, cvtt_roundsd_si64, cvtt_roundsd_i64, cvttsd_i32, cvtt_roundsd_u32, cvtt_roundsd_u64, cvttsd_i64, cvttsd_u32, cvttsd_u64, cvtt_roundss_si32, cvtt_roundss_i32, cvtt_roundss_si64, cvtt_roundss_i64, cvttss_i32, cvtt_roundss_u32, cvtt_roundss_u64, cvttss_i64, cvttss_u32, cvttss_u64;
cvtu32_ss, cvtu32_sd, cvtu64_ss, cvtu64_sd;
mm_comi_ss; mm_comi_sd;

… cvtss_i32, cvt_roundss_u32, cvt_roundss_u64, cvtss_i64, cvtss_u32, cvtss_u64
… cvtsd_i32, cvt_roundsd_u32, cvt_roundsd_u64, cvtsd_i64, cvtsd_u32, cvtsd_u64
… cvt_roundsi64_sd, cvt_roundi64_sd, cvt_roundu32_ss, cvt_roundu64_ss, cvt_roundu64_sd
…i64, cvttsd_i32, cvtt_roundsd_u32, cvtt_roundsd_u64, cvttsd_i64, cvttsd_u32, cvttsd_u64, cvtt_roundss_si32, cvtt_roundss_i32, cvtt_roundss_si64, cvtt_roundss_i64, cvttss_i32, cvtt_roundss_u32, cvtt_roundss_u64, cvttss_i64, cvttss_u32, cvttss_u64;
@rust-highfive
Copy link

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

@minybot
Copy link
Contributor Author

minybot commented Oct 20, 2020

It seems "mm_cvt_roundss_si64()" ..., i64,si64 cause i686-unknown-linux-gnu test error.

/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=mm_mask_fmsub_ss&expand=2668)
#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vfmadd213ss))] //should be vfmsub213ss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong: it should be generating the vfmsub213ss instruction. Maybe you need to invoke the subtract intrinsic directly instead of negating extractc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow LLVM code, it uses llvm.fma.f32. I think vfmadd213ss and vfmsub213ss have the same latency, and vfmaddss comes first when generating.

__m128 test_mm_mask3_fmsub_ss(__m128 __W, __m128 __X, __m128 __Y, __mmask8 __U){
// CHECK-LABEL: @test_mm_mask3_fmsub_ss
// CHECK: [[NEG:%.+]] = fneg <4 x float> [[ORIGC:%.+]]
// CHECK: [[A:%.+]] = extractelement <4 x float> %{{.}}, i64 0
// CHECK-NEXT: [[B:%.+]] = extractelement <4 x float> %{{.
}}, i64 0
// CHECK-NEXT: [[C:%.+]] = extractelement <4 x float> [[NEG]], i64 0
// CHECK-NEXT: [[FMA:%.+]] = call float @llvm.fma.f32(float [[A]], float [[B]], float [[C]])
// CHECK-NEXT: [[C2:%.+]] = extractelement <4 x float> [[ORIGC]], i64 0
// CHECK-NEXT: bitcast i8 %{{.}} to <8 x i1>
// CHECK-NEXT: extractelement <8 x i1> %{{.
}}, i64 0
// CHECK-NEXT: [[SEL:%.+]] = select i1 %{{.*}}, float [[FMA]], float [[C2]]
// CHECK-NEXT: insertelement <4 x float> [[ORIGC]], float [[SEL]], i64 0
return _mm_mask3_fmsub_ss(__W, __X, __Y, __U);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that you are using 0. - extractc instead of -extractc. The former is a subtraction while the latter is a negation. They have different behavior in edge cases like negative zero.

@Amanieu Amanieu merged commit 2acca02 into rust-lang:master Nov 7, 2020
@minybot minybot deleted the avx512f branch November 9, 2020 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants