Skip to content

[clang][CodeGen] Propagate pragma set fast-math flags to floating point builtins #90377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,8 +513,8 @@ static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));

CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1 });
} else {
Expand All @@ -530,8 +530,8 @@ static Value *emitBinaryExpMaybeConstrainedFPBuiltin(
llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));

CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
{Src0->getType(), Src1->getType()});
return CGF.Builder.CreateConstrainedFPCall(F, {Src0, Src1});
Expand All @@ -551,8 +551,8 @@ static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));

CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1, Src2 });
} else {
Expand Down Expand Up @@ -704,6 +704,7 @@ static Value *EmitSignBit(CodeGenFunction &CGF, Value *V) {

static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD,
const CallExpr *E, llvm::Constant *calleeValue) {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
CGCallee callee = CGCallee::forDirect(calleeValue, GlobalDecl(FD));
return CGF.EmitCall(E->getCallee()->getType(), callee, E, ReturnValueSlot());
}
Expand Down Expand Up @@ -2660,7 +2661,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
if (OP.hasMathErrnoOverride())
ErrnoOverriden = OP.getMathErrnoOverride();
}
// True if 'atttibute__((optnone)) is used. This attibute overrides
// True if 'attribute__((optnone))' is used. This attribute overrides
// fast-math which implies math-errno.
bool OptNone = CurFuncDecl && CurFuncDecl->hasAttr<OptimizeNoneAttr>();

Expand Down
6 changes: 3 additions & 3 deletions clang/test/CodeGen/math-errno.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ float f1(float x) {
// CHECK: tail call float @sqrtf(float noundef {{.*}}) #[[ATTR4_O2:[0-9]+]]

// FAST-LABEL: define {{.*}} nofpclass(nan inf) float @f1
// FAST: call fast nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan inf) {{.*}}) #[[ATTR3_FAST:[0-9]+]]
// FAST: call nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan inf) {{.*}}) #[[ATTR3_FAST:[0-9]+]]

// NOOPT-LABEL: define {{.*}} float @f1
// NOOPT: call float @sqrtf(float noundef {{.*}}) #[[ATTR4_NOOPT:[0-9]+]]
Expand All @@ -44,7 +44,7 @@ float f2(float x) {
// FAST: call fast float @llvm.sqrt.f32(float {{.*}})

// NOOPT-LABEL: define {{.*}} float @f2
// NOOPT: call float @sqrtf(float {{.*}}) #[[ATTR4_NOOPT:[0-9]+]]
// NOOPT: call fast float @sqrtf(float {{.*}}) #[[ATTR4_NOOPT:[0-9]+]]

__attribute__((optnone))
float f3(float x) {
Expand All @@ -56,7 +56,7 @@ float f3(float x) {
// CHECK: call float @sqrtf(float noundef {{.*}})

// FAST-LABEL: define {{.*}} nofpclass(nan inf) float @f3
// FAST: call fast nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan inf) {{.*}}) #[[ATTR4_FAST:[0-9]+]]
// FAST: call nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan inf) {{.*}}) #[[ATTR4_FAST:[0-9]+]]

// NOOPT-LABEL: define {{.*}} float @f3
// NOOPT: call float @sqrtf(float noundef %0) #[[ATTR4_NOOPT:[0-9]+]]
Expand Down
76 changes: 76 additions & 0 deletions clang/test/CodeGen/pr87758.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s

// precise mode
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fmath-errno -ffp-contract=on \
// RUN: -fno-rounding-math -emit-llvm -o - %s | FileCheck \
// RUN: --check-prefix=CHECK-PRECISE %s

// fast mode
// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast \
// RUN: -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-FAST %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a run line with ffast-math option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add that.

// Reproducer for issue #87758
// The testcase below verifies that the "fast" flag are set on the calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using these run lines?
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s

// precise mode
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fmath-errno -ffp-contract=on \
// RUN: -fno-rounding-math -emit-llvm -o - %s | FileCheck %s

// fast mode
// RUN %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast \
// RUN: -emit-llvm -o - %s | FileCheck %s
You might need to use different prefixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I update the testcase with only those run lines.

float sqrtf(float x); // unary fp builtin
float powf(float x, float y); // binary fp builtin
float fmaf(float x, float y, float z); // ternary fp builtin
char *rindex(const char *s, int c); // not a fp builtin

#pragma float_control(push)
#pragma float_control(precise, off)
// CHECK: define dso_local float @fp_precise_off_libm_calls(
// CHECK: call fast float @llvm.sqrt.f32(
// CHECK: call fast float @llvm.pow.f32(
// CHECK: call fast float @llvm.fma.f32(
// CHECK: call ptr @rindex(

// CHECK-PRECISE: define dso_local float @fp_precise_off_libm_calls(
// CHECK-PRECISE: call fast float @sqrtf(
// CHECK-PRECISE: call fast float @powf(
// CHECK-PRECISE: call fast float @llvm.fma.f32(
// CHECK-PRECISE: call ptr @rindex(

// CHECK-FAST: define dso_local nofpclass(nan inf) float @fp_precise_off_libm_calls(
// CHECK-FAST: call fast float @llvm.sqrt.f32(
// CHECK-FAST: call fast float @llvm.pow.f32(
// CHECK-FAST: call fast float @llvm.fma.f32(
// CHECK-FAST: call ptr @rindex(

float fp_precise_off_libm_calls(float a, float b, float c, const char *d, char *e, unsigned char f) {
a = sqrtf(a);
a = powf(a,b);
a = fmaf(a,b,c);
e = rindex(d, 75);
return a;
}
#pragma float_control(pop)

#pragma float_control(push)
#pragma float_control(precise, on)
// CHECK: define dso_local float @fp_precise_on_libm_calls(
// CHECK: call float @sqrtf(
// CHECK: call float @powf(
// CHECK: call float @llvm.fma.f32(
// CHECK: call ptr @rindex(

// CHECK-PRECISE: define dso_local float @fp_precise_on_libm_calls(
// CHECK-PRECISE: call float @sqrtf(
// CHECK-PRECISE: call float @powf(
// CHECK-PRECISE: call float @llvm.fma.f32(
// CHECK-PRECISE: call ptr @rindex(

// CHECK-FAST: define dso_local nofpclass(nan inf) float @fp_precise_on_libm_calls(
// CHECK-FAST: call nofpclass(nan inf) float @sqrtf(
// CHECK-FAST: call nofpclass(nan inf) float @powf(
// CHECK-FAST: call float @llvm.fma.f32(
// CHECK-FAST: call ptr @rindex(

float fp_precise_on_libm_calls(float a, float b, float c, const char *d, char *e, unsigned char f) {
a = sqrtf(a);
a = powf(a,b);
a = fmaf(a,b,c);
e = rindex(d, 75);
return a;
}
#pragma float_control(pop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use the same test with #pragma float_control(precise,on) and confirm that the fast flags are not set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By combining #pragma float_control(precise,on) and -ffast-math to the testcase it revealed additional bugs, as the "fast" flag was still set. I tried to fix this by changing the function emitLibraryCall() but I'm a bit unsure if this is a correct fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the patch with your change in emitLibraryCall() and it seems to be fixing at least one known issue.
For the code here (using your 3 functions from this PR): https://godbolt.org/z/fzsGce76x

We can see that with -ffp-model=fast all function calls in foof function are correctly marked with the fast flag, but in foop they are incorrectly marked with the fast flag.
When compiled with ffp-model=precise some function calls are missing the fast flag in foof.
With your patch all these issues are corrected. Please confirm that. And if it's the case, then I think you can keep your last change and add more testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the testcase with more RUN: lines and also added a libc call to rindex that have nothing to do with floats to verify that it never get the "fast" flag.