-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang][CodeGen] Propagate pragma set fast-math flags to floating point builtins #90377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding test case related to issue llvm#87758 The testcase show the faulty behavior where the calls to llvm.pow.f32 and llvm.fma.f32 are not attributed with the "fast" flag.
…nt builtins This is a fix for the issue llvm#87758 where fast-math flags are not propagated all builtins. It seems like pragmas with fast math flags was only propagated to calls of unary floating point builtins. This patch propagate them also for binary and ternary floating point builtins.
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Karl-Johan Karlsson (karka228) ChangesThis is a fix for the issue #87758 where fast-math flags are not propagated all builtins. It seems like pragmas with fast math flags was only propagated to calls of unary floating point builtins. This patch propagate them also for binary and ternary floating point builtins. Full diff: https://github.com/llvm/llvm-project/pull/90377.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d08ab539148914..d61a63ac61572d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -513,8 +513,8 @@ static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
+ CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
- CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1 });
} else {
@@ -530,8 +530,8 @@ static Value *emitBinaryExpMaybeConstrainedFPBuiltin(
llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
+ CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
- CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
{Src0->getType(), Src1->getType()});
return CGF.Builder.CreateConstrainedFPCall(F, {Src0, Src1});
@@ -551,8 +551,8 @@ static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
+ CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
if (CGF.Builder.getIsFPConstrained()) {
- CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1, Src2 });
} else {
diff --git a/clang/test/CodeGen/pr87758.c b/clang/test/CodeGen/pr87758.c
new file mode 100644
index 00000000000000..05b3232986e0a6
--- /dev/null
+++ b/clang/test/CodeGen/pr87758.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \
+// RUN: | FileCheck -check-prefix=CHECK %s
+
+// Reproducer for issue #87758
+// The testcase below verifies that the "fast" flag are set on the calls.
+
+float sqrtf(float x);
+float powf(float x, float y);
+float fmaf(float x, float y, float z);
+
+#pragma float_control(push)
+#pragma float_control(precise, off)
+// CHECK-LABEL: define dso_local float @fp_precise_libm_calls(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call fast float @llvm.sqrt.f32(float [[A]])
+// CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.pow.f32(float [[TMP0]], float [[B]])
+// CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.fma.f32(float [[TMP1]], float [[B]], float [[C]])
+// CHECK-NEXT: ret float [[TMP2]]
+//
+float fp_precise_libm_calls(float a, float b, float c) {
+ a = sqrtf(a);
+ a = powf(a,b);
+ a = fmaf(a,b,c);
+ return a;
+}
+#pragma float_control(pop)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right to me, but @zahiraam is more familiar with this code than I am. Zahira, do you have any comments?
clang/test/CodeGen/pr87758.c
Outdated
@@ -0,0 +1,28 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4 | |||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to use -O3 here to prove what you want, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason for using -O3
was to get rid of all the allocas, loads and stores in the output that didn't contributed to the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to put every // CHECK line. Since we are interested only in the flag associated with the calls to the builtins you can just put those line.
a = fmaf(a,b,c); | ||
return a; | ||
} | ||
#pragma float_control(pop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use the same test with #pragma float_control(precise,on)
and confirm that the fast flags are not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By combining #pragma float_control(precise,on)
and -ffast-math
to the testcase it revealed additional bugs, as the "fast" flag was still set. I tried to fix this by changing the function emitLibraryCall() but I'm a bit unsure if this is a correct fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the patch with your change in emitLibraryCall()
and it seems to be fixing at least one known issue.
For the code here (using your 3 functions from this PR): https://godbolt.org/z/fzsGce76x
We can see that with -ffp-model=fast
all function calls in foof
function are correctly marked with the fast
flag, but in foop
they are incorrectly marked with the fast
flag.
When compiled with ffp-model=precise
some function calls are missing the fast
flag in foof
.
With your patch all these issues are corrected. Please confirm that. And if it's the case, then I think you can keep your last change and add more testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the testcase with more RUN: lines and also added a libc call to rindex that have nothing to do with floats to verify that it never get the "fast" flag.
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4 | ||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \ | ||
// RUN: | FileCheck -check-prefix=CHECK %s | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a run line with ffast-math
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add that.
…nt builtins Minor fix in emitLibraryCall() and updated testcase.
…nt builtins Updated testcase.
clang/test/CodeGen/pr87758.c
Outdated
@@ -0,0 +1,55 @@ | |||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s | |||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s | |||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -fmath-errno -ffp-contract=on -fno-rounding-math -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to line 7.
clang/test/CodeGen/pr87758.c
Outdated
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s | ||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s | ||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -fmath-errno -ffp-contract=on -fno-rounding-math -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s | ||
// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to line 8.
|
||
// Reproducer for issue #87758 | ||
// The testcase below verifies that the "fast" flag are set on the calls. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using these run lines?
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
// precise mode
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fmath-errno -ffp-contract=on \
// RUN: -fno-rounding-math -emit-llvm -o - %s | FileCheck %s
// fast mode
// RUN %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast \
// RUN: -emit-llvm -o - %s | FileCheck %s
You might need to use different prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I update the testcase with only those run lines.
…nt builtins Updated testcase.
clang/test/CodeGen/pr87758.c
Outdated
#pragma float_control(push) | ||
#pragma float_control(precise, off) | ||
// CHECK: define dso_local float @fp_precise_off_libm_calls( | ||
// CHECK: %{{.*}} = call fast float @llvm.sqrt.f32( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the %{{.*}} = before these calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not any longer. I'll change the testcase.
…nt builtins Updated testcase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
I will close the issue #87758. Thanks @andykaylor and @zahiraam for reviewing. |
This is a fix for the issue #87758 where fast-math flags are not propagated all builtins.
It seems like pragmas with fast math flags was only propagated to calls of unary floating point builtins. This patch propagate them also for binary and ternary floating point builtins.