Skip to content

[clang][CodeGen] Propagate pragma set fast-math flags to floating point builtins #90377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 4, 2024

Conversation

karka228
Copy link
Collaborator

This is a fix for the issue #87758 where fast-math flags are not propagated all builtins.

It seems like pragmas with fast math flags was only propagated to calls of unary floating point builtins. This patch propagate them also for binary and ternary floating point builtins.

Adding test case related to issue llvm#87758

The testcase show the faulty behavior where the calls to llvm.pow.f32
and llvm.fma.f32 are not attributed with the "fast" flag.
…nt builtins

This is a fix for the issue llvm#87758 where fast-math flags are not propagated
all builtins.

It seems like pragmas with fast math flags was only propagated to calls of
unary floating point builtins. This patch propagate them also for binary and
ternary floating point builtins.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Apr 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 28, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Karl-Johan Karlsson (karka228)

Changes

This is a fix for the issue #87758 where fast-math flags are not propagated all builtins.

It seems like pragmas with fast math flags was only propagated to calls of unary floating point builtins. This patch propagate them also for binary and ternary floating point builtins.


Full diff: https://github.com/llvm/llvm-project/pull/90377.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+3-3)
  • (added) clang/test/CodeGen/pr87758.c (+28)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d08ab539148914..d61a63ac61572d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -513,8 +513,8 @@ static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
 
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   if (CGF.Builder.getIsFPConstrained()) {
-    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
     Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
     return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1 });
   } else {
@@ -530,8 +530,8 @@ static Value *emitBinaryExpMaybeConstrainedFPBuiltin(
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
 
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   if (CGF.Builder.getIsFPConstrained()) {
-    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
     Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
                                        {Src0->getType(), Src1->getType()});
     return CGF.Builder.CreateConstrainedFPCall(F, {Src0, Src1});
@@ -551,8 +551,8 @@ static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
 
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   if (CGF.Builder.getIsFPConstrained()) {
-    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
     Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
     return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1, Src2 });
   } else {
diff --git a/clang/test/CodeGen/pr87758.c b/clang/test/CodeGen/pr87758.c
new file mode 100644
index 00000000000000..05b3232986e0a6
--- /dev/null
+++ b/clang/test/CodeGen/pr87758.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \
+// RUN: | FileCheck -check-prefix=CHECK %s
+
+// Reproducer for issue #87758
+// The testcase below verifies that the "fast" flag are set on the calls.
+
+float sqrtf(float x);
+float powf(float x, float y);
+float fmaf(float x, float y, float z);
+
+#pragma float_control(push)
+#pragma float_control(precise, off)
+// CHECK-LABEL: define dso_local float @fp_precise_libm_calls(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call fast float @llvm.sqrt.f32(float [[A]])
+// CHECK-NEXT:    [[TMP1:%.*]] = tail call fast float @llvm.pow.f32(float [[TMP0]], float [[B]])
+// CHECK-NEXT:    [[TMP2:%.*]] = tail call fast float @llvm.fma.f32(float [[TMP1]], float [[B]], float [[C]])
+// CHECK-NEXT:    ret float [[TMP2]]
+//
+float fp_precise_libm_calls(float a, float b, float c) {
+  a = sqrtf(a);
+  a = powf(a,b);
+  a = fmaf(a,b,c);
+  return a;
+}
+#pragma float_control(pop)

@karka228 karka228 requested a review from andykaylor April 28, 2024 08:45
@andykaylor andykaylor requested a review from zahiraam April 29, 2024 17:15
Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right to me, but @zahiraam is more familiar with this code than I am. Zahira, do you have any comments?

@@ -0,0 +1,28 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to use -O3 here to prove what you want, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason for using -O3 was to get rid of all the allocas, loads and stores in the output that didn't contributed to the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to put every // CHECK line. Since we are interested only in the flag associated with the calls to the builtins you can just put those line.

a = fmaf(a,b,c);
return a;
}
#pragma float_control(pop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use the same test with #pragma float_control(precise,on) and confirm that the fast flags are not set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By combining #pragma float_control(precise,on) and -ffast-math to the testcase it revealed additional bugs, as the "fast" flag was still set. I tried to fix this by changing the function emitLibraryCall() but I'm a bit unsure if this is a correct fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the patch with your change in emitLibraryCall() and it seems to be fixing at least one known issue.
For the code here (using your 3 functions from this PR): https://godbolt.org/z/fzsGce76x

We can see that with -ffp-model=fast all function calls in foof function are correctly marked with the fast flag, but in foop they are incorrectly marked with the fast flag.
When compiled with ffp-model=precise some function calls are missing the fast flag in foof.
With your patch all these issues are corrected. Please confirm that. And if it's the case, then I think you can keep your last change and add more testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the testcase with more RUN: lines and also added a libc call to rindex that have nothing to do with floats to verify that it never get the "fast" flag.

// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -o - %s \
// RUN: | FileCheck -check-prefix=CHECK %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a run line with ffast-math option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add that.

…nt builtins

Minor fix in emitLibraryCall() and updated testcase.
@@ -0,0 +1,55 @@
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -fmath-errno -ffp-contract=on -fno-rounding-math -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to line 7.

// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-linux-gnu -O3 -fmath-errno -ffp-contract=on -fno-rounding-math -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to line 8.


// Reproducer for issue #87758
// The testcase below verifies that the "fast" flag are set on the calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using these run lines?
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s

// precise mode
// RUN: %clang_cc1 -triple x86_64-linux-gnu -fmath-errno -ffp-contract=on \
// RUN: -fno-rounding-math -emit-llvm -o - %s | FileCheck %s

// fast mode
// RUN %clang_cc1 -triple x86_64-linux-gnu -ffast-math -ffp-contract=fast \
// RUN: -emit-llvm -o - %s | FileCheck %s
You might need to use different prefixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I update the testcase with only those run lines.

#pragma float_control(push)
#pragma float_control(precise, off)
// CHECK: define dso_local float @fp_precise_off_libm_calls(
// CHECK: %{{.*}} = call fast float @llvm.sqrt.f32(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the %{{.*}} = before these calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not any longer. I'll change the testcase.

Copy link
Contributor

@zahiraam zahiraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@karka228 karka228 merged commit cb015b9 into llvm:main May 4, 2024
@karka228
Copy link
Collaborator Author

karka228 commented May 5, 2024

LGTM. Thanks.

I will close the issue #87758.

Thanks @andykaylor and @zahiraam for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants