Skip to content

[PartiallyInlineLibCalls] Emit missed- and passed-optimization remarks when partially inlining sqrt #123966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

TiborGY
Copy link
Contributor

@TiborGY TiborGY commented Jan 22, 2025

PR 2/2, closes #78785

Introduction

This builds on PR #122654 and adds both passed- and missed-optimization remarks to the PartiallyInlineLibCalls pass. This pass is currently only responsible for the partial inlining of sqrt/sqrtf C library calls, on targets that have a fast HW instruction for square root but demand that errno is set on invalid input. One such target is x86_64-unknown-linux-gnu.

About this PR

Before this PR, the pass was silent, which can be confusing if someone is looking at the asm output and finds the branch it is inserting. So I have added a passed-optimization remark.
At the same time this branch may also impede vectorization in some cases, the sqrt is often found in scientific codes with nested loops, etc. Therefore I have also added a missed-optimization remark alerting the user to the fact that the errno requirement is having an effect on optimizations.

I have also added some further missed-optimization remarks that are emitted when a C library call cannot be optimized by the pass due to strict FP exception behavior or a tail call circumstance.

Testing

I have added some regression testing to the existing tests that I could find for this pass. I am not 100% sure they are good/sufficient tests, but they are passing. Let me know if more testing is required for new opt-remarks.

Copy link

github-actions bot commented Jan 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@TiborGY TiborGY force-pushed the msg_pt2 branch 13 times, most recently from e8d7b20 to 4bc7e6c Compare January 23, 2025 16:23
@TiborGY TiborGY changed the title [WIP] Emit missed- and passed-optimization messages when partially inlining sqrt [PartiallyInlineLibCalls] Emit missed- and passed-optimization messages when partially inlining sqrt Jan 23, 2025
@TiborGY TiborGY changed the title [PartiallyInlineLibCalls] Emit missed- and passed-optimization messages when partially inlining sqrt [PartiallyInlineLibCalls] Emit missed- and passed-optimization remarks when partially inlining sqrt Jan 23, 2025
@TiborGY TiborGY marked this pull request as ready for review January 23, 2025 20:09
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (TiborGY)

Changes

PR 2/2

Introduction

This build on PR #122654 and adds both passed- and missed-optimization remarks to the PartiallyInlineLibCalls pass. This pass is currently only responsible for the partial inlining of sqrt/sqrtf C library calls, on targets that have a fast HW instruction for square root but demand that errno is set on invalid input. One such target is x86_64-unknown-linux-gnu.

About this PR

Before this PR, the pass was silent, which can be confusing if someone is looking at the asm output and finds the branch it is inserting. So I have added a passed-optimization remark.
At the same time this branch may also impede vectorization in some cases, sqrt is often found in scientific code with nested loops, etc. Therefore I have also added a missed-optimization remark alerting the user to the fact that the errno requirement is having an effect on optimizations.

I have also added some further missed-optimization remarks that are emitted when a C library call cannot be optimized by the pass due to strict FP exception behavior or a tail call circumstance.

Testing

I have added some regression testing to the existing tests that I could find for this pass. I am not 100% sure they are good/sufficient tests, but they are passing. Let me know if more testing is required for new opt-remarks.


Full diff: https://github.com/llvm/llvm-project/pull/123966.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp (+35-3)
  • (modified) llvm/test/Transforms/PartiallyInlineLibCalls/X86/good-prototype.ll (+5-1)
  • (modified) llvm/test/Transforms/PartiallyInlineLibCalls/X86/musttail.ll (+3-1)
  • (modified) llvm/test/Transforms/PartiallyInlineLibCalls/strictfp.ll (+3-1)
diff --git a/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp b/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp
index 2b50ccdc2eeb4f..e56b561ef1f591 100644
--- a/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp
+++ b/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp
@@ -56,6 +56,20 @@ static bool optimizeSQRT(CallInst *Call, Function *CalledFunc,
   // dst = phi(v0, v1)
   //
 
+  ORE->emit([&]() {
+    return OptimizationRemark(DEBUG_TYPE, "SqrtPartiallyInlined",
+                              Call->getDebugLoc(), &CurrBB)
+           << "Partially inlined call to sqrt function despite having to use "
+              "errno for error handling: target has fast sqrt instruction";
+  });
+  ORE->emit([&]() {
+    return OptimizationRemarkMissed(DEBUG_TYPE, "BranchInserted",
+                                    Call->getDebugLoc(), &CurrBB)
+           << "Branch to library sqrt fn had to be inserted to satisfy the "
+              "current target's requirement for math functions to set errno on "
+              "invalid inputs";
+  });
+
   Type *Ty = Call->getType();
   IRBuilder<> Builder(Call->getNextNode());
 
@@ -125,11 +139,29 @@ static bool runPartiallyInlineLibCalls(Function &F, TargetLibraryInfo *TLI,
       if (!Call || !(CalledFunc = Call->getCalledFunction()))
         continue;
 
-      if (Call->isNoBuiltin() || Call->isStrictFP())
+      if (Call->isNoBuiltin())
         continue;
-
-      if (Call->isMustTailCall())
+      if (Call->isStrictFP()) {
+        ORE->emit([&]() {
+          return OptimizationRemarkMissed(DEBUG_TYPE, "StrictFloat",
+                                          Call->getDebugLoc(), &*CurrBB)
+                 << "Could not consider library function for partial inlining:"
+                    " strict FP exception behavior is active";
+        });
         continue;
+      }
+      // Partially inlining a libcall that has the musttail attribute leads to
+      // broken LLVM IR, triggering an assertion in the IR verifier.
+      // Work around that by forgoing this optimization for musttail calls.
+      if (Call->isMustTailCall()) {
+        ORE->emit([&]() {
+          return OptimizationRemarkMissed(DEBUG_TYPE, "MustTailCall",
+                                          Call->getDebugLoc(), &*CurrBB)
+                 << "Could not consider library function for partial inlining:"
+                    " must tail call";
+        });
+        continue;
+      }
 
       // Skip if function either has local linkage or is not a known library
       // function.
diff --git a/llvm/test/Transforms/PartiallyInlineLibCalls/X86/good-prototype.ll b/llvm/test/Transforms/PartiallyInlineLibCalls/X86/good-prototype.ll
index e6c2a7e629a5d6..49a34d9d4a0a6f 100644
--- a/llvm/test/Transforms/PartiallyInlineLibCalls/X86/good-prototype.ll
+++ b/llvm/test/Transforms/PartiallyInlineLibCalls/X86/good-prototype.ll
@@ -1,8 +1,12 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
+; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu -pass-remarks=partially-inline-libcalls \
+; RUN:     -pass-remarks-missed=partially-inline-libcalls 2>%t < %s | FileCheck %s
+; RUN: cat %t | FileCheck %s -check-prefix=CHECK-REMARK
 
 define float @f(float %val) {
 ; CHECK-LABEL: @f(
+; CHECK-REMARK: Partially inlined call to sqrt function despite having to use errno for error handling: target has fast sqrt instruction
+; CHECK-REMARK: Branch to library sqrt fn had to be inserted to satisfy the current target's requirement for math functions to set errno on invalid inputs
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[RES:%.*]] = tail call float @sqrtf(float [[VAL:%.*]]) #[[READNONE:.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = fcmp oge float [[VAL]], 0.000000e+00
diff --git a/llvm/test/Transforms/PartiallyInlineLibCalls/X86/musttail.ll b/llvm/test/Transforms/PartiallyInlineLibCalls/X86/musttail.ll
index 65dd616b43ea69..c877990df309a0 100644
--- a/llvm/test/Transforms/PartiallyInlineLibCalls/X86/musttail.ll
+++ b/llvm/test/Transforms/PartiallyInlineLibCalls/X86/musttail.ll
@@ -1,8 +1,10 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
+; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu -pass-remarks-missed=partially-inline-libcalls 2>%t < %s | FileCheck %s
+; RUN: cat %t | FileCheck %s -check-prefix=CHECK-REMARK
 
 define double @foo(double %x) {
 ; CHECK-LABEL: @foo(
+; CHECK-REMARK: Could not consider library function for partial inlining: must tail call
 ; CHECK-NEXT:    [[R:%.*]] = musttail call double @sqrt(double [[X:%.*]])
 ; CHECK-NEXT:    ret double [[R]]
 ;
diff --git a/llvm/test/Transforms/PartiallyInlineLibCalls/strictfp.ll b/llvm/test/Transforms/PartiallyInlineLibCalls/strictfp.ll
index 8d18d1969b04d3..6a58de096b84a5 100644
--- a/llvm/test/Transforms/PartiallyInlineLibCalls/strictfp.ll
+++ b/llvm/test/Transforms/PartiallyInlineLibCalls/strictfp.ll
@@ -1,7 +1,9 @@
-; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
+; RUN: opt -S -passes=partially-inline-libcalls -mtriple=x86_64-unknown-linux-gnu -pass-remarks-missed=partially-inline-libcalls 2>%t < %s | FileCheck %s
+; RUN: cat %t | FileCheck %s -check-prefix=CHECK-REMARK
 
 define float @f(float %val) strictfp {
 ; CHECK-LABEL: @f
+; CHECK-REMARK: Could not consider library function for partial inlining: strict FP exception behavior is active
 ; CHECK: call{{.*}}@sqrtf
 ; CHECK-NOT: call{{.*}}@sqrtf
   %res = tail call float @sqrtf(float %val) strictfp

@TiborGY
Copy link
Contributor Author

TiborGY commented Jan 25, 2025

Tagging @arsenm since they have reviewed the first PR in the series.

@TiborGY
Copy link
Contributor Author

TiborGY commented Feb 1, 2025

Also tagging @kazutakahirata since they were suggested as a reviewer for the previous PR in the series, and are listed as an inliner maintainer.

@TiborGY
Copy link
Contributor Author

TiborGY commented Feb 14, 2025

Ping

Comment on lines 66 to 70
return OptimizationRemarkMissed(DEBUG_TYPE, "BranchInserted",
Call->getDebugLoc(), &CurrBB)
<< "Branch to library sqrt fn had to be inserted to satisfy the "
"current target's requirement for math functions to set errno on "
"invalid inputs";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the OptimizationRemarkMissed is the best category for this. Maybe include it in the remark above, as it isn't really a missed optimization BranchInserted only happens ifSqrtPartiallyInlined also happens ?

Or maybe just use OptimizationRemark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can see how this could be a matter of interpretation. I have felt the need to separate the two messages as they convey two different aspects of the same event.
First, the OptimizationRemark signals that a transformation which is expected to be profitable has been done.
Second, the OptimizationRemarkMissed signals that that there is a condition inhibiting this optimization from being even better.
Splitting these enables users to have more relevant information with filtering, eg. if they are only interested in spots where LLVM is struggling to optimize something, they can look at OptimizationRemarkMissed only.
Combining the two messages or turning the second one into a second OptimizationRemark would defeat this.

Perhaps my mental model is wrong, but to me OptimizationRemark means "LLVM did something it believes to be useful" and OptimizationRemarkMissed means "LLVM cannot or will not do something that (hypothetically) could have been useful". Hence the separation.

Copy link
Contributor Author

@TiborGY TiborGY Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my explanation is satisfactory, and there are no further concerns to be discussed, may I request a merge?
I do not have write access to the LLVM repo.

@TiborGY
Copy link
Contributor Author

TiborGY commented May 2, 2025

ping

return OptimizationRemark(DEBUG_TYPE, "SqrtPartiallyInlined",
Call->getDebugLoc(), &CurrBB)
<< "Partially inlined call to sqrt function despite having to use "
"errno for error handling: target has fast sqrt instruction";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't know if the target has a "fast sqrt instruction" and this shouldn't be considering it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't optimizeSQRT only get called if TTI->haveFastSqrt(Call->getType()) == true?
This is its only call site:

        if (TTI->haveFastSqrt(Call->getType()) &&
            optimizeSQRT(Call, CalledFunc, *CurrBB, BB, TTI,
                         DTU ? &*DTU : nullptr, ORE))
          break;

If the target does not have fast sqrt, the && gets short-circuited to false and optimizeSQRT is never called.
Therefore optimizeSQRT implicitly knows the HW sqrt is fast, in the context of the current codebase.
But you are right, this is quite brittle, so I have moved the remark after the condition that may call optimizeSQRT.

FWIW, I have taken the "fast sqrt instruction" language from here:
https://github.com/llvm/llvm-project/blob/9c60431b673c478606e63ff1e47860eb4eb6af09/llvm/include/llvm/Analysis/TargetTransformInfo.h#L1048C1-L1049C37

return OptimizationRemarkMissed(DEBUG_TYPE, "BranchInserted",
Call->getDebugLoc(), &CurrBB)
<< "Branch to library sqrt fn had to be inserted to satisfy the "
"current target's requirement for math functions to set errno on "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a property of the target. I also don't see why these two remarks are emitted under the same exact conditions

Copy link
Contributor Author

@TiborGY TiborGY May 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, target may not be the best word for it indeed, but the docs are using a similar language:

On some targets, math library functions never set errno, and so -fno-math-errno is the default. This includes most BSD-derived systems, including Darwin.

In guess I could say target triplet instead of target?

PS. I have created two remarks so I can split them into an optimization passed/missed pair. In my mental model passed means "LLVM did some profitable transformation" and missed means "something is inhibiting an optimization that could have been profitable" or "LLVM thinks something is not profitable". In this case, applying the partial inlining with the branch is simultaneously both: profitable but could have been better if setting errno wasn't a requirement.
Separating the two aspects enables the user to filter the remark output and e.g. look at only the missed optimization remarks. I often use Compiler Explorer to look at what LLVM says about a piece of code, and I find the ability to filter by remark type useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partially-inline-libcalls should probably issue a missed-optimization message if it inserts a branch
4 participants