Skip to content

[llvm-profgen] Filter out ambiguous cold profiles during profile generation #81803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Feb 14, 2024

For the built-in local initialization function(__cxx_global_var_init, __tls_init prefix), there could be multiple versions of the functions in the final binary, e.g. __cxx_global_var_init, which is a wrapper of global variable ctors, the compiler could assign suffixes like __cxx_global_var_init.N for different ctors.
However, in the profile generation, we call getCanonicalFnName to canonicalize the names which strip the suffixes. Therefore, samples from different functions queries the same profile(only __cxx_global_var_init) and the counts are merged. As the functions are essentially different, entries of the merged profile are ambiguous. In sample loading, for each version of this function, the IR from one version would be attributed towards a merged entries, which is inaccurate, especially for fuzzy profile matching, it gets multiple callsites(from different function) but using to match one callsite, which mislead the matching and report a lot of false positives.
Hence, we want to filter them out from the profile map during the profile generation time. The profiles are all cold functions, it won't have perf impact.

@wlei-llvm wlei-llvm requested a review from WenleiHe February 14, 2024 23:09
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Feb 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 14, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

For the local pre-defined initialization function("__cxx_global_var_init", "__tls_init" prefix), the profile is generated either from one version or the merged version, which is ambiguous. We can't attribute the profile for each version accurately, and it could mislead the profile matching, report a lot of false positives, so filter them out from the profile map during the profile generation time. The profiles are all cold functions, it won't have perf impact.


Full diff: https://github.com/llvm/llvm-project/pull/81803.diff

6 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/SampleProf.h (+4-2)
  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+1-1)
  • (added) llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof (+16)
  • (added) llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test (+8)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+35)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.h (+4)
diff --git a/llvm/include/llvm/ProfileData/SampleProf.h b/llvm/include/llvm/ProfileData/SampleProf.h
index 8ac84d4b933f20..5530abe37fc2a9 100644
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@@ -466,7 +466,7 @@ struct SampleContextFrame {
   LineLocation Location;
 
   SampleContextFrame() : Location(0, 0) {}
-  
+
   SampleContextFrame(FunctionId Func, LineLocation Location)
       : Func(Func), Location(Location) {}
 
@@ -527,7 +527,7 @@ class SampleContext {
       : Func(Name), State(UnknownContext), Attributes(ContextNone) {
         assert(!Name.empty() && "Name is empty");
       }
-  
+
   SampleContext(FunctionId Func)
       : Func(Func), State(UnknownContext), Attributes(ContextNone) {}
 
@@ -975,6 +975,8 @@ class FunctionSamples {
     return CallsiteSamples;
   }
 
+  CallsiteSampleMap &getCallsiteSamples() { return CallsiteSamples; }
+
   /// Return the maximum of sample counts in a function body. When SkipCallSite
   /// is false, which is the default, the return count includes samples in the
   /// inlined functions. When SkipCallSite is true, the return count only
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 2fd8668d15e200..5d75f52f985161 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -1129,7 +1129,7 @@ void SampleProfileLoader::findExternalInlineCandidate(
         CalleeSample->getContext().hasAttribute(ContextShouldBeInlined);
     if (!PreInline && CalleeSample->getHeadSamplesEstimate() < Threshold)
       continue;
-    
+
     Function *Func = SymbolMap.lookup(CalleeSample->getFunction());
     // Add to the import list only when it's defined out of module.
     if (!Func || Func->isDeclaration())
diff --git a/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof b/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof
new file mode 100644
index 00000000000000..a63d151e247c75
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof
@@ -0,0 +1,16 @@
+foo:12345:1000
+ 1: 1000
+ 4: bar:1000
+  1: 1000
+  2: __tls_init.llvm.123:1
+   1: 1
+  3: goo:300
+   1: 300
+ 8: __cxx_global_var_init.4:4
+  1: 1
+  2: goo:3
+   1: 3
+__cxx_global_var_init.1:1:1
+ 1: 1
+__tls_init.llvm.345:1:1
+ 1: 1
diff --git a/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test b/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test
new file mode 100644
index 00000000000000..3a264e3b1108bf
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test
@@ -0,0 +1,8 @@
+; RUN: llvm-profgen --format=text --llvm-sample-profile=%S/Inputs/filter-ambiguous-profile.prof --binary=%S/Inputs/inline-cs-noprobe.perfbin --csspgo-preinliner=0 --output=%t1 || FileCheck %s --input-file %t1
+
+;CHECK:    foo:12345:1000
+;CHECK-NEXT  1: 1000
+;CHECK-NEXT  4: bar:1000
+;CHECK-NEXT   1: 1000
+;CHECK-NEXT   3: goo:300
+;CHECK-NEXT    1: 300
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index c4028e6b132871..86b0bb4749e79f 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -196,6 +196,39 @@ void ProfileGeneratorBase::showDensitySuggestion(double Density) {
            << "% total samples: " << format("%.1f", Density) << "\n";
 }
 
+// Those functions are usually cold but could have multiple versions, the
+// profile are either from one version or the merged version, which is
+// ambiguous. We can't attribute the profile for each version accurately, so
+// filter them out from the profile map.
+static constexpr const char *FuncPrefixsToFilter[] = {"__cxx_global_var_init",
+                                                      "__tls_init"};
+
+bool ProfileGeneratorBase::filterAmbiguousProfile(FunctionSamples &FS) {
+  for (const auto &Prefix : FuncPrefixsToFilter) {
+    if (FS.getFuncName().starts_with(Prefix))
+      return true;
+  }
+
+  // Filter the function profiles for the inlinees.
+  for (auto &Callees : FS.getCallsiteSamples()) {
+    auto &CalleesMap = Callees.second;
+    for (auto I = CalleesMap.begin(); I != CalleesMap.end();) {
+      auto Tmp = I++;
+      if (filterAmbiguousProfile(Tmp->second))
+        CalleesMap.erase(Tmp);
+    }
+  }
+  return false;
+}
+
+void ProfileGeneratorBase::filterAmbiguousProfile(SampleProfileMap &Profiles) {
+  for (auto I = ProfileMap.begin(); I != ProfileMap.end();) {
+    auto Tmp = I++;
+    if (filterAmbiguousProfile(Tmp->second))
+      ProfileMap.erase(Tmp);
+  }
+}
+
 double ProfileGeneratorBase::calculateDensity(const SampleProfileMap &Profiles,
                                               uint64_t HotCntThreshold) {
   double Density = DBL_MAX;
@@ -492,6 +525,7 @@ void ProfileGenerator::postProcessProfiles() {
   computeSummaryAndThreshold(ProfileMap);
   trimColdProfiles(ProfileMap, ColdCountThreshold);
   calculateAndShowDensity(ProfileMap);
+  filterAmbiguousProfile(ProfileMap);
 }
 
 void ProfileGenerator::trimColdProfiles(const SampleProfileMap &Profiles,
@@ -1024,6 +1058,7 @@ void CSProfileGenerator::postProcessProfiles() {
     CSConverter.convertCSProfiles();
     FunctionSamples::ProfileIsCS = false;
   }
+  filterAmbiguousProfile(ProfileMap);
 }
 
 void ProfileGeneratorBase::computeSummaryAndThreshold(
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.h b/llvm/tools/llvm-profgen/ProfileGenerator.h
index 88cf2dc0d49f37..3970876c5bcfb7 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.h
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.h
@@ -108,6 +108,10 @@ class ProfileGeneratorBase {
 
   void updateCallsiteSamples();
 
+  void filterAmbiguousProfile(SampleProfileMap &Profiles);
+
+  bool filterAmbiguousProfile(FunctionSamples &FS);
+
   StringRef getCalleeNameForAddress(uint64_t TargetAddress);
 
   void computeSummaryAndThreshold(SampleProfileMap &ProfileMap);

@@ -975,6 +975,8 @@ class FunctionSamples {
return CallsiteSamples;
}

CallsiteSampleMap &getCallsiteSamples() { return CallsiteSamples; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we intentionally not exposing this as non-const (it was added once, and later cleaned up and removed). In generally we should be careful about exposing non-const accessor to private members.

OTOH, I'm wondering do we actually need to filter call site (inlined) samples? I think only top level functions are indistinguishable, and inlinee's samples can be distinguished and still used by their inliner?

Also __cxx_global_var_init can inlined only into tls_init, but tls_ininit probably will never be inlined anywhere?

If we really need to do this, it's probably better to use the const accessor and then cast away the constness, rather than adding an non-const API which is essentially saying it's expected to be modified in general and it's likely subject to misuse in the future.

Copy link
Contributor Author

@wlei-llvm wlei-llvm Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we intentionally removed this.. I got it, will remove this.

OTOH, I'm wondering do we actually need to filter call site (inlined) samples?
Also __cxx_global_var_init can inlined only into tls_init, but tls_ininit probably will never be inlined anywhere?

I do see tls_init appearing in the CallsiteSamples, presumably it can be inlined.
Another thing which is specific to profile matching, we use the flattened profile for profile matching, so the inlinee's samples will be merged into top-level function, then used as the top-level function in profile matching(causing the issues we hit)

I think only top level functions are indistinguishable, and inlinee's samples can be distinguished and still used by their inliner?

I think you are right, so we can leave it only for top level function if we can ensure it's also fine for profile matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I just checked the profile matching, so we only check against FSFlattened (not FS) https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2433
The reason is that there could be function that all the call are inlined, so there is no top-level version, but it's still can be checksum mismatched, we should make it possible to run stale matching for those functions.

Given this, we probably needs the filtering for inlined samples.

@@ -1024,6 +1058,7 @@ void CSProfileGenerator::postProcessProfiles() {
CSConverter.convertCSProfiles();
FunctionSamples::ProfileIsCS = false;
}
filterAmbiguousProfile(ProfileMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have it both here and in postProcessProfiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postProcessProfiles is not a shared function for non-cs and cs, i,e, we have CSProfileGenerator::postProcessProfiles() and ProfileGenerator::postProcessProfiles() .

@wlei-llvm wlei-llvm force-pushed the filter-ambiguous-profile branch from d924501 to 5ff2685 Compare February 15, 2024 01:33
Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the profile is generated either from one version or the merged version, which is ambiguous.

Please make the change description clearer. A few things to clarify in the change description: 1) what are the profiles we can't differentiate, 2) why we have these different functions and profiles. 3) why we can't attribute profiles accurately.

@@ -491,6 +518,7 @@ void ProfileGenerator::generateProfile() {
void ProfileGenerator::postProcessProfiles() {
computeSummaryAndThreshold(ProfileMap);
trimColdProfiles(ProfileMap, ColdCountThreshold);
filterAmbiguousProfile(ProfileMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment why we need to filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments to the function.

@wlei-llvm
Copy link
Contributor Author

the profile is generated either from one version or the merged version, which is ambiguous.

Please make the change description clearer. A few things to clarify in the change description: 1) what are the profiles we can't differentiate, 2) why we have these different functions and profiles. 3) why we can't attribute profiles accurately.

Updated the description.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed may worth trying to see if we can filter it earlier, if not, the current change looks good.

@wlei-llvm
Copy link
Contributor Author

As we discussed may worth trying to see if we can filter it earlier, if not, the current change looks good.

Thanks for the review. I checked that it's probably not easy to do it earlier. There are two places I can think in earlier phrase, both have diverged code to support.

  1. Parsing symbol time in profiled binary, it's diverged for pseudo-probe and non-probe path:
    For non-probe path, we needs to filter the symbol from debug info symbol.
    For pseudo-probe path. we need to filter the symbol decoded from pseudo-probe sections, e.g. in buildGUID2FuncDescMap (https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCPseudoProbe.cpp#L379) filter the entry and then as the guid for those filtered function can still be decoded, we also have to either filter those guid or support to make sure all the guid lookup like getProbeFNameForGUID not break.

  2. In the middle of profile generation, it's also diverged for pseudo-probe and non-probe path:
    For non-probe path, it uses getTopLevelFunctionProfile to create a FunctionSamples(https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/ProfileGenerator.cpp#L469)
    For probe path, it uses getFunctionProfileForLeafProbe (https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/ProfileGenerator.cpp#L1264)
    And bothe of function's return type is "FunctionSamples &" not pointer, so there needs some downstream support for this.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into alternatives. Actually how about we add another function FunctionSamples::RemoveCallSamples to do this? We already have removeCalledTargetAndBodySample.

@wlei-llvm
Copy link
Contributor Author

Thanks for looking into alternatives. Actually how about we add another function FunctionSamples::RemoveCallSamples to do this? We already have removeCalledTargetAndBodySample.

It'd probably also be hacky if we use an internal RemoveCallSamples,

  1. The current filter is to match a prefix not a exact callsite. the input parameter is a prefix/ or a list of prefix..

  2. Or we can do a round of lookup for CallSamples to generate a list of ToBeRemovedFS Pointer... then pass them to an internal RemoveCallSamplesPointer ... which looks too complicated..

@WenleiHe
Copy link
Member

Thanks for looking into alternatives. Actually how about we add another function FunctionSamples::RemoveCallSamples to do this? We already have removeCalledTargetAndBodySample.

It'd probably also be hacky if we use an internal RemoveCallSamples,

  1. The current filter is to match a prefix not a exact callsite. the input parameter is a prefix/ or a list of prefix..
  2. Or we can do a round of lookup for CallSamples to generate a list of ToBeRemovedFS Pointer... then pass them to an internal RemoveCallSamplesPointer ... which looks too complicated..

Ok, let's just use the current for now then.

@wlei-llvm wlei-llvm merged commit 24f0251 into llvm:main Feb 16, 2024
@wlei-llvm wlei-llvm deleted the filter-ambiguous-profile branch March 18, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants