-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm-profgen] Filter out ambiguous cold profiles during profile generation #81803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-pgo Author: Lei Wang (wlei-llvm) ChangesFor the local pre-defined initialization function("__cxx_global_var_init", "__tls_init" prefix), the profile is generated either from one version or the merged version, which is ambiguous. We can't attribute the profile for each version accurately, and it could mislead the profile matching, report a lot of false positives, so filter them out from the profile map during the profile generation time. The profiles are all cold functions, it won't have perf impact. Full diff: https://github.com/llvm/llvm-project/pull/81803.diff 6 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/SampleProf.h b/llvm/include/llvm/ProfileData/SampleProf.h
index 8ac84d4b933f20..5530abe37fc2a9 100644
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@@ -466,7 +466,7 @@ struct SampleContextFrame {
LineLocation Location;
SampleContextFrame() : Location(0, 0) {}
-
+
SampleContextFrame(FunctionId Func, LineLocation Location)
: Func(Func), Location(Location) {}
@@ -527,7 +527,7 @@ class SampleContext {
: Func(Name), State(UnknownContext), Attributes(ContextNone) {
assert(!Name.empty() && "Name is empty");
}
-
+
SampleContext(FunctionId Func)
: Func(Func), State(UnknownContext), Attributes(ContextNone) {}
@@ -975,6 +975,8 @@ class FunctionSamples {
return CallsiteSamples;
}
+ CallsiteSampleMap &getCallsiteSamples() { return CallsiteSamples; }
+
/// Return the maximum of sample counts in a function body. When SkipCallSite
/// is false, which is the default, the return count includes samples in the
/// inlined functions. When SkipCallSite is true, the return count only
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 2fd8668d15e200..5d75f52f985161 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -1129,7 +1129,7 @@ void SampleProfileLoader::findExternalInlineCandidate(
CalleeSample->getContext().hasAttribute(ContextShouldBeInlined);
if (!PreInline && CalleeSample->getHeadSamplesEstimate() < Threshold)
continue;
-
+
Function *Func = SymbolMap.lookup(CalleeSample->getFunction());
// Add to the import list only when it's defined out of module.
if (!Func || Func->isDeclaration())
diff --git a/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof b/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof
new file mode 100644
index 00000000000000..a63d151e247c75
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/filter-ambiguous-profile.prof
@@ -0,0 +1,16 @@
+foo:12345:1000
+ 1: 1000
+ 4: bar:1000
+ 1: 1000
+ 2: __tls_init.llvm.123:1
+ 1: 1
+ 3: goo:300
+ 1: 300
+ 8: __cxx_global_var_init.4:4
+ 1: 1
+ 2: goo:3
+ 1: 3
+__cxx_global_var_init.1:1:1
+ 1: 1
+__tls_init.llvm.345:1:1
+ 1: 1
diff --git a/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test b/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test
new file mode 100644
index 00000000000000..3a264e3b1108bf
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/filter-ambiguous-profile.test
@@ -0,0 +1,8 @@
+; RUN: llvm-profgen --format=text --llvm-sample-profile=%S/Inputs/filter-ambiguous-profile.prof --binary=%S/Inputs/inline-cs-noprobe.perfbin --csspgo-preinliner=0 --output=%t1 || FileCheck %s --input-file %t1
+
+;CHECK: foo:12345:1000
+;CHECK-NEXT 1: 1000
+;CHECK-NEXT 4: bar:1000
+;CHECK-NEXT 1: 1000
+;CHECK-NEXT 3: goo:300
+;CHECK-NEXT 1: 300
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index c4028e6b132871..86b0bb4749e79f 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -196,6 +196,39 @@ void ProfileGeneratorBase::showDensitySuggestion(double Density) {
<< "% total samples: " << format("%.1f", Density) << "\n";
}
+// Those functions are usually cold but could have multiple versions, the
+// profile are either from one version or the merged version, which is
+// ambiguous. We can't attribute the profile for each version accurately, so
+// filter them out from the profile map.
+static constexpr const char *FuncPrefixsToFilter[] = {"__cxx_global_var_init",
+ "__tls_init"};
+
+bool ProfileGeneratorBase::filterAmbiguousProfile(FunctionSamples &FS) {
+ for (const auto &Prefix : FuncPrefixsToFilter) {
+ if (FS.getFuncName().starts_with(Prefix))
+ return true;
+ }
+
+ // Filter the function profiles for the inlinees.
+ for (auto &Callees : FS.getCallsiteSamples()) {
+ auto &CalleesMap = Callees.second;
+ for (auto I = CalleesMap.begin(); I != CalleesMap.end();) {
+ auto Tmp = I++;
+ if (filterAmbiguousProfile(Tmp->second))
+ CalleesMap.erase(Tmp);
+ }
+ }
+ return false;
+}
+
+void ProfileGeneratorBase::filterAmbiguousProfile(SampleProfileMap &Profiles) {
+ for (auto I = ProfileMap.begin(); I != ProfileMap.end();) {
+ auto Tmp = I++;
+ if (filterAmbiguousProfile(Tmp->second))
+ ProfileMap.erase(Tmp);
+ }
+}
+
double ProfileGeneratorBase::calculateDensity(const SampleProfileMap &Profiles,
uint64_t HotCntThreshold) {
double Density = DBL_MAX;
@@ -492,6 +525,7 @@ void ProfileGenerator::postProcessProfiles() {
computeSummaryAndThreshold(ProfileMap);
trimColdProfiles(ProfileMap, ColdCountThreshold);
calculateAndShowDensity(ProfileMap);
+ filterAmbiguousProfile(ProfileMap);
}
void ProfileGenerator::trimColdProfiles(const SampleProfileMap &Profiles,
@@ -1024,6 +1058,7 @@ void CSProfileGenerator::postProcessProfiles() {
CSConverter.convertCSProfiles();
FunctionSamples::ProfileIsCS = false;
}
+ filterAmbiguousProfile(ProfileMap);
}
void ProfileGeneratorBase::computeSummaryAndThreshold(
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.h b/llvm/tools/llvm-profgen/ProfileGenerator.h
index 88cf2dc0d49f37..3970876c5bcfb7 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.h
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.h
@@ -108,6 +108,10 @@ class ProfileGeneratorBase {
void updateCallsiteSamples();
+ void filterAmbiguousProfile(SampleProfileMap &Profiles);
+
+ bool filterAmbiguousProfile(FunctionSamples &FS);
+
StringRef getCalleeNameForAddress(uint64_t TargetAddress);
void computeSummaryAndThreshold(SampleProfileMap &ProfileMap);
|
@@ -975,6 +975,8 @@ class FunctionSamples { | |||
return CallsiteSamples; | |||
} | |||
|
|||
CallsiteSampleMap &getCallsiteSamples() { return CallsiteSamples; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we intentionally not exposing this as non-const (it was added once, and later cleaned up and removed). In generally we should be careful about exposing non-const accessor to private members.
OTOH, I'm wondering do we actually need to filter call site (inlined) samples? I think only top level functions are indistinguishable, and inlinee's samples can be distinguished and still used by their inliner?
Also __cxx_global_var_init can inlined only into tls_init, but tls_ininit probably will never be inlined anywhere?
If we really need to do this, it's probably better to use the const accessor and then cast away the constness, rather than adding an non-const API which is essentially saying it's expected to be modified in general and it's likely subject to misuse in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we intentionally removed this.. I got it, will remove this.
OTOH, I'm wondering do we actually need to filter call site (inlined) samples?
Also __cxx_global_var_init can inlined only into tls_init, but tls_ininit probably will never be inlined anywhere?
I do see tls_init
appearing in the CallsiteSamples
, presumably it can be inlined.
Another thing which is specific to profile matching, we use the flattened profile for profile matching, so the inlinee's samples will be merged into top-level function, then used as the top-level function in profile matching(causing the issues we hit)
I think only top level functions are indistinguishable, and inlinee's samples can be distinguished and still used by their inliner?
I think you are right, so we can leave it only for top level function if we can ensure it's also fine for profile matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just checked the profile matching, so we only check against FSFlattened
(not FS) https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2433
The reason is that there could be function that all the call are inlined, so there is no top-level version, but it's still can be checksum mismatched, we should make it possible to run stale matching for those functions.
Given this, we probably needs the filtering for inlined samples.
@@ -1024,6 +1058,7 @@ void CSProfileGenerator::postProcessProfiles() { | |||
CSConverter.convertCSProfiles(); | |||
FunctionSamples::ProfileIsCS = false; | |||
} | |||
filterAmbiguousProfile(ProfileMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have it both here and in postProcessProfiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The postProcessProfiles
is not a shared function for non-cs and cs, i,e, we have CSProfileGenerator::postProcessProfiles()
and ProfileGenerator::postProcessProfiles()
.
d924501
to
5ff2685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the profile is generated either from one version or the merged version, which is ambiguous.
Please make the change description clearer. A few things to clarify in the change description: 1) what are the profiles we can't differentiate, 2) why we have these different functions and profiles. 3) why we can't attribute profiles accurately.
@@ -491,6 +518,7 @@ void ProfileGenerator::generateProfile() { | |||
void ProfileGenerator::postProcessProfiles() { | |||
computeSummaryAndThreshold(ProfileMap); | |||
trimColdProfiles(ProfileMap, ColdCountThreshold); | |||
filterAmbiguousProfile(ProfileMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment why we need to filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments to the function.
Updated the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed may worth trying to see if we can filter it earlier, if not, the current change looks good.
Thanks for the review. I checked that it's probably not easy to do it earlier. There are two places I can think in earlier phrase, both have diverged code to support.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into alternatives. Actually how about we add another function FunctionSamples::RemoveCallSamples
to do this? We already have removeCalledTargetAndBodySample
.
It'd probably also be hacky if we use an internal
|
Ok, let's just use the current for now then. |
For the built-in local initialization function(
__cxx_global_var_init
,__tls_init
prefix), there could be multiple versions of the functions in the final binary, e.g.__cxx_global_var_init
, which is a wrapper of global variable ctors, the compiler could assign suffixes like__cxx_global_var_init.N
for different ctors.However, in the profile generation, we call
getCanonicalFnName
to canonicalize the names which strip the suffixes. Therefore, samples from different functions queries the same profile(only__cxx_global_var_init
) and the counts are merged. As the functions are essentially different, entries of the merged profile are ambiguous. In sample loading, for each version of this function, the IR from one version would be attributed towards a merged entries, which is inaccurate, especially for fuzzy profile matching, it gets multiple callsites(from different function) but using to match one callsite, which mislead the matching and report a lot of false positives.Hence, we want to filter them out from the profile map during the profile generation time. The profiles are all cold functions, it won't have perf impact.