Skip to content

Commit 5b6f151

Browse files
authored
[SampleFDO] Improve stale profile matching by diff algorithm (#87375)
This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.
1 parent b342d18 commit 5b6f151

File tree

5 files changed

+457
-126
lines changed

5 files changed

+457
-126
lines changed

llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@
1919

2020
namespace llvm {
2121

22+
using AnchorList = std::vector<std::pair<LineLocation, FunctionId>>;
23+
using AnchorMap = std::map<LineLocation, FunctionId>;
24+
2225
// Sample profile matching - fuzzy match.
2326
class SampleProfileMatcher {
2427
Module &M;
@@ -27,8 +30,8 @@ class SampleProfileMatcher {
2730
const ThinOrFullLTOPhase LTOPhase;
2831
SampleProfileMap FlattenedProfiles;
2932
// For each function, the matcher generates a map, of which each entry is a
30-
// mapping from the source location of current build to the source location in
31-
// the profile.
33+
// mapping from the source location of current build to the source location
34+
// in the profile.
3235
StringMap<LocToLocMap> FuncMappings;
3336

3437
// Match state for an anchor/callsite.
@@ -95,18 +98,13 @@ class SampleProfileMatcher {
9598
return nullptr;
9699
}
97100
void runOnFunction(Function &F);
98-
void findIRAnchors(const Function &F,
99-
std::map<LineLocation, StringRef> &IRAnchors);
100-
void findProfileAnchors(
101-
const FunctionSamples &FS,
102-
std::map<LineLocation, std::unordered_set<FunctionId>> &ProfileAnchors);
101+
void findIRAnchors(const Function &F, AnchorMap &IRAnchors);
102+
void findProfileAnchors(const FunctionSamples &FS, AnchorMap &ProfileAnchors);
103103
// Record the callsite match states for profile staleness report, the result
104104
// is saved in FuncCallsiteMatchStates.
105-
void recordCallsiteMatchStates(
106-
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
107-
const std::map<LineLocation, std::unordered_set<FunctionId>>
108-
&ProfileAnchors,
109-
const LocToLocMap *IRToProfileLocationMap);
105+
void recordCallsiteMatchStates(const Function &F, const AnchorMap &IRAnchors,
106+
const AnchorMap &ProfileAnchors,
107+
const LocToLocMap *IRToProfileLocationMap);
110108

111109
bool isMismatchState(const enum MatchState &State) {
112110
return State == MatchState::InitialMismatch ||
@@ -143,11 +141,25 @@ class SampleProfileMatcher {
143141
}
144142
void distributeIRToProfileLocationMap();
145143
void distributeIRToProfileLocationMap(FunctionSamples &FS);
146-
void runStaleProfileMatching(
147-
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
148-
const std::map<LineLocation, std::unordered_set<FunctionId>>
149-
&ProfileAnchors,
150-
LocToLocMap &IRToProfileLocationMap);
144+
// This function implements the Myers diff algorithm used for stale profile
145+
// matching. The algorithm provides a simple and efficient way to find the
146+
// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
147+
// sequences. For more details, refer to the paper 'An O(ND) Difference
148+
// Algorithm and Its Variations' by Eugene W. Myers.
149+
// In the scenario of profile fuzzy matching, the two sequences are the IR
150+
// callsite anchors and profile callsite anchors. The subsequence equivalent
151+
// parts from the resulting SES are used to remap the IR locations to the
152+
// profile locations. As the number of function callsite is usually not big,
153+
// we currently just implements the basic greedy version(page 6 of the paper).
154+
LocToLocMap
155+
longestCommonSequence(const AnchorList &IRCallsiteAnchors,
156+
const AnchorList &ProfileCallsiteAnchors) const;
157+
void matchNonCallsiteLocs(const LocToLocMap &AnchorMatchings,
158+
const AnchorMap &IRAnchors,
159+
LocToLocMap &IRToProfileLocationMap);
160+
void runStaleProfileMatching(const Function &F, const AnchorMap &IRAnchors,
161+
const AnchorMap &ProfileAnchors,
162+
LocToLocMap &IRToProfileLocationMap);
151163
void reportOrPersistProfileStats();
152164
};
153165
} // end namespace llvm

0 commit comments

Comments
 (0)