Skip to content

Commit 5740bb8

Browse files
committed
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205
1 parent ea15b86 commit 5740bb8

30 files changed

+730
-206
lines changed

llvm/include/llvm/ProfileData/SampleProf.h

Lines changed: 60 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,8 @@ enum class SecProfSummaryFlags : uint32_t {
206206
enum class SecFuncMetadataFlags : uint32_t {
207207
SecFlagInvalid = 0,
208208
SecFlagIsProbeBased = (1 << 0),
209-
SecFlagHasAttribute = (1 << 1)
209+
SecFlagHasAttribute = (1 << 1),
210+
SecFlagIsCSNested = (1 << 2),
210211
};
211212

212213
enum class SecFuncOffsetFlags : uint32_t {
@@ -591,11 +592,11 @@ class SampleContext {
591592
: hash_value(getName());
592593
}
593594

594-
/// Set the name of the function.
595+
/// Set the name of the function and clear the current context.
595596
void setName(StringRef FunctionName) {
596-
assert(FullContext.empty() &&
597-
"setName should only be called for non-CS profile");
598597
Name = FunctionName;
598+
FullContext = SampleContextFrames();
599+
State = UnknownContext;
599600
}
600601

601602
void setContext(SampleContextFrames Context,
@@ -745,6 +746,16 @@ class FunctionSamples {
745746
}
746747
}
747748

749+
// Set current context and all callee contexts to be synthetic.
750+
void SetContextSynthetic() {
751+
Context.setState(SyntheticContext);
752+
for (auto &I : CallsiteSamples) {
753+
for (auto &CS : I.second) {
754+
CS.second.SetContextSynthetic();
755+
}
756+
}
757+
}
758+
748759
/// Return the number of samples collected at the given location.
749760
/// Each location is specified by \p LineOffset and \p Discriminator.
750761
/// If the location is not found in profile, return error.
@@ -816,7 +827,7 @@ class FunctionSamples {
816827
/// Return the sample count of the first instruction of the function.
817828
/// The function can be either a standalone symbol or an inlined function.
818829
uint64_t getEntrySamples() const {
819-
if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
830+
if (FunctionSamples::ProfileIsCSFlat && getHeadSamples()) {
820831
// For CS profile, if we already have more accurate head samples
821832
// counted by branch sample from caller, use them as entry samples.
822833
return getHeadSamples();
@@ -1008,7 +1019,13 @@ class FunctionSamples {
10081019
/// instruction. This is wrapper of two scenarios, the probe-based profile and
10091020
/// regular profile, to hide implementation details from the sample loader and
10101021
/// the context tracker.
1011-
static LineLocation getCallSiteIdentifier(const DILocation *DIL);
1022+
static LineLocation getCallSiteIdentifier(const DILocation *DIL,
1023+
bool ProfileIsFS = false);
1024+
1025+
/// Returns a unique hash code for a combination of a callsite location and
1026+
/// the callee function name.
1027+
static uint64_t getCallSiteHash(StringRef CalleeName,
1028+
const LineLocation &Callsite);
10121029

10131030
/// Get the FunctionSamples of the inline instance where DIL originates
10141031
/// from.
@@ -1027,7 +1044,9 @@ class FunctionSamples {
10271044

10281045
static bool ProfileIsProbeBased;
10291046

1030-
static bool ProfileIsCS;
1047+
static bool ProfileIsCSFlat;
1048+
1049+
static bool ProfileIsCSNested;
10311050

10321051
SampleContext &getContext() const { return Context; }
10331052

@@ -1161,6 +1180,40 @@ class SampleContextTrimmer {
11611180
SampleProfileMap &ProfileMap;
11621181
};
11631182

1183+
// CSProfileConverter converts a full context-sensitive flat sample profile into
1184+
// a nested context-sensitive sample profile.
1185+
class CSProfileConverter {
1186+
public:
1187+
CSProfileConverter(SampleProfileMap &Profiles);
1188+
void convertProfiles();
1189+
struct FrameNode {
1190+
FrameNode(StringRef FName = StringRef(),
1191+
FunctionSamples *FSamples = nullptr,
1192+
LineLocation CallLoc = {0, 0})
1193+
: FuncName(FName), FuncSamples(FSamples), CallSiteLoc(CallLoc){};
1194+
1195+
// Map line+discriminator location to child frame
1196+
std::map<uint64_t, FrameNode> AllChildFrames;
1197+
// Function name for current frame
1198+
StringRef FuncName;
1199+
// Function Samples for current frame
1200+
FunctionSamples *FuncSamples;
1201+
// Callsite location in parent context
1202+
LineLocation CallSiteLoc;
1203+
1204+
FrameNode *getOrCreateChildFrame(const LineLocation &CallSite,
1205+
StringRef CalleeName);
1206+
};
1207+
1208+
private:
1209+
// Nest all children profiles into the profile of Node.
1210+
void convertProfiles(FrameNode &Node);
1211+
FrameNode *getOrCreateContextPath(const SampleContext &Context);
1212+
1213+
SampleProfileMap &ProfileMap;
1214+
FrameNode RootFrame;
1215+
};
1216+
11641217
/// ProfileSymbolList records the list of function symbols shown up
11651218
/// in the binary used to generate the profile. It is useful to
11661219
/// to discriminate a function being so cold as not to shown up

llvm/include/llvm/ProfileData/SampleProfReader.h

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -473,8 +473,11 @@ class SampleProfileReader {
473473
/// Whether input profile is based on pseudo probes.
474474
bool profileIsProbeBased() const { return ProfileIsProbeBased; }
475475

476-
/// Whether input profile is fully context-sensitive
477-
bool profileIsCS() const { return ProfileIsCS; }
476+
/// Whether input profile is fully context-sensitive and flat.
477+
bool profileIsCSFlat() const { return ProfileIsCSFlat; }
478+
479+
/// Whether input profile is fully context-sensitive and nested.
480+
bool profileIsCSNested() const { return ProfileIsCSNested; }
478481

479482
virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
480483
return nullptr;
@@ -533,8 +536,11 @@ class SampleProfileReader {
533536
/// \brief Whether samples are collected based on pseudo probes.
534537
bool ProfileIsProbeBased = false;
535538

536-
/// Whether function profiles are context-sensitive.
537-
bool ProfileIsCS = false;
539+
/// Whether function profiles are context-sensitive flat profiles.
540+
bool ProfileIsCSFlat = false;
541+
542+
/// Whether function profiles are context-sensitive nested profiles.
543+
bool ProfileIsCSNested = false;
538544

539545
/// Number of context-sensitive profiles.
540546
uint32_t CSProfileCount = 0;
@@ -698,6 +704,8 @@ class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
698704
std::error_code readSecHdrTable();
699705

700706
std::error_code readFuncMetadata(bool ProfileHasAttribute);
707+
std::error_code readFuncMetadata(bool ProfileHasAttribute,
708+
FunctionSamples *FProfile);
701709
std::error_code readFuncOffsetTable();
702710
std::error_code readFuncProfiles();
703711
std::error_code readMD5NameTable();

llvm/include/llvm/ProfileData/SampleProfWriter.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,7 @@ class SampleProfileWriterExtBinaryBase : public SampleProfileWriterBinary {
269269
std::error_code writeCSNameTableSection();
270270

271271
std::error_code writeFuncMetadata(const SampleProfileMap &Profiles);
272+
std::error_code writeFuncMetadata(const FunctionSamples &Profile);
272273

273274
// Functions to write various kinds of sections.
274275
std::error_code writeNameTableSection(const SampleProfileMap &ProfileMap);

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ class ProfiledCallGraph {
6868

6969
// Constructor for non-CS profile.
7070
ProfiledCallGraph(SampleProfileMap &ProfileMap) {
71-
assert(!FunctionSamples::ProfileIsCS && "CS profile is not handled here");
71+
assert(!FunctionSamples::ProfileIsCSFlat &&
72+
"CS flat profile is not handled here");
7273
for (const auto &Samples : ProfileMap) {
7374
addProfiledCalls(Samples.second);
7475
}

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,6 @@ class ContextTrieNode {
6666
void dumpTree();
6767

6868
private:
69-
static uint64_t nodeHash(StringRef ChildName, const LineLocation &Callsite);
70-
7169
// Map line+discriminator location to child context
7270
std::map<uint64_t, ContextTrieNode> AllChildContext;
7371

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ SampleProfileSummaryBuilder::computeSummaryForProfiles(
194194
// more function profiles each with lower counts, which in turn leads to lower
195195
// hot thresholds. To compensate for that, by default we merge context
196196
// profiles before computing profile summary.
197-
if (UseContextLessSummary || (sampleprof::FunctionSamples::ProfileIsCS &&
197+
if (UseContextLessSummary || (sampleprof::FunctionSamples::ProfileIsCSFlat &&
198198
!UseContextLessSummary.getNumOccurrences())) {
199199
for (const auto &I : Profiles) {
200200
ContextLessProfiles[I.second.getName()].merge(I.second);

llvm/lib/ProfileData/SampleProf.cpp

Lines changed: 106 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,18 @@ static cl::opt<uint64_t> ProfileSymbolListCutOff(
3535
cl::desc("Cutoff value about how many symbols in profile symbol list "
3636
"will be used. This is very useful for performance debugging"));
3737

38+
cl::opt<bool> GenerateMergedBaseProfiles(
39+
"generate-merged-base-profiles", cl::init(true), cl::ZeroOrMore,
40+
cl::desc("When generating nested context-sensitive profiles, always "
41+
"generate extra base profile for function with all its context "
42+
"profiles merged into it."));
43+
3844
namespace llvm {
3945
namespace sampleprof {
4046
SampleProfileFormat FunctionSamples::Format;
4147
bool FunctionSamples::ProfileIsProbeBased = false;
42-
bool FunctionSamples::ProfileIsCS = false;
48+
bool FunctionSamples::ProfileIsCSFlat = false;
49+
bool FunctionSamples::ProfileIsCSNested = false;
4350
bool FunctionSamples::UseMD5 = false;
4451
bool FunctionSamples::HasUniqSuffix = true;
4552
bool FunctionSamples::ProfileIsFS = false;
@@ -218,18 +225,29 @@ unsigned FunctionSamples::getOffset(const DILocation *DIL) {
218225
0xffff;
219226
}
220227

221-
LineLocation FunctionSamples::getCallSiteIdentifier(const DILocation *DIL) {
222-
if (FunctionSamples::ProfileIsProbeBased)
228+
LineLocation FunctionSamples::getCallSiteIdentifier(const DILocation *DIL,
229+
bool ProfileIsFS) {
230+
if (FunctionSamples::ProfileIsProbeBased) {
223231
// In a pseudo-probe based profile, a callsite is simply represented by the
224232
// ID of the probe associated with the call instruction. The probe ID is
225233
// encoded in the Discriminator field of the call instruction's debug
226234
// metadata.
227235
return LineLocation(PseudoProbeDwarfDiscriminator::extractProbeIndex(
228236
DIL->getDiscriminator()),
229237
0);
230-
else
231-
return LineLocation(FunctionSamples::getOffset(DIL),
232-
DIL->getBaseDiscriminator());
238+
} else {
239+
unsigned Discriminator =
240+
ProfileIsFS ? DIL->getDiscriminator() : DIL->getBaseDiscriminator();
241+
return LineLocation(FunctionSamples::getOffset(DIL), Discriminator);
242+
}
243+
}
244+
245+
uint64_t FunctionSamples::getCallSiteHash(StringRef CalleeName,
246+
const LineLocation &Callsite) {
247+
uint64_t NameHash = std::hash<std::string>{}(CalleeName.str());
248+
uint64_t LocId =
249+
(((uint64_t)Callsite.LineOffset) << 32) | Callsite.Discriminator;
250+
return NameHash + (LocId << 5) + LocId;
233251
}
234252

235253
const FunctionSamples *FunctionSamples::findFunctionSamples(
@@ -239,21 +257,16 @@ const FunctionSamples *FunctionSamples::findFunctionSamples(
239257

240258
const DILocation *PrevDIL = DIL;
241259
for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
242-
unsigned Discriminator;
243-
if (ProfileIsFS)
244-
Discriminator = DIL->getDiscriminator();
245-
else
246-
Discriminator = DIL->getBaseDiscriminator();
247-
248260
// Use C++ linkage name if possible.
249261
StringRef Name = PrevDIL->getScope()->getSubprogram()->getLinkageName();
250262
if (Name.empty())
251263
Name = PrevDIL->getScope()->getSubprogram()->getName();
252-
253-
S.push_back(
254-
std::make_pair(LineLocation(getOffset(DIL), Discriminator), Name));
264+
S.emplace_back(FunctionSamples::getCallSiteIdentifier(
265+
DIL, FunctionSamples::ProfileIsFS),
266+
Name);
255267
PrevDIL = DIL;
256268
}
269+
257270
if (S.size() == 0)
258271
return this;
259272
const FunctionSamples *FS = this;
@@ -454,3 +467,81 @@ void ProfileSymbolList::dump(raw_ostream &OS) const {
454467
for (auto &Sym : SortedList)
455468
OS << Sym << "\n";
456469
}
470+
471+
CSProfileConverter::FrameNode *
472+
CSProfileConverter::FrameNode::getOrCreateChildFrame(
473+
const LineLocation &CallSite, StringRef CalleeName) {
474+
uint64_t Hash = FunctionSamples::getCallSiteHash(CalleeName, CallSite);
475+
auto It = AllChildFrames.find(Hash);
476+
if (It != AllChildFrames.end()) {
477+
assert(It->second.FuncName == CalleeName &&
478+
"Hash collision for child context node");
479+
return &It->second;
480+
}
481+
482+
AllChildFrames[Hash] = FrameNode(CalleeName, nullptr, CallSite);
483+
return &AllChildFrames[Hash];
484+
}
485+
486+
CSProfileConverter::CSProfileConverter(SampleProfileMap &Profiles)
487+
: ProfileMap(Profiles) {
488+
for (auto &FuncSample : Profiles) {
489+
FunctionSamples *FSamples = &FuncSample.second;
490+
auto *NewNode = getOrCreateContextPath(FSamples->getContext());
491+
assert(!NewNode->FuncSamples && "New node cannot have sample profile");
492+
NewNode->FuncSamples = FSamples;
493+
}
494+
}
495+
496+
CSProfileConverter::FrameNode *
497+
CSProfileConverter::getOrCreateContextPath(const SampleContext &Context) {
498+
auto Node = &RootFrame;
499+
LineLocation CallSiteLoc(0, 0);
500+
for (auto &Callsite : Context.getContextFrames()) {
501+
Node = Node->getOrCreateChildFrame(CallSiteLoc, Callsite.FuncName);
502+
CallSiteLoc = Callsite.Location;
503+
}
504+
return Node;
505+
}
506+
507+
void CSProfileConverter::convertProfiles(CSProfileConverter::FrameNode &Node) {
508+
// Process each child profile. Add each child profile to callsite profile map
509+
// of the current node `Node` if `Node` comes with a profile. Otherwise
510+
// promote the child profile to a standalone profile.
511+
auto *NodeProfile = Node.FuncSamples;
512+
for (auto &It : Node.AllChildFrames) {
513+
auto &ChildNode = It.second;
514+
convertProfiles(ChildNode);
515+
auto *ChildProfile = ChildNode.FuncSamples;
516+
if (!ChildProfile)
517+
continue;
518+
SampleContext OrigChildContext = ChildProfile->getContext();
519+
// Reset the child context to be contextless.
520+
ChildProfile->getContext().setName(OrigChildContext.getName());
521+
if (NodeProfile) {
522+
// Add child profile to the callsite profile map.
523+
auto &SamplesMap = NodeProfile->functionSamplesAt(ChildNode.CallSiteLoc);
524+
SamplesMap.emplace(OrigChildContext.getName(), *ChildProfile);
525+
NodeProfile->addTotalSamples(ChildProfile->getTotalSamples());
526+
}
527+
528+
// Separate child profile to be a standalone profile, if the current parent
529+
// profile doesn't exist. This is a duplicating operation when the child
530+
// profile is already incorporated into the parent which is still useful and
531+
// thus done optionally. It is seen that duplicating context profiles into
532+
// base profiles improves the code quality for thinlto build by allowing a
533+
// profile in the prelink phase for to-be-fully-inlined functions.
534+
if (!NodeProfile || GenerateMergedBaseProfiles)
535+
ProfileMap[ChildProfile->getContext()].merge(*ChildProfile);
536+
537+
// Contexts coming with a `ContextShouldBeInlined` attribute indicate this
538+
// is a preinliner-computed profile.
539+
if (OrigChildContext.hasAttribute(ContextShouldBeInlined))
540+
FunctionSamples::ProfileIsCSNested = true;
541+
542+
// Remove the original child profile.
543+
ProfileMap.erase(OrigChildContext);
544+
}
545+
}
546+
547+
void CSProfileConverter::convertProfiles() { convertProfiles(RootFrame); }

0 commit comments

Comments
 (0)