Skip to content

Commit 7624de5

Browse files
committed
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740
1 parent 638865c commit 7624de5

File tree

14 files changed

+513
-145
lines changed

14 files changed

+513
-145
lines changed

llvm/include/llvm/ProfileData/SampleProf.h

Lines changed: 196 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,14 @@ struct LineLocationHash {
318318

319319
raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);
320320

321+
static inline uint64_t hashFuncName(StringRef F) {
322+
// If function name is already MD5 string, do not hash again.
323+
uint64_t Hash;
324+
if (F.getAsInteger(10, Hash))
325+
Hash = MD5Hash(F);
326+
return Hash;
327+
}
328+
321329
/// Representation of a single sample record.
322330
///
323331
/// A sample record is represented by a positive integer value, which
@@ -631,8 +639,12 @@ class SampleContext {
631639
}
632640

633641
uint64_t getHashCode() const {
634-
return hasContext() ? hash_value(getContextFrames())
635-
: hash_value(getName());
642+
if (hasContext())
643+
return hash_value(getContextFrames());
644+
645+
// For non-context function name, use its MD5 as hash value, so that it is
646+
// consistent with the profile map's key.
647+
return hashFuncName(getName());
636648
}
637649

638650
/// Set the name of the function and clear the current context.
@@ -710,9 +722,12 @@ class SampleContext {
710722
uint32_t Attributes;
711723
};
712724

713-
static inline hash_code hash_value(const SampleContext &arg) {
714-
return arg.hasContext() ? hash_value(arg.getContextFrames())
715-
: hash_value(arg.getName());
725+
static inline hash_code hash_value(const SampleContext &Context) {
726+
return Context.getHashCode();
727+
}
728+
729+
inline raw_ostream &operator<<(raw_ostream &OS, const SampleContext &Context) {
730+
return OS << Context.toString();
716731
}
717732

718733
class FunctionSamples;
@@ -1206,6 +1221,9 @@ class FunctionSamples {
12061221
return !(*this == Other);
12071222
}
12081223

1224+
template <typename T>
1225+
const T &getKey() const;
1226+
12091227
private:
12101228
/// CFG hash value for the function.
12111229
uint64_t FunctionHash = 0;
@@ -1269,12 +1287,176 @@ class FunctionSamples {
12691287
const LocToLocMap *IRToProfileLocationMap = nullptr;
12701288
};
12711289

1290+
template <>
1291+
inline const SampleContext &FunctionSamples::getKey<SampleContext>() const {
1292+
return getContext();
1293+
}
1294+
12721295
raw_ostream &operator<<(raw_ostream &OS, const FunctionSamples &FS);
12731296

1274-
using SampleProfileMap =
1275-
std::unordered_map<SampleContext, FunctionSamples, SampleContext::Hash>;
1297+
/// This class is a wrapper to associative container MapT<KeyT, ValueT> using
1298+
/// the hash value of the original key as the new key. This greatly improves the
1299+
/// performance of insert and query operations especially when hash values of
1300+
/// keys are available a priori, and reduces memory usage if KeyT has a large
1301+
/// size.
1302+
/// When performing any action, if an existing entry with a given key is found,
1303+
/// and the interface "KeyT ValueT::getKey<KeyT>() const" to retrieve a value's
1304+
/// original key exists, this class checks if the given key actually matches
1305+
/// the existing entry's original key. If they do not match, this class behaves
1306+
/// as if the entry did not exist (for insertion, this means the new value will
1307+
/// replace the existing entry's value, as if it is newly inserted). If
1308+
/// ValueT::getKey<KeyT>() is not available, all keys with the same hash value
1309+
/// are considered equivalent (i.e. hash collision is silently ignored). Given
1310+
/// such feature this class should only be used where it does not affect
1311+
/// compilation correctness, for example, when loading a sample profile.
1312+
/// Assuming the hashing algorithm is uniform, the probability of hash collision
1313+
/// with 1,000,000 entries is
1314+
/// (2^64)!/((2^64-1000000)!*(2^64)^1000000) ~= 3*10^-8.
1315+
template <template <typename, typename, typename...> typename MapT,
1316+
typename KeyT, typename ValueT, typename... MapTArgs>
1317+
class HashKeyMap : public MapT<hash_code, ValueT, MapTArgs...> {
1318+
public:
1319+
using base_type = MapT<hash_code, ValueT, MapTArgs...>;
1320+
using key_type = hash_code;
1321+
using original_key_type = KeyT;
1322+
using mapped_type = ValueT;
1323+
using value_type = typename base_type::value_type;
12761324

1277-
using NameFunctionSamples = std::pair<SampleContext, const FunctionSamples *>;
1325+
using iterator = typename base_type::iterator;
1326+
using const_iterator = typename base_type::const_iterator;
1327+
1328+
private:
1329+
// If the value type has getKey(), retrieve its original key for comparison.
1330+
template <typename U = mapped_type,
1331+
typename = decltype(U().template getKey<original_key_type>())>
1332+
static bool
1333+
CheckKeyMatch(const original_key_type &Key, const mapped_type &ExistingValue,
1334+
original_key_type *ExistingKeyIfDifferent = nullptr) {
1335+
const original_key_type &ExistingKey =
1336+
ExistingValue.template getKey<original_key_type>();
1337+
bool Result = (Key == ExistingKey);
1338+
if (!Result && ExistingKeyIfDifferent)
1339+
*ExistingKeyIfDifferent = ExistingKey;
1340+
return Result;
1341+
}
1342+
1343+
// If getKey() does not exist, this overload is selected, which assumes all
1344+
// keys with the same hash are equivalent.
1345+
static bool CheckKeyMatch(...) { return true; }
1346+
1347+
public:
1348+
template <typename... Ts>
1349+
std::pair<iterator, bool> try_emplace(const key_type &Hash,
1350+
const original_key_type &Key,
1351+
Ts &&...Args) {
1352+
assert(Hash == hash_value(Key));
1353+
auto Ret = base_type::try_emplace(Hash, std::forward<Ts>(Args)...);
1354+
if (!Ret.second) {
1355+
original_key_type ExistingKey;
1356+
if (LLVM_UNLIKELY(!CheckKeyMatch(Key, Ret.first->second, &ExistingKey))) {
1357+
dbgs() << "MD5 collision detected: " << Key << " and " << ExistingKey
1358+
<< " has same hash value " << Hash << "\n";
1359+
Ret.second = true;
1360+
Ret.first->second = mapped_type(std::forward<Ts>(Args)...);
1361+
}
1362+
}
1363+
return Ret;
1364+
}
1365+
1366+
template <typename... Ts>
1367+
std::pair<iterator, bool> try_emplace(const original_key_type &Key,
1368+
Ts &&...Args) {
1369+
key_type Hash = hash_value(Key);
1370+
return try_emplace(Hash, Key, std::forward<Ts>(Args)...);
1371+
}
1372+
1373+
template <typename... Ts> std::pair<iterator, bool> emplace(Ts &&...Args) {
1374+
return try_emplace(std::forward<Ts>(Args)...);
1375+
}
1376+
1377+
mapped_type &operator[](const original_key_type &Key) {
1378+
return try_emplace(Key, mapped_type()).first->second;
1379+
}
1380+
1381+
iterator find(const original_key_type &Key) {
1382+
key_type Hash = hash_value(Key);
1383+
auto It = base_type::find(Hash);
1384+
if (It != base_type::end())
1385+
if (LLVM_LIKELY(CheckKeyMatch(Key, It->second)))
1386+
return It;
1387+
return base_type::end();
1388+
}
1389+
1390+
const_iterator find(const original_key_type &Key) const {
1391+
key_type Hash = hash_value(Key);
1392+
auto It = base_type::find(Hash);
1393+
if (It != base_type::end())
1394+
if (LLVM_LIKELY(CheckKeyMatch(Key, It->second)))
1395+
return It;
1396+
return base_type::end();
1397+
}
1398+
1399+
size_t erase(const original_key_type &Ctx) {
1400+
auto It = find(Ctx);
1401+
if (It != base_type::end()) {
1402+
base_type::erase(It);
1403+
return 1;
1404+
}
1405+
return 0;
1406+
}
1407+
};
1408+
1409+
/// This class provides operator overloads to the map container using MD5 as the
1410+
/// key type, so that existing code can still work in most cases using
1411+
/// SampleContext as key.
1412+
/// Note: when populating container, make sure to assign the SampleContext to
1413+
/// the mapped value immediately because the key no longer holds it.
1414+
class SampleProfileMap
1415+
: public HashKeyMap<DenseMap, SampleContext, FunctionSamples> {
1416+
public:
1417+
// Convenience method because this is being used in many places. Set the
1418+
// FunctionSamples' context if its newly inserted.
1419+
mapped_type &Create(const SampleContext &Ctx) {
1420+
auto Ret = try_emplace(Ctx, FunctionSamples());
1421+
if (Ret.second)
1422+
Ret.first->second.setContext(Ctx);
1423+
return Ret.first->second;
1424+
}
1425+
1426+
iterator find(const SampleContext &Ctx) {
1427+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
1428+
Ctx);
1429+
}
1430+
1431+
const_iterator find(const SampleContext &Ctx) const {
1432+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
1433+
Ctx);
1434+
}
1435+
1436+
// Overloaded find() to lookup a function by name. This is called by IPO
1437+
// passes with an actual function name, and it is possible that the profile
1438+
// reader converted function names in the profile to MD5 strings, so we need
1439+
// to check if either representation matches.
1440+
iterator find(StringRef Fname) {
1441+
uint64_t Hash = hashFuncName(Fname);
1442+
auto It = base_type::find(hash_code(Hash));
1443+
if (It != end()) {
1444+
StringRef CtxName = It->second.getContext().getName();
1445+
if (LLVM_LIKELY(CtxName == Fname || CtxName == std::to_string(Hash)))
1446+
return It;
1447+
}
1448+
return end();
1449+
}
1450+
1451+
size_t erase(const SampleContext &Ctx) {
1452+
return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::erase(
1453+
Ctx);
1454+
}
1455+
1456+
size_t erase(const key_type &Key) { return base_type::erase(Key); }
1457+
};
1458+
1459+
using NameFunctionSamples = std::pair<hash_code, const FunctionSamples *>;
12781460

12791461
void sortFuncProfiles(const SampleProfileMap &ProfileMap,
12801462
std::vector<NameFunctionSamples> &SortedProfiles);
@@ -1320,8 +1502,6 @@ class SampleContextTrimmer {
13201502
bool MergeColdContext,
13211503
uint32_t ColdContextFrameLength,
13221504
bool TrimBaseProfileOnly);
1323-
// Canonicalize context profile name and attributes.
1324-
void canonicalizeContextProfiles();
13251505

13261506
private:
13271507
SampleProfileMap &ProfileMap;
@@ -1367,12 +1547,12 @@ class ProfileConverter {
13671547
SampleProfileMap &OutputProfiles,
13681548
bool ProfileIsCS = false) {
13691549
if (ProfileIsCS) {
1370-
for (const auto &I : InputProfiles)
1371-
OutputProfiles[I.second.getName()].merge(I.second);
1372-
// Retain the profile name and clear the full context for each function
1373-
// profile.
1374-
for (auto &I : OutputProfiles)
1375-
I.second.setContext(SampleContext(I.first));
1550+
for (const auto &I : InputProfiles) {
1551+
// Retain the profile name and clear the full context for each function
1552+
// profile.
1553+
FunctionSamples &FS = OutputProfiles.Create(I.second.getName());
1554+
FS.merge(I.second);
1555+
}
13761556
} else {
13771557
for (const auto &I : InputProfiles)
13781558
flattenNestedProfile(OutputProfiles, I.second);

llvm/include/llvm/ProfileData/SampleProfReader.h

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ class SampleProfileReader {
347347
public:
348348
SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
349349
SampleProfileFormat Format = SPF_None)
350-
: Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}
350+
: Profiles(), Ctx(C), Buffer(std::move(B)), Format(Format) {}
351351

352352
virtual ~SampleProfileReader() = default;
353353

@@ -383,8 +383,8 @@ class SampleProfileReader {
383383
/// The implementaion to read sample profiles from the associated file.
384384
virtual std::error_code readImpl() = 0;
385385

386-
/// Print the profile for \p FContext on stream \p OS.
387-
void dumpFunctionProfile(SampleContext FContext, raw_ostream &OS = dbgs());
386+
/// Print the profile for \p FunctionSamples on stream \p OS.
387+
void dumpFunctionProfile(const FunctionSamples &FS, raw_ostream &OS = dbgs());
388388

389389
/// Collect functions with definitions in Module M. For reader which
390390
/// support loading function profiles on demand, return true when the
@@ -408,9 +408,7 @@ class SampleProfileReader {
408408
}
409409

410410
/// Return the samples collected for function \p F.
411-
virtual FunctionSamples *getSamplesFor(StringRef Fname) {
412-
std::string FGUID;
413-
Fname = getRepInFormat(Fname, useMD5(), FGUID);
411+
FunctionSamples *getSamplesFor(StringRef Fname) {
414412
auto It = Profiles.find(Fname);
415413
if (It != Profiles.end())
416414
return &It->second;
@@ -638,15 +636,16 @@ class SampleProfileReaderBinary : public SampleProfileReader {
638636
/// Read the whole name table.
639637
std::error_code readNameTable();
640638

641-
/// Read a string indirectly via the name table.
642-
ErrorOr<StringRef> readStringFromTable();
639+
/// Read a string indirectly via the name table. Optionally return the index.
640+
ErrorOr<StringRef> readStringFromTable(size_t *RetIdx = nullptr);
643641

644-
/// Read a context indirectly via the CSNameTable.
645-
ErrorOr<SampleContextFrames> readContextFromTable();
642+
/// Read a context indirectly via the CSNameTable. Optionally return the
643+
/// index.
644+
ErrorOr<SampleContextFrames> readContextFromTable(size_t *RetIdx = nullptr);
646645

647646
/// Read a context indirectly via the CSNameTable if the profile has context,
648-
/// otherwise same as readStringFromTable.
649-
ErrorOr<SampleContext> readSampleContextFromTable();
647+
/// otherwise same as readStringFromTable, also return its hash value.
648+
ErrorOr<std::pair<SampleContext, uint64_t>> readSampleContextFromTable();
650649

651650
/// Points to the current location in the buffer.
652651
const uint8_t *Data = nullptr;
@@ -666,13 +665,24 @@ class SampleProfileReaderBinary : public SampleProfileReader {
666665
/// the lifetime of MD5StringBuf is not shorter than that of NameTable.
667666
std::vector<std::string> MD5StringBuf;
668667

669-
/// The starting address of NameTable containing fixed length MD5.
668+
/// The starting address of fixed length MD5 name table section.
670669
const uint8_t *MD5NameMemStart = nullptr;
671670

672671
/// CSNameTable is used to save full context vectors. It is the backing buffer
673672
/// for SampleContextFrames.
674673
std::vector<SampleContextFrameVector> CSNameTable;
675674

675+
/// Table to cache MD5 values of sample contexts corresponding to
676+
/// readSampleContextFromTable(), used to index into Profiles or
677+
/// FuncOffsetTable.
678+
std::vector<uint64_t> MD5SampleContextTable;
679+
680+
/// The starting address of the table of MD5 values of sample contexts. For
681+
/// fixed length MD5 non-CS profile it is same as MD5NameMemStart because
682+
/// hashes of non-CS contexts are already in the profile. Otherwise it points
683+
/// to the start of MD5SampleContextTable.
684+
const uint64_t *MD5SampleContextStart = nullptr;
685+
676686
private:
677687
std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
678688
virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
@@ -746,10 +756,10 @@ class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
746756

747757
std::unique_ptr<ProfileSymbolList> ProfSymList;
748758

749-
/// The table mapping from function context to the offset of its
759+
/// The table mapping from a function context's MD5 to the offset of its
750760
/// FunctionSample towards file start.
751761
/// At most one of FuncOffsetTable and FuncOffsetList is populated.
752-
DenseMap<SampleContext, uint64_t> FuncOffsetTable;
762+
DenseMap<hash_code, uint64_t> FuncOffsetTable;
753763

754764
/// The list version of FuncOffsetTable. This is used if every entry is
755765
/// being accessed.

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -204,9 +204,7 @@ SampleProfileSummaryBuilder::computeSummaryForProfiles(
204204
// profiles before computing profile summary.
205205
if (UseContextLessSummary || (sampleprof::FunctionSamples::ProfileIsCS &&
206206
!UseContextLessSummary.getNumOccurrences())) {
207-
for (const auto &I : Profiles) {
208-
ContextLessProfiles[I.second.getName()].merge(I.second);
209-
}
207+
ProfileConverter::flattenProfile(Profiles, ContextLessProfiles, true);
210208
ProfilesToUse = &ContextLessProfiles;
211209
}
212210

0 commit comments

Comments
 (0)