Skip to content

Commit 7bd4f1a

Browse files
[memprof] Use BLAKE for FrameId (llvm#109501)
This patch uses a stronger hash function for FrameId. Without this patch, I've observed hash collisions in fairly common scenarios. Specifically, for a given GUID, I see three pairs of hash collisions in the two-dimensional range 1 <= LineOffset <= 70 and 1 <= Column <= 50, which may be a bit too frequent. With the new function, I don't see collisions at all even for a large profile. Impact on serialization/deserialization should be as follows: - Up to indexed memprof format Version 2, inclusive: The FrameIds computed with the new hash function will show up as part of the profile file. However, the deserializer only treats FrameIds as keys (but not hash values) to retrieve Frames from the on-disk hash table, so a change of the hash function shouldn't matter. - For indexed memprof format Version 3, FrameIds do not show up at all in the resulting profile file. FrameIds are only used to break ties when we sort Frames in writeMemProfFrameArray.
1 parent aeb18eb commit 7bd4f1a

File tree

1 file changed

+14
-17
lines changed

1 file changed

+14
-17
lines changed

llvm/include/llvm/ProfileData/MemProf.h

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,10 @@
77
#include "llvm/ADT/SmallVector.h"
88
#include "llvm/IR/GlobalValue.h"
99
#include "llvm/ProfileData/MemProfData.inc"
10+
#include "llvm/Support/BLAKE3.h"
1011
#include "llvm/Support/Endian.h"
1112
#include "llvm/Support/EndianStream.h"
13+
#include "llvm/Support/HashBuilder.h"
1214
#include "llvm/Support/raw_ostream.h"
1315

1416
#include <bitset>
@@ -308,24 +310,19 @@ struct Frame {
308310
<< " Inline: " << IsInlineFrame << "\n";
309311
}
310312

311-
// Return a hash value based on the contents of the frame. Here we don't use
312-
// hashing from llvm ADT since we are going to persist the hash id, the hash
313-
// combine algorithm in ADT uses a new randomized seed each time.
313+
// Return a hash value based on the contents of the frame. Here we use a
314+
// cryptographic hash function to minimize the chance of hash collisions. We
315+
// do persist FrameIds as part of memprof formats up to Version 2, inclusive.
316+
// However, the deserializer never calls this function; it uses FrameIds
317+
// merely as keys to look up Frames proper.
314318
inline FrameId hash() const {
315-
auto HashCombine = [](auto Value, size_t Seed) {
316-
std::hash<decltype(Value)> Hasher;
317-
// The constant used below is the 64 bit representation of the fractional
318-
// part of the golden ratio. Used here for the randomness in their bit
319-
// pattern.
320-
return Hasher(Value) + 0x9e3779b97f4a7c15 + (Seed << 6) + (Seed >> 2);
321-
};
322-
323-
size_t Result = 0;
324-
Result ^= HashCombine(Function, Result);
325-
Result ^= HashCombine(LineOffset, Result);
326-
Result ^= HashCombine(Column, Result);
327-
Result ^= HashCombine(IsInlineFrame, Result);
328-
return static_cast<FrameId>(Result);
319+
llvm::HashBuilder<llvm::TruncatedBLAKE3<8>, llvm::endianness::little>
320+
HashBuilder;
321+
HashBuilder.add(Function, LineOffset, Column, IsInlineFrame);
322+
llvm::BLAKE3Result<8> Hash = HashBuilder.final();
323+
FrameId Id;
324+
std::memcpy(&Id, Hash.data(), sizeof(Hash));
325+
return Id;
329326
}
330327
};
331328

0 commit comments

Comments
 (0)