-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs #115807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs #115807
Conversation
Undrifting the MemProf profile requires two sets of information: - caller-callee pairs from the profile - callee-callee pairs from the IR This patch adds a function to do the former. The latter has been addressed by extractCallsFromIR. Unfortunately, the current MemProf format does not directly give us the caller-callee pairs from the profile. "struct Frame" just tells us where the call site is -- Caller GUID and line/column numbers; it doesn't tell us what function a given Frame is calling. To extract caller-callee pairs, we need to scan each call stack, look at two adjacent Frames, and extract a caller-callee pair. Conceptually, we would extract caller-callee pairs with: for each MemProfRecord in the profile: for each call stack in AllocSites: extract caller-callee pairs from adjacent pairs of Frames However, this is highly inefficient. Obtaining MemProfRecord involves looking up the OnDiskHashTable, allocating several vectors on the heap, and populating fields that are irrelevant to us, such as MIB and CallSites. This patch adds an efficient way of doing the above. Specifically, we - go though all IndexedMemProfRecords, - look at each linear call stack ID - extract caller-callee pairs from each call stack The extraction is done by a new class CallerCalleePairExtractor, modified from LinearCallStackIdConverter, which reconstructs a call stack from the radix tree array. For our purposes, we skip the reconstruction and immediately populates the data structure for caller-callee pairs. The resulting caller-callee-pairs is of the type: DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs; which can be passed directly to longestCommonSequence just like the result of extractCallsFromIR. Further performance optimizations are possible for the new functions in this patch. I'll address those in follow-up patches.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-pgo Author: Kazu Hirata (kazutakahirata) ChangesUndrifting the MemProf profile requires two sets of information:
This patch adds a function to do the former. The latter has been Unfortunately, the current MemProf format does not directly give us Conceptually, we would extract caller-callee pairs with: for each MemProfRecord in the profile: However, this is highly inefficient. Obtaining MemProfRecord involves This patch adds an efficient way of doing the above. Specifically, we
The extraction is done by a new class CallerCalleePairExtractor, The resulting caller-callee-pairs is of the type: DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs; which can be passed directly to longestCommonSequence just like the Further performance optimizations are possible for the new functions Full diff: https://github.com/llvm/llvm-project/pull/115807.diff 5 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index 6be3fad41824a9..42414bc193bc84 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -695,6 +695,9 @@ class IndexedMemProfReader {
Expected<memprof::MemProfRecord>
getMemProfRecord(const uint64_t FuncNameHash) const;
+
+ DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+ getMemProfCallerCalleePairs() const;
};
/// Reader for the indexed binary instrprof format.
@@ -793,6 +796,11 @@ class IndexedInstrProfReader : public InstrProfReader {
return MemProfReader.getMemProfRecord(FuncNameHash);
}
+ DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+ getMemProfCallerCalleePairs() {
+ return MemProfReader.getMemProfCallerCalleePairs();
+ }
+
/// Fill Counts with the profile data for the given function name.
Error getFunctionCounts(StringRef FuncName, uint64_t FuncHash,
std::vector<uint64_t> &Counts);
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index da2cc807370095..0d47e12cb2b5de 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -931,6 +931,80 @@ struct LinearCallStackIdConverter {
}
};
+struct LineLocation {
+ LineLocation(uint32_t L, uint32_t D) : LineOffset(L), Column(D) {}
+
+ bool operator<(const LineLocation &O) const {
+ return LineOffset < O.LineOffset ||
+ (LineOffset == O.LineOffset && Column < O.Column);
+ }
+
+ bool operator==(const LineLocation &O) const {
+ return LineOffset == O.LineOffset && Column == O.Column;
+ }
+
+ bool operator!=(const LineLocation &O) const {
+ return LineOffset != O.LineOffset || Column != O.Column;
+ }
+
+ uint64_t getHashCode() const { return ((uint64_t)Column << 32) | LineOffset; }
+
+ uint32_t LineOffset;
+ uint32_t Column;
+};
+
+// A pair of a call site location and its corresponding callee GUID.
+using CallEdgeTy = std::pair<LineLocation, uint64_t>;
+
+// Used to extract caller-callee pairs from the call stack array. The leaf
+// frame is assumed to call a heap allocation function with GUID 0. The
+// resulting pairs are accumulated in CallerCalleePairs. Users can take it
+// with:
+//
+// auto Pairs = std::move(Extractor.CallerCalleePairs);
+struct CallerCalleePairExtractor {
+ const unsigned char *CallStackBase;
+ std::function<Frame(LinearFrameId)> FrameIdToFrame;
+ DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;
+
+ CallerCalleePairExtractor() = delete;
+ CallerCalleePairExtractor(const unsigned char *CallStackBase,
+ std::function<Frame(LinearFrameId)> FrameIdToFrame)
+ : CallStackBase(CallStackBase), FrameIdToFrame(FrameIdToFrame) {}
+
+ void operator()(LinearCallStackId LinearCSId) {
+ const unsigned char *Ptr =
+ CallStackBase +
+ static_cast<uint64_t>(LinearCSId) * sizeof(LinearFrameId);
+ uint32_t NumFrames =
+ support::endian::readNext<uint32_t, llvm::endianness::little>(Ptr);
+ // The leaf frame calls a function with GUID 0.
+ uint64_t CalleeGUID = 0;
+ for (; NumFrames; --NumFrames) {
+ LinearFrameId Elem =
+ support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+ // Follow a pointer to the parent, if any. See comments below on
+ // CallStackRadixTreeBuilder for the description of the radix tree format.
+ if (static_cast<std::make_signed_t<LinearFrameId>>(Elem) < 0) {
+ Ptr += (-Elem) * sizeof(LinearFrameId);
+ Elem =
+ support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+ }
+ // We shouldn't encounter another pointer.
+ assert(static_cast<std::make_signed_t<LinearFrameId>>(Elem) >= 0);
+
+ // Add a new caller-callee pair.
+ Frame F = FrameIdToFrame(Elem);
+ uint64_t CallerGUID = F.Function;
+ LineLocation Loc(F.LineOffset, F.Column);
+ CallerCalleePairs[CallerGUID].emplace_back(Loc, CalleeGUID);
+
+ Ptr += sizeof(LinearFrameId);
+ CalleeGUID = CallerGUID;
+ }
+ }
+};
+
struct IndexedMemProfData {
// A map to hold memprof data per function. The lower 64 bits obtained from
// the md5 hash of the function name is used to index into the map.
diff --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
index f168ffc4fdb1ef..2b8debd872c124 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
@@ -14,6 +14,7 @@
#include "llvm/ADT/IntrusiveRefCntPtr.h"
#include "llvm/IR/PassManager.h"
+#include "llvm/ProfileData/MemProf.h"
namespace llvm {
class Function;
@@ -59,31 +60,6 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
namespace memprof {
-struct LineLocation {
- LineLocation(uint32_t L, uint32_t D) : LineOffset(L), Column(D) {}
-
- bool operator<(const LineLocation &O) const {
- return LineOffset < O.LineOffset ||
- (LineOffset == O.LineOffset && Column < O.Column);
- }
-
- bool operator==(const LineLocation &O) const {
- return LineOffset == O.LineOffset && Column == O.Column;
- }
-
- bool operator!=(const LineLocation &O) const {
- return LineOffset != O.LineOffset || Column != O.Column;
- }
-
- uint64_t getHashCode() const { return ((uint64_t)Column << 32) | LineOffset; }
-
- uint32_t LineOffset;
- uint32_t Column;
-};
-
-// A pair of a call site location and its corresponding callee GUID.
-using CallEdgeTy = std::pair<LineLocation, uint64_t>;
-
// Extract all calls from the IR. Arrange them in a map from caller GUIDs to a
// list of call sites, each of the form {LineLocation, CalleeGUID}.
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> extractCallsFromIR(Module &M);
diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index b90617c74f6d13..034ae14b39bdd7 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1666,6 +1666,35 @@ IndexedMemProfReader::getMemProfRecord(const uint64_t FuncNameHash) const {
memprof::MaximumSupportedVersion));
}
+DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+IndexedMemProfReader::getMemProfCallerCalleePairs() const {
+ assert(MemProfRecordTable);
+ assert(Version == memprof::Version3);
+
+ memprof::LinearFrameIdConverter FrameIdConv(FrameBase);
+ memprof::CallerCalleePairExtractor Extractor(CallStackBase, FrameIdConv);
+
+ // Collect the set of linear call stack IDs. Since we expect a lot of
+ // duplicates, we first collect them in the form a bit vector before
+ // processing them.
+ for (const memprof::IndexedMemProfRecord &IndexedRecord :
+ MemProfRecordTable->data())
+ for (const memprof::IndexedAllocationInfo &IndexedAI :
+ IndexedRecord.AllocSites)
+ Extractor(IndexedAI.CSId);
+
+ DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> Pairs =
+ std::move(Extractor.CallerCalleePairs);
+
+ // Sort each call list by the source location.
+ for (auto &[CallerGUID, CallList] : Pairs) {
+ llvm::sort(CallList);
+ CallList.erase(llvm::unique(CallList), CallList.end());
+ }
+
+ return Pairs;
+}
+
Error IndexedInstrProfReader::getFunctionCounts(StringRef FuncName,
uint64_t FuncHash,
std::vector<uint64_t> &Counts) {
diff --git a/llvm/unittests/ProfileData/InstrProfTest.cpp b/llvm/unittests/ProfileData/InstrProfTest.cpp
index 7fdfd15e7bc990..cf3cf7fb952738 100644
--- a/llvm/unittests/ProfileData/InstrProfTest.cpp
+++ b/llvm/unittests/ProfileData/InstrProfTest.cpp
@@ -580,6 +580,68 @@ TEST_F(InstrProfTest, test_memprof_v2_partial_schema) {
EXPECT_THAT(WantRecord, EqualsRecord(Record));
}
+TEST_F(InstrProfTest, test_caller_callee_pairs) {
+ const MemInfoBlock MIB = makePartialMIB();
+
+ Writer.setMemProfVersionRequested(memprof::Version3);
+ Writer.setMemProfFullSchema(false);
+
+ ASSERT_THAT_ERROR(Writer.mergeProfileKind(InstrProfKind::MemProf),
+ Succeeded());
+
+ // Call Hierarchy
+ //
+ // Function GUID:0x123
+ // Line: 1, Column: 2
+ // Function GUID: 0x234
+ // Line: 3, Column: 4
+ // new(...)
+ // Line: 5, Column: 6
+ // Function GUID: 0x345
+ // Line: 7, Column: 8
+ // new(...)
+
+ const std::pair<memprof::FrameId, memprof::Frame> Frames[] = {
+ {0, {0x123, 1, 2, false}},
+ {1, {0x234, 3, 4, true}},
+ {2, {0x123, 5, 6, false}},
+ {3, {0x345, 7, 8, true}}};
+ for (const auto &[FrameId, Frame] : Frames)
+ Writer.addMemProfFrame(FrameId, Frame, Err);
+
+ const std::pair<memprof::CallStackId, SmallVector<memprof::FrameId>>
+ CallStacks[] = {{0x111, {1, 0}}, {0x222, {3, 2}}};
+ for (const auto &[CSId, CallStack] : CallStacks)
+ Writer.addMemProfCallStack(CSId, CallStack, Err);
+
+ const IndexedMemProfRecord IndexedMR = makeRecordV2(
+ /*AllocFrames=*/{0x111, 0x222},
+ /*CallSiteFrames=*/{}, MIB, memprof::getHotColdSchema());
+ Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+
+ auto Profile = Writer.writeBuffer();
+ readProfile(std::move(Profile));
+
+ auto Pairs = Reader->getMemProfCallerCalleePairs();
+ ASSERT_THAT(Pairs, SizeIs(3));
+
+ auto It = Pairs.find(0x123);
+ ASSERT_NE(It, Pairs.end());
+ ASSERT_THAT(It->second, SizeIs(2));
+ EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(1U, 2U), 0x234U));
+ EXPECT_THAT(It->second[1], testing::Pair(testing::FieldsAre(5U, 6U), 0x345U));
+
+ It = Pairs.find(0x234);
+ ASSERT_NE(It, Pairs.end());
+ ASSERT_THAT(It->second, SizeIs(1));
+ EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(3U, 4U), 0U));
+
+ It = Pairs.find(0x345);
+ ASSERT_NE(It, Pairs.end());
+ ASSERT_THAT(It->second, SizeIs(1));
+ EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(7U, 8U), 0U));
+}
+
TEST_F(InstrProfTest, test_memprof_getrecord_error) {
ASSERT_THAT_ERROR(Writer.mergeProfileKind(InstrProfKind::MemProf),
Succeeded());
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/8551 Here is the relevant piece of the build log for the reference
|
@snehasish Thank you for the feedback! Could I address it after #116184? The follow-up PR touches the same area. |
Undrifting the MemProf profile requires two sets of information:
This patch adds a function to do the former. The latter has been
addressed by extractCallsFromIR.
Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile. "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling. To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.
Conceptually, we would extract caller-callee pairs with:
for each MemProfRecord in the profile:
for each call stack in AllocSites:
extract caller-callee pairs from adjacent pairs of Frames
However, this is highly inefficient. Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.
This patch adds an efficient way of doing the above. Specifically, we
The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array. For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.
The resulting caller-callee-pairs is of the type:
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;
which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.
Further performance optimizations are possible for the new functions
in this patch. I'll address those in follow-up patches.