-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[memprof] Add YAML-based deserialization for MemProf profile #117829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds YAML-based deserialization for MemProf profile. It's been painful to write tests for MemProf passes because we do not have a text format for the MemProf profile. We would write a test case in C++, run it for a binary MemProf profile, and then finally run a test written in LLVM IR with the binary profile. This patch paves the way toward YAML-based MemProf profile. Specifically, it adds new class YAMLMemProfReader derived from MemProfReader. For now, it only adds a function to parse StringRef pointing to YAML data. Subseqeunt patches will wire it to llvm-profdata and read from a file. The field names are based on various printYAML functions in MemProf.h. I'm not aiming for compatibility with the format used in printYAML, but I don't see a point in changing the field names.
@llvm/pr-subscribers-pgo Author: Kazu Hirata (kazutakahirata) ChangesThis patch adds YAML-based deserialization for MemProf profile. It's been painful to write tests for MemProf passes because we do not This patch paves the way toward YAML-based MemProf profile. The field names are based on various printYAML functions in MemProf.h. Full diff: https://github.com/llvm/llvm-project/pull/117829.diff 4 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index d64addcfe3ed06..3b631fad4029e7 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -19,6 +19,10 @@
#include <optional>
namespace llvm {
+namespace yaml {
+template <typename T> struct CustomMappingTraits;
+} // namespace yaml
+
namespace memprof {
struct MemProfRecord;
@@ -193,6 +197,9 @@ struct PortableMemInfoBlock {
return Result;
}
+ // Give YAML access to the individual MIB fields.
+ friend struct yaml::CustomMappingTraits<memprof::PortableMemInfoBlock>;
+
private:
// The set of available fields, indexed by Meta::Name.
std::bitset<llvm::to_underlying(Meta::Size)> Schema;
@@ -362,6 +369,8 @@ struct IndexedAllocationInfo {
IndexedAllocationInfo(CallStackId CSId, const MemInfoBlock &MB,
const MemProfSchema &Schema = getFullSchema())
: CSId(CSId), Info(MB, Schema) {}
+ IndexedAllocationInfo(CallStackId CSId, const PortableMemInfoBlock &MB)
+ : CSId(CSId), Info(MB) {}
// Returns the size in bytes when this allocation info struct is serialized.
size_t serializedSize(const MemProfSchema &Schema,
@@ -498,6 +507,19 @@ struct MemProfRecord {
}
};
+// Helper struct for AllMemProfData. In YAML, we treat the GUID and the fields
+// within MemProfRecord at the same level as if the GUID were part of
+// MemProfRecord.
+struct GUIDMemProfRecordPair {
+ GlobalValue::GUID GUID;
+ MemProfRecord Record;
+};
+
+// The top-level data structure, only used with YAML for now.
+struct AllMemProfData {
+ std::vector<GUIDMemProfRecordPair> HeapProfileRecords;
+};
+
// Reads a memprof schema from a buffer. All entries in the buffer are
// interpreted as uint64_t. The first entry in the buffer denotes the number of
// ids in the schema. Subsequent entries are integers which map to memprof::Meta
diff --git a/llvm/include/llvm/ProfileData/MemProfReader.h b/llvm/include/llvm/ProfileData/MemProfReader.h
index bf2f91548fac07..0529f794606465 100644
--- a/llvm/include/llvm/ProfileData/MemProfReader.h
+++ b/llvm/include/llvm/ProfileData/MemProfReader.h
@@ -209,6 +209,12 @@ class RawMemProfReader final : public MemProfReader {
// A mapping of the hash to symbol name, only used if KeepSymbolName is true.
llvm::DenseMap<uint64_t, std::string> GuidToSymbolName;
};
+
+class YAMLMemProfReader final : public MemProfReader {
+public:
+ YAMLMemProfReader() = default;
+ void parse(StringRef YAMLData);
+};
} // namespace memprof
} // namespace llvm
diff --git a/llvm/lib/ProfileData/MemProfReader.cpp b/llvm/lib/ProfileData/MemProfReader.cpp
index 921ff3429fed74..e0b8207020dced 100644
--- a/llvm/lib/ProfileData/MemProfReader.cpp
+++ b/llvm/lib/ProfileData/MemProfReader.cpp
@@ -40,6 +40,71 @@
#include "llvm/Support/Path.h"
#define DEBUG_TYPE "memprof"
+
+namespace llvm {
+namespace yaml {
+template <> struct MappingTraits<memprof::Frame> {
+ static void mapping(IO &Io, memprof::Frame &F) {
+ Io.mapRequired("Function", F.Function);
+ Io.mapRequired("LineOffset", F.LineOffset);
+ Io.mapRequired("Column", F.Column);
+ Io.mapRequired("Inline", F.IsInlineFrame);
+ }
+};
+
+template <> struct CustomMappingTraits<memprof::PortableMemInfoBlock> {
+ static void inputOne(IO &Io, StringRef KeyStr,
+ memprof::PortableMemInfoBlock &MIB) {
+ // PortableMemInfoBlock keeps track of the set of fields that actually have
+ // values. We update the set here as we receive a key-value pair from the
+ // YAML document.
+#define MIBEntryDef(NameTag, Name, Type) \
+ if (KeyStr == #Name) { \
+ Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \
+ MIB.Schema.set(llvm::to_underlying(memprof::Meta::Name)); \
+ return; \
+ }
+#include "llvm/ProfileData/MIBEntryDef.inc"
+#undef MIBEntryDef
+ Io.setError("Key is not a valid validation event");
+ }
+
+ static void output(IO &Io, memprof::PortableMemInfoBlock &VI) {
+ llvm_unreachable("To be implemented");
+ }
+};
+
+template <> struct MappingTraits<memprof::AllocationInfo> {
+ static void mapping(IO &Io, memprof::AllocationInfo &AI) {
+ Io.mapRequired("Callstack", AI.CallStack);
+ Io.mapRequired("MemInfoBlock", AI.Info);
+ }
+};
+
+// In YAML, we use GUIDMemProfRecordPair instead of MemProfRecord so that we can
+// treat the GUID and the fields within MemProfRecord at the same level as if
+// the GUID were part of MemProfRecord.
+template <> struct MappingTraits<memprof::GUIDMemProfRecordPair> {
+ static void mapping(IO &Io, memprof::GUIDMemProfRecordPair &Pair) {
+ Io.mapRequired("GUID", Pair.GUID);
+ Io.mapRequired("AllocSites", Pair.Record.AllocSites);
+ Io.mapRequired("CallSites", Pair.Record.CallSites);
+ }
+};
+
+template <> struct MappingTraits<memprof::AllMemProfData> {
+ static void mapping(IO &Io, memprof::AllMemProfData &Data) {
+ Io.mapRequired("HeapProfileRecords", Data.HeapProfileRecords);
+ }
+};
+} // namespace yaml
+} // namespace llvm
+
+LLVM_YAML_IS_SEQUENCE_VECTOR(memprof::Frame)
+LLVM_YAML_IS_SEQUENCE_VECTOR(std::vector<memprof::Frame>)
+LLVM_YAML_IS_SEQUENCE_VECTOR(memprof::AllocationInfo)
+LLVM_YAML_IS_SEQUENCE_VECTOR(memprof::GUIDMemProfRecordPair)
+
namespace llvm {
namespace memprof {
namespace {
@@ -756,5 +821,46 @@ Error RawMemProfReader::readNextRecord(
};
return MemProfReader::readNextRecord(GuidRecord, IdToFrameCallback);
}
+
+void YAMLMemProfReader::parse(StringRef YAMLData) {
+ memprof::AllMemProfData Doc;
+ yaml::Input Yin(YAMLData);
+
+ Yin >> Doc;
+ if (Yin.error())
+ return;
+
+ // Add a call stack to MemProfData.CallStacks and return its CallStackId.
+ auto AddCallStack = [&](ArrayRef<Frame> CallStack) -> CallStackId {
+ SmallVector<FrameId> IndexedCallStack;
+ IndexedCallStack.reserve(CallStack.size());
+ for (const Frame &F : CallStack) {
+ FrameId Id = F.hash();
+ MemProfData.Frames.try_emplace(Id, F);
+ IndexedCallStack.push_back(Id);
+ }
+ CallStackId CSId = hashCallStack(IndexedCallStack);
+ MemProfData.CallStacks.try_emplace(CSId, std::move(IndexedCallStack));
+ return CSId;
+ };
+
+ for (const auto &[GUID, Record] : Doc.HeapProfileRecords) {
+ IndexedMemProfRecord IndexedRecord;
+
+ // Convert AllocationInfo to IndexedAllocationInfo.
+ for (const AllocationInfo &AI : Record.AllocSites) {
+ CallStackId CSId = AddCallStack(AI.CallStack);
+ IndexedRecord.AllocSites.emplace_back(CSId, AI.Info);
+ }
+
+ // Populate CallSiteIds.
+ for (const auto &CallSite : Record.CallSites) {
+ CallStackId CSId = AddCallStack(CallSite);
+ IndexedRecord.CallSiteIds.push_back(CSId);
+ }
+
+ MemProfData.Records.try_emplace(GUID, std::move(IndexedRecord));
+ }
+}
} // namespace memprof
} // namespace llvm
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp
index b3b6249342d630..53a4c8a09472ae 100644
--- a/llvm/unittests/ProfileData/MemProfTest.cpp
+++ b/llvm/unittests/ProfileData/MemProfTest.cpp
@@ -742,4 +742,81 @@ TEST(MemProf, RadixTreeBuilderSuccessiveJumps) {
EXPECT_THAT(Mappings, testing::Contains(testing::Pair(
llvm::memprof::hashCallStack(CS4), 10U)));
}
+
+// Verify that we can parse YAML and retrieve IndexedMemProfData as expected.
+TEST(MemProf, YAMLParser) {
+ StringRef YAMLData = R"YAML(
+---
+HeapProfileRecords:
+- GUID: 0xdeadbeef12345678
+ AllocSites:
+ - Callstack:
+ - {Function: 0x100, LineOffset: 11, Column: 10, Inline: true}
+ - {Function: 0x200, LineOffset: 22, Column: 20, Inline: false}
+ MemInfoBlock:
+ AllocCount: 777
+ TotalSize: 888
+ - Callstack:
+ - {Function: 0x300, LineOffset: 33, Column: 30, Inline: false}
+ - {Function: 0x400, LineOffset: 44, Column: 40, Inline: true}
+ MemInfoBlock:
+ AllocCount: 666
+ TotalSize: 555
+ CallSites:
+ - - {Function: 0x500, LineOffset: 55, Column: 50, Inline: true}
+ - {Function: 0x600, LineOffset: 66, Column: 60, Inline: false}
+ - - {Function: 0x700, LineOffset: 77, Column: 70, Inline: true}
+ - {Function: 0x800, LineOffset: 88, Column: 80, Inline: false}
+)YAML";
+
+ llvm::memprof::YAMLMemProfReader YAMLReader;
+ YAMLReader.parse(YAMLData);
+ llvm::memprof::IndexedMemProfData MemProfData = YAMLReader.takeMemProfData();
+
+ Frame F1(0x100, 11, 10, true);
+ Frame F2(0x200, 22, 20, false);
+ Frame F3(0x300, 33, 30, false);
+ Frame F4(0x400, 44, 40, true);
+ Frame F5(0x500, 55, 50, true);
+ Frame F6(0x600, 66, 60, false);
+ Frame F7(0x700, 77, 70, true);
+ Frame F8(0x800, 88, 80, false);
+
+ llvm::SmallVector<FrameId> CS1 = {F1.hash(), F2.hash()};
+ llvm::SmallVector<FrameId> CS2 = {F3.hash(), F4.hash()};
+ llvm::SmallVector<FrameId> CS3 = {F5.hash(), F6.hash()};
+ llvm::SmallVector<FrameId> CS4 = {F7.hash(), F8.hash()};
+
+ // Verify the entire contents of MemProfData.Frames.
+ EXPECT_THAT(
+ MemProfData.Frames,
+ ::testing::UnorderedElementsAre(
+ ::testing::Pair(F1.hash(), F1), ::testing::Pair(F2.hash(), F2),
+ ::testing::Pair(F3.hash(), F3), ::testing::Pair(F4.hash(), F4),
+ ::testing::Pair(F5.hash(), F5), ::testing::Pair(F6.hash(), F6),
+ ::testing::Pair(F7.hash(), F7), ::testing::Pair(F8.hash(), F8)));
+
+ // Verify the entire contents of MemProfData.Frames.
+ EXPECT_THAT(MemProfData.CallStacks,
+ ::testing::UnorderedElementsAre(
+ ::testing::Pair(llvm::memprof::hashCallStack(CS1), CS1),
+ ::testing::Pair(llvm::memprof::hashCallStack(CS2), CS2),
+ ::testing::Pair(llvm::memprof::hashCallStack(CS3), CS3),
+ ::testing::Pair(llvm::memprof::hashCallStack(CS4), CS4)));
+
+ // Verify the entire contents of MemProfData.Records.
+ ASSERT_THAT(MemProfData.Records, SizeIs(1));
+ const auto &[GUID, Record] = *MemProfData.Records.begin();
+ EXPECT_EQ(GUID, 0xdeadbeef12345678ULL);
+ ASSERT_THAT(Record.AllocSites, SizeIs(2));
+ EXPECT_EQ(Record.AllocSites[0].CSId, llvm::memprof::hashCallStack(CS1));
+ EXPECT_EQ(Record.AllocSites[0].Info.getAllocCount(), 777U);
+ EXPECT_EQ(Record.AllocSites[0].Info.getTotalSize(), 888U);
+ EXPECT_EQ(Record.AllocSites[1].CSId, llvm::memprof::hashCallStack(CS2));
+ EXPECT_EQ(Record.AllocSites[1].Info.getAllocCount(), 666U);
+ EXPECT_EQ(Record.AllocSites[1].Info.getTotalSize(), 555U);
+ EXPECT_THAT(Record.CallSiteIds,
+ ::testing::ElementsAre(llvm::memprof::hashCallStack(CS3),
+ llvm::memprof::hashCallStack(CS4)));
+}
} // namespace
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/5070 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/5252 Here is the relevant piece of the build log for the reference
|
THis breaks building on macOS with a similar error. |
…117829)" This reverts commit c00e532. It looks like this breaks building LLVM on macOS and some other platform/compiler combos https://lab.llvm.org/buildbot/#/builders/23/builds/5252 https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/5356/console In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:34: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MemProfReader.h:24: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfReader.h:22: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfCorrelator.h:21: /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:1173:36: error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait<unsigned long>' char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)]; ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:961:7: note: in instantiation of function template specialization 'llvm::yaml::yamlize<unsigned long>' requested here yamlize(*this, Val, Required, Ctx); ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:883:11: note: in instantiation of function template specialization 'llvm::yaml::IO::processKey<unsigned long, llvm::yaml::EmptyContext>' requested here this->processKey(Key, Val, true, Ctx); ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MIBEntryDef.inc:55:1: note: in instantiation of function template specialization 'llvm::yaml::IO::mapRequired<unsigned long>' requested here MIBEntryDef(AccessHistogram = 27, AccessHistogram, uintptr_t) ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:77:8: note: expanded from macro 'MIBEntryDef' Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \ ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:310:8: note: template is declared here struct MissingTrait; ^ 1 error generated.
Reverted for now to bring builders back to green in 7e312c3 |
@fhahn Thanks for the revert. I'll take a look. |
…117829) This patch adds YAML-based deserialization for MemProf profile. It's been painful to write tests for MemProf passes because we do not have a text format for the MemProf profile. We would write a test case in C++, run it for a binary MemProf profile, and then finally run a test written in LLVM IR with the binary profile. This patch paves the way toward YAML-based MemProf profile. Specifically, it adds new class YAMLMemProfReader derived from MemProfReader. For now, it only adds a function to parse StringRef pointing to YAML data. Subseqeunt patches will wire it to llvm-profdata and read from a file. The field names are based on various printYAML functions in MemProf.h. I'm not aiming for compatibility with the format used in printYAML, but I don't see a point in changing the field names. This iteration works around the unavailability of ScalarTraits<uintptr_t> on macOS.
@kazutakahirata I'm seeing MSVC build warnings due to e98396f:
|
Thank you for reporting this to me! I've reproduced the problem with a small test case on Compiler Explorer. I'll put |
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.