Skip to content

Commit d3ce4aa

Browse files
[BOLT] DataAggregator supports binaries with multiple text segments
When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Background: Larger binaries get two text segments mapped when loaded in memory. BOLT processes only the first, which is not having a correct BaseAddress, causing a wrong computation of a BinaryMMapInfo's size. Consequently, BOLT wrongly thinks that many of the samples fall outside the binary and ignores them. As a result, when used in heatmaps the output excludes all those entries and the section hotness statistics are wrong. This bug is present in both the AArch64 and x86 backends. --- This patch introduces the flag 'perf-script-events' that allows passing perf events without BOLT having to parse them using 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The size of the mapping is updated as `parseMMapEvents` now processes all text segments. --- Example used in unit tests: From `/proc/<BINARY PID>/maps`, we have 2 text mappings, say A and B. ``` abc0000000-abc1000000 r-xp 011c0000 103:01 1573523 BINARY abc2000000-abca000000 r-xp 031d0000 103:01 1573523 BINARY ``` Size of text mappings: | Mapping | Size | | ------- | ------ | | A | ~15MB | | B | ~135MB | --- Example on a real program: ``` 2f7200000-2fabca000 r--p 00000000 bolted-binary 2fabd9000-2fe47c000 r-xp 039c9000 bolted-binary <- 1st txt segment 2fe48b000-2fe61d000 r--p 0727b000 bolted-binary 2fe62c000-2fe660000 rw-p 0740c000 bolted-binary 2fe660000-2fea4c000 rw-p 00000000 2fec00000-303dad000 r-xp 07a00000 bolted-binary <- 2nd (appears only on the bolted binary) ```
1 parent 8e1eb4a commit d3ce4aa

File tree

1 file changed

+8
-11
lines changed

1 file changed

+8
-11
lines changed

bolt/lib/Profile/DataAggregator.cpp

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,15 +2026,6 @@ std::error_code DataAggregator::parseMMapEvents() {
20262026
if (FileMMapInfo.first == "(deleted)")
20272027
continue;
20282028

2029-
// Consider only the first mapping of the file for any given PID
2030-
auto Range = GlobalMMapInfo.equal_range(FileMMapInfo.first);
2031-
bool PIDExists = llvm::any_of(make_range(Range), [&](const auto &MI) {
2032-
return MI.second.PID == FileMMapInfo.second.PID;
2033-
});
2034-
2035-
if (PIDExists)
2036-
continue;
2037-
20382029
GlobalMMapInfo.insert(FileMMapInfo);
20392030
}
20402031

@@ -2086,12 +2077,18 @@ std::error_code DataAggregator::parseMMapEvents() {
20862077
<< " using file offset 0x" << Twine::utohexstr(MMapInfo.Offset)
20872078
<< ". Ignoring profile data for this mapping\n";
20882079
continue;
2089-
} else {
2090-
MMapInfo.BaseAddress = *BaseAddress;
20912080
}
2081+
MMapInfo.BaseAddress = *BaseAddress;
20922082
}
20932083

2084+
// Try to add MMapInfo to the map and update its size. Large binaries
2085+
// may span multiple text segments, so the mapping is inserted only on the
2086+
// first occurrence. If a larger section size is found, it will be updated.
20942087
BinaryMMapInfo.insert(std::make_pair(MMapInfo.PID, MMapInfo));
2088+
uint64_t EndAddress = MMapInfo.MMapAddress + MMapInfo.Size;
2089+
uint64_t Size = EndAddress - BinaryMMapInfo[MMapInfo.PID].BaseAddress;
2090+
if (Size > BinaryMMapInfo[MMapInfo.PID].Size)
2091+
BinaryMMapInfo[MMapInfo.PID].Size = Size;
20952092
}
20962093

20972094
if (BinaryMMapInfo.empty()) {

0 commit comments

Comments
 (0)