Skip to content

Commit 7c21d30

Browse files
committed
[𝘀𝗽𝗿] changes to main this commit is based on
Created using spr 1.3.4 [skip ci]
1 parent 155d584 commit 7c21d30

File tree

7 files changed

+412
-99
lines changed

7 files changed

+412
-99
lines changed

bolt/docs/BAT.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# BOLT Address Translation (BAT)
2+
# Purpose
3+
A regular profile collection for BOLT involves collecting samples from
4+
unoptimized binary. BOLT Address Translation allows collecting profile
5+
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
6+
binary.
7+
8+
# Overview
9+
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
10+
into the output binary containing translation tables and split functions linkage
11+
information. This information enables mapping the profile back from optimized
12+
binary onto the original binary.
13+
14+
# Usage
15+
`--enable-bat` flag controls the generation of BAT section. Sampled profile
16+
needs to be passed along with the optimized binary containing BAT section to
17+
`perf2bolt` which reads BAT section and produces fdata profile for the original
18+
binary. Note that YAML profile generation is not supported since BAT doesn't
19+
contain the metadata for input functions.
20+
21+
# Internals
22+
## Section contents
23+
The section is organized as follows:
24+
- Hot functions table
25+
- Address translation tables
26+
- Cold functions table
27+
28+
## Construction and parsing
29+
BAT section is created from `BoltAddressTranslation` class which captures
30+
address translation information provided by BOLT linker. It is then encoded as a
31+
note section in the output binary.
32+
33+
During profile conversion when BAT-enabled binary is passed to perf2bolt,
34+
`BoltAddressTranslation` class is populated from BAT section. The class is then
35+
queried by `DataAggregator` during sample processing to reconstruct addresses/
36+
offsets in the input binary.
37+
38+
## Encoding format
39+
The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
40+
and bolt/lib/Profile/BoltAddressTranslation.cpp.
41+
42+
### Layout
43+
The general layout is as follows:
44+
```
45+
Hot functions table header
46+
|------------------|
47+
| Function entry |
48+
| |--------------| |
49+
| | OutOff InOff | |
50+
| |--------------| |
51+
~~~~~~~~~~~~~~~~~~~~
52+
53+
Cold functions table header
54+
|------------------|
55+
| Function entry |
56+
| |--------------| |
57+
| | OutOff InOff | |
58+
| |--------------| |
59+
~~~~~~~~~~~~~~~~~~~~
60+
```
61+
62+
### Functions table
63+
Hot and cold functions tables share the encoding except differences marked below.
64+
Header:
65+
| Entry | Encoding | Description |
66+
| ------ | ----- | ----------- |
67+
| `NumFuncs` | ULEB128 | Number of functions in the functions table |
68+
69+
The header is followed by Functions table with `NumFuncs` entries.
70+
Output binary addresses are delta encoded, meaning that only the difference with
71+
the last previous output address is stored. Addresses implicitly start at zero.
72+
Output addresses are continuous through function start addresses and function
73+
internal offsets, and between hot and cold fragments, to better spread deltas
74+
and save space.
75+
76+
Hot indices are delta encoded, implicitly starting at zero.
77+
| Entry | Encoding | Description |
78+
| ------ | ------| ----------- |
79+
| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
80+
| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
81+
| `FuncHash` | 8b | Hot functions only: function hash for input function |
82+
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
83+
| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
84+
| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
85+
Function header is followed by `EqualElems` offsets (hot functions only) and
86+
`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
87+
current function.
88+
89+
### Address translation table
90+
Delta encoding means that only the difference with the previous corresponding
91+
entry is encoded. Input offsets implicitly start at zero.
92+
| Entry | Encoding | Description |
93+
| ------ | ------| ----------- |
94+
| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
95+
| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
96+
| `BBHash` | Optional, 8b | Basic block entries only: basic block hash in input binary |
97+
98+
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
99+
(branch or call instruction). If not set, it signifies a control flow target
100+
(basic block offset).
101+
`InputAddr` is omitted for equal offsets in input and output function. In this
102+
case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.

bolt/include/bolt/Profile/BoltAddressTranslation.h

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
#include "llvm/ADT/SmallVector.h"
1313
#include "llvm/ADT/StringRef.h"
14+
#include "llvm/Support/DataExtractor.h"
1415
#include <cstdint>
1516
#include <map>
1617
#include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
7879

7980
BoltAddressTranslation() {}
8081

82+
/// Write the serialized address translation table for a function.
83+
template <bool Cold>
84+
void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
85+
raw_ostream &OS);
86+
8187
/// Write the serialized address translation tables for each reordered
8288
/// function
8389
void write(const BinaryContext &BC, raw_ostream &OS);
8490

91+
/// Read the serialized address translation table for a function.
92+
/// Return a parse error if failed.
93+
template <bool Cold>
94+
void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
95+
DataExtractor &DE, uint64_t &Offset, Error &Err);
96+
8597
/// Read the serialized address translation tables and load them internally
8698
/// in memory. Return a parse error if failed.
8799
std::error_code parse(StringRef Buf);
@@ -110,22 +122,40 @@ class BoltAddressTranslation {
110122
/// addresses when aggregating profile
111123
bool enabledFor(llvm::object::ELFObjectFileBase *InputFile) const;
112124

125+
/// Save function and basic block hashes used for metadata dump.
126+
void saveMetadata(BinaryContext &BC);
127+
113128
private:
114129
/// Helper to update \p Map by inserting one or more BAT entries reflecting
115130
/// \p BB for function located at \p FuncAddress. At least one entry will be
116131
/// emitted for the start of the BB. More entries may be emitted to cover
117132
/// the location of calls or any instruction that may change control flow.
118133
void writeEntriesForBB(MapTy &Map, const BinaryBasicBlock &BB,
119-
uint64_t FuncAddress);
134+
uint64_t FuncAddress, uint64_t FuncInputAddress);
135+
136+
/// Returns the bitmask with set bits corresponding to indices of BRANCHENTRY
137+
/// entries in function address translation map.
138+
APInt calculateBranchEntriesBitMask(MapTy &Map, size_t EqualElems);
139+
140+
/// Calculate the number of equal offsets (output = input) in the beginning
141+
/// of the function.
142+
size_t getNumEqualOffsets(const MapTy &Map) const;
120143

121144
std::map<uint64_t, MapTy> Maps;
145+
std::map<uint64_t, MapTy> ColdMaps;
146+
147+
using BBHashMap = std::unordered_map<uint32_t, size_t>;
148+
std::unordered_map<uint64_t, std::pair<size_t, BBHashMap>> FuncHashes;
122149

123150
/// Links outlined cold bocks to their original function
124151
std::map<uint64_t, uint64_t> ColdPartSource;
125152

153+
/// Links output address of a main fragment back to input address.
154+
std::unordered_map<uint64_t, uint64_t> ReverseMap;
155+
126156
/// Identifies the address of a control-flow changing instructions in a
127157
/// translation map entry
128-
const static uint32_t BRANCHENTRY = 0x80000000;
158+
const static uint32_t BRANCHENTRY = 0x1;
129159
};
130160
} // namespace bolt
131161

bolt/include/bolt/Profile/DataAggregator.h

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,10 @@ class DataAggregator : public DataReader {
225225
/// Aggregation statistics
226226
uint64_t NumInvalidTraces{0};
227227
uint64_t NumLongRangeTraces{0};
228+
/// Specifies how many samples were recorded in cold areas if we are dealing
229+
/// with profiling data collected in a bolted binary. For LBRs, incremented
230+
/// for the source of the branch to avoid counting cold activity twice (one
231+
/// for source and another for destination).
228232
uint64_t NumColdSamples{0};
229233

230234
/// Looks into system PATH for Linux Perf and set up the aggregator to use it
@@ -246,13 +250,9 @@ class DataAggregator : public DataReader {
246250
BinaryFunction *getBinaryFunctionContainingAddress(uint64_t Address) const;
247251

248252
/// Retrieve the location name to be used for samples recorded in \p Func.
249-
/// If doing BAT translation, link cold parts to the hot part names (used by
250-
/// the original binary). \p Count specifies how many samples were recorded
251-
/// at that location, so we can tally total activity in cold areas if we are
252-
/// dealing with profiling data collected in a bolted binary. For LBRs,
253-
/// \p Count should only be used for the source of the branch to avoid
254-
/// counting cold activity twice (one for source and another for destination).
255-
StringRef getLocationName(BinaryFunction &Func, uint64_t Count);
253+
/// If doing BAT translation, link cold parts to the hot part names (used by
254+
/// the original binary) and return true as second member.
255+
std::pair<StringRef, bool> getLocationName(const BinaryFunction &Func) const;
256256

257257
/// Semantic actions - parser hooks to interpret parsed perf samples
258258
/// Register a sample (non-LBR mode), i.e. a new hit at \p Address

0 commit comments

Comments
 (0)