Skip to content

Commit 01c1231

Browse files
committed
[𝘀𝗽𝗿] changes to main this commit is based on
Created using spr 1.3.4 [skip ci]
1 parent 155d584 commit 01c1231

File tree

5 files changed

+307
-66
lines changed

5 files changed

+307
-66
lines changed

bolt/docs/BAT.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# BOLT Address Translation (BAT)
2+
# Purpose
3+
A regular profile collection for BOLT involves collecting samples from
4+
unoptimized binary. BOLT Address Translation allows collecting profile
5+
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
6+
binary.
7+
8+
# Overview
9+
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
10+
into the output binary containing translation tables and split functions linkage
11+
information. This information enables mapping the profile back from optimized
12+
binary onto the original binary.
13+
14+
# Usage
15+
`--enable-bat` flag controls the generation of BAT section. Sampled profile
16+
needs to be passed along with the optimized binary containing BAT section to
17+
`perf2bolt` which reads BAT section and produces fdata profile for the original
18+
binary. Note that YAML profile generation is not supported since BAT doesn't
19+
contain the metadata for input functions.
20+
21+
# Internals
22+
## Section contents
23+
The section is organized as follows:
24+
- Hot functions table
25+
- Address translation tables
26+
- Cold functions table
27+
28+
## Construction and parsing
29+
BAT section is created from `BoltAddressTranslation` class which captures
30+
address translation information provided by BOLT linker. It is then encoded as a
31+
note section in the output binary.
32+
33+
During profile conversion when BAT-enabled binary is passed to perf2bolt,
34+
`BoltAddressTranslation` class is populated from BAT section. The class is then
35+
queried by `DataAggregator` during sample processing to reconstruct addresses/
36+
offsets in the input binary.
37+
38+
## Encoding format
39+
The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
40+
and bolt/lib/Profile/BoltAddressTranslation.cpp.
41+
42+
### Layout
43+
The general layout is as follows:
44+
```
45+
Hot functions table header
46+
|------------------|
47+
| Function entry |
48+
| |--------------| |
49+
| | OutOff InOff | |
50+
| |--------------| |
51+
~~~~~~~~~~~~~~~~~~~~
52+
53+
Cold functions table header
54+
|------------------|
55+
| Function entry |
56+
| |--------------| |
57+
| | OutOff InOff | |
58+
| |--------------| |
59+
~~~~~~~~~~~~~~~~~~~~
60+
```
61+
62+
### Functions table
63+
Hot and cold functions tables share the encoding except differences marked below.
64+
Header:
65+
| Entry | Encoding | Description |
66+
| ------ | ----- | ----------- |
67+
| `NumFuncs` | ULEB128 | Number of functions in the functions table |
68+
69+
The header is followed by Functions table with `NumFuncs` entries.
70+
Output binary addresses are delta encoded, meaning that only the difference with
71+
the last previous output address is stored. Addresses implicitly start at zero.
72+
Output addresses are continuous through function start addresses and function
73+
internal offsets, and between hot and cold fragments, to better spread deltas
74+
and save space.
75+
76+
Hot indices are delta encoded, implicitly starting at zero.
77+
| Entry | Encoding | Description |
78+
| ------ | ------| ----------- |
79+
| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
80+
| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
81+
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
82+
| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
83+
| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
84+
Function header is followed by `EqualElems` offsets (hot functions only) and
85+
`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
86+
current function.
87+
88+
### Address translation table
89+
Delta encoding means that only the difference with the previous corresponding
90+
entry is encoded. Input offsets implicitly start at zero.
91+
| Entry | Encoding | Description |
92+
| ------ | ------| ----------- |
93+
| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
94+
| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
95+
96+
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
97+
(branch or call instruction). If not set, it signifies a control flow target
98+
(basic block offset).
99+
`InputAddr` is omitted for equal offsets in input and output function. In this
100+
case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.

bolt/include/bolt/Profile/BoltAddressTranslation.h

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
#include "llvm/ADT/SmallVector.h"
1313
#include "llvm/ADT/StringRef.h"
14+
#include "llvm/Support/DataExtractor.h"
1415
#include <cstdint>
1516
#include <map>
1617
#include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
7879

7980
BoltAddressTranslation() {}
8081

82+
/// Write the serialized address translation table for a function.
83+
template <bool Cold>
84+
void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
85+
raw_ostream &OS);
86+
8187
/// Write the serialized address translation tables for each reordered
8288
/// function
8389
void write(const BinaryContext &BC, raw_ostream &OS);
8490

91+
/// Read the serialized address translation table for a function.
92+
/// Return a parse error if failed.
93+
template <bool Cold>
94+
void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
95+
DataExtractor &DE, uint64_t &Offset, Error &Err);
96+
8597
/// Read the serialized address translation tables and load them internally
8698
/// in memory. Return a parse error if failed.
8799
std::error_code parse(StringRef Buf);
@@ -110,6 +122,9 @@ class BoltAddressTranslation {
110122
/// addresses when aggregating profile
111123
bool enabledFor(llvm::object::ELFObjectFileBase *InputFile) const;
112124

125+
/// Save function and basic block hashes used for metadata dump.
126+
void saveMetadata(BinaryContext &BC);
127+
113128
private:
114129
/// Helper to update \p Map by inserting one or more BAT entries reflecting
115130
/// \p BB for function located at \p FuncAddress. At least one entry will be
@@ -118,14 +133,26 @@ class BoltAddressTranslation {
118133
void writeEntriesForBB(MapTy &Map, const BinaryBasicBlock &BB,
119134
uint64_t FuncAddress);
120135

136+
/// Returns the bitmask with set bits corresponding to indices of BRANCHENTRY
137+
/// entries in function address translation map.
138+
APInt calculateBranchEntriesBitMask(MapTy &Map, size_t EqualElems);
139+
140+
/// Calculate the number of equal offsets (output = input) in the beginning
141+
/// of the function.
142+
size_t getNumEqualOffsets(const MapTy &Map) const;
143+
121144
std::map<uint64_t, MapTy> Maps;
145+
std::map<uint64_t, MapTy> ColdMaps;
146+
147+
using BBHashMap = std::unordered_map<uint32_t, size_t>;
148+
std::unordered_map<uint64_t, std::pair<size_t, BBHashMap>> FuncHashes;
122149

123150
/// Links outlined cold bocks to their original function
124151
std::map<uint64_t, uint64_t> ColdPartSource;
125152

126153
/// Identifies the address of a control-flow changing instructions in a
127154
/// translation map entry
128-
const static uint32_t BRANCHENTRY = 0x80000000;
155+
const static uint32_t BRANCHENTRY = 0x1;
129156
};
130157
} // namespace bolt
131158

0 commit comments

Comments
 (0)