Skip to content

Commit 67d00eb

Browse files
committed
[𝘀𝗽𝗿] changes to main this commit is based on
Created using spr 1.3.4 [skip ci]
1 parent 155d584 commit 67d00eb

File tree

11 files changed

+615
-123
lines changed

11 files changed

+615
-123
lines changed

bolt/docs/BAT.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# BOLT Address Translation (BAT)
2+
# Purpose
3+
A regular profile collection for BOLT involves collecting samples from
4+
unoptimized binary. BOLT Address Translation allows collecting profile
5+
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
6+
binary.
7+
8+
# Overview
9+
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
10+
into the output binary containing translation tables and split functions linkage
11+
information. This information enables mapping the profile back from optimized
12+
binary onto the original binary.
13+
14+
# Usage
15+
`--enable-bat` flag controls the generation of BAT section. Sampled profile
16+
needs to be passed along with the optimized binary containing BAT section to
17+
`perf2bolt` which reads BAT section and produces profile for the original
18+
binary.
19+
20+
# Internals
21+
## Section contents
22+
The section is organized as follows:
23+
- Hot functions table
24+
- Address translation tables
25+
- Cold functions table
26+
27+
## Construction and parsing
28+
BAT section is created from `BoltAddressTranslation` class which captures
29+
address translation information provided by BOLT linker. It is then encoded as a
30+
note section in the output binary.
31+
32+
During profile conversion when BAT-enabled binary is passed to perf2bolt,
33+
`BoltAddressTranslation` class is populated from BAT section. The class is then
34+
queried by `DataAggregator` during sample processing to reconstruct addresses/
35+
offsets in the input binary.
36+
37+
## Encoding format
38+
The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
39+
and bolt/lib/Profile/BoltAddressTranslation.cpp.
40+
41+
### Layout
42+
The general layout is as follows:
43+
```
44+
Hot functions table header
45+
|------------------|
46+
| Function entry |
47+
| |--------------| |
48+
| | OutOff InOff | |
49+
| |--------------| |
50+
~~~~~~~~~~~~~~~~~~~~
51+
52+
Cold functions table header
53+
|------------------|
54+
| Function entry |
55+
| |--------------| |
56+
| | OutOff InOff | |
57+
| |--------------| |
58+
~~~~~~~~~~~~~~~~~~~~
59+
```
60+
61+
### Functions table
62+
Hot and cold functions tables share the encoding except differences marked below.
63+
Header:
64+
| Entry | Encoding | Description |
65+
| ------ | ----- | ----------- |
66+
| `NumFuncs` | ULEB128 | Number of functions in the functions table |
67+
68+
The header is followed by Functions table with `NumFuncs` entries.
69+
Output binary addresses are delta encoded, meaning that only the difference with
70+
the last previous output address is stored. Addresses implicitly start at zero.
71+
Output addresses are continuous through function start addresses and function
72+
internal offsets, and between hot and cold fragments, to better spread deltas
73+
and save space.
74+
75+
Hot indices are delta encoded, implicitly starting at zero.
76+
| Entry | Encoding | Description |
77+
| ------ | ------| ----------- |
78+
| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
79+
| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
80+
| `FuncHash` | 8b | Hot functions only: function hash for input function |
81+
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
82+
| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
83+
| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
84+
Function header is followed by `EqualElems` offsets (hot functions only) and
85+
`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
86+
current function.
87+
88+
### Address translation table
89+
Delta encoding means that only the difference with the previous corresponding
90+
entry is encoded. Input offsets implicitly start at zero.
91+
| Entry | Encoding | Description |
92+
| ------ | ------| ----------- |
93+
| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
94+
| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
95+
| `BBHash` | Optional, 8b | Basic block entries only: basic block hash in input binary |
96+
97+
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
98+
(branch or call instruction). If not set, it signifies a control flow target
99+
(basic block offset).
100+
`InputAddr` is omitted for equal offsets in input and output function. In this
101+
case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.

bolt/include/bolt/Profile/BoltAddressTranslation.h

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
#include "llvm/ADT/SmallVector.h"
1313
#include "llvm/ADT/StringRef.h"
14+
#include "llvm/Support/DataExtractor.h"
1415
#include <cstdint>
1516
#include <map>
1617
#include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
7879

7980
BoltAddressTranslation() {}
8081

82+
/// Write the serialized address translation table for a function.
83+
template <bool Cold>
84+
void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
85+
raw_ostream &OS);
86+
8187
/// Write the serialized address translation tables for each reordered
8288
/// function
8389
void write(const BinaryContext &BC, raw_ostream &OS);
8490

91+
/// Read the serialized address translation table for a function.
92+
/// Return a parse error if failed.
93+
template <bool Cold>
94+
void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
95+
DataExtractor &DE, uint64_t &Offset, Error &Err);
96+
8597
/// Read the serialized address translation tables and load them internally
8698
/// in memory. Return a parse error if failed.
8799
std::error_code parse(StringRef Buf);
@@ -110,22 +122,43 @@ class BoltAddressTranslation {
110122
/// addresses when aggregating profile
111123
bool enabledFor(llvm::object::ELFObjectFileBase *InputFile) const;
112124

125+
/// Save function and basic block hashes used for metadata dump.
126+
void saveMetadata(BinaryContext &BC);
127+
128+
/// True if a given \p Address is a function with translation table entry.
129+
bool isBATFunction(uint64_t Address) const { return Maps.count(Address); }
130+
113131
private:
114132
/// Helper to update \p Map by inserting one or more BAT entries reflecting
115133
/// \p BB for function located at \p FuncAddress. At least one entry will be
116134
/// emitted for the start of the BB. More entries may be emitted to cover
117135
/// the location of calls or any instruction that may change control flow.
118136
void writeEntriesForBB(MapTy &Map, const BinaryBasicBlock &BB,
119-
uint64_t FuncAddress);
137+
uint64_t FuncAddress, uint64_t FuncInputAddress);
138+
139+
/// Returns the bitmask with set bits corresponding to indices of BRANCHENTRY
140+
/// entries in function address translation map.
141+
APInt calculateBranchEntriesBitMask(MapTy &Map, size_t EqualElems);
142+
143+
/// Calculate the number of equal offsets (output = input) in the beginning
144+
/// of the function.
145+
size_t getNumEqualOffsets(const MapTy &Map) const;
120146

121147
std::map<uint64_t, MapTy> Maps;
148+
std::map<uint64_t, MapTy> ColdMaps;
149+
150+
using BBHashMap = std::unordered_map<uint32_t, size_t>;
151+
std::unordered_map<uint64_t, std::pair<size_t, BBHashMap>> FuncHashes;
122152

123153
/// Links outlined cold bocks to their original function
124154
std::map<uint64_t, uint64_t> ColdPartSource;
125155

156+
/// Links output address of a main fragment back to input address.
157+
std::unordered_map<uint64_t, uint64_t> ReverseMap;
158+
126159
/// Identifies the address of a control-flow changing instructions in a
127160
/// translation map entry
128-
const static uint32_t BRANCHENTRY = 0x80000000;
161+
const static uint32_t BRANCHENTRY = 0x1;
129162
};
130163
} // namespace bolt
131164

bolt/include/bolt/Profile/DataAggregator.h

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,10 @@ class DataAggregator : public DataReader {
225225
/// Aggregation statistics
226226
uint64_t NumInvalidTraces{0};
227227
uint64_t NumLongRangeTraces{0};
228+
/// Specifies how many samples were recorded in cold areas if we are dealing
229+
/// with profiling data collected in a bolted binary. For LBRs, incremented
230+
/// for the source of the branch to avoid counting cold activity twice (one
231+
/// for source and another for destination).
228232
uint64_t NumColdSamples{0};
229233

230234
/// Looks into system PATH for Linux Perf and set up the aggregator to use it
@@ -246,13 +250,9 @@ class DataAggregator : public DataReader {
246250
BinaryFunction *getBinaryFunctionContainingAddress(uint64_t Address) const;
247251

248252
/// Retrieve the location name to be used for samples recorded in \p Func.
249-
/// If doing BAT translation, link cold parts to the hot part names (used by
250-
/// the original binary). \p Count specifies how many samples were recorded
251-
/// at that location, so we can tally total activity in cold areas if we are
252-
/// dealing with profiling data collected in a bolted binary. For LBRs,
253-
/// \p Count should only be used for the source of the branch to avoid
254-
/// counting cold activity twice (one for source and another for destination).
255-
StringRef getLocationName(BinaryFunction &Func, uint64_t Count);
253+
/// If doing BAT translation, link cold parts to the hot part names (used by
254+
/// the original binary) and return true as second member.
255+
std::pair<StringRef, bool> getLocationName(const BinaryFunction &Func) const;
256256

257257
/// Semantic actions - parser hooks to interpret parsed perf samples
258258
/// Register a sample (non-LBR mode), i.e. a new hit at \p Address
@@ -463,6 +463,10 @@ class DataAggregator : public DataReader {
463463
/// Dump data structures into a file readable by llvm-bolt
464464
std::error_code writeAggregatedFile(StringRef OutputFilename) const;
465465

466+
/// Dump translated data structures into YAML
467+
std::error_code writeBATYAML(BinaryContext &BC,
468+
StringRef OutputFilename) const;
469+
466470
/// Filter out binaries based on PID
467471
void filterBinaryMMapInfo();
468472

bolt/include/bolt/Profile/YAMLProfileWriter.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
#ifndef BOLT_PROFILE_YAML_PROFILE_WRITER_H
1010
#define BOLT_PROFILE_YAML_PROFILE_WRITER_H
1111

12+
#include "bolt/Profile/ProfileYAMLMapping.h"
1213
#include "llvm/Support/raw_ostream.h"
1314
#include <system_error>
1415

@@ -29,6 +30,9 @@ class YAMLProfileWriter {
2930

3031
/// Save execution profile for that instance.
3132
std::error_code writeProfile(const RewriteInstance &RI);
33+
34+
static yaml::bolt::BinaryFunctionProfile convert(const BinaryFunction &BF,
35+
bool UseDFS);
3236
};
3337

3438
} // namespace bolt

0 commit comments

Comments
 (0)