Skip to content

Commit d19c017

Browse files
committed
[𝘀𝗽𝗿] changes to main this commit is based on
Created using spr 1.3.4 [skip ci]
1 parent 155d584 commit d19c017

File tree

5 files changed

+189
-66
lines changed

5 files changed

+189
-66
lines changed

bolt/docs/BAT.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# BOLT Address Translation (BAT)
2+
# Purpose
3+
A regular profile collection for BOLT involves collecting samples from
4+
unoptimized binary. BOLT Address Translation allows collecting profile
5+
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
6+
binary.
7+
8+
# Overview
9+
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
10+
into the output binary containing translation tables and split functions linkage
11+
information. This information enables mapping the profile back from optimized
12+
binary onto the original binary.
13+
14+
# Usage
15+
`--enable-bat` flag controls the generation of BAT section. Sampled profile
16+
needs to be passed along with the optimized binary containing BAT section to
17+
`perf2bolt` which reads BAT section and produces fdata profile for the original
18+
binary. Note that YAML profile generation is not supported since BAT doesn't
19+
contain the metadata for input functions.
20+
21+
# Internals
22+
## Section contents
23+
The section is organized as follows:
24+
- Hot functions table
25+
- Address translation tables
26+
- Cold functions table
27+
28+
## Construction and parsing
29+
BAT section is created from `BoltAddressTranslation` class which captures
30+
address translation information provided by BOLT linker. It is then encoded as a
31+
note section in the output binary.
32+
33+
During profile conversion when BAT-enabled binary is passed to perf2bolt,
34+
`BoltAddressTranslation` class is populated from BAT section. The class is then
35+
queried by `DataAggregator` during sample processing to reconstruct addresses/
36+
offsets in the input binary.
37+
38+
## Encoding format
39+
The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
40+
and bolt/lib/Profile/BoltAddressTranslation.cpp.
41+
42+
### Layout
43+
The general layout is as follows:
44+
```
45+
Hot functions table header
46+
|------------------|
47+
| Function entry |
48+
| |--------------| |
49+
| | OutOff InOff | |
50+
| |--------------| |
51+
~~~~~~~~~~~~~~~~~~~~
52+
53+
Cold functions table header
54+
|------------------|
55+
| Function entry |
56+
| |--------------| |
57+
| | OutOff InOff | |
58+
| |--------------| |
59+
~~~~~~~~~~~~~~~~~~~~
60+
```
61+
62+
### Functions table
63+
Hot and cold functions tables share the encoding except difference marked below.
64+
Header:
65+
| Entry | Encoding | Description |
66+
| ------ | ----- | ----------- |
67+
| `NumFuncs` | ULEB128 | Number of functions in the functions table |
68+
69+
The header is followed by Functions table with `NumFuncs` entries.
70+
Output binary addresses are delta encoded, meaning that only the difference with
71+
the last previous output address is stored. Addresses implicitly start at zero.
72+
Output addresses are continuous through function start addresses and function
73+
internal offsets, and between hot and cold fragments, to better spread deltas
74+
and save space.
75+
76+
Hot indices are delta encoded, implicitly starting at zero.
77+
| Entry | Encoding | Description |
78+
| ------ | ------| ----------- |
79+
| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
80+
| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
81+
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
82+
Function header is followed by `NumEntries` pairs of offsets for current
83+
function.
84+
85+
### Address translation table
86+
Delta encoding means that only the difference with the previous corresponding
87+
entry is encoded. Input offsets implicitly start at zero.
88+
| Entry | Encoding | Description |
89+
| ------ | ------| ----------- |
90+
| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
91+
| `InputAddr` | Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
92+
93+
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
94+
(branch or call instruction). If not set, it signifies a control flow target
95+
(basic block offset).

bolt/include/bolt/Profile/BoltAddressTranslation.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
#include "llvm/ADT/SmallVector.h"
1313
#include "llvm/ADT/StringRef.h"
14+
#include "llvm/Support/DataExtractor.h"
1415
#include <cstdint>
1516
#include <map>
1617
#include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
7879

7980
BoltAddressTranslation() {}
8081

82+
/// Write the serialized address translation table for a function.
83+
template <bool Cold>
84+
void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
85+
raw_ostream &OS);
86+
8187
/// Write the serialized address translation tables for each reordered
8288
/// function
8389
void write(const BinaryContext &BC, raw_ostream &OS);
8490

91+
/// Read the serialized address translation table for a function.
92+
/// Return a parse error if failed.
93+
template <bool Cold>
94+
void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
95+
DataExtractor &DE, uint64_t &Offset, Error &Err);
96+
8597
/// Read the serialized address translation tables and load them internally
8698
/// in memory. Return a parse error if failed.
8799
std::error_code parse(StringRef Buf);
@@ -119,13 +131,14 @@ class BoltAddressTranslation {
119131
uint64_t FuncAddress);
120132

121133
std::map<uint64_t, MapTy> Maps;
134+
std::map<uint64_t, MapTy> ColdMaps;
122135

123136
/// Links outlined cold bocks to their original function
124137
std::map<uint64_t, uint64_t> ColdPartSource;
125138

126139
/// Identifies the address of a control-flow changing instructions in a
127140
/// translation map entry
128-
const static uint32_t BRANCHENTRY = 0x80000000;
141+
const static uint32_t BRANCHENTRY = 0x1;
129142
};
130143
} // namespace bolt
131144

bolt/lib/Profile/BoltAddressTranslation.cpp

Lines changed: 78 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@
88

99
#include "bolt/Profile/BoltAddressTranslation.h"
1010
#include "bolt/Core/BinaryFunction.h"
11-
#include "llvm/Support/DataExtractor.h"
1211
#include "llvm/Support/Errc.h"
12+
#include "llvm/Support/Error.h"
13+
#include "llvm/Support/LEB128.h"
1314

1415
#define DEBUG_TYPE "bolt-bat"
1516

@@ -44,7 +45,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
4445
// and this deleted block will both share the same output address (the same
4546
// key), and we need to map back. We choose here to privilege the successor by
4647
// allowing it to overwrite the previously inserted key in the map.
47-
Map[BBOutputOffset] = BBInputOffset;
48+
Map[BBOutputOffset] = BBInputOffset << 1;
4849

4950
const auto &IOAddressMap =
5051
BB.getFunction()->getBinaryContext().getIOAddressMap();
@@ -61,8 +62,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
6162

6263
LLVM_DEBUG(dbgs() << " Key: " << Twine::utohexstr(OutputOffset) << " Val: "
6364
<< Twine::utohexstr(InputOffset) << " (branch)\n");
64-
Map.insert(
65-
std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY));
65+
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset,
66+
(InputOffset << 1) | BRANCHENTRY));
6667
}
6768
}
6869

@@ -96,41 +97,53 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
9697
for (const BinaryBasicBlock *const BB : FF)
9798
writeEntriesForBB(Map, *BB, FF.getAddress());
9899

99-
Maps.emplace(FF.getAddress(), std::move(Map));
100+
ColdMaps.emplace(FF.getAddress(), std::move(Map));
100101
ColdPartSource.emplace(FF.getAddress(), Function.getOutputAddress());
101102
}
102103
}
103104

105+
// Output addresses are delta-encoded
106+
uint64_t PrevAddress = 0;
107+
writeMaps</*Cold=*/false>(Maps, PrevAddress, OS);
108+
writeMaps</*Cold=*/true>(ColdMaps, PrevAddress, OS);
109+
110+
outs() << "BOLT-INFO: Wrote " << Maps.size() + ColdMaps.size()
111+
<< " BAT maps\n";
112+
}
113+
114+
template <bool Cold>
115+
void BoltAddressTranslation::writeMaps(std::map<uint64_t, MapTy> &Maps,
116+
uint64_t &PrevAddress, raw_ostream &OS) {
104117
const uint32_t NumFuncs = Maps.size();
105-
OS.write(reinterpret_cast<const char *>(&NumFuncs), 4);
106-
LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n");
118+
encodeULEB128(NumFuncs, OS);
119+
LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << (Cold ? " cold" : "")
120+
<< " functions for BAT.\n");
121+
size_t PrevIndex = 0;
107122
for (auto &MapEntry : Maps) {
108123
const uint64_t Address = MapEntry.first;
109124
MapTy &Map = MapEntry.second;
110125
const uint32_t NumEntries = Map.size();
111126
LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x"
112127
<< Twine::utohexstr(Address) << ".\n");
113-
OS.write(reinterpret_cast<const char *>(&Address), 8);
114-
OS.write(reinterpret_cast<const char *>(&NumEntries), 4);
128+
encodeULEB128(Address - PrevAddress, OS);
129+
PrevAddress = Address;
130+
if (Cold) {
131+
size_t HotIndex =
132+
std::distance(ColdPartSource.begin(), ColdPartSource.find(Address));
133+
encodeULEB128(HotIndex - PrevIndex, OS);
134+
PrevIndex = HotIndex;
135+
}
136+
encodeULEB128(NumEntries, OS);
137+
uint64_t InOffset = 0;
138+
// Output and Input addresses and delta-encoded
115139
for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
116-
OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4);
117-
OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4);
140+
const uint64_t OutputAddress = KeyVal.first + Address;
141+
encodeULEB128(OutputAddress - PrevAddress, OS);
142+
PrevAddress = OutputAddress;
143+
encodeSLEB128(KeyVal.second - InOffset, OS);
144+
InOffset = KeyVal.second;
118145
}
119146
}
120-
const uint32_t NumColdEntries = ColdPartSource.size();
121-
LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries
122-
<< " cold part mappings.\n");
123-
OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4);
124-
for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) {
125-
OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8);
126-
OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8);
127-
LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> "
128-
<< Twine::utohexstr(ColdEntry.second) << "\n");
129-
}
130-
131-
outs() << "BOLT-INFO: Wrote " << Maps.size() << " BAT maps\n";
132-
outs() << "BOLT-INFO: Wrote " << NumColdEntries
133-
<< " BAT cold-to-hot entries\n";
134147
}
135148

136149
std::error_code BoltAddressTranslation::parse(StringRef Buf) {
@@ -152,53 +165,54 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
152165
if (Name.substr(0, 4) != "BOLT")
153166
return make_error_code(llvm::errc::io_error);
154167

155-
if (Buf.size() - Offset < 4)
156-
return make_error_code(llvm::errc::io_error);
168+
Error Err(Error::success());
169+
std::vector<uint64_t> HotFuncs;
170+
uint64_t PrevAddress = 0;
171+
parseMaps</*Cold=*/false>(HotFuncs, PrevAddress, DE, Offset, Err);
172+
parseMaps</*Cold=*/true>(HotFuncs, PrevAddress, DE, Offset, Err);
173+
outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
174+
return errorToErrorCode(std::move(Err));
175+
}
157176

158-
const uint32_t NumFunctions = DE.getU32(&Offset);
159-
LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n");
177+
template <bool Cold>
178+
void BoltAddressTranslation::parseMaps(std::vector<uint64_t> &HotFuncs,
179+
uint64_t &PrevAddress, DataExtractor &DE,
180+
uint64_t &Offset, Error &Err) {
181+
const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err);
182+
LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << (Cold ? " cold" : "")
183+
<< " functions\n");
184+
size_t HotIndex = 0;
160185
for (uint32_t I = 0; I < NumFunctions; ++I) {
161-
if (Buf.size() - Offset < 12)
162-
return make_error_code(llvm::errc::io_error);
163-
164-
const uint64_t Address = DE.getU64(&Offset);
165-
const uint32_t NumEntries = DE.getU32(&Offset);
186+
const uint64_t Address = PrevAddress + DE.getULEB128(&Offset, &Err);
187+
PrevAddress = Address;
188+
if (Cold) {
189+
HotIndex += DE.getULEB128(&Offset, &Err);
190+
ColdPartSource.emplace(Address, HotFuncs[HotIndex]);
191+
} else {
192+
HotFuncs.push_back(Address);
193+
}
194+
const uint32_t NumEntries = DE.getULEB128(&Offset, &Err);
166195
MapTy Map;
167196

168197
LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
169198
<< Twine::utohexstr(Address) << "\n");
170-
if (Buf.size() - Offset < 8 * NumEntries)
171-
return make_error_code(llvm::errc::io_error);
199+
uint64_t InputOffset = 0;
172200
for (uint32_t J = 0; J < NumEntries; ++J) {
173-
const uint32_t OutputAddr = DE.getU32(&Offset);
174-
const uint32_t InputAddr = DE.getU32(&Offset);
175-
Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
176-
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
177-
<< Twine::utohexstr(InputAddr) << "\n");
201+
const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
202+
const uint64_t OutputAddress = PrevAddress + OutputDelta;
203+
const uint64_t OutputOffset = OutputAddress - Address;
204+
PrevAddress = OutputAddress;
205+
const int64_t InputDelta = DE.getSLEB128(&Offset, &Err);
206+
InputOffset += InputDelta;
207+
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
208+
LLVM_DEBUG(
209+
dbgs() << formatv("{0:x} -> {1:x} ({2}/{3}b -> {4}/{5}b), {6:x}\n",
210+
OutputOffset, InputOffset, OutputDelta,
211+
encodeULEB128(OutputDelta, nulls()), InputDelta,
212+
encodeSLEB128(InputDelta, nulls()), OutputAddress));
178213
}
179214
Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
180215
}
181-
182-
if (Buf.size() - Offset < 4)
183-
return make_error_code(llvm::errc::io_error);
184-
185-
const uint32_t NumColdEntries = DE.getU32(&Offset);
186-
LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n");
187-
for (uint32_t I = 0; I < NumColdEntries; ++I) {
188-
if (Buf.size() - Offset < 16)
189-
return make_error_code(llvm::errc::io_error);
190-
const uint32_t ColdAddress = DE.getU64(&Offset);
191-
const uint32_t HotAddress = DE.getU64(&Offset);
192-
ColdPartSource.insert(
193-
std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress));
194-
LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> "
195-
<< Twine::utohexstr(HotAddress) << "\n");
196-
}
197-
outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
198-
outs() << "BOLT-INFO: Parsed " << NumColdEntries
199-
<< " BAT cold-to-hot entries\n";
200-
201-
return std::error_code();
202216
}
203217

204218
void BoltAddressTranslation::dump(raw_ostream &OS) {
@@ -209,7 +223,7 @@ void BoltAddressTranslation::dump(raw_ostream &OS) {
209223
OS << "BB mappings:\n";
210224
for (const auto &Entry : MapEntry.second) {
211225
const bool IsBranch = Entry.second & BRANCHENTRY;
212-
const uint32_t Val = Entry.second & ~BRANCHENTRY;
226+
const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit
213227
OS << "0x" << Twine::utohexstr(Entry.first) << " -> "
214228
<< "0x" << Twine::utohexstr(Val);
215229
if (IsBranch)
@@ -244,7 +258,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress,
244258

245259
--KeyVal;
246260

247-
const uint32_t Val = KeyVal->second & ~BRANCHENTRY;
261+
const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit
248262
// Branch source addresses are translated to the first instruction of the
249263
// source BB to avoid accounting for modifications BOLT may have made in the
250264
// BB regarding deletion/addition of instructions.

bolt/lib/Rewrite/RewriteInstance.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4112,6 +4112,7 @@ void RewriteInstance::encodeBATSection() {
41124112
copyByteArray(BoltInfo), BoltInfo.size(),
41134113
/*Alignment=*/1,
41144114
/*IsReadOnly=*/true, ELF::SHT_NOTE);
4115+
outs() << "BOLT-INFO: BAT section size (bytes): " << BoltInfo.size() << '\n';
41154116
}
41164117

41174118
template <typename ELFShdrTy>

bolt/test/X86/bolt-address-translation.test

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
#
3737
# CHECK: BOLT: 3 out of 7 functions were overwritten.
3838
# CHECK: BOLT-INFO: Wrote 6 BAT maps
39-
# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries
39+
# CHECK: BOLT-INFO: BAT section size (bytes): 404
4040
#
4141
# usqrt mappings (hot part). We match against any key (left side containing
4242
# the bolted binary offsets) because BOLT may change where it puts instructions

0 commit comments

Comments
 (0)