Skip to content

Commit 5883f6e

Browse files
authored
Merge branch 'main' into avl-llvm/dwarfstreamer-dependence
2 parents 30a603f + 9d8e538 commit 5883f6e

File tree

430 files changed

+8994
-5458
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

430 files changed

+8994
-5458
lines changed

bolt/docs/BAT.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# BOLT Address Translation (BAT)
2+
# Purpose
3+
A regular profile collection for BOLT involves collecting samples from
4+
unoptimized binary. BOLT Address Translation allows collecting profile
5+
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
6+
binary.
7+
8+
# Overview
9+
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
10+
into the output binary containing translation tables and split functions linkage
11+
information. This information enables mapping the profile back from optimized
12+
binary onto the original binary.
13+
14+
# Usage
15+
`--enable-bat` flag controls the generation of BAT section. Sampled profile
16+
needs to be passed along with the optimized binary containing BAT section to
17+
`perf2bolt` which reads BAT section and produces fdata profile for the original
18+
binary. Note that YAML profile generation is not supported since BAT doesn't
19+
contain the metadata for input functions.
20+
21+
# Internals
22+
## Section contents
23+
The section is organized as follows:
24+
- Functions table
25+
- Address translation tables
26+
- Fragment linkage table
27+
28+
## Construction and parsing
29+
BAT section is created from `BoltAddressTranslation` class which captures
30+
address translation information provided by BOLT linker. It is then encoded as a
31+
note section in the output binary.
32+
33+
During profile conversion when BAT-enabled binary is passed to perf2bolt,
34+
`BoltAddressTranslation` class is populated from BAT section. The class is then
35+
queried by `DataAggregator` during sample processing to reconstruct addresses/
36+
offsets in the input binary.
37+
38+
## Encoding format
39+
The encoding is specified in
40+
[BoltAddressTranslation.h](/bolt/include/bolt/Profile/BoltAddressTranslation.h)
41+
and [BoltAddressTranslation.cpp](/bolt/lib/Profile/BoltAddressTranslation.cpp).
42+
43+
### Layout
44+
The general layout is as follows:
45+
```
46+
Functions table header
47+
|------------------|
48+
| Function entry |
49+
| |--------------| |
50+
| | OutOff InOff | |
51+
| |--------------| |
52+
~~~~~~~~~~~~~~~~~~~~
53+
54+
Fragment linkage header
55+
|------------------|
56+
| ColdAddr HotAddr |
57+
~~~~~~~~~~~~~~~~~~~~
58+
```
59+
60+
### Functions table
61+
Header:
62+
| Entry | Encoding | Description |
63+
| ------ | ----- | ----------- |
64+
| `NumFuncs` | ULEB128 | Number of functions in the functions table |
65+
66+
The header is followed by Functions table with `NumFuncs` entries.
67+
Output binary addresses are delta encoded, meaning that only the difference with
68+
the previous output address is stored. Addresses implicitly start at zero.
69+
| Entry | Encoding | Description |
70+
| ------ | ------| ----------- |
71+
| `Address` | Delta, ULEB128 | Function address in the output binary |
72+
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
73+
74+
Function header is followed by `NumEntries` pairs of offsets for current
75+
function.
76+
77+
### Address translation table
78+
Delta encoding means that only the difference with the previous corresponding
79+
entry is encoded. Offsets implicitly start at zero.
80+
| Entry | Encoding | Description |
81+
| ------ | ------| ----------- |
82+
| `OutputOffset` | Delta, ULEB128 | Function offset in output binary |
83+
| `InputOffset` | Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
84+
85+
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
86+
(branch or call instruction). If not set, it signifies a control flow target
87+
(basic block offset).
88+
89+
### Fragment linkage table
90+
Following Functions table, fragment linkage table is encoded to link split
91+
cold fragments with main (hot) fragment.
92+
Header:
93+
| Entry | Encoding | Description |
94+
| ------ | ------------ | ----------- |
95+
| `NumColdEntries` | ULEB128 | Number of split functions in the functions table |
96+
97+
`NumColdEntries` pairs of addresses follow:
98+
| Entry | Encoding | Description |
99+
| ------ | ------| ----------- |
100+
| `ColdAddress` | ULEB128 | Cold fragment address in output binary |
101+
| `HotAddress` | ULEB128 | Hot fragment address in output binary |

bolt/include/bolt/Profile/BoltAddressTranslation.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ class BoltAddressTranslation {
125125

126126
/// Identifies the address of a control-flow changing instructions in a
127127
/// translation map entry
128-
const static uint32_t BRANCHENTRY = 0x80000000;
128+
const static uint32_t BRANCHENTRY = 0x1;
129129
};
130130
} // namespace bolt
131131

bolt/lib/Profile/BoltAddressTranslation.cpp

Lines changed: 39 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
#include "bolt/Core/BinaryFunction.h"
1111
#include "llvm/Support/DataExtractor.h"
1212
#include "llvm/Support/Errc.h"
13+
#include "llvm/Support/Error.h"
14+
#include "llvm/Support/LEB128.h"
1315

1416
#define DEBUG_TYPE "bolt-bat"
1517

@@ -44,7 +46,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
4446
// and this deleted block will both share the same output address (the same
4547
// key), and we need to map back. We choose here to privilege the successor by
4648
// allowing it to overwrite the previously inserted key in the map.
47-
Map[BBOutputOffset] = BBInputOffset;
49+
Map[BBOutputOffset] = BBInputOffset << 1;
4850

4951
const auto &IOAddressMap =
5052
BB.getFunction()->getBinaryContext().getIOAddressMap();
@@ -61,8 +63,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
6163

6264
LLVM_DEBUG(dbgs() << " Key: " << Twine::utohexstr(OutputOffset) << " Val: "
6365
<< Twine::utohexstr(InputOffset) << " (branch)\n");
64-
Map.insert(
65-
std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY));
66+
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset,
67+
(InputOffset << 1) | BRANCHENTRY));
6668
}
6769
}
6870

@@ -102,28 +104,33 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
102104
}
103105

104106
const uint32_t NumFuncs = Maps.size();
105-
OS.write(reinterpret_cast<const char *>(&NumFuncs), 4);
107+
encodeULEB128(NumFuncs, OS);
106108
LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n");
109+
uint64_t PrevAddress = 0;
107110
for (auto &MapEntry : Maps) {
108111
const uint64_t Address = MapEntry.first;
109112
MapTy &Map = MapEntry.second;
110113
const uint32_t NumEntries = Map.size();
111114
LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x"
112115
<< Twine::utohexstr(Address) << ".\n");
113-
OS.write(reinterpret_cast<const char *>(&Address), 8);
114-
OS.write(reinterpret_cast<const char *>(&NumEntries), 4);
116+
encodeULEB128(Address - PrevAddress, OS);
117+
PrevAddress = Address;
118+
encodeULEB128(NumEntries, OS);
119+
uint64_t InOffset = 0, OutOffset = 0;
120+
// Output and Input addresses and delta-encoded
115121
for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
116-
OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4);
117-
OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4);
122+
encodeULEB128(KeyVal.first - OutOffset, OS);
123+
encodeSLEB128(KeyVal.second - InOffset, OS);
124+
std::tie(OutOffset, InOffset) = KeyVal;
118125
}
119126
}
120127
const uint32_t NumColdEntries = ColdPartSource.size();
121128
LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries
122129
<< " cold part mappings.\n");
123-
OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4);
130+
encodeULEB128(NumColdEntries, OS);
124131
for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) {
125-
OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8);
126-
OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8);
132+
encodeULEB128(ColdEntry.first, OS);
133+
encodeULEB128(ColdEntry.second, OS);
127134
LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> "
128135
<< Twine::utohexstr(ColdEntry.second) << "\n");
129136
}
@@ -152,43 +159,37 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
152159
if (Name.substr(0, 4) != "BOLT")
153160
return make_error_code(llvm::errc::io_error);
154161

155-
if (Buf.size() - Offset < 4)
156-
return make_error_code(llvm::errc::io_error);
157-
158-
const uint32_t NumFunctions = DE.getU32(&Offset);
162+
Error Err(Error::success());
163+
const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err);
159164
LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n");
165+
uint64_t PrevAddress = 0;
160166
for (uint32_t I = 0; I < NumFunctions; ++I) {
161-
if (Buf.size() - Offset < 12)
162-
return make_error_code(llvm::errc::io_error);
163-
164-
const uint64_t Address = DE.getU64(&Offset);
165-
const uint32_t NumEntries = DE.getU32(&Offset);
167+
const uint64_t Address = PrevAddress + DE.getULEB128(&Offset, &Err);
168+
PrevAddress = Address;
169+
const uint32_t NumEntries = DE.getULEB128(&Offset, &Err);
166170
MapTy Map;
167171

168172
LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
169173
<< Twine::utohexstr(Address) << "\n");
170-
if (Buf.size() - Offset < 8 * NumEntries)
171-
return make_error_code(llvm::errc::io_error);
174+
uint64_t InputOffset = 0, OutputOffset = 0;
172175
for (uint32_t J = 0; J < NumEntries; ++J) {
173-
const uint32_t OutputAddr = DE.getU32(&Offset);
174-
const uint32_t InputAddr = DE.getU32(&Offset);
175-
Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
176-
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
177-
<< Twine::utohexstr(InputAddr) << "\n");
176+
const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
177+
const int64_t InputDelta = DE.getSLEB128(&Offset, &Err);
178+
OutputOffset += OutputDelta;
179+
InputOffset += InputDelta;
180+
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
181+
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputOffset) << " -> "
182+
<< Twine::utohexstr(InputOffset) << " (" << OutputDelta
183+
<< ", " << InputDelta << ")\n");
178184
}
179185
Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
180186
}
181187

182-
if (Buf.size() - Offset < 4)
183-
return make_error_code(llvm::errc::io_error);
184-
185-
const uint32_t NumColdEntries = DE.getU32(&Offset);
188+
const uint32_t NumColdEntries = DE.getULEB128(&Offset, &Err);
186189
LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n");
187190
for (uint32_t I = 0; I < NumColdEntries; ++I) {
188-
if (Buf.size() - Offset < 16)
189-
return make_error_code(llvm::errc::io_error);
190-
const uint32_t ColdAddress = DE.getU64(&Offset);
191-
const uint32_t HotAddress = DE.getU64(&Offset);
191+
const uint32_t ColdAddress = DE.getULEB128(&Offset, &Err);
192+
const uint32_t HotAddress = DE.getULEB128(&Offset, &Err);
192193
ColdPartSource.insert(
193194
std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress));
194195
LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> "
@@ -198,7 +199,7 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
198199
outs() << "BOLT-INFO: Parsed " << NumColdEntries
199200
<< " BAT cold-to-hot entries\n";
200201

201-
return std::error_code();
202+
return errorToErrorCode(std::move(Err));
202203
}
203204

204205
void BoltAddressTranslation::dump(raw_ostream &OS) {
@@ -209,7 +210,7 @@ void BoltAddressTranslation::dump(raw_ostream &OS) {
209210
OS << "BB mappings:\n";
210211
for (const auto &Entry : MapEntry.second) {
211212
const bool IsBranch = Entry.second & BRANCHENTRY;
212-
const uint32_t Val = Entry.second & ~BRANCHENTRY;
213+
const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit
213214
OS << "0x" << Twine::utohexstr(Entry.first) << " -> "
214215
<< "0x" << Twine::utohexstr(Val);
215216
if (IsBranch)
@@ -244,7 +245,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress,
244245

245246
--KeyVal;
246247

247-
const uint32_t Val = KeyVal->second & ~BRANCHENTRY;
248+
const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit
248249
// Branch source addresses are translated to the first instruction of the
249250
// source BB to avoid accounting for modifications BOLT may have made in the
250251
// BB regarding deletion/addition of instructions.

bolt/lib/Rewrite/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ set(LLVM_LINK_COMPONENTS
55
MC
66
Object
77
Support
8-
DWARFLinkerBase
98
DWARFLinker
9+
DWARFLinkerClassic
1010
AsmPrinter
1111
TargetParser
1212
)

bolt/test/X86/bolt-address-translation.test

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
# CHECK: BOLT: 3 out of 7 functions were overwritten.
3838
# CHECK: BOLT-INFO: Wrote 6 BAT maps
3939
# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries
40-
# CHECK: BOLT-INFO: BAT section size (bytes): 1436
40+
# CHECK: BOLT-INFO: BAT section size (bytes): 428
4141
#
4242
# usqrt mappings (hot part). We match against any key (left side containing
4343
# the bolted binary offsets) because BOLT may change where it puts instructions

clang-tools-extra/clangd/CompileCommands.cpp

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -313,26 +313,29 @@ void CommandMangler::operator()(tooling::CompileCommand &Command,
313313

314314
tooling::addTargetAndModeForProgramName(Cmd, Cmd.front());
315315

316-
// Check whether the flag exists, either as -flag or -flag=*
317-
auto Has = [&](llvm::StringRef Flag) {
318-
for (llvm::StringRef Arg : Cmd) {
319-
if (Arg.consume_front(Flag) && (Arg.empty() || Arg[0] == '='))
320-
return true;
321-
}
322-
return false;
316+
// Check whether the flag exists in the command.
317+
auto HasExact = [&](llvm::StringRef Flag) {
318+
return llvm::any_of(Cmd, [&](llvm::StringRef Arg) { return Arg == Flag; });
319+
};
320+
321+
// Check whether the flag appears in the command as a prefix.
322+
auto HasPrefix = [&](llvm::StringRef Flag) {
323+
return llvm::any_of(
324+
Cmd, [&](llvm::StringRef Arg) { return Arg.starts_with(Flag); });
323325
};
324326

325327
llvm::erase_if(Cmd, [](llvm::StringRef Elem) {
326328
return Elem.starts_with("--save-temps") || Elem.starts_with("-save-temps");
327329
});
328330

329331
std::vector<std::string> ToAppend;
330-
if (ResourceDir && !Has("-resource-dir"))
332+
if (ResourceDir && !HasExact("-resource-dir") && !HasPrefix("-resource-dir="))
331333
ToAppend.push_back(("-resource-dir=" + *ResourceDir));
332334

333335
// Don't set `-isysroot` if it is already set or if `--sysroot` is set.
334336
// `--sysroot` is a superset of the `-isysroot` argument.
335-
if (Sysroot && !Has("-isysroot") && !Has("--sysroot")) {
337+
if (Sysroot && !HasPrefix("-isysroot") && !HasExact("--sysroot") &&
338+
!HasPrefix("--sysroot=")) {
336339
ToAppend.push_back("-isysroot");
337340
ToAppend.push_back(*Sysroot);
338341
}
@@ -343,7 +346,7 @@ void CommandMangler::operator()(tooling::CompileCommand &Command,
343346
}
344347

345348
if (!Cmd.empty()) {
346-
bool FollowSymlink = !Has("-no-canonical-prefixes");
349+
bool FollowSymlink = !HasExact("-no-canonical-prefixes");
347350
Cmd.front() =
348351
(FollowSymlink ? ResolvedDrivers : ResolvedDriversNoFollow)
349352
.get(Cmd.front(), [&, this] {

0 commit comments

Comments
 (0)