Skip to content

[BOLT] Delta-encode offsets in BAT #76900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 11, 2024
99 changes: 99 additions & 0 deletions bolt/docs/BAT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# BOLT Address Translation (BAT)
# Purpose
A regular profile collection for BOLT involves collecting samples from
unoptimized binary. BOLT Address Translation allows collecting profile
from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
binary.

# Overview
BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
into the output binary containing translation tables and split functions linkage
information. This information enables mapping the profile back from optimized
binary onto the original binary.

# Usage
`--enable-bat` flag controls the generation of BAT section. Sampled profile
needs to be passed along with the optimized binary containing BAT section to
`perf2bolt` which reads BAT section and produces fdata profile for the original
binary. Note that YAML profile generation is not supported since BAT doesn't
contain the metadata for input functions.

# Internals
## Section contents
The section is organized as follows:
- Functions table
- Address translation tables
- Fragment linkage table

## Construction and parsing
BAT section is created from `BoltAddressTranslation` class which captures
address translation information provided by BOLT linker. It is then encoded as a
note section in the output binary.

During profile conversion when BAT-enabled binary is passed to perf2bolt,
`BoltAddressTranslation` class is populated from BAT section. The class is then
queried by `DataAggregator` during sample processing to reconstruct addresses/
offsets in the input binary.

## Encoding format
The encoding is specified in
[BoltAddressTranslation.h](/bolt/include/bolt/Profile/BoltAddressTranslation.h)
and [BoltAddressTranslation.cpp](/bolt/lib/Profile/BoltAddressTranslation.cpp).

### Layout
The general layout is as follows:
```
Functions table header
|------------------|
| Function entry |
| |--------------| |
| | OutOff InOff | |
| |--------------| |
~~~~~~~~~~~~~~~~~~~~

Fragment linkage header
|------------------|
| ColdAddr HotAddr |
~~~~~~~~~~~~~~~~~~~~
```

### Functions table
Header:
| Entry | Encoding | Description |
| ------ | ----- | ----------- |
| `NumFuncs` | ULEB128 | Number of functions in the functions table |

The header is followed by Functions table with `NumFuncs` entries.
| Entry | Encoding | Description |
| ------ | ------| ----------- |
| `Address` | ULEB128 | Function address in the output binary |
| `NumEntries` | ULEB128 | Number of address translation entries for a function |

Function header is followed by `NumEntries` pairs of offsets for current
function.

### Address translation table
Delta encoding means that only the difference with the previous corresponding
entry is encoded. Offsets implicitly start at zero.
| Entry | Encoding | Description |
| ------ | ------| ----------- |
| `OutputAddr` | Delta, ULEB128 | Function offset in output binary |
| `InputAddr` | Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |

`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
(branch or call instruction). If not set, it signifies a control flow target
(basic block offset).

### Fragment linkage table
Following Functions table, fragment linkage table is encoded to link split
cold fragments with main (hot) fragment.
Header:
| Entry | Encoding | Description |
| ------ | ------------ | ----------- |
| `NumColdEntries` | ULEB128 | Number of split functions in the functions table |

`NumColdEntries` pairs of addresses follow:
| Entry | Encoding | Description |
| ------ | ------| ----------- |
| `ColdAddress` | ULEB128 | Cold fragment address in output binary |
| `HotAddress` | ULEB128 | Hot fragment address in output binary |
2 changes: 1 addition & 1 deletion bolt/include/bolt/Profile/BoltAddressTranslation.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ class BoltAddressTranslation {

/// Identifies the address of a control-flow changing instructions in a
/// translation map entry
const static uint32_t BRANCHENTRY = 0x80000000;
const static uint32_t BRANCHENTRY = 0x1;
};
} // namespace bolt

Expand Down
73 changes: 35 additions & 38 deletions bolt/lib/Profile/BoltAddressTranslation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
#include "bolt/Core/BinaryFunction.h"
#include "llvm/Support/DataExtractor.h"
#include "llvm/Support/Errc.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/LEB128.h"

#define DEBUG_TYPE "bolt-bat"

Expand Down Expand Up @@ -44,7 +46,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
// and this deleted block will both share the same output address (the same
// key), and we need to map back. We choose here to privilege the successor by
// allowing it to overwrite the previously inserted key in the map.
Map[BBOutputOffset] = BBInputOffset;
Map[BBOutputOffset] = BBInputOffset << 1;

const auto &IOAddressMap =
BB.getFunction()->getBinaryContext().getIOAddressMap();
Expand All @@ -61,8 +63,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,

LLVM_DEBUG(dbgs() << " Key: " << Twine::utohexstr(OutputOffset) << " Val: "
<< Twine::utohexstr(InputOffset) << " (branch)\n");
Map.insert(
std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY));
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset,
(InputOffset << 1) | BRANCHENTRY));
}
}

Expand Down Expand Up @@ -102,28 +104,31 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
}

const uint32_t NumFuncs = Maps.size();
OS.write(reinterpret_cast<const char *>(&NumFuncs), 4);
encodeULEB128(NumFuncs, OS);
LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n");
for (auto &MapEntry : Maps) {
const uint64_t Address = MapEntry.first;
MapTy &Map = MapEntry.second;
const uint32_t NumEntries = Map.size();
LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x"
<< Twine::utohexstr(Address) << ".\n");
OS.write(reinterpret_cast<const char *>(&Address), 8);
OS.write(reinterpret_cast<const char *>(&NumEntries), 4);
encodeULEB128(Address, OS);
encodeULEB128(NumEntries, OS);
uint64_t InOffset = 0, OutOffset = 0;
// Output and Input addresses and delta-encoded
for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4);
OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4);
encodeULEB128(KeyVal.first - OutOffset, OS);
encodeSLEB128(KeyVal.second - InOffset, OS);
std::tie(OutOffset, InOffset) = KeyVal;
}
}
const uint32_t NumColdEntries = ColdPartSource.size();
LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries
<< " cold part mappings.\n");
OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4);
encodeULEB128(NumColdEntries, OS);
for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) {
OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8);
OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8);
encodeULEB128(ColdEntry.first, OS);
encodeULEB128(ColdEntry.second, OS);
LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> "
<< Twine::utohexstr(ColdEntry.second) << "\n");
}
Expand Down Expand Up @@ -152,43 +157,35 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
if (Name.substr(0, 4) != "BOLT")
return make_error_code(llvm::errc::io_error);

if (Buf.size() - Offset < 4)
return make_error_code(llvm::errc::io_error);

const uint32_t NumFunctions = DE.getU32(&Offset);
Error Err(Error::success());
const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err);
LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n");
for (uint32_t I = 0; I < NumFunctions; ++I) {
if (Buf.size() - Offset < 12)
return make_error_code(llvm::errc::io_error);

const uint64_t Address = DE.getU64(&Offset);
const uint32_t NumEntries = DE.getU32(&Offset);
const uint64_t Address = DE.getULEB128(&Offset, &Err);
const uint32_t NumEntries = DE.getULEB128(&Offset, &Err);
MapTy Map;

LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
<< Twine::utohexstr(Address) << "\n");
if (Buf.size() - Offset < 8 * NumEntries)
return make_error_code(llvm::errc::io_error);
uint64_t InputOffset = 0, OutputOffset = 0;
for (uint32_t J = 0; J < NumEntries; ++J) {
const uint32_t OutputAddr = DE.getU32(&Offset);
const uint32_t InputAddr = DE.getU32(&Offset);
Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
<< Twine::utohexstr(InputAddr) << "\n");
const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
const int64_t InputDelta = DE.getSLEB128(&Offset, &Err);
OutputOffset += OutputDelta;
InputOffset += InputDelta;
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputOffset) << " -> "
<< Twine::utohexstr(InputOffset) << " (" << OutputDelta
<< ", " << InputDelta << ")\n");
}
Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
}

if (Buf.size() - Offset < 4)
return make_error_code(llvm::errc::io_error);

const uint32_t NumColdEntries = DE.getU32(&Offset);
const uint32_t NumColdEntries = DE.getULEB128(&Offset, &Err);
LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n");
for (uint32_t I = 0; I < NumColdEntries; ++I) {
if (Buf.size() - Offset < 16)
return make_error_code(llvm::errc::io_error);
const uint32_t ColdAddress = DE.getU64(&Offset);
const uint32_t HotAddress = DE.getU64(&Offset);
const uint32_t ColdAddress = DE.getULEB128(&Offset, &Err);
const uint32_t HotAddress = DE.getULEB128(&Offset, &Err);
ColdPartSource.insert(
std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress));
LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> "
Expand All @@ -198,7 +195,7 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
outs() << "BOLT-INFO: Parsed " << NumColdEntries
<< " BAT cold-to-hot entries\n";

return std::error_code();
return errorToErrorCode(std::move(Err));
}

void BoltAddressTranslation::dump(raw_ostream &OS) {
Expand All @@ -209,7 +206,7 @@ void BoltAddressTranslation::dump(raw_ostream &OS) {
OS << "BB mappings:\n";
for (const auto &Entry : MapEntry.second) {
const bool IsBranch = Entry.second & BRANCHENTRY;
const uint32_t Val = Entry.second & ~BRANCHENTRY;
const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit
OS << "0x" << Twine::utohexstr(Entry.first) << " -> "
<< "0x" << Twine::utohexstr(Val);
if (IsBranch)
Expand Down Expand Up @@ -244,7 +241,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress,

--KeyVal;

const uint32_t Val = KeyVal->second & ~BRANCHENTRY;
const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit
// Branch source addresses are translated to the first instruction of the
// source BB to avoid accounting for modifications BOLT may have made in the
// BB regarding deletion/addition of instructions.
Expand Down
2 changes: 1 addition & 1 deletion bolt/test/X86/bolt-address-translation.test
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
# CHECK: BOLT: 3 out of 7 functions were overwritten.
# CHECK: BOLT-INFO: Wrote 6 BAT maps
# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries
# CHECK: BOLT-INFO: BAT section size (bytes): 1436
# CHECK: BOLT-INFO: BAT section size (bytes): 436
#
# usqrt mappings (hot part). We match against any key (left side containing
# the bolted binary offsets) because BOLT may change where it puts instructions
Expand Down