Skip to content

Commit bbe0798

Browse files
authored
[BOLT] Delta-encode offsets in BAT (llvm#76900)
This change further reduces the size of BAT: - large binary: to 13073904 bytes (0.34x original), - medium binary: to 1703116 bytes (0.29x original), - small binary: to 436 bytes (0.30x original). Test Plan: Updated bolt/test/X86/bolt-address-translation.test
1 parent b3981ed commit bbe0798

File tree

3 files changed

+19
-10
lines changed

3 files changed

+19
-10
lines changed

bolt/docs/BAT.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,12 @@ Function header is followed by `NumEntries` pairs of offsets for current
7373
function.
7474

7575
### Address translation table
76+
Delta encoding means that only the difference with the previous corresponding
77+
entry is encoded. Offsets implicitly start at zero.
7678
| Entry | Encoding | Description |
7779
| ------ | ------| ----------- |
78-
| `OutputAddr` | ULEB128 | Function offset in output binary |
79-
| `InputAddr` | ULEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
80+
| `OutputOffset` | Delta, ULEB128 | Function offset in output binary |
81+
| `InputOffset` | Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
8082

8183
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
8284
(branch or call instruction). If not set, it signifies a control flow target

bolt/lib/Profile/BoltAddressTranslation.cpp

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,12 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
114114
<< Twine::utohexstr(Address) << ".\n");
115115
encodeULEB128(Address, OS);
116116
encodeULEB128(NumEntries, OS);
117+
uint64_t InOffset = 0, OutOffset = 0;
118+
// Output and Input addresses and delta-encoded
117119
for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
118-
encodeULEB128(KeyVal.first, OS);
119-
encodeULEB128(KeyVal.second, OS);
120+
encodeULEB128(KeyVal.first - OutOffset, OS);
121+
encodeSLEB128(KeyVal.second - InOffset, OS);
122+
std::tie(OutOffset, InOffset) = KeyVal;
120123
}
121124
}
122125
const uint32_t NumColdEntries = ColdPartSource.size();
@@ -164,12 +167,16 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
164167

165168
LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
166169
<< Twine::utohexstr(Address) << "\n");
170+
uint64_t InputOffset = 0, OutputOffset = 0;
167171
for (uint32_t J = 0; J < NumEntries; ++J) {
168-
const uint32_t OutputAddr = DE.getULEB128(&Offset, &Err);
169-
const uint32_t InputAddr = DE.getULEB128(&Offset, &Err);
170-
Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
171-
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
172-
<< Twine::utohexstr(InputAddr) << "\n");
172+
const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
173+
const int64_t InputDelta = DE.getSLEB128(&Offset, &Err);
174+
OutputOffset += OutputDelta;
175+
InputOffset += InputDelta;
176+
Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
177+
LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputOffset) << " -> "
178+
<< Twine::utohexstr(InputOffset) << " (" << OutputDelta
179+
<< ", " << InputDelta << ")\n");
173180
}
174181
Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
175182
}

bolt/test/X86/bolt-address-translation.test

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
# CHECK: BOLT: 3 out of 7 functions were overwritten.
3838
# CHECK: BOLT-INFO: Wrote 6 BAT maps
3939
# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries
40-
# CHECK: BOLT-INFO: BAT section size (bytes): 680
40+
# CHECK: BOLT-INFO: BAT section size (bytes): 436
4141
#
4242
# usqrt mappings (hot part). We match against any key (left side containing
4343
# the bolted binary offsets) because BOLT may change where it puts instructions

0 commit comments

Comments
 (0)