|
| 1 | +# BOLT Address Translation (BAT) |
| 2 | +# Purpose |
| 3 | +A regular profile collection for BOLT involves collecting samples from |
| 4 | +unoptimized binary. BOLT Address Translation allows collecting profile |
| 5 | +from BOLT-optimized binary and using it for optimizing the input (pre-BOLT) |
| 6 | +binary. |
| 7 | + |
| 8 | +# Overview |
| 9 | +BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT |
| 10 | +into the output binary containing translation tables and split functions linkage |
| 11 | +information. This information enables mapping the profile back from optimized |
| 12 | +binary onto the original binary. |
| 13 | + |
| 14 | +# Usage |
| 15 | +`--enable-bat` flag controls the generation of BAT section. Sampled profile |
| 16 | +needs to be passed along with the optimized binary containing BAT section to |
| 17 | +`perf2bolt` which reads BAT section and produces fdata profile for the original |
| 18 | +binary. Note that YAML profile generation is not supported since BAT doesn't |
| 19 | +contain the metadata for input functions. |
| 20 | + |
| 21 | +# Internals |
| 22 | +## Section contents |
| 23 | +The section is organized as follows: |
| 24 | +- Hot functions table |
| 25 | + - Address translation tables |
| 26 | +- Cold functions table |
| 27 | + |
| 28 | +## Construction and parsing |
| 29 | +BAT section is created from `BoltAddressTranslation` class which captures |
| 30 | +address translation information provided by BOLT linker. It is then encoded as a |
| 31 | +note section in the output binary. |
| 32 | + |
| 33 | +During profile conversion when BAT-enabled binary is passed to perf2bolt, |
| 34 | +`BoltAddressTranslation` class is populated from BAT section. The class is then |
| 35 | +queried by `DataAggregator` during sample processing to reconstruct addresses/ |
| 36 | +offsets in the input binary. |
| 37 | + |
| 38 | +## Encoding format |
| 39 | +The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h |
| 40 | +and bolt/lib/Profile/BoltAddressTranslation.cpp. |
| 41 | + |
| 42 | +### Layout |
| 43 | +The general layout is as follows: |
| 44 | +``` |
| 45 | +Hot functions table header |
| 46 | +|------------------| |
| 47 | +| Function entry | |
| 48 | +| |--------------| | |
| 49 | +| | OutOff InOff | | |
| 50 | +| |--------------| | |
| 51 | +~~~~~~~~~~~~~~~~~~~~ |
| 52 | +
|
| 53 | +Cold functions table header |
| 54 | +|------------------| |
| 55 | +| Function entry | |
| 56 | +| |--------------| | |
| 57 | +| | OutOff InOff | | |
| 58 | +| |--------------| | |
| 59 | +~~~~~~~~~~~~~~~~~~~~ |
| 60 | +``` |
| 61 | + |
| 62 | +### Functions table |
| 63 | +Hot and cold functions tables share the encoding except differences marked below. |
| 64 | +Header: |
| 65 | +| Entry | Encoding | Description | |
| 66 | +| ------ | ----- | ----------- | |
| 67 | +| `NumFuncs` | ULEB128 | Number of functions in the functions table | |
| 68 | + |
| 69 | +The header is followed by Functions table with `NumFuncs` entries. |
| 70 | +Output binary addresses are delta encoded, meaning that only the difference with |
| 71 | +the last previous output address is stored. Addresses implicitly start at zero. |
| 72 | +Output addresses are continuous through function start addresses and function |
| 73 | +internal offsets, and between hot and cold fragments, to better spread deltas |
| 74 | +and save space. |
| 75 | + |
| 76 | +Hot indices are delta encoded, implicitly starting at zero. |
| 77 | +| Entry | Encoding | Description | |
| 78 | +| ------ | ------| ----------- | |
| 79 | +| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary | |
| 80 | +| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table | |
| 81 | +| `NumEntries` | ULEB128 | Number of address translation entries for a function | |
| 82 | +| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function | |
| 83 | +| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit | |
| 84 | +Function header is followed by `EqualElems` offsets (hot functions only) and |
| 85 | +`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for |
| 86 | +current function. |
| 87 | + |
| 88 | +### Address translation table |
| 89 | +Delta encoding means that only the difference with the previous corresponding |
| 90 | +entry is encoded. Input offsets implicitly start at zero. |
| 91 | +| Entry | Encoding | Description | |
| 92 | +| ------ | ------| ----------- | |
| 93 | +| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary | |
| 94 | +| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit | |
| 95 | + |
| 96 | +`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source |
| 97 | +(branch or call instruction). If not set, it signifies a control flow target |
| 98 | +(basic block offset). |
| 99 | +`InputAddr` is omitted for equal offsets in input and output function. In this |
| 100 | +case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector. |
0 commit comments