Skip to content

Commit 7085065

Browse files
authored
[BOLT] Support pre-aggregated returns (llvm#143296)
Intel's Architectural LBR supports capturing branch type information as part of LBR stack (SDM Vol 3B, part 2, October 2024): ``` 20.1.3.2 Branch Types The IA32_LBR_x_INFO.BR_TYPE and IA32_LER_INFO.BR_TYPE fields encode the branch types as shown in Table 20-3. Table 20-3. IA32_LBR_x_INFO and IA32_LER_INFO Branch Type Encodings Encoding | Branch Type 0000B | COND 0001B | NEAR_IND_JMP 0010B | NEAR_REL_JMP 0011B | NEAR_IND_CALL 0100B | NEAR_REL_CALL 0101B | NEAR_RET 011xB | Reserved 1xxxB | OTHER_BRANCH For a list of branch operations that fall into the categories above, see Table 20-2. Table 20-2. Branch Type Filtering Details Branch Type | Operations Recorded COND | Jcc, J*CXZ, and LOOP* NEAR_IND_JMP | JMP r/m* NEAR_REL_JMP | JMP rel* NEAR_IND_CALL | CALL r/m* NEAR_REL_CALL | CALL rel* (excluding CALLs to the next sequential IP) NEAR_RET | RET (0C3H) OTHER_BRANCH | JMP/CALL ptr*, JMP/CALL m*, RET (0C8H), SYS*, interrupts, exceptions (other than debug exceptions), IRET, INT3, INTn, INTO, TSX Abort, EENTER, ERESUME, EEXIT, AEX, INIT, SIPI, RSM ``` Linux kernel can preserve branch type when `save_type` is enabled, even if CPU does not support Architectural LBR: https://github.com/torvalds/linux/blob/f09079bd04a924c72d555cd97942d5f8d7eca98c/tools/perf/Documentation/perf-record.txt#L457-L460 > - save_type: save branch type during sampling in case binary is not available later. For the platforms with Intel Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server), the save branch type is unconditionally enabled when the taken branch stack sampling is enabled. Kernel-reported branch type values: https://github.com/torvalds/linux/blob/8c6bc74c7f8910ed4c969ccec52e98716f98700a/include/uapi/linux/perf_event.h#L251-L269 This information is needed to disambiguate external returns (from DSO/JIT) to an entry point or a landing pad, when BOLT can't disassemble the branch source. This patch adds new pre-aggregated types: - return trace (R), - external return fall-through (r). For such types, the checks for fall-through start (not an entry or a landing pad) are relaxed. Depends on llvm#143295. Test Plan: updated callcont-fallthru.s
1 parent a5fa5bd commit 7085065

File tree

4 files changed

+77
-32
lines changed

4 files changed

+77
-32
lines changed

bolt/include/bolt/Profile/DataAggregator.h

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -101,16 +101,17 @@ class DataAggregator : public DataReader {
101101
uint64_t Addr;
102102
};
103103

104-
/// Container for the unit of branch data.
105-
/// Backwards compatible with legacy use for branches and fall-throughs:
106-
/// - if \p Branch is FT_ONLY or FT_EXTERNAL_ORIGIN, the trace only
107-
/// contains fall-through data,
108-
/// - if \p To is BR_ONLY, the trace only contains branch data.
104+
/// Container for the unit of branch data, matching pre-aggregated trace type.
105+
/// Backwards compatible with branch and fall-through types:
106+
/// - if \p To is < 0, the trace only contains branch data (BR_ONLY),
107+
/// - if \p Branch is < 0, the trace only contains fall-through data
108+
/// (FT_ONLY, FT_EXTERNAL_ORIGIN, or FT_EXTERNAL_RETURN).
109109
struct Trace {
110110
static constexpr const uint64_t EXTERNAL = 0ULL;
111111
static constexpr const uint64_t BR_ONLY = -1ULL;
112112
static constexpr const uint64_t FT_ONLY = -1ULL;
113113
static constexpr const uint64_t FT_EXTERNAL_ORIGIN = -2ULL;
114+
static constexpr const uint64_t FT_EXTERNAL_RETURN = -3ULL;
114115

115116
uint64_t Branch;
116117
uint64_t From;
@@ -390,9 +391,9 @@ class DataAggregator : public DataReader {
390391
/// File format syntax:
391392
/// E <event>
392393
/// S <start> <count>
393-
/// T <start> <end> <ft_end> <count>
394+
/// [TR] <start> <end> <ft_end> <count>
394395
/// B <start> <end> <count> <mispred_count>
395-
/// [Ff] <start> <end> <count>
396+
/// [Ffr] <start> <end> <count>
396397
///
397398
/// where <start>, <end>, <ft_end> have the format [<id>:]<offset>
398399
///
@@ -403,8 +404,11 @@ class DataAggregator : public DataReader {
403404
/// f - an aggregated fall-through with external origin - used to disambiguate
404405
/// between a return hitting a basic block head and a regular internal
405406
/// jump to the block
407+
/// r - an aggregated fall-through originating at an external return, no
408+
/// checks are performed for a fallthrough start
406409
/// T - an aggregated trace: branch from <start> to <end> with a fall-through
407410
/// to <ft_end>
411+
/// R - an aggregated trace originating at a return
408412
///
409413
/// <id> - build id of the object containing the address. We can skip it for
410414
/// the main binary and use "X" for an unknown object. This will save some
@@ -532,7 +536,12 @@ inline raw_ostream &operator<<(raw_ostream &OS,
532536
const DataAggregator::Trace &T) {
533537
switch (T.Branch) {
534538
case DataAggregator::Trace::FT_ONLY:
539+
break;
535540
case DataAggregator::Trace::FT_EXTERNAL_ORIGIN:
541+
OS << "X:0 -> ";
542+
break;
543+
case DataAggregator::Trace::FT_EXTERNAL_RETURN:
544+
OS << "X:R -> ";
536545
break;
537546
default:
538547
OS << Twine::utohexstr(T.Branch) << " -> ";

bolt/lib/Profile/DataAggregator.cpp

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -537,8 +537,7 @@ Error DataAggregator::preprocessProfile(BinaryContext &BC) {
537537

538538
heatmap:
539539
// Sort parsed traces for faster processing.
540-
if (!opts::BasicAggregation)
541-
llvm::sort(Traces, llvm::less_first());
540+
llvm::sort(Traces, llvm::less_first());
542541

543542
if (!opts::HeatmapMode)
544543
return Error::success();
@@ -883,13 +882,9 @@ DataAggregator::getFallthroughsInTrace(BinaryFunction &BF, const Trace &Trace,
883882

884883
// Adjust FromBB if the first LBR is a return from the last instruction in
885884
// the previous block (that instruction should be a call).
886-
if (IsReturn) {
887-
if (From)
888-
FromBB = BF.getBasicBlockContainingOffset(From - 1);
889-
else
890-
LLVM_DEBUG(dbgs() << "return to the function start: " << Trace << '\n');
891-
} else if (Trace.Branch == Trace::EXTERNAL && From == FromBB->getOffset() &&
892-
!FromBB->isEntryPoint() && !FromBB->isLandingPad()) {
885+
if (Trace.Branch != Trace::FT_ONLY && !BF.containsAddress(Trace.Branch) &&
886+
From == FromBB->getOffset() &&
887+
(IsReturn ? From : !(FromBB->isEntryPoint() || FromBB->isLandingPad()))) {
893888
const BinaryBasicBlock *PrevBB =
894889
BF.getLayout().getBlock(FromBB->getIndex() - 1);
895890
if (PrevBB->getSuccessor(FromBB->getLabel())) {
@@ -1228,12 +1223,14 @@ ErrorOr<Location> DataAggregator::parseLocationOrOffset() {
12281223
std::error_code DataAggregator::parseAggregatedLBREntry() {
12291224
enum AggregatedLBREntry : char {
12301225
INVALID = 0,
1231-
EVENT_NAME, // E
1232-
TRACE, // T
1233-
SAMPLE, // S
1234-
BRANCH, // B
1235-
FT, // F
1236-
FT_EXTERNAL_ORIGIN // f
1226+
EVENT_NAME, // E
1227+
TRACE, // T
1228+
RETURN, // R
1229+
SAMPLE, // S
1230+
BRANCH, // B
1231+
FT, // F
1232+
FT_EXTERNAL_ORIGIN, // f
1233+
FT_EXTERNAL_RETURN // r
12371234
} Type = INVALID;
12381235

12391236
/// The number of fields to parse, set based on \p Type.
@@ -1261,20 +1258,22 @@ std::error_code DataAggregator::parseAggregatedLBREntry() {
12611258

12621259
Type = StringSwitch<AggregatedLBREntry>(Str)
12631260
.Case("T", TRACE)
1261+
.Case("R", RETURN)
12641262
.Case("S", SAMPLE)
12651263
.Case("E", EVENT_NAME)
12661264
.Case("B", BRANCH)
12671265
.Case("F", FT)
12681266
.Case("f", FT_EXTERNAL_ORIGIN)
1267+
.Case("r", FT_EXTERNAL_RETURN)
12691268
.Default(INVALID);
12701269

12711270
if (Type == INVALID) {
1272-
reportError("expected T, S, E, B, F or f");
1271+
reportError("expected T, R, S, E, B, F, f or r");
12731272
return make_error_code(llvm::errc::io_error);
12741273
}
12751274

12761275
using SSI = StringSwitch<int>;
1277-
AddrNum = SSI(Str).Case("T", 3).Case("S", 1).Case("E", 0).Default(2);
1276+
AddrNum = SSI(Str).Cases("T", "R", 3).Case("S", 1).Case("E", 0).Default(2);
12781277
CounterNum = SSI(Str).Case("B", 2).Case("E", 0).Default(1);
12791278
}
12801279

@@ -1331,17 +1330,30 @@ std::error_code DataAggregator::parseAggregatedLBREntry() {
13311330
if (ToFunc)
13321331
ToFunc->setHasProfileAvailable();
13331332

1334-
/// For legacy fall-through types, adjust locations to match Trace container.
1335-
if (Type == FT || Type == FT_EXTERNAL_ORIGIN) {
1333+
/// For fall-through types, adjust locations to match Trace container.
1334+
if (Type == FT || Type == FT_EXTERNAL_ORIGIN || Type == FT_EXTERNAL_RETURN) {
13361335
Addr[2] = Location(Addr[1]->Offset); // Trace To
13371336
Addr[1] = Location(Addr[0]->Offset); // Trace From
1338-
// Put a magic value into Trace Branch to differentiate from a full trace.
1339-
Addr[0] = Location(Type == FT ? Trace::FT_ONLY : Trace::FT_EXTERNAL_ORIGIN);
1337+
// Put a magic value into Trace Branch to differentiate from a full trace:
1338+
if (Type == FT)
1339+
Addr[0] = Location(Trace::FT_ONLY);
1340+
else if (Type == FT_EXTERNAL_ORIGIN)
1341+
Addr[0] = Location(Trace::FT_EXTERNAL_ORIGIN);
1342+
else if (Type == FT_EXTERNAL_RETURN)
1343+
Addr[0] = Location(Trace::FT_EXTERNAL_RETURN);
1344+
else
1345+
llvm_unreachable("Unexpected fall-through type");
13401346
}
13411347

1342-
/// For legacy branch type, mark Trace To to differentite from a full trace.
1343-
if (Type == BRANCH) {
1348+
/// For branch type, mark Trace To to differentiate from a full trace.
1349+
if (Type == BRANCH)
13441350
Addr[2] = Location(Trace::BR_ONLY);
1351+
1352+
if (Type == RETURN) {
1353+
if (!Addr[0]->Offset)
1354+
Addr[0]->Offset = Trace::FT_EXTERNAL_RETURN;
1355+
else
1356+
Returns.emplace(Addr[0]->Offset);
13451357
}
13461358

13471359
/// Record a trace.
@@ -1602,6 +1614,7 @@ void DataAggregator::processBranchEvents() {
16021614
NamedRegionTimer T("processBranch", "Processing branch events",
16031615
TimerGroupName, TimerGroupDesc, opts::TimeAggregator);
16041616

1617+
Returns.emplace(Trace::FT_EXTERNAL_RETURN);
16051618
for (const auto &[Trace, Info] : Traces) {
16061619
bool IsReturn = checkReturn(Trace.Branch);
16071620
// Ignore returns.

bolt/test/X86/callcont-fallthru.s

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
# RUN: link_fdata %s %t %t.pa-ret PREAGG-RET
1111
# Trace from an external location to a landing pad/entry point call continuation
1212
# RUN: link_fdata %s %t %t.pa-ext PREAGG-EXT
13+
# Return trace to a landing pad/entry point call continuation
14+
# RUN: link_fdata %s %t %t.pa-pret PREAGG-PRET
15+
# External return to a landing pad/entry point call continuation
16+
# RUN: link_fdata %s %t %t.pa-eret PREAGG-ERET
1317
# RUN-DISABLED: link_fdata %s %t %t.pa-plt PREAGG-PLT
1418

1519
# RUN: llvm-strip --strip-unneeded %t -o %t.strip
@@ -38,6 +42,21 @@
3842
# RUN: llvm-bolt %t.strip --pa -p %t.pa-ext -o %t.out \
3943
# RUN: --print-cfg --print-only=main | FileCheck %s --check-prefix=CHECK-SKIP
4044

45+
## Check pre-aggregated return traces from external location attach call
46+
## continuation fallthrough count to secondary entry point (unstripped)
47+
# RUN: llvm-bolt %t --pa -p %t.pa-pret -o %t.out \
48+
# RUN: --print-cfg --print-only=main | FileCheck %s --check-prefix=CHECK-ATTACH
49+
## Check pre-aggregated return traces from external location attach call
50+
## continuation fallthrough count to landing pad (stripped, landing pad)
51+
# RUN: llvm-bolt %t.strip --pa -p %t.pa-pret -o %t.out \
52+
# RUN: --print-cfg --print-only=main | FileCheck %s --check-prefix=CHECK-ATTACH
53+
54+
## Same for external return type
55+
# RUN: llvm-bolt %t --pa -p %t.pa-eret -o %t.out \
56+
# RUN: --print-cfg --print-only=main | FileCheck %s --check-prefix=CHECK-ATTACH
57+
# RUN: llvm-bolt %t.strip --pa -p %t.pa-eret -o %t.out \
58+
# RUN: --print-cfg --print-only=main | FileCheck %s --check-prefix=CHECK-ATTACH
59+
4160
## Check pre-aggregated traces don't report zero-sized PLT fall-through as
4261
## invalid trace
4362
# RUN-DISABLED: llvm-bolt %t.strip --pa -p %t.pa-plt -o %t.out | FileCheck %s \
@@ -92,6 +111,10 @@ Ltmp4_br:
92111
# PREAGG-RET: T #Lfoo_ret# #Ltmp3# #Ltmp3_br# 1
93112
## Target is a secondary entry point (unstripped) or a landing pad (stripped)
94113
# PREAGG-EXT: T X:0 #Ltmp3# #Ltmp3_br# 1
114+
## Pre-aggregated return trace
115+
# PREAGG-PRET: R X:0 #Ltmp3# #Ltmp3_br# 1
116+
## External return
117+
# PREAGG-ERET: r #Ltmp3# #Ltmp3_br# 1
95118

96119
# CHECK-ATTACH: callq foo
97120
# CHECK-ATTACH-NEXT: count: 1

bolt/test/link_fdata.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@
3636
fdata_pat = re.compile(r"([01].*) (?P<mispred>\d+) (?P<exec>\d+)")
3737

3838
# Pre-aggregated profile:
39-
# {T|S|E|B|F|f} <start> [<end>] [<ft_end>] <count> [<mispred_count>]
39+
# {T|R|S|E|B|F|f|r} <start> [<end>] [<ft_end>] <count> [<mispred_count>]
4040
# <loc>: [<id>:]<offset>
41-
preagg_pat = re.compile(r"(?P<type>[TSBFf]) (?P<offsets_count>.*)")
41+
preagg_pat = re.compile(r"(?P<type>[TRSBFfr]) (?P<offsets_count>.*)")
4242

4343
# No-LBR profile:
4444
# <is symbol?> <closest elf symbol or DSO name> <relative address> <count>

0 commit comments

Comments
 (0)