Skip to content

[llvm-profgen] Support creating profiles of arbitrary events #99026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
39 changes: 39 additions & 0 deletions llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfscript

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions llvm/test/tools/llvm-profgen/Inputs/ip-duplication.perfscript
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
4006b7 0x4006b7/0x40068b/P/-/-/1 0x4006c8/0x4006b0/P/-/-/1 0x400689/0x4006b9/P/-/-/1 0x40066d/0x400686/P/-/-/2 0x4007a6/0x400650/P/-/-/9 0x4007ca/0x400790/P/-/-/8 0x4007d7/0x4007bd/P/-/-/1 0x400792/0x4007d7/P/-/-/1 0x4007b8/0x400790/P/-/-/2 0x4006a2/0x4007a8/P/-/-/3
40065d 40065d/0x40068f/M/-/-/1
5 changes: 5 additions & 0 deletions llvm/test/tools/llvm-profgen/Inputs/noprobe-skid.perfscript
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Invalid perf line
40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/6 0x40062f/0x4005b0/P/-/-/16 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/6 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/10 0x40062f/0x4005b0/P/-/-/14 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/7 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/15 0x400645/0x4005ff/P/-/-/1
4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/11 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/11 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/13 0x40062f/0x4005b0/P/-/-/9
4005c8 0x4005c8/0x4005dc/P/-/-/11 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/8
4005c5 0x4005c8/0x4005dc/P/-/-/11 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/8
78 changes: 78 additions & 0 deletions llvm/test/tools/llvm-profgen/event-filtering.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --skip-symbolization --perf-event=br_inst_retired.near_taken:upp
// RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --perf-event=br_inst_retired.near_taken:upp
// RUN: FileCheck %s --input-file %t --check-prefix=CHECK

// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --skip-symbolization --perf-event=br_misp_retired.all_branches:upp --leading-ip-only
// RUN: FileCheck %s --input-file %t --check-prefix=UNPRED-RAW-PROFILE
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --perf-event=br_misp_retired.all_branches:upp --leading-ip-only
// RUN: FileCheck %s --input-file %t --check-prefix=UNPRED

// Check that we can use perf event filtering to generate multiple types of
// source-level profiles from a single perf profile. In this case, we generate
// a typical execution frequency profile using br_inst_retired.near_taken LBRs,
// and a branch mispredict profile using br_misp_retired.all_branches sample
// IPs.

// The source example below is based on perfKernelCpp/cmov_3, except a
// misleading builtin is used to persuade the compiler not to use cmov, which
// induces branch mispredicts.

// CHECK: sel_arr:20229:0
// CHECK: 3.1: 627
// CHECK: 3.2: 627
// CHECK: 4: 615
// CHECK: 5: 627

// UNPRED: sel_arr:18:0
// UNPRED: 3.1: 0
// UNPRED: 3.2: 0
// UNPRED: 4: 9
// UNPRED: 5: 0

// CHECK-RAW-PROFILE: 3
// CHECK-RAW-PROFILE-NEXT: 2f0-2fa:303
// CHECK-RAW-PROFILE-NEXT: 2f0-310:312
// CHECK-RAW-PROFILE-NEXT: 2ff-310:315

// UNPRED-RAW-PROFILE: 1
// UNPRED-RAW-PROFILE-NEXT: 2fa-2fa:9

// original code:
// clang -O2 -gline-tables-only -fdebug-info-for-profiling lit.c
#include <stdlib.h>

#define N 20000
#define ITERS 10000

static int *m_s1, *m_s2, *m_s3, *m_dst;

void init(void) {
m_s1 = malloc(sizeof(int)*N);
m_s2 = malloc(sizeof(int)*N);
m_s3 = malloc(sizeof(int)*N);
m_dst = malloc(sizeof(int)*N);
srand(42);

for (int i = 0; i < N; i++) {
m_s1[i] = rand() % N;
m_s2[i] = 0;
m_s3[i] = 1;
}
}

void __attribute__((noinline)) sel_arr(int *dst, int *s1, int *s2, int *s3) {
#pragma nounroll
#pragma clang loop vectorize(disable) interleave(disable)
for (int i = 0; i < N; i++) {
int *p = __builtin_expect((s1[i] < 10035), 0) ? &s2[i] : &s3[i];
dst[i] = *p;
}
}

int main(void) {
init();
for(int i=0; i<ITERS; ++i)
sel_arr(m_dst, m_s1, m_s2, m_s3);
return 0;
}
22 changes: 22 additions & 0 deletions llvm/test/tools/llvm-profgen/iponly-nodupfactor.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/ip-duplication.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --use-offset=0 --leading-ip-only
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK

; Test that we don't over-count samples for duplicated source code when
; building an IP-based profile.

; The inline-noprobe2.perfbin binary is used for this test because one of the
; partition_pivot_last+3.1 debug locations has a duplication factor of 2
; encoded into its discriminator. In IP-sample mode, a hit in one instruction
; in the duplicated code does not imply a hit to the other duplicates.

; The perfscript input includes 1 sample at a location with duplication factor
; of 2, and another sample at the same source location but with no duplication
; factor. These should be summed without duplication factors. Ensure we record
; a count of 1+1=2 (and not 2+1=3) for the 3.1 location.

;CHECK-LABEL: partition_pivot_last
;CHECK-NEXT: 1: 0
;CHECK-NEXT: 2: 0
;CHECK-NEXT: 3: 0
;CHECK-NEXT: 3.1: 2

58 changes: 58 additions & 0 deletions llvm/test/tools/llvm-profgen/iponly.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noprobe-skid.perfscript --binary=%S/Inputs/noprobe.perfbin --output=%t --skip-symbolization --leading-ip-only
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noprobe-skid.perfscript --binary=%S/Inputs/noprobe.perfbin --output=%t --leading-ip-only
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK

; Here we check the ability to ignore LBRs, which is useful for generating
; profiles where only the precise PMU sample IP is of interest. In general the
; IPs need not identify a branch. In this case there are exactly 4 samples, so
; we see only these 4 locations as "hot" and none of the LBR history.
; Compare with noinline-noprobe.test, which includes LBR history.

; Note that there are two different IPs (5c5 and 5c8) contributing to line
; offset 1 in bar. This tests that sample counts corresponding to the same
; debug location are summed into that location in the profile rather than the
; maximum being taken, as happens with basic block execution count profiles.

;CHECK: bar:14:0
;CHECK: 0: 0
;CHECK: 1: 2
;CHECK: 2: 1
;CHECK: 4: 0
;CHECK: 5: 0
;CHECK: foo:5:0
;CHECK: 0: 0
;CHECK: 1: 0
;CHECK: 2: 0
;CHECK: 3: 1
;CHECK: 4: 0
;CHECK: 5: 0

CHECK-RAW-PROFILE: 4
CHECK-RAW-PROFILE-NEXT: 5c5-5c5:1
CHECK-RAW-PROFILE-NEXT: 5c8-5c8:1
CHECK-RAW-PROFILE-NEXT: 5d7-5d7:1
CHECK-RAW-PROFILE-NEXT: 62f-62f:1

; original code:
; clang -O3 -g -fdebug-info-for-profiling test.c -fno-inline -o a.out
#include <stdio.h>

int bar(int x, int y) {
if (x % 3) {
return x - y;
}
return x + y;
}

void foo() {
int s, i = 0;
while (i++ < 4000 * 4000)
if (i % 91) s = bar(i, s); else s += 30;
printf("sum is %d\n", s);
}

int main() {
foo();
return 0;
}
116 changes: 107 additions & 9 deletions llvm/tools/llvm-profgen/PerfReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@ static cl::opt<bool>
"and produce context-insensitive profile."));
cl::opt<bool> ShowDetailedWarning("show-detailed-warning",
cl::desc("Show detailed warning message."));
cl::opt<bool>
LeadingIPOnly("leading-ip-only",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a bit confusing. I think what you meant is to ignore LBRs and only consuming leading IPs?

In that case, we should name it something like ignore-lbr-samples, to be consistent with existing ignore-stack-samples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, does use of profiled-event require leading-ip-only? If leading-ip-only is not specified, what is the expected behavior?

cl::desc("Form a profile based only on sample IPs"));

static cl::list<std::string> PerfEventFilter(
"perf-event",
cl::desc("Ignore samples not matching the given event names"));
Comment on lines +49 to +50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it profiled-event with description Perf event to generate profile for, e.g. br_misp_retired.all_branches. Program execution count profile is generated when unspecified.

static cl::alias
PerfEventFilterPlural("perf-events", cl::CommaSeparated,
cl::desc("Comma-delimited version of -perf-event"),
cl::aliasopt(PerfEventFilter));

extern cl::opt<std::string> PerfTraceFilename;
extern cl::opt<bool> ShowDisassemblyOnly;
Expand Down Expand Up @@ -404,13 +415,18 @@ PerfScriptReader::convertPerfDataToTrace(ProfiledBinary *Binary, bool SkipPID,
}
}

// If filtering by events was requested, additionally request the "event"
// field.
const std::string FieldList =
PerfEventFilter.empty() ? "ip,brstack" : "event,ip,brstack";

// Run perf script again to retrieve events for PIDs collected above
SmallVector<StringRef, 8> ScriptSampleArgs;
ScriptSampleArgs.push_back(PerfPath);
ScriptSampleArgs.push_back("script");
ScriptSampleArgs.push_back("--show-mmap-events");
ScriptSampleArgs.push_back("-F");
ScriptSampleArgs.push_back("ip,brstack");
ScriptSampleArgs.push_back(FieldList);
ScriptSampleArgs.push_back("-i");
ScriptSampleArgs.push_back(PerfData);
if (!PIDs.empty()) {
Expand Down Expand Up @@ -575,14 +591,54 @@ bool PerfScriptReader::extractLBRStack(TraceStream &TraceIt,

// Skip the leading instruction pointer.
size_t Index = 0;

StringRef EventName;
// Skip a perf event name. This may or may not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This may or may not exist." <-- I can't parse this.

Can you add comment with expected input format?

if (Records.size() > Index && Records[Index].ends_with(":")) {
EventName = Records[Index].ltrim().rtrim(':');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the format in cmov_3.perfscript, event name should be extracted from extractCallstack, rather than extractLBRStack?

Index++;

if (PerfEventFilter.empty()) {
WithColor::warning() << "No --perf-event filter was specified, but an "
"\"event\" field was found in line "
<< TraceIt.getLineNumber() << ": "
<< TraceIt.getCurrentLine() << "\n";
} else if (std::find(PerfEventFilter.begin(), PerfEventFilter.end(),
EventName) == PerfEventFilter.end()) {
TraceIt.advance();
return false;
}

} else if (!PerfEventFilter.empty()) {
WithColor::warning() << "A --perf-event filter was specified, but no "
"\"event\" field found in line "
<< TraceIt.getLineNumber() << ": "
<< TraceIt.getCurrentLine() << "\n";
Comment on lines +602 to +616
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are emitting warning on each sample, and this is going to make the warnings super noisy.

}

uint64_t LeadingAddr;
if (!Records.empty() && !Records[0].contains('/')) {
if (Records[0].getAsInteger(16, LeadingAddr)) {
if (Records.size() > Index && !Records[Index].contains('/')) {
if (Records[Index].getAsInteger(16, LeadingAddr)) {
WarnInvalidLBR(TraceIt);
TraceIt.advance();
return false;
}
Index = 1;
Index++;
}

// We assume that if we saw an event name we also saw a leading addr.
// In other words, LeadingAddr is set if Index is 1 or 2.
if (LeadingIPOnly && Index > 0) {
// Form a profile only from the sample IP. Do not assume an LBR stack
// follows, and ignore it if it does.
uint64_t SampleIP = Binary->canonicalizeVirtualAddress(LeadingAddr);
bool SampleIPIsInternal = Binary->addressIsCode(SampleIP);
if (SampleIPIsInternal) {
// Form a half LBR entry where the sample IP is the destination.
LBRStack.emplace_back(LBREntry(SampleIP, SampleIP));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really fit in LBRStack, instead, it fits CallStack better. All of the special case from warnInvalidRange can be avoided if we handle these events as part of extractCallstack.

}
TraceIt.advance();
return !LBRStack.empty();
}

// Now extract LBR samples - note that we do not reverse the
Expand Down Expand Up @@ -902,6 +958,20 @@ void PerfScriptReader::computeCounterFromLBR(const PerfSample *Sample,
uint64_t Repeat) {
SampleCounter &Counter = SampleCounters.begin()->second;
uint64_t EndAddress = 0;

if (LeadingIPOnly) {
assert(Sample->LBRStack.size() == 1 &&
"Expected only half LBR entries for ip-only mode");
const LBREntry &LBR = *(Sample->LBRStack.begin());
uint64_t SourceAddress = LBR.Source;
uint64_t TargetAddress = LBR.Target;
if (SourceAddress == TargetAddress &&
Binary->addressIsCode(TargetAddress)) {
Counter.recordRangeCount(SourceAddress, TargetAddress, Repeat);
}
return;
}

for (const LBREntry &LBR : Sample->LBRStack) {
uint64_t SourceAddress = LBR.Source;
uint64_t TargetAddress = LBR.Target;
Expand Down Expand Up @@ -1062,6 +1132,18 @@ bool PerfScriptReader::isLBRSample(StringRef Line) {
Line.trim().split(Records, " ", 2, false);
if (Records.size() < 2)
return false;
// Check if there is an event name before the leading IP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the header comment for this function to include a representation of LBR sample with event name

// If there is, it will be in Records[0]. To skip it, we'll re-split on
// Records[1], which should contain the rest of the line.
if (Records[0].contains(":")) {
// If so, consume the event name and continue processing the rest of the
// line.
StringRef IPAndLBR = Records[1].ltrim();
Records.clear();
IPAndLBR.split(Records, " ", 2, false);
if (Records.size() < 2)
return false;
}
if (Records[1].starts_with("0x") && Records[1].contains('/'))
return true;
return false;
Expand Down Expand Up @@ -1152,6 +1234,18 @@ void PerfScriptReader::warnInvalidRange() {
const PerfSample *Sample = Item.first.getPtr();
uint64_t Count = Item.second;
uint64_t EndAddress = 0;

if (LeadingIPOnly) {
assert(Sample->LBRStack.size() == 1 &&
"Expected only half LBR entries for ip-only mode");
const LBREntry &LBR = *(Sample->LBRStack.begin());
if (LBR.Source == LBR.Target && LBR.Source != ExternalAddr) {
// This is an leading-addr-only profile.
Ranges[{LBR.Source, LBR.Source}] += Count;
}
continue;
}

for (const LBREntry &LBR : Sample->LBRStack) {
uint64_t SourceAddress = LBR.Source;
uint64_t StartAddress = LBR.Target;
Expand Down Expand Up @@ -1199,11 +1293,15 @@ void PerfScriptReader::warnInvalidRange() {
!Binary->addressIsCode(EndAddress))
continue;

if (!Binary->addressIsCode(StartAddress) ||
!Binary->addressIsTransfer(EndAddress)) {
InstNotBoundary += I.second;
WarnInvalidRange(StartAddress, EndAddress, EndNotBoundaryMsg);
}
// IP samples can indicate activity on individual instructions rather than
// basic blocks/edges. In this mode, don't warn if sampled IPs aren't
// branches.
if (!LeadingIPOnly)
if (!Binary->addressIsCode(StartAddress) ||
!Binary->addressIsTransfer(EndAddress)) {
InstNotBoundary += I.second;
WarnInvalidRange(StartAddress, EndAddress, EndNotBoundaryMsg);
}

auto *FRange = Binary->findFuncRange(StartAddress);
if (!FRange) {
Expand Down
31 changes: 20 additions & 11 deletions llvm/tools/llvm-profgen/ProfileGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ cl::opt<bool> InferMissingFrames(
"Infer missing call frames due to compiler tail call elimination."),
llvm::cl::Optional);

extern cl::opt<bool> LeadingIPOnly;

using namespace llvm;
using namespace sampleprof;

Expand Down Expand Up @@ -388,18 +390,25 @@ void ProfileGeneratorBase::updateBodySamplesforFunctionProfile(
// Use the maximum count of samples with same line location
uint32_t Discriminator = getBaseDiscriminator(LeafLoc.Location.Discriminator);

// Use duplication factor to compensated for loop unroll/vectorization.
// Note that this is only needed when we're taking MAX of the counts at
// the location instead of SUM.
Count *= getDuplicationFactor(LeafLoc.Location.Discriminator);

ErrorOr<uint64_t> R =
FunctionProfile.findSamplesAt(LeafLoc.Location.LineOffset, Discriminator);

uint64_t PreviousCount = R ? R.get() : 0;
if (PreviousCount <= Count) {
if (LeadingIPOnly) {
// When computing an IP-based profile we take the SUM of counts at the
// location instead of applying duplication factors and taking the MAX.
Comment on lines +394 to +395
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of using IP sampling or LBR sampling, sample profile loader always take the max for profile annotation, so this is not correct in the general sense. I'm guessing what you meant is when consuming mispredict profile, sum is used at profile use time. In that case, this needs to be narrowed to only mispredict or certain profile type, not general IP profile.

FunctionProfile.addBodySamples(LeafLoc.Location.LineOffset, Discriminator,
Count - PreviousCount);
Count);
} else {
// Otherwise, use duplication factor to compensate for loop
// unroll/vectorization. Note that this is only needed when we're taking
// MAX of the counts at the location instead of SUM.
Count *= getDuplicationFactor(LeafLoc.Location.Discriminator);

ErrorOr<uint64_t> R = FunctionProfile.findSamplesAt(
LeafLoc.Location.LineOffset, Discriminator);

uint64_t PreviousCount = R ? R.get() : 0;
if (PreviousCount <= Count) {
FunctionProfile.addBodySamples(LeafLoc.Location.LineOffset, Discriminator,
Count - PreviousCount);
}
}
}

Expand Down
Loading