-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm-profgen] Support creating profiles of arbitrary events #99026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
4006b7 0x4006b7/0x40068b/P/-/-/1 0x4006c8/0x4006b0/P/-/-/1 0x400689/0x4006b9/P/-/-/1 0x40066d/0x400686/P/-/-/2 0x4007a6/0x400650/P/-/-/9 0x4007ca/0x400790/P/-/-/8 0x4007d7/0x4007bd/P/-/-/1 0x400792/0x4007d7/P/-/-/1 0x4007b8/0x400790/P/-/-/2 0x4006a2/0x4007a8/P/-/-/3 | ||
40065d 40065d/0x40068f/M/-/-/1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
// Invalid perf line | ||
40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/6 0x40062f/0x4005b0/P/-/-/16 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/6 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/10 0x40062f/0x4005b0/P/-/-/14 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/7 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/15 0x400645/0x4005ff/P/-/-/1 | ||
4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/11 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/11 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/13 0x40062f/0x4005b0/P/-/-/9 | ||
4005c8 0x4005c8/0x4005dc/P/-/-/11 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/8 | ||
4005c5 0x4005c8/0x4005dc/P/-/-/11 0x40062f/0x4005b0/P/-/-/8 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/5 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/7 0x40062f/0x4005b0/P/-/-/10 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/1 0x4005d7/0x4005e5/P/-/-/12 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 0x4005e9/0x400634/P/-/-/2 0x4005c8/0x4005dc/P/-/-/8 0x40062f/0x4005b0/P/-/-/8 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --skip-symbolization --perf-event=br_inst_retired.near_taken:upp | ||
// RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE | ||
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --perf-event=br_inst_retired.near_taken:upp | ||
// RUN: FileCheck %s --input-file %t --check-prefix=CHECK | ||
|
||
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --skip-symbolization --perf-event=br_misp_retired.all_branches:upp --leading-ip-only | ||
// RUN: FileCheck %s --input-file %t --check-prefix=UNPRED-RAW-PROFILE | ||
// RUN: llvm-profgen --format=text --perfscript=%S/Inputs/cmov_3.perfscript --binary=%S/Inputs/cmov_3.perfbin --output=%t --perf-event=br_misp_retired.all_branches:upp --leading-ip-only | ||
// RUN: FileCheck %s --input-file %t --check-prefix=UNPRED | ||
|
||
// Check that we can use perf event filtering to generate multiple types of | ||
// source-level profiles from a single perf profile. In this case, we generate | ||
// a typical execution frequency profile using br_inst_retired.near_taken LBRs, | ||
// and a branch mispredict profile using br_misp_retired.all_branches sample | ||
// IPs. | ||
|
||
// The source example below is based on perfKernelCpp/cmov_3, except a | ||
// misleading builtin is used to persuade the compiler not to use cmov, which | ||
// induces branch mispredicts. | ||
|
||
// CHECK: sel_arr:20229:0 | ||
// CHECK: 3.1: 627 | ||
// CHECK: 3.2: 627 | ||
// CHECK: 4: 615 | ||
// CHECK: 5: 627 | ||
|
||
// UNPRED: sel_arr:18:0 | ||
// UNPRED: 3.1: 0 | ||
// UNPRED: 3.2: 0 | ||
// UNPRED: 4: 9 | ||
// UNPRED: 5: 0 | ||
|
||
// CHECK-RAW-PROFILE: 3 | ||
// CHECK-RAW-PROFILE-NEXT: 2f0-2fa:303 | ||
// CHECK-RAW-PROFILE-NEXT: 2f0-310:312 | ||
// CHECK-RAW-PROFILE-NEXT: 2ff-310:315 | ||
|
||
// UNPRED-RAW-PROFILE: 1 | ||
// UNPRED-RAW-PROFILE-NEXT: 2fa-2fa:9 | ||
|
||
// original code: | ||
// clang -O2 -gline-tables-only -fdebug-info-for-profiling lit.c | ||
#include <stdlib.h> | ||
|
||
#define N 20000 | ||
#define ITERS 10000 | ||
|
||
static int *m_s1, *m_s2, *m_s3, *m_dst; | ||
|
||
void init(void) { | ||
m_s1 = malloc(sizeof(int)*N); | ||
m_s2 = malloc(sizeof(int)*N); | ||
m_s3 = malloc(sizeof(int)*N); | ||
m_dst = malloc(sizeof(int)*N); | ||
srand(42); | ||
|
||
for (int i = 0; i < N; i++) { | ||
m_s1[i] = rand() % N; | ||
m_s2[i] = 0; | ||
m_s3[i] = 1; | ||
} | ||
} | ||
|
||
void __attribute__((noinline)) sel_arr(int *dst, int *s1, int *s2, int *s3) { | ||
#pragma nounroll | ||
#pragma clang loop vectorize(disable) interleave(disable) | ||
for (int i = 0; i < N; i++) { | ||
int *p = __builtin_expect((s1[i] < 10035), 0) ? &s2[i] : &s3[i]; | ||
dst[i] = *p; | ||
} | ||
} | ||
|
||
int main(void) { | ||
init(); | ||
for(int i=0; i<ITERS; ++i) | ||
sel_arr(m_dst, m_s1, m_s2, m_s3); | ||
return 0; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/ip-duplication.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --use-offset=0 --leading-ip-only | ||
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK | ||
|
||
; Test that we don't over-count samples for duplicated source code when | ||
; building an IP-based profile. | ||
|
||
; The inline-noprobe2.perfbin binary is used for this test because one of the | ||
; partition_pivot_last+3.1 debug locations has a duplication factor of 2 | ||
; encoded into its discriminator. In IP-sample mode, a hit in one instruction | ||
; in the duplicated code does not imply a hit to the other duplicates. | ||
|
||
; The perfscript input includes 1 sample at a location with duplication factor | ||
; of 2, and another sample at the same source location but with no duplication | ||
; factor. These should be summed without duplication factors. Ensure we record | ||
; a count of 1+1=2 (and not 2+1=3) for the 3.1 location. | ||
|
||
;CHECK-LABEL: partition_pivot_last | ||
;CHECK-NEXT: 1: 0 | ||
;CHECK-NEXT: 2: 0 | ||
;CHECK-NEXT: 3: 0 | ||
;CHECK-NEXT: 3.1: 2 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noprobe-skid.perfscript --binary=%S/Inputs/noprobe.perfbin --output=%t --skip-symbolization --leading-ip-only | ||
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE | ||
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noprobe-skid.perfscript --binary=%S/Inputs/noprobe.perfbin --output=%t --leading-ip-only | ||
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK | ||
|
||
; Here we check the ability to ignore LBRs, which is useful for generating | ||
; profiles where only the precise PMU sample IP is of interest. In general the | ||
; IPs need not identify a branch. In this case there are exactly 4 samples, so | ||
; we see only these 4 locations as "hot" and none of the LBR history. | ||
; Compare with noinline-noprobe.test, which includes LBR history. | ||
|
||
; Note that there are two different IPs (5c5 and 5c8) contributing to line | ||
; offset 1 in bar. This tests that sample counts corresponding to the same | ||
; debug location are summed into that location in the profile rather than the | ||
; maximum being taken, as happens with basic block execution count profiles. | ||
|
||
;CHECK: bar:14:0 | ||
;CHECK: 0: 0 | ||
;CHECK: 1: 2 | ||
;CHECK: 2: 1 | ||
;CHECK: 4: 0 | ||
;CHECK: 5: 0 | ||
;CHECK: foo:5:0 | ||
;CHECK: 0: 0 | ||
;CHECK: 1: 0 | ||
;CHECK: 2: 0 | ||
;CHECK: 3: 1 | ||
;CHECK: 4: 0 | ||
;CHECK: 5: 0 | ||
|
||
CHECK-RAW-PROFILE: 4 | ||
CHECK-RAW-PROFILE-NEXT: 5c5-5c5:1 | ||
CHECK-RAW-PROFILE-NEXT: 5c8-5c8:1 | ||
CHECK-RAW-PROFILE-NEXT: 5d7-5d7:1 | ||
CHECK-RAW-PROFILE-NEXT: 62f-62f:1 | ||
|
||
; original code: | ||
; clang -O3 -g -fdebug-info-for-profiling test.c -fno-inline -o a.out | ||
#include <stdio.h> | ||
|
||
int bar(int x, int y) { | ||
if (x % 3) { | ||
return x - y; | ||
} | ||
return x + y; | ||
} | ||
|
||
void foo() { | ||
int s, i = 0; | ||
while (i++ < 4000 * 4000) | ||
if (i % 91) s = bar(i, s); else s += 30; | ||
printf("sum is %d\n", s); | ||
} | ||
|
||
int main() { | ||
foo(); | ||
return 0; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,17 @@ static cl::opt<bool> | |
"and produce context-insensitive profile.")); | ||
cl::opt<bool> ShowDetailedWarning("show-detailed-warning", | ||
cl::desc("Show detailed warning message.")); | ||
cl::opt<bool> | ||
LeadingIPOnly("leading-ip-only", | ||
cl::desc("Form a profile based only on sample IPs")); | ||
|
||
static cl::list<std::string> PerfEventFilter( | ||
"perf-event", | ||
cl::desc("Ignore samples not matching the given event names")); | ||
Comment on lines
+49
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: name it |
||
static cl::alias | ||
PerfEventFilterPlural("perf-events", cl::CommaSeparated, | ||
cl::desc("Comma-delimited version of -perf-event"), | ||
cl::aliasopt(PerfEventFilter)); | ||
|
||
extern cl::opt<std::string> PerfTraceFilename; | ||
extern cl::opt<bool> ShowDisassemblyOnly; | ||
|
@@ -404,13 +415,18 @@ PerfScriptReader::convertPerfDataToTrace(ProfiledBinary *Binary, bool SkipPID, | |
} | ||
} | ||
|
||
// If filtering by events was requested, additionally request the "event" | ||
// field. | ||
const std::string FieldList = | ||
PerfEventFilter.empty() ? "ip,brstack" : "event,ip,brstack"; | ||
|
||
// Run perf script again to retrieve events for PIDs collected above | ||
SmallVector<StringRef, 8> ScriptSampleArgs; | ||
ScriptSampleArgs.push_back(PerfPath); | ||
ScriptSampleArgs.push_back("script"); | ||
ScriptSampleArgs.push_back("--show-mmap-events"); | ||
ScriptSampleArgs.push_back("-F"); | ||
ScriptSampleArgs.push_back("ip,brstack"); | ||
ScriptSampleArgs.push_back(FieldList); | ||
ScriptSampleArgs.push_back("-i"); | ||
ScriptSampleArgs.push_back(PerfData); | ||
if (!PIDs.empty()) { | ||
|
@@ -575,14 +591,54 @@ bool PerfScriptReader::extractLBRStack(TraceStream &TraceIt, | |
|
||
// Skip the leading instruction pointer. | ||
size_t Index = 0; | ||
|
||
StringRef EventName; | ||
// Skip a perf event name. This may or may not exist. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "This may or may not exist." <-- I can't parse this. Can you add comment with expected input format? |
||
if (Records.size() > Index && Records[Index].ends_with(":")) { | ||
EventName = Records[Index].ltrim().rtrim(':'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the format in |
||
Index++; | ||
|
||
if (PerfEventFilter.empty()) { | ||
WithColor::warning() << "No --perf-event filter was specified, but an " | ||
"\"event\" field was found in line " | ||
<< TraceIt.getLineNumber() << ": " | ||
<< TraceIt.getCurrentLine() << "\n"; | ||
} else if (std::find(PerfEventFilter.begin(), PerfEventFilter.end(), | ||
EventName) == PerfEventFilter.end()) { | ||
TraceIt.advance(); | ||
return false; | ||
} | ||
|
||
} else if (!PerfEventFilter.empty()) { | ||
WithColor::warning() << "A --perf-event filter was specified, but no " | ||
"\"event\" field found in line " | ||
<< TraceIt.getLineNumber() << ": " | ||
<< TraceIt.getCurrentLine() << "\n"; | ||
Comment on lines
+602
to
+616
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we are emitting warning on each sample, and this is going to make the warnings super noisy. |
||
} | ||
|
||
uint64_t LeadingAddr; | ||
if (!Records.empty() && !Records[0].contains('/')) { | ||
if (Records[0].getAsInteger(16, LeadingAddr)) { | ||
if (Records.size() > Index && !Records[Index].contains('/')) { | ||
if (Records[Index].getAsInteger(16, LeadingAddr)) { | ||
WarnInvalidLBR(TraceIt); | ||
TraceIt.advance(); | ||
return false; | ||
} | ||
Index = 1; | ||
Index++; | ||
} | ||
|
||
// We assume that if we saw an event name we also saw a leading addr. | ||
// In other words, LeadingAddr is set if Index is 1 or 2. | ||
if (LeadingIPOnly && Index > 0) { | ||
// Form a profile only from the sample IP. Do not assume an LBR stack | ||
// follows, and ignore it if it does. | ||
uint64_t SampleIP = Binary->canonicalizeVirtualAddress(LeadingAddr); | ||
bool SampleIPIsInternal = Binary->addressIsCode(SampleIP); | ||
if (SampleIPIsInternal) { | ||
// Form a half LBR entry where the sample IP is the destination. | ||
LBRStack.emplace_back(LBREntry(SampleIP, SampleIP)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't really fit in LBRStack, instead, it fits CallStack better. All of the special case from |
||
} | ||
TraceIt.advance(); | ||
return !LBRStack.empty(); | ||
} | ||
|
||
// Now extract LBR samples - note that we do not reverse the | ||
|
@@ -902,6 +958,20 @@ void PerfScriptReader::computeCounterFromLBR(const PerfSample *Sample, | |
uint64_t Repeat) { | ||
SampleCounter &Counter = SampleCounters.begin()->second; | ||
uint64_t EndAddress = 0; | ||
|
||
if (LeadingIPOnly) { | ||
assert(Sample->LBRStack.size() == 1 && | ||
"Expected only half LBR entries for ip-only mode"); | ||
const LBREntry &LBR = *(Sample->LBRStack.begin()); | ||
uint64_t SourceAddress = LBR.Source; | ||
uint64_t TargetAddress = LBR.Target; | ||
if (SourceAddress == TargetAddress && | ||
Binary->addressIsCode(TargetAddress)) { | ||
Counter.recordRangeCount(SourceAddress, TargetAddress, Repeat); | ||
} | ||
return; | ||
} | ||
|
||
for (const LBREntry &LBR : Sample->LBRStack) { | ||
uint64_t SourceAddress = LBR.Source; | ||
uint64_t TargetAddress = LBR.Target; | ||
|
@@ -1062,6 +1132,18 @@ bool PerfScriptReader::isLBRSample(StringRef Line) { | |
Line.trim().split(Records, " ", 2, false); | ||
if (Records.size() < 2) | ||
return false; | ||
// Check if there is an event name before the leading IP. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update the header comment for this function to include a representation of LBR sample with event name |
||
// If there is, it will be in Records[0]. To skip it, we'll re-split on | ||
// Records[1], which should contain the rest of the line. | ||
if (Records[0].contains(":")) { | ||
// If so, consume the event name and continue processing the rest of the | ||
// line. | ||
StringRef IPAndLBR = Records[1].ltrim(); | ||
Records.clear(); | ||
IPAndLBR.split(Records, " ", 2, false); | ||
if (Records.size() < 2) | ||
return false; | ||
} | ||
if (Records[1].starts_with("0x") && Records[1].contains('/')) | ||
return true; | ||
return false; | ||
|
@@ -1152,6 +1234,18 @@ void PerfScriptReader::warnInvalidRange() { | |
const PerfSample *Sample = Item.first.getPtr(); | ||
uint64_t Count = Item.second; | ||
uint64_t EndAddress = 0; | ||
|
||
if (LeadingIPOnly) { | ||
assert(Sample->LBRStack.size() == 1 && | ||
"Expected only half LBR entries for ip-only mode"); | ||
const LBREntry &LBR = *(Sample->LBRStack.begin()); | ||
if (LBR.Source == LBR.Target && LBR.Source != ExternalAddr) { | ||
// This is an leading-addr-only profile. | ||
Ranges[{LBR.Source, LBR.Source}] += Count; | ||
} | ||
continue; | ||
} | ||
|
||
for (const LBREntry &LBR : Sample->LBRStack) { | ||
uint64_t SourceAddress = LBR.Source; | ||
uint64_t StartAddress = LBR.Target; | ||
|
@@ -1199,11 +1293,15 @@ void PerfScriptReader::warnInvalidRange() { | |
!Binary->addressIsCode(EndAddress)) | ||
continue; | ||
|
||
if (!Binary->addressIsCode(StartAddress) || | ||
!Binary->addressIsTransfer(EndAddress)) { | ||
InstNotBoundary += I.second; | ||
WarnInvalidRange(StartAddress, EndAddress, EndNotBoundaryMsg); | ||
} | ||
// IP samples can indicate activity on individual instructions rather than | ||
// basic blocks/edges. In this mode, don't warn if sampled IPs aren't | ||
// branches. | ||
if (!LeadingIPOnly) | ||
if (!Binary->addressIsCode(StartAddress) || | ||
!Binary->addressIsTransfer(EndAddress)) { | ||
InstNotBoundary += I.second; | ||
WarnInvalidRange(StartAddress, EndAddress, EndNotBoundaryMsg); | ||
} | ||
|
||
auto *FRange = Binary->findFuncRange(StartAddress); | ||
if (!FRange) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,8 @@ cl::opt<bool> InferMissingFrames( | |
"Infer missing call frames due to compiler tail call elimination."), | ||
llvm::cl::Optional); | ||
|
||
extern cl::opt<bool> LeadingIPOnly; | ||
|
||
using namespace llvm; | ||
using namespace sampleprof; | ||
|
||
|
@@ -388,18 +390,25 @@ void ProfileGeneratorBase::updateBodySamplesforFunctionProfile( | |
// Use the maximum count of samples with same line location | ||
uint32_t Discriminator = getBaseDiscriminator(LeafLoc.Location.Discriminator); | ||
|
||
// Use duplication factor to compensated for loop unroll/vectorization. | ||
// Note that this is only needed when we're taking MAX of the counts at | ||
// the location instead of SUM. | ||
Count *= getDuplicationFactor(LeafLoc.Location.Discriminator); | ||
|
||
ErrorOr<uint64_t> R = | ||
FunctionProfile.findSamplesAt(LeafLoc.Location.LineOffset, Discriminator); | ||
|
||
uint64_t PreviousCount = R ? R.get() : 0; | ||
if (PreviousCount <= Count) { | ||
if (LeadingIPOnly) { | ||
// When computing an IP-based profile we take the SUM of counts at the | ||
// location instead of applying duplication factors and taking the MAX. | ||
Comment on lines
+394
to
+395
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regardless of using IP sampling or LBR sampling, sample profile loader always take the max for profile annotation, so this is not correct in the general sense. I'm guessing what you meant is when consuming mispredict profile, sum is used at profile use time. In that case, this needs to be narrowed to only mispredict or certain profile type, not general IP profile. |
||
FunctionProfile.addBodySamples(LeafLoc.Location.LineOffset, Discriminator, | ||
Count - PreviousCount); | ||
Count); | ||
} else { | ||
// Otherwise, use duplication factor to compensate for loop | ||
// unroll/vectorization. Note that this is only needed when we're taking | ||
// MAX of the counts at the location instead of SUM. | ||
Count *= getDuplicationFactor(LeafLoc.Location.Discriminator); | ||
|
||
ErrorOr<uint64_t> R = FunctionProfile.findSamplesAt( | ||
LeafLoc.Location.LineOffset, Discriminator); | ||
|
||
uint64_t PreviousCount = R ? R.get() : 0; | ||
if (PreviousCount <= Count) { | ||
FunctionProfile.addBodySamples(LeafLoc.Location.LineOffset, Discriminator, | ||
Count - PreviousCount); | ||
} | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is a bit confusing. I think what you meant is to ignore LBRs and only consuming leading IPs?
In that case, we should name it something like
ignore-lbr-samples
, to be consistent with existingignore-stack-samples
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, does use of
profiled-event
requireleading-ip-only
? Ifleading-ip-only
is not specified, what is the expected behavior?