Skip to content

[llvm-profgen] Support creating profiles of arbitrary events #99026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 21, 2024

Conversation

tcreech-intel
Copy link
Contributor

@tcreech-intel tcreech-intel commented Jul 16, 2024

This change introduces two options which may be used to create profiles of arbitrary PMU events.

  1. --leading-ip-only provides a simple sample-IP-based profile mode. This is not useful for building a profile of execution frequency, but it is useful for building new types of profiles.

    For example, to build a profile of unpredictable branches:

    perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen --perfdata perf.data --leading-ip-only ...
    
  2. --perf-event=event enables the creation of a profile concerned with a specific event or set of events. The names given should match the "event" field as emitted by perf-script(1).

    This option has two spellings: --perf-event and --perf-events. The plural spelling accepts a comma-separated list. The singular spelling appends a single event name to the set of events which will be used. This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles from a single perf record collection. For example, to generate both execution frequency and branch mispredict profiles:

perf record -c 1000003 -b -e br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
llvm-profgen --output execution.prof --perf-event=br_inst_retired.near_taken:upp ...
llvm-profgen --leading-ip-only --output unpredictable.prof --perf-event=br_misp_retired.all_branches:upp ...

These additions are in support of more general HWPGO1, allowing feedback from a wider range of hardware events.

Footnotes

  1. https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf

This change introduces two options which may be used to create profiles
of arbitrary PMU events.

1. `--leading-ip-only` provides a simple sample-IP-based profile mode.
   This is not useful for building a profile of execution frequency, but
   it is useful for building new types of profiles.

   For example, to build a profile of unpredictable branches:

   ```sh
   perf record -b -e branch-misses:upp -o perf.data ...
   llvm-profgen --perfdata perf.data --leading-ip-only ...
   ```

2. `--perf-event=event` enables the creation of a profile concerned with
   a specific event or set of events. The names given should match the
   "event" field as emitted by perf-script(1).

   This option has two spellings: `--perf-event` and `--perf-events`.
   The plural spelling accepts a comma-separated list. The singular
   spelling appends a single event name to the set of events which will
   be used. This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles
from a single `perf record` collection. For example, to generate both
execution frequency and branch mispredict profiles:

   ```sh
   perf record -c 1000003 -b -e br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
   llvm-profgen --output hotness.prof --perf-event=br_inst_retired.near_taken:upp ...
   llvm-profgen --leading-ip-only --output unpredictable.prof --perf-event=br_misp_retired.all_branches:upp ...
   ```
Copy link

github-actions bot commented Jul 16, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@tcreech-intel tcreech-intel marked this pull request as ready for review July 16, 2024 13:05
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jul 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2024

@llvm/pr-subscribers-pgo

Author: Tim Creech (tcreech-intel)

Changes

This change introduces two options which may be used to create profiles of arbitrary PMU events.

  1. --leading-ip-only provides a simple sample-IP-based profile mode. This is not useful for building a profile of execution frequency, but it is useful for building new types of profiles.

    For example, to build a profile of unpredictable branches:

    perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen --perfdata perf.data --leading-ip-only ...
    
  2. --perf-event=event enables the creation of a profile concerned with a specific event or set of events. The names given should match the "event" field as emitted by perf-script(1).

    This option has two spellings: --perf-event and --perf-events. The plural spelling accepts a comma-separated list. The singular spelling appends a single event name to the set of events which will be used. This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles from a single perf record collection. For example, to generate both execution frequency and branch mispredict profiles:

perf record -c 1000003 -b -e br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
llvm-profgen --output execution.prof --perf-event=br_inst_retired.near_taken:upp ...
llvm-profgen --leading-ip-only --output unpredictable.prof --perf-event=br_misp_retired.all_branches:upp ...

Patch is 59.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99026.diff

9 Files Affected:

  • (added) llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfbin ()
  • (added) llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfscript (+39)
  • (added) llvm/test/tools/llvm-profgen/Inputs/ip-duplication.perfscript (+2)
  • (added) llvm/test/tools/llvm-profgen/Inputs/noprobe-skid.perfscript (+5)
  • (added) llvm/test/tools/llvm-profgen/event-filtering.test (+78)
  • (added) llvm/test/tools/llvm-profgen/iponly-nodupfactor.test (+22)
  • (added) llvm/test/tools/llvm-profgen/iponly.test (+58)
  • (modified) llvm/tools/llvm-profgen/PerfReader.cpp (+107-9)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+20-11)
diff --git a/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfbin b/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfbin
new file mode 100755
index 0000000000000..7a1543041f805
Binary files /dev/null and b/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfbin differ
diff --git a/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfscript b/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfscript
new file mode 100644
index 0000000000000..3d29d444d56bb
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/cmov_3.perfscript
@@ -0,0 +1,39 @@
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/3//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/24//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/29//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//- 
+br_misp_retired.all_branches:upp:            4012fa 0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/24//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/4//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/2//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/3//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//- 
+br_misp_retired.all_branches:upp:            4012fa 0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/24//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/24//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/24//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/6//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/21//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/14//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/2//- 
+br_misp_retired.all_branches:upp:            4012fa 0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/26//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/5//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/24//-  0x401310/0x4012f0/P/-/-/2//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//- 
+br_misp_retired.all_branches:upp:            4012fa 0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/2//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/2//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/27//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/2//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/29//-  0x401310/0x4012f0/P/-/-/24//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/19//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/21//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/4//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/28//-  0x401310/0x4012f0/P/-/-/3//-  0x401310/0x4012f0/P/-/-/26//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/3//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/4//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/26//-  0x401310/0x4012f0/P/-/-/25//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/2//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/27//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/1//- 
+  br_inst_retired.near_taken:upp:            401310 0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/5//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/3//-  0x401310/0x4012f0/P/-/-/22//-  0x4012fa/0x4012ff/M/-/-/1//-  0x401310/0x4012f0/P/-/-/29//-  0x401310/0x4012f0/P/-/-/23//-  0x4012fa/0x4012ff/M/-/-/2//-  0x401310/0x4012f0/P/-/-/1//-  0x4012fa/0x4012ff/P/-/-/1//-  0x401310/0x4012f0/P/-/-/26/...
[truncated]

@williamweixiao
Copy link
Contributor

Hi @tcreech-intel you can mention that this enhancement is for HWPGO (https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf) to support a wide range of hardware events in the future.

Copy link
Contributor

@williamweixiao williamweixiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some minor comments.

@tcreech-intel
Copy link
Contributor Author

Thanks, @williamweixiao! I have fixed the issues you found.

@williamweixiao williamweixiao merged commit 0caf0c9 into llvm:main Jul 21, 2024
7 checks passed
@WenleiHe
Copy link
Member

@williamweixiao @tcreech-intel for this kind of changes, can you give other reviews and code owners some time to respond before merging? I think a week is what's recommended for non-trivial changes.

Also as I mentioned, for HWPGO, could you send an RFC to outline: 1) high level design (also including how this interacts with other PGO, and usage instructions), 2) performance results, 3) future plan?

HWPGO is a good exploration, but when it comes to upstreaming as part of LLVM, generally the community would like to see results (performance) first.

@@ -41,6 +41,17 @@ static cl::opt<bool>
"and produce context-insensitive profile."));
cl::opt<bool> ShowDetailedWarning("show-detailed-warning",
cl::desc("Show detailed warning message."));
cl::opt<bool>
LeadingIPOnly("leading-ip-only",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a bit confusing. I think what you meant is to ignore LBRs and only consuming leading IPs?

In that case, we should name it something like ignore-lbr-samples, to be consistent with existing ignore-stack-samples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, does use of profiled-event require leading-ip-only? If leading-ip-only is not specified, what is the expected behavior?

@@ -575,14 +591,54 @@ bool PerfScriptReader::extractLBRStack(TraceStream &TraceIt,

// Skip the leading instruction pointer.
size_t Index = 0;

StringRef EventName;
// Skip a perf event name. This may or may not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This may or may not exist." <-- I can't parse this.

Can you add comment with expected input format?

Comment on lines +602 to +616
WithColor::warning() << "No --perf-event filter was specified, but an "
"\"event\" field was found in line "
<< TraceIt.getLineNumber() << ": "
<< TraceIt.getCurrentLine() << "\n";
} else if (std::find(PerfEventFilter.begin(), PerfEventFilter.end(),
EventName) == PerfEventFilter.end()) {
TraceIt.advance();
return false;
}

} else if (!PerfEventFilter.empty()) {
WithColor::warning() << "A --perf-event filter was specified, but no "
"\"event\" field found in line "
<< TraceIt.getLineNumber() << ": "
<< TraceIt.getCurrentLine() << "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are emitting warning on each sample, and this is going to make the warnings super noisy.

@tcreech-intel
Copy link
Contributor Author

Hi @WenleiHe; my apologies. We will certainly allow more time for feedback moving forward.

Thanks for your guidance and comments here. I will prepare another PR to address your review comments.

We plan to prepare an RFC in the near future as requested. In the very short term our intent was to provide the branch mispredict parts of HWPGO with LLVM19 as a follow-up to our EuroLLVM presentation on HWPGO. However, that is not to say that we mean to rush the review process.

The current usage is very much opt-in and relatively ad-hoc: one needs to run llvm-profgen a second time to produce a mispredict profile, then name this secondary profile when compiling to annotate with !mispredict metadata. (Similar to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InsertPrefetch.cpp .) The RFC will seek to improve this usage model so that it could potentially scale to many profile types.

@WenleiHe
Copy link
Member

@tcreech-intel @williamweixiao I was there at the EuroLLVM talk. While I appreciate the work you guys are doing, I think what's missing is practical result beyond hand-crafted microbenchmark. It's one thing to explore ideas with prototype leveraging LLVM project, and then a different discussion when it comes to making prototypes into LLVM project itself, which means the entire community will be committed to carrying and maintaining such features/optimizations going forward.

For that to happen, generally we need 1) practical benefit demonstrated beyond novelty, beyond toy-program showcasing the idea, 2) we need to have a coherent design including its interaction with other existing features. We we have a ton of prototypes that we didn't upstream too, including PEBS value profiling work.

We plan to prepare an RFC in the near future as requested. In the very short term our intent was to provide the branch mispredict parts of HWPGO with LLVM19 as a follow-up to our EuroLLVM presentation on HWPGO. However, that is not to say that we mean to rush the review process.

It's pointless to send RFC after the fact. The purpose of RFC is to seek feedback, build consensus on worthiness of feature/optimization, and also on design/implementation, all before anything is committed upstream. This is a convention of how everyone works in LLVM community.

I'd like to suggest pausing committing any of these changes related to HWPGO before RFC is out and consensus is built.

StringRef EventName;
// Skip a perf event name. This may or may not exist.
if (Records.size() > Index && Records[Index].ends_with(":")) {
EventName = Records[Index].ltrim().rtrim(':');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the format in cmov_3.perfscript, event name should be extracted from extractCallstack, rather than extractLBRStack?

@@ -1062,6 +1132,18 @@ bool PerfScriptReader::isLBRSample(StringRef Line) {
Line.trim().split(Records, " ", 2, false);
if (Records.size() < 2)
return false;
// Check if there is an event name before the leading IP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the header comment for this function to include a representation of LBR sample with event name

bool SampleIPIsInternal = Binary->addressIsCode(SampleIP);
if (SampleIPIsInternal) {
// Form a half LBR entry where the sample IP is the destination.
LBRStack.emplace_back(LBREntry(SampleIP, SampleIP));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really fit in LBRStack, instead, it fits CallStack better. All of the special case from warnInvalidRange can be avoided if we handle these events as part of extractCallstack.

Comment on lines +394 to +395
// When computing an IP-based profile we take the SUM of counts at the
// location instead of applying duplication factors and taking the MAX.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of using IP sampling or LBR sampling, sample profile loader always take the max for profile annotation, so this is not correct in the general sense. I'm guessing what you meant is when consuming mispredict profile, sum is used at profile use time. In that case, this needs to be narrowed to only mispredict or certain profile type, not general IP profile.

Comment on lines +49 to +50
"perf-event",
cl::desc("Ignore samples not matching the given event names"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it profiled-event with description Perf event to generate profile for, e.g. br_misp_retired.all_branches. Program execution count profile is generated when unspecified.

@@ -41,6 +41,17 @@ static cl::opt<bool>
"and produce context-insensitive profile."));
cl::opt<bool> ShowDetailedWarning("show-detailed-warning",
cl::desc("Show detailed warning message."));
cl::opt<bool>
LeadingIPOnly("leading-ip-only",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, does use of profiled-event require leading-ip-only? If leading-ip-only is not specified, what is the expected behavior?

@WenleiHe
Copy link
Member

I will prepare another PR to address your review comments.

Thanks. I went over the patch and I think we'd have a cleaner and more compatible implementation if this is all handled as part of callstack sample rather than lbr sample. If we do that, it might actually be easier to unland this first and review/reland a reworked version.

tcreech-intel added a commit to tcreech-intel/llvm-project that referenced this pull request Jul 23, 2024
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
This change introduces two options which may be used to create profiles
of arbitrary PMU events.

1. `--leading-ip-only` provides a simple sample-IP-based profile mode.
This is not useful for building a profile of execution frequency, but it
is useful for building new types of profiles.

   For example, to build a profile of unpredictable branches:

perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen
--perfdata perf.data --leading-ip-only ...

2. `--perf-event=event` enables the creation of a profile concerned with
a specific event or set of events. The names given should match the
"event" field as emitted by perf-script(1).

This option has two spellings: `--perf-event` and `--perf-events`. The
plural spelling accepts a comma-separated list. The singular spelling
appends a single event name to the set of events which will be used.
This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles from
a single `perf record` collection. For example, to generate both
execution frequency and branch mispredict profiles:

perf record -c 1000003 -b -e
br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
llvm-profgen --output execution.prof
--perf-event=br_inst_retired.near_taken:upp ...
llvm-profgen --leading-ip-only --output unpredictable.prof
--perf-event=br_misp_retired.all_branches:upp ...

These additions are in support of more general HWPGO[^1], allowing
feedback from a wider range of hardware events.

[^1]:
https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf

---------

Co-authored-by: Tim Creech <[email protected]>

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251204
williamweixiao pushed a commit that referenced this pull request Aug 2, 2024
Revert #99826 and #99026 to allow for additional input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants