Skip to content

Commit 66274eb

Browse files
Improve documented sampling profiler steps to best known methods (#88438)
1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`, which improve the usefulness of debug info for profiling. 2. Recommend the use of `br_inst_retired.near_taken:uppp`, which provides the most precise results on supporting hardware. Mention `branches:u` as a more portable backup. Both should portray execution counts better than the default event (`cycles`) and have a better chance of working as an unprivileged user due to the `:u` modifier.
1 parent ec6c0a2 commit 66274eb

File tree

1 file changed

+55
-16
lines changed

1 file changed

+55
-16
lines changed

clang/docs/UsersManual.rst

Lines changed: 55 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2319,6 +2319,8 @@ are listed below.
23192319
on ELF targets when using the integrated assembler. This flag currently
23202320
only has an effect on ELF targets.
23212321

2322+
.. _funique_internal_linkage_names:
2323+
23222324
.. option:: -f[no]-unique-internal-linkage-names
23232325

23242326
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2448,27 +2450,41 @@ usual build cycle when using sample profilers for optimization:
24482450
usual build flags that you always build your application with. The only
24492451
requirement is that DWARF debug info including source line information is
24502452
generated. This DWARF information is important for the profiler to be able
2451-
to map instructions back to source line locations.
2453+
to map instructions back to source line locations. The usefulness of this
2454+
DWARF information can be improved with the ``-fdebug-info-for-profiling``
2455+
and ``-funique-internal-linkage-names`` options.
24522456

2453-
On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
2457+
On Linux:
24542458

24552459
.. code-block:: console
24562460
2457-
$ clang++ -O2 -gline-tables-only code.cc -o code
2461+
$ clang++ -O2 -gline-tables-only \
2462+
-fdebug-info-for-profiling -funique-internal-linkage-names \
2463+
code.cc -o code
24582464
24592465
While MSVC-style targets default to CodeView debug information, DWARF debug
24602466
information is required to generate source-level LLVM profiles. Use
24612467
``-gdwarf`` to include DWARF debug information:
24622468

2463-
.. code-block:: console
2469+
.. code-block:: winbatch
2470+
2471+
> clang-cl /O2 -gdwarf -gline-tables-only ^
2472+
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2473+
code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2474+
2475+
.. note::
24642476

2465-
$ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
2477+
:ref:`-funique-internal-linkage-names <funique_internal_linkage_names>`
2478+
generates unique names based on given command-line source file paths. If
2479+
your build system uses absolute source paths and these paths may change
2480+
between steps 1 and 4, then the uniqued function names may change and result
2481+
in unused profile data. Consider omitting this option in such cases.
24662482

24672483
2. Run the executable under a sampling profiler. The specific profiler
24682484
you use does not really matter, as long as its output can be converted
24692485
into the format that the LLVM optimizer understands.
24702486

2471-
Two such profilers are the the Linux Perf profiler
2487+
Two such profilers are the Linux Perf profiler
24722488
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
24732489
available as part of `Intel VTune
24742490
<https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
@@ -2482,7 +2498,9 @@ usual build cycle when using sample profilers for optimization:
24822498

24832499
.. code-block:: console
24842500
2485-
$ perf record -b ./code
2501+
$ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
2502+
2503+
If the event above is unavailable, ``branches:u`` is probably next-best.
24862504

24872505
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
24882506
Record (LBR) to record call chains. While this is not strictly required,
@@ -2532,21 +2550,42 @@ usual build cycle when using sample profilers for optimization:
25322550
that executes faster than the original one. Note that you are not
25332551
required to build the code with the exact same arguments that you
25342552
used in the first step. The only requirement is that you build the code
2535-
with ``-gline-tables-only`` and ``-fprofile-sample-use``.
2553+
with the same debug info options and ``-fprofile-sample-use``.
2554+
2555+
On Linux:
25362556

25372557
.. code-block:: console
25382558
2539-
$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
2559+
$ clang++ -O2 -gline-tables-only \
2560+
-fdebug-info-for-profiling -funique-internal-linkage-names \
2561+
-fprofile-sample-use=code.prof code.cc -o code
25402562
2541-
[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2542-
edge counters. The profile inference algorithm (profi) can be used to infer
2543-
missing blocks and edge counts, and improve the quality of profile data.
2544-
Enable it with ``-fsample-profile-use-profi``.
2563+
On Windows:
25452564

2546-
.. code-block:: console
2565+
.. code-block:: winbatch
2566+
2567+
> clang-cl /O2 -gdwarf -gline-tables-only ^
2568+
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2569+
/fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2570+
2571+
[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2572+
edge counters. The profile inference algorithm (profi) can be used to infer
2573+
missing blocks and edge counts, and improve the quality of profile data.
2574+
Enable it with ``-fsample-profile-use-profi``. For example, on Linux:
2575+
2576+
.. code-block:: console
2577+
2578+
$ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \
2579+
-fdebug-info-for-profiling -funique-internal-linkage-names \
2580+
-fprofile-sample-use=code.prof code.cc -o code
2581+
2582+
On Windows:
2583+
2584+
.. code-block:: winbatch
25472585
2548-
$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
2549-
-fsample-profile-use-profi code.cc -o code
2586+
> clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^
2587+
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2588+
/fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
25502589
25512590
Sample Profile Formats
25522591
""""""""""""""""""""""

0 commit comments

Comments
 (0)