Skip to content

Improve documented sampling profiler steps to best known methods #88438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 55 additions & 16 deletions clang/docs/UsersManual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2314,6 +2314,8 @@ are listed below.
on ELF targets when using the integrated assembler. This flag currently
only has an effect on ELF targets.

.. _funique_internal_linkage_names:

.. option:: -f[no]-unique-internal-linkage-names

Controls whether Clang emits a unique (best-effort) symbol name for internal
Expand Down Expand Up @@ -2443,27 +2445,41 @@ usual build cycle when using sample profilers for optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
to map instructions back to source line locations.
to map instructions back to source line locations. The usefulness of this
DWARF information can be improved with the ``-fdebug-info-for-profiling``
and ``-funique-internal-linkage-names`` options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using -funique-internal-linkage-names will cause an identifier to be produced based on the source file name passed into the compiler, so if a build system is used that passes the name using absolute paths, compiling the file source.c would result in different unique identifiers depending on whether it is compiled as:
clang /home/dev _foo/source.c …
or
clang/home/dev_bar/source.c …

even when it is the same file, but just using different base directories. This could limit profile data collected when using one workspace being able to be applied to a different workspace because the function names will no longer match, unless there is some behavior added to relax the default function name matching in the SampleProfileLoaderPass. Should a note be added that when using this option the build for data collection needs to be using the same base directory as the feedback run, if absolute paths are used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need -fdebug-info-for-profiling and -funique-internal-linkage-names for step 4 ("-fprofile-sample-use=code.prof") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @chrulski-intel -- good point. I'll add a brief note.

@williamweixiao, I think you're right that they should match. I'll update those steps.


On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
On Linux:

.. code-block:: console

$ clang++ -O2 -gline-tables-only code.cc -o code
$ clang++ -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
code.cc -o code

While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
``-gdwarf`` to include DWARF debug information:

.. code-block:: console
.. code-block:: winbatch

> clang-cl /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

.. note::

$ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
:ref:`-funique-internal-linkage-names <funique_internal_linkage_names>`
generates unique names based on given command-line source file paths. If
your build system uses absolute source paths and these paths may change
between steps 1 and 4, then the uniqued function names may change and result
in unused profile data. Consider omitting this option in such cases.

2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.

Two such profilers are the the Linux Perf profiler
Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune
<https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
Expand All @@ -2477,7 +2493,9 @@ usual build cycle when using sample profilers for optimization:

.. code-block:: console

$ perf record -b ./code
$ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code

If the event above is unavailable, ``branches:u`` is probably next-best.

Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,
Expand Down Expand Up @@ -2527,21 +2545,42 @@ usual build cycle when using sample profilers for optimization:
that executes faster than the original one. Note that you are not
required to build the code with the exact same arguments that you
used in the first step. The only requirement is that you build the code
with ``-gline-tables-only`` and ``-fprofile-sample-use``.
with the same debug info options and ``-fprofile-sample-use``.

On Linux:

.. code-block:: console

$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
$ clang++ -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
-fprofile-sample-use=code.prof code.cc -o code

[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
edge counters. The profile inference algorithm (profi) can be used to infer
missing blocks and edge counts, and improve the quality of profile data.
Enable it with ``-fsample-profile-use-profi``.
On Windows:

.. code-block:: console
.. code-block:: winbatch

> clang-cl /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
/fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
edge counters. The profile inference algorithm (profi) can be used to infer
missing blocks and edge counts, and improve the quality of profile data.
Enable it with ``-fsample-profile-use-profi``. For example, on Linux:

.. code-block:: console

$ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
-fprofile-sample-use=code.prof code.cc -o code

On Windows:

.. code-block:: winbatch

$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
-fsample-profile-use-profi code.cc -o code
> clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
/fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

Sample Profile Formats
""""""""""""""""""""""
Expand Down