[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

steakhal · 2025-02-11T12:53:37Z

No description provided.

…mance issues

llvmbot · 2025-02-11T12:53:59Z

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/126724.diff

3 Files Affected:

(modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst (+93-3)
(added) clang/docs/analyzer/images/flamegraph.png ()
(added) clang/docs/analyzer/images/uftrace_detailed.png ()

diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..6d1a5f126223d93 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
 Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
 A translation unit contains multiple entry points, each of which take multiple steps to analyze.
 
+Performance analysis using ``-ftime-trace``
+===========================================
+
 You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
 You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
 or using `speedscope <https://speedscope.app>`_.
@@ -19,9 +22,8 @@ Here is an example of a time trace produced with
 .. code-block:: bash
    :caption: Clang Static Analyzer invocation to generate a time trace of string.c analysis.
 
-   clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
-         -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
-         -verify ./clang/test/Analysis/string.c \
+   clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
          -ftime-trace=trace.json -ftime-trace-granularity=1
 
 .. image:: ../images/speedscope.png
@@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
 Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
 for a single entry point.
 You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting sampling-based profiling.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+   :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+   :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+   perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+   /path/to/FlameGraph/flamegraph.pl perf.folded  > perf.svg
+   firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you can use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+It will also consume many gigabites of storage for a single trace unless filter flags are used during recording.
+
+.. code-block:: bash
+   :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
+
+   uftrace record  clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+   uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the common options, refer to the ``uftrace`` `documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.
+
+Similar filters can be applied for dumping too. That way you can reuse the same (detailed)
+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+If you do not apply filters on recording, you will collect a large trace and every dump operation
+would need to sieve through the much larger recording which may be annoying if done repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+   cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.png b/clang/docs/analyzer/images/flamegraph.png
new file mode 100644
index 000000000000000..b16ec90b9e600db
Binary files /dev/null and b/clang/docs/analyzer/images/flamegraph.png differ
diff --git a/clang/docs/analyzer/images/uftrace_detailed.png b/clang/docs/analyzer/images/uftrace_detailed.png
new file mode 100644
index 000000000000000..fcf681909d07068
Binary files /dev/null and b/clang/docs/analyzer/images/uftrace_detailed.png differ

steakhal · 2025-02-11T12:54:01Z

This is the continuation of #126520
Sorry for the complications again. -.-

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

…mance issues (llvm#126724)

steakhal added 10 commits February 10, 2025 15:12

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

94f0b3c

…mance issues

Reduce the quality of the images, harmonize using png format

cd8e88f

Simplify and harmonize clang invocations in the docs

169664c

Rephrase the Perf introduction

aa5a285

s/could/can substitution

1b105e0

Fix through -> thorough typo

3b2d323

Highlight high disk usage for uftrace

7a76bd5

Move doc link

4aa1f34

Accept reviewer suggestion

196dd50

Accept reviewer suggestion

004b8a6

steakhal added the clang:static analyzer label Feb 11, 2025

steakhal requested review from Xazax-hun and NagyDonat February 11, 2025 12:53

Xazax-hun approved these changes Feb 11, 2025

View reviewed changes

Fix file extension in the image tag

6f91f3c

necto approved these changes Feb 11, 2025

View reviewed changes

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst Outdated Show resolved Hide resolved

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst Outdated Show resolved Hide resolved

Accept reviewer suggestion

4ad385a

steakhal merged commit 1337b0f into llvm:main Feb 11, 2025
6 of 8 checks passed

steakhal deleted the bb/nfc-extend-perf-debugging-docs branch February 11, 2025 17:41

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

66de8d7

…mance issues (llvm#126724)

flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

1e002aa

…mance issues (llvm#126724)

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

69160ca

…mance issues (llvm#126724)

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

bdb2818

…mance issues (llvm#126724)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

Uh oh!

steakhal commented Feb 11, 2025

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

steakhal commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

Uh oh!

Conversation

steakhal commented Feb 11, 2025

Uh oh!

llvmbot commented Feb 11, 2025

Uh oh!

steakhal commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!