Skip to content

Commit 1337b0f

Browse files
authored
[analyzer][docs] Document how to use perf and uftrace to debug performance issues (#126724)
1 parent 6d58dd4 commit 1337b0f

File tree

3 files changed

+93
-3
lines changed

3 files changed

+93
-3
lines changed

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

Lines changed: 93 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ Performance Investigation
55
Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
66
A translation unit contains multiple entry points, each of which take multiple steps to analyze.
77

8+
Performance analysis using ``-ftime-trace``
9+
===========================================
10+
811
You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
912
You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
1013
or using `speedscope <https://speedscope.app>`_.
@@ -19,9 +22,8 @@ Here is an example of a time trace produced with
1922
.. code-block:: bash
2023
:caption: Clang Static Analyzer invocation to generate a time trace of string.c analysis.
2124
22-
clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
23-
-setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
24-
-verify ./clang/test/Analysis/string.c \
25+
clang -cc1 -analyze -verify clang/test/Analysis/string.c \
26+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
2527
-ftime-trace=trace.json -ftime-trace-granularity=1
2628
2729
.. image:: ../images/speedscope.png
@@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
4547
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
4648
for a single entry point.
4749
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
50+
51+
52+
Performance analysis using ``perf``
53+
===================================
54+
55+
`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting sampling-based profiling.
56+
It's easy to start profiling, you only have 2 prerequisites.
57+
Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
58+
You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
59+
along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
60+
Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
61+
62+
.. code-block:: bash
63+
:caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
64+
65+
# -F: Sampling frequency, use `-F max` for maximal frequency
66+
# -g: Enable call-graph recording for both kernel and user space
67+
perf record -F 99 -g -- clang -cc1 -analyze -verify clang/test/Analysis/string.c \
68+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
69+
70+
Once you have the profile data, you can use it to produce a Flame graph.
71+
A Flame graph is a visual representation of the stack frames of the samples.
72+
Common stack frame prefixes are squashed together, making up a wider bar.
73+
The wider the bar, the more time was spent under that particular stack frame,
74+
giving a sense of how the overall execution time was spent.
75+
76+
Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
77+
as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
78+
It's also useful to check out Brendan Gregg's (the author of FlameGraph)
79+
`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
80+
81+
82+
.. code-block:: bash
83+
:caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
84+
85+
perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
86+
/path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
87+
firefox perf.svg
88+
89+
.. image:: ../images/flamegraph.png
90+
91+
92+
Performance analysis using ``uftrace``
93+
======================================
94+
95+
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
96+
that you can use to focus and drill down into the timeline of your application.
97+
We will use it to generate Chromium trace JSON.
98+
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.
99+
In contrast to using ``-ftime-trace``, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
100+
All functions are profiled due to automatic static instrumentation.
101+
102+
There is only one prerequisite to use this tool.
103+
You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
104+
This will make it run substantially slower but allows rich instrumentation.
105+
It will also consume many gigabites of storage for a single trace unless filter flags are used during recording.
106+
107+
.. code-block:: bash
108+
:caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
109+
110+
uftrace record clang -cc1 -analyze -verify clang/test/Analysis/string.c \
111+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
112+
uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
113+
114+
.. image:: ../images/uftrace_detailed.png
115+
116+
In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
117+
You can also see how deep function calls we may have due to AST visitors.
118+
119+
Using different filters can reduce the number of functions to record.
120+
For the common options, refer to the ``uftrace`` `documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.
121+
122+
Similar filters can be applied for dumping too. That way you can reuse the same (detailed)
123+
recording to selectively focus on some special part using a refinement of the filter flags.
124+
Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
125+
thus it needs to be of a limited size.
126+
If you do not apply filters on recording, you will collect a large trace and every dump operation
127+
would need to sieve through the much larger recording which may be annoying if done repeatedly.
128+
129+
If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.
130+
Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
131+
To see what functions appear frequently in the trace, use this command:
132+
133+
.. code-block:: bash
134+
135+
cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
136+
137+
``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
72.6 KB
Loading
59.4 KB
Loading

0 commit comments

Comments
 (0)