-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
steakhal
merged 12 commits into
llvm:main
from
steakhal:bb/nfc-extend-perf-debugging-docs
Feb 11, 2025
Merged
[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724
steakhal
merged 12 commits into
llvm:main
from
steakhal:bb/nfc-extend-perf-debugging-docs
Feb 11, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesFull diff: https://github.com/llvm/llvm-project/pull/126724.diff 3 Files Affected:
diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..6d1a5f126223d93 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
A translation unit contains multiple entry points, each of which take multiple steps to analyze.
+Performance analysis using ``-ftime-trace``
+===========================================
+
You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
or using `speedscope <https://speedscope.app>`_.
@@ -19,9 +22,8 @@ Here is an example of a time trace produced with
.. code-block:: bash
:caption: Clang Static Analyzer invocation to generate a time trace of string.c analysis.
- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
- -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
- -verify ./clang/test/Analysis/string.c \
+ clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+ -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
-ftime-trace=trace.json -ftime-trace-granularity=1
.. image:: ../images/speedscope.png
@@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
for a single entry point.
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting sampling-based profiling.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+ :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+ # -F: Sampling frequency, use `-F max` for maximal frequency
+ # -g: Enable call-graph recording for both kernel and user space
+ perf record -F 99 -g -- clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+ -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+ :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+ perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+ /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
+ firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you can use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+It will also consume many gigabites of storage for a single trace unless filter flags are used during recording.
+
+.. code-block:: bash
+ :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
+
+ uftrace record clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+ -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+ uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the common options, refer to the ``uftrace`` `documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.
+
+Similar filters can be applied for dumping too. That way you can reuse the same (detailed)
+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+If you do not apply filters on recording, you will collect a large trace and every dump operation
+would need to sieve through the much larger recording which may be annoying if done repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+ cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.png b/clang/docs/analyzer/images/flamegraph.png
new file mode 100644
index 000000000000000..b16ec90b9e600db
Binary files /dev/null and b/clang/docs/analyzer/images/flamegraph.png differ
diff --git a/clang/docs/analyzer/images/uftrace_detailed.png b/clang/docs/analyzer/images/uftrace_detailed.png
new file mode 100644
index 000000000000000..fcf681909d07068
Binary files /dev/null and b/clang/docs/analyzer/images/uftrace_detailed.png differ
|
This is the continuation of #126520 |
Xazax-hun
approved these changes
Feb 11, 2025
necto
approved these changes
Feb 11, 2025
clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
Outdated
Show resolved
Hide resolved
clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
Outdated
Show resolved
Hide resolved
Icohedron
pushed a commit
to Icohedron/llvm-project
that referenced
this pull request
Feb 11, 2025
flovent
pushed a commit
to flovent/llvm-project
that referenced
this pull request
Feb 13, 2025
joaosaffran
pushed a commit
to joaosaffran/llvm-project
that referenced
this pull request
Feb 14, 2025
sivan-shani
pushed a commit
to sivan-shani/llvm-project
that referenced
this pull request
Feb 24, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.