Skip to content

[3.13] gh-130861: Add clarification to the perf docs on optimization levels (GH-131098) #132687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 23 additions & 8 deletions Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,13 +254,28 @@ files in the current directory which are ELF images for all the JIT trampolines
that were created by Python.

.. warning::
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
When using ``--call-graph dwarf``, the ``perf`` tool will take
snapshots of the stack of the process being profiled and save the
information in the ``perf.data`` file. By default the size of the stack dump
is 8192 bytes but the user can change the size by passing the size after
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
important because if the size is too small ``perf`` will not be able to
unwind the stack and the output will be incomplete. On the other hand, if
the size is too big, then ``perf`` won't be able to sample the process as
frequently as it would like as the overhead will be higher.
information in the ``perf.data`` file. By default, the size of the stack dump
is 8192 bytes, but you can change the size by passing it after
a comma like ``--call-graph dwarf,16384``.

The size of the stack dump is important because if the size is too small
``perf`` will not be able to unwind the stack and the output will be
incomplete. On the other hand, if the size is too big, then ``perf`` won't
be able to sample the process as frequently as it would like as the overhead
will be higher.

The stack size is particularly important when profiling Python code compiled
with low optimization levels (like ``-O0``), as these builds tend to have
larger stack frames. If you are compiling Python with ``-O0`` and not seeing
Python functions in your profiling output, try increasing the stack dump
size to 65528 bytes (the maximum)::

$ perf record -F 9999 -g -k 1 --call-graph dwarf,65528 -o perf.data python -Xperf_jit my_script.py

Different compilation flags can significantly impact stack sizes:

- Builds with ``-O0`` typically have much larger stack frames than those with ``-O1`` or higher
- Adding optimizations (``-O1``, ``-O2``, etc.) typically reduces stack size
- Frame pointers (``-fno-omit-frame-pointer``) generally provide more reliable stack unwinding
Loading