Skip to content

Commit ca2e1a1

Browse files
[3.13] gh-130861: Add clarification to the perf docs on optimization levels (GH-131098) (#132687)
1 parent b3d4980 commit ca2e1a1

File tree

1 file changed

+23
-8
lines changed

1 file changed

+23
-8
lines changed

Doc/howto/perf_profiling.rst

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -254,13 +254,28 @@ files in the current directory which are ELF images for all the JIT trampolines
254254
that were created by Python.
255255

256256
.. warning::
257-
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
257+
When using ``--call-graph dwarf``, the ``perf`` tool will take
258258
snapshots of the stack of the process being profiled and save the
259-
information in the ``perf.data`` file. By default the size of the stack dump
260-
is 8192 bytes but the user can change the size by passing the size after
261-
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
262-
important because if the size is too small ``perf`` will not be able to
263-
unwind the stack and the output will be incomplete. On the other hand, if
264-
the size is too big, then ``perf`` won't be able to sample the process as
265-
frequently as it would like as the overhead will be higher.
259+
information in the ``perf.data`` file. By default, the size of the stack dump
260+
is 8192 bytes, but you can change the size by passing it after
261+
a comma like ``--call-graph dwarf,16384``.
266262

263+
The size of the stack dump is important because if the size is too small
264+
``perf`` will not be able to unwind the stack and the output will be
265+
incomplete. On the other hand, if the size is too big, then ``perf`` won't
266+
be able to sample the process as frequently as it would like as the overhead
267+
will be higher.
268+
269+
The stack size is particularly important when profiling Python code compiled
270+
with low optimization levels (like ``-O0``), as these builds tend to have
271+
larger stack frames. If you are compiling Python with ``-O0`` and not seeing
272+
Python functions in your profiling output, try increasing the stack dump
273+
size to 65528 bytes (the maximum)::
274+
275+
$ perf record -F 9999 -g -k 1 --call-graph dwarf,65528 -o perf.data python -Xperf_jit my_script.py
276+
277+
Different compilation flags can significantly impact stack sizes:
278+
279+
- Builds with ``-O0`` typically have much larger stack frames than those with ``-O1`` or higher
280+
- Adding optimizations (``-O1``, ``-O2``, etc.) typically reduces stack size
281+
- Frame pointers (``-fno-omit-frame-pointer``) generally provide more reliable stack unwinding

0 commit comments

Comments
 (0)