bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes #30652

sweeneyde · 2022-01-17T21:41:21Z

Some benchmarks from GCC with --enable-optimizations --with-lto on WSL:

Slower (6):

pickle_dict: 27.2 us +- 0.4 us -> 28.9 us +- 0.4 us: 1.06x slower
unpickle_list: 5.01 us +- 0.11 us -> 5.12 us +- 0.10 us: 1.02x slower
json_dumps: 12.8 ms +- 0.2 ms -> 13.1 ms +- 0.3 ms: 1.02x slower
regex_dna: 208 ms +- 4 ms -> 212 ms +- 2 ms: 1.02x slower
django_template: 41.2 ms +- 2.6 ms -> 42.0 ms +- 0.9 ms: 1.02x slower
pickle_list: 4.57 us +- 0.13 us -> 4.65 us +- 0.16 us: 1.02x slower

Faster (34):

pickle: 12.5 us +- 1.2 us -> 11.3 us +- 1.3 us: 1.10x faster
nbody: 100 ms +- 2 ms -> 92.7 ms +- 2.5 ms: 1.08x faster
deltablue: 4.55 ms +- 0.06 ms -> 4.29 ms +- 0.16 ms: 1.06x faster
spectral_norm: 98.5 ms +- 2.4 ms -> 93.8 ms +- 1.1 ms: 1.05x faster
json_loads: 27.2 us +- 1.1 us -> 26.0 us +- 1.8 us: 1.05x faster
chaos: 80.5 ms +- 0.9 ms -> 77.1 ms +- 1.8 ms: 1.04x faster
hexiom: 7.01 ms +- 0.07 ms -> 6.72 ms +- 0.26 ms: 1.04x faster
nqueens: 87.3 ms +- 1.9 ms -> 83.7 ms +- 2.8 ms: 1.04x faster
crypto_pyaes: 91.0 ms +- 2.2 ms -> 88.0 ms +- 1.3 ms: 1.03x faster
scimark_lu: 112 ms +- 2 ms -> 108 ms +- 4 ms: 1.03x faster
pidigits: 187 ms +- 1 ms -> 182 ms +- 1 ms: 1.03x faster
dulwich_log: 83.8 ms +- 1.4 ms -> 81.5 ms +- 1.1 ms: 1.03x faster
logging_format: 8.19 us +- 0.10 us -> 7.98 us +- 0.14 us: 1.03x faster
logging_simple: 7.24 us +- 0.17 us -> 7.07 us +- 0.13 us: 1.02x faster
go: 152 ms +- 3 ms -> 149 ms +- 2 ms: 1.02x faster
regex_compile: 150 ms +- 3 ms -> 146 ms +- 2 ms: 1.02x faster
scimark_sor: 123 ms +- 3 ms -> 120 ms +- 2 ms: 1.02x faster
fannkuch: 406 ms +- 9 ms -> 398 ms +- 6 ms: 1.02x faster
float: 81.2 ms +- 1.3 ms -> 79.7 ms +- 1.4 ms: 1.02x faster
raytrace: 332 ms +- 8 ms -> 326 ms +- 7 ms: 1.02x faster
tornado_http: 191 ms +- 8 ms -> 188 ms +- 7 ms: 1.02x faster
mako: 11.7 ms +- 0.2 ms -> 11.6 ms +- 0.1 ms: 1.02x faster
scimark_monte_carlo: 72.5 ms +- 1.8 ms -> 71.4 ms +- 0.9 ms: 1.01x faster
unpickle_pure_python: 263 us +- 6 us -> 259 us +- 6 us: 1.01x faster
sympy_expand: 570 ms +- 14 ms -> 562 ms +- 17 ms: 1.01x faster
pickle_pure_python: 351 us +- 7 us -> 346 us +- 8 us: 1.01x faster
sympy_str: 342 ms +- 8 ms -> 338 ms +- 12 ms: 1.01x faster
regex_v8: 23.7 ms +- 0.6 ms -> 23.5 ms +- 0.3 ms: 1.01x faster
pathlib: 20.9 ms +- 0.6 ms -> 20.6 ms +- 0.5 ms: 1.01x faster
regex_effbot: 3.11 ms +- 0.08 ms -> 3.08 ms +- 0.03 ms: 1.01x faster
xml_etree_generate: 79.9 ms +- 1.0 ms -> 79.1 ms +- 1.0 ms: 1.01x faster
chameleon: 7.19 ms +- 0.13 ms -> 7.13 ms +- 0.10 ms: 1.01x faster
2to3: 272 ms +- 2 ms -> 269 ms +- 2 ms: 1.01x faster
xml_etree_process: 57.1 ms +- 0.8 ms -> 56.7 ms +- 0.7 ms: 1.01x faster

Benchmark hidden because not significant (15): logging_silent, meteor_contest, pyflate, python_startup, python_startup_no_site, richards, scimark_fft, scimark_sparse_mat_mult, sympy_integrate, sympy_sum, telco, unpack_sequence, unpickle, xml_etree_parse, xml_etree_iterparse

Geometric mean: 1.01x faster

https://bugs.python.org/issue46420

markshannon · 2022-01-18T14:35:39Z

It is good to see that removing many tracing checks can produce a measurable speedup.
Unfortunately, I don't think this is correct. It is possible for tracing to be turned on during several of these bytecodes.

Most obviously, CALL_NO_KW_BUILTIN_CLASS_1 and CALL_NO_KW_BUILTIN_FAST could end up calling sys.settrace.
More subtly, any allocation can trigger the cycle collector, which can execute arbitrary code.

The approach I want to take, although it needs more work, is to only check for tracing in the RESUME instruction by using instrumentation instead.

See faster-cpython/ideas#112 and PEP 669.
Hopefully, using instrumentation should provide at least the speedup you have observed, and reduce the performance loss when tracing is used.

sweeneyde · 2022-01-18T19:01:01Z

Since STORE_FAST__STORE_FAST calls SETLOCAL, which can call Py_DECREF, which can execute arbitrary __del__ methods, does that mean that existing usage of NOTRACE_DISPATCH is mildly incorrect as well?

markshannon · 2022-01-19T12:29:09Z

Perhaps.
I think it is OK, because the __del__ is not specified to occur at any specific time, or in any particular order.
So, the behavior is the same as if we checked for tracing on every dispatch.

Most of your changes are also probably safe. Probably. It is hard to be sure.
Although I do think the changes to CALL specializations are incorrect.

The instrumentation approach is much more robust. At least, I think it is.

…UILTIN_CLASS_1, CALL_NO_KW_BUILTIN_O, CALL_NO_KW_BUILTIN_FAST, CALL_NO_KW_LEN, CALL_NO_KW_ISINSTANCE. Use NOTRACE_DISPATCH on CALL_NO_KW_LIST_APPEND.

markshannon · 2022-01-25T12:28:02Z

Looks good. All the new uses of NOTRACE_DISPATCH() are consistent with existing usage.

Use NOTRACE_DISPATCH() in specialized opcodes

753f04c

sweeneyde requested a review from markshannon as a code owner January 17, 2022 21:41

the-knights-who-say-ni added the CLA signed label Jan 17, 2022

bedevere-bot added the awaiting review label Jan 17, 2022

sweeneyde added skip news performance Performance or resource usage labels Jan 17, 2022

Use DISPATCH() on BINARY_SUBSCR_DICT, STORE_SUBSCR_DICT, CALL_NO_KW_B…

ad47ec2

…UILTIN_CLASS_1, CALL_NO_KW_BUILTIN_O, CALL_NO_KW_BUILTIN_FAST, CALL_NO_KW_LEN, CALL_NO_KW_ISINSTANCE. Use NOTRACE_DISPATCH on CALL_NO_KW_LIST_APPEND.

markshannon merged commit 96bf84d into python:main Jan 25, 2022

bedevere-bot removed the awaiting review label Jan 25, 2022

sweeneyde deleted the notrace branch January 25, 2022 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes #30652

bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes #30652

Uh oh!

sweeneyde commented Jan 17, 2022 •

edited by bedevere-bot

Loading

Uh oh!

markshannon commented Jan 18, 2022

Uh oh!

sweeneyde commented Jan 18, 2022

Uh oh!

markshannon commented Jan 19, 2022

Uh oh!

markshannon commented Jan 25, 2022

Uh oh!

Uh oh!

Uh oh!

bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes #30652

bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes #30652

Uh oh!

Conversation

sweeneyde commented Jan 17, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Jan 18, 2022

Uh oh!

sweeneyde commented Jan 18, 2022

Uh oh!

markshannon commented Jan 19, 2022

Uh oh!

markshannon commented Jan 25, 2022

Uh oh!

Uh oh!

sweeneyde commented Jan 17, 2022 •

edited by bedevere-bot

Loading