bpo-45636: Simpler design for `BINARY_OP`/`INPLACE_OP` #29418

brandtbucher · 2021-11-05T05:06:49Z

No perf impact:

Benchmark	main	binary-op-alt
nbody	118 ms	113 ms: 1.05x faster
logging_silent	118 ns	114 ns: 1.04x faster
xml_etree_iterparse	109 ms	106 ms: 1.03x faster
go	172 ms	167 ms: 1.02x faster
richards	57.8 ms	56.8 ms: 1.02x faster
dulwich_log	67.7 ms	66.7 ms: 1.01x faster
telco	6.33 ms	6.24 ms: 1.01x faster
pickle_pure_python	369 us	364 us: 1.01x faster
scimark_sparse_mat_mult	4.74 ms	4.69 ms: 1.01x faster
hexiom	7.53 ms	7.44 ms: 1.01x faster
meteor_contest	107 ms	106 ms: 1.01x faster
xml_etree_parse	155 ms	154 ms: 1.01x faster
pickle_list	4.58 us	4.56 us: 1.01x faster
regex_compile	145 ms	144 ms: 1.00x faster
nqueens	88.5 ms	88.1 ms: 1.00x faster
sympy_sum	170 ms	169 ms: 1.00x faster
python_startup	8.53 ms	8.52 ms: 1.00x faster
python_startup_no_site	5.90 ms	5.91 ms: 1.00x slower
pickle	9.90 us	9.94 us: 1.00x slower
pidigits	194 ms	195 ms: 1.00x slower
fannkuch	417 ms	419 ms: 1.01x slower
json_dumps	12.5 ms	12.5 ms: 1.01x slower
chaos	76.5 ms	76.9 ms: 1.01x slower
unpickle	13.4 us	13.5 us: 1.01x slower
pyflate	534 ms	538 ms: 1.01x slower
mako	11.9 ms	12.0 ms: 1.01x slower
json_loads	25.4 us	25.7 us: 1.01x slower
logging_format	6.73 us	6.79 us: 1.01x slower
logging_simple	6.06 us	6.13 us: 1.01x slower
chameleon	7.53 ms	7.62 ms: 1.01x slower
scimark_fft	347 ms	352 ms: 1.01x slower
regex_dna	216 ms	219 ms: 1.01x slower
pathlib	18.7 ms	19.0 ms: 1.02x slower
unpickle_list	4.95 us	5.05 us: 1.02x slower
spectral_norm	107 ms	110 ms: 1.03x slower
scimark_monte_carlo	78.0 ms	80.4 ms: 1.03x slower
regex_v8	23.2 ms	24.2 ms: 1.04x slower
pickle_dict	27.5 us	28.7 us: 1.04x slower
regex_effbot	3.18 ms	3.34 ms: 1.05x slower
unpack_sequence	44.0 ns	48.8 ns: 1.11x slower
Geometric mean	(ref)	1.00x slower

https://bugs.python.org/issue45636

Objects/abstract.c

brandtbucher · 2021-11-05T17:39:27Z

Currently re-running benchmarks, for my own sanity…

brandtbucher · 2021-11-05T18:42:21Z

I'm glad I did. It seems that that single change contributes to the slowdown we saw on the other PR:

Slower (37):
- crypto_pyaes: 87.5 ms +- 0.6 ms -> 91.3 ms +- 1.1 ms: 1.04x slower
- pickle_dict: 26.8 us +- 0.1 us -> 27.9 us +- 4.1 us: 1.04x slower
- go: 165 ms +- 1 ms -> 171 ms +- 2 ms: 1.04x slower
- richards: 56.9 ms +- 0.9 ms -> 58.8 ms +- 0.7 ms: 1.03x slower
- chameleon: 7.48 ms +- 0.10 ms -> 7.71 ms +- 0.07 ms: 1.03x slower
- regex_compile: 144 ms +- 2 ms -> 148 ms +- 1 ms: 1.03x slower
- spectral_norm: 107 ms +- 1 ms -> 110 ms +- 2 ms: 1.02x slower
- tornado_http: 109 ms +- 3 ms -> 111 ms +- 3 ms: 1.02x slower
- scimark_monte_carlo: 77.5 ms +- 0.8 ms -> 79.1 ms +- 1.4 ms: 1.02x slower
- chaos: 76.7 ms +- 0.7 ms -> 78.3 ms +- 2.5 ms: 1.02x slower
- deltablue: 4.85 ms +- 0.06 ms -> 4.95 ms +- 0.07 ms: 1.02x slower
- nbody: 115 ms +- 2 ms -> 118 ms +- 3 ms: 1.02x slower
- sympy_expand: 501 ms +- 5 ms -> 510 ms +- 5 ms: 1.02x slower
- float: 80.5 ms +- 0.9 ms -> 81.8 ms +- 0.9 ms: 1.02x slower
- json_loads: 25.4 us +- 0.2 us -> 25.8 us +- 0.2 us: 1.02x slower
- pickle_pure_python: 367 us +- 3 us -> 373 us +- 4 us: 1.02x slower
- regex_dna: 212 ms +- 1 ms -> 215 ms +- 2 ms: 1.01x slower
- pyflate: 539 ms +- 9 ms -> 545 ms +- 4 ms: 1.01x slower
- unpickle_pure_python: 272 us +- 3 us -> 275 us +- 3 us: 1.01x slower
- sqlite_synth: 2.74 us +- 0.04 us -> 2.77 us +- 0.07 us: 1.01x slower
- sympy_integrate: 22.0 ms +- 0.2 ms -> 22.2 ms +- 0.1 ms: 1.01x slower
- django_template: 37.2 ms +- 0.5 ms -> 37.6 ms +- 0.4 ms: 1.01x slower
- scimark_sparse_mat_mult: 4.84 ms +- 0.08 ms -> 4.89 ms +- 0.14 ms: 1.01x slower
- xml_etree_iterparse: 107 ms +- 3 ms -> 108 ms +- 3 ms: 1.01x slower
- pickle_list: 4.48 us +- 0.04 us -> 4.52 us +- 0.03 us: 1.01x slower
- sympy_str: 303 ms +- 4 ms -> 306 ms +- 3 ms: 1.01x slower
- meteor_contest: 106 ms +- 2 ms -> 106 ms +- 2 ms: 1.01x slower
- unpickle_list: 5.05 us +- 0.04 us -> 5.08 us +- 0.07 us: 1.01x slower
- xml_etree_process: 59.1 ms +- 0.6 ms -> 59.5 ms +- 0.4 ms: 1.01x slower
- xml_etree_generate: 80.4 ms +- 0.7 ms -> 80.9 ms +- 1.0 ms: 1.01x slower
- 2to3: 271 ms +- 1 ms -> 273 ms +- 1 ms: 1.01x slower
- hexiom: 7.53 ms +- 0.12 ms -> 7.57 ms +- 0.07 ms: 1.01x slower
- sympy_sum: 170 ms +- 2 ms -> 171 ms +- 2 ms: 1.01x slower
- dulwich_log: 67.6 ms +- 0.3 ms -> 67.9 ms +- 0.4 ms: 1.00x slower
- regex_v8: 23.2 ms +- 0.1 ms -> 23.3 ms +- 0.2 ms: 1.00x slower
- python_startup: 8.51 ms +- 0.01 ms -> 8.54 ms +- 0.01 ms: 1.00x slower
- python_startup_no_site: 5.90 ms +- 0.00 ms -> 5.91 ms +- 0.00 ms: 1.00x slower

Faster (6):
- scimark_fft: 355 ms +- 10 ms -> 346 ms +- 2 ms: 1.03x faster
- fannkuch: 427 ms +- 4 ms -> 417 ms +- 2 ms: 1.02x faster
- pidigits: 190 ms +- 0 ms -> 187 ms +- 0 ms: 1.01x faster
- json_dumps: 12.6 ms +- 0.1 ms -> 12.5 ms +- 0.1 ms: 1.01x faster
- pathlib: 19.0 ms +- 0.3 ms -> 18.8 ms +- 0.2 ms: 1.01x faster
- mako: 11.9 ms +- 0.1 ms -> 11.9 ms +- 0.1 ms: 1.00x faster

Benchmark hidden because not significant (15): logging_format, logging_silent, logging_simple, nqueens, pickle, raytrace, regex_effbot, scimark_lu, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, telco, unpack_sequence, unpickle, xml_etree_parse

Geometric mean: 1.01x slower

It's not clear to me why. Perhaps int is that much faster than uint16_t?

arhadthedev · 2021-11-07T09:11:00Z

@brandtbucher Maybe the change created a branching point with random targets, so CPU branch predictor just gives up with prefetching. A profiler will help to find exact lines with the longest execution time.

markshannon

I suspect the awkward size of nb_name_info may be responsible for most of the slowdown.

markshannon · 2021-11-08T11:10:54Z

Objects/abstract.c

+#undef NB_NAME_INFO
+
+PyObject *
+_PyNumber_Op(PyObject *o1, PyObject *o2, unsigned op)


This just forwards to binary_op. You might want to consider:

Make this the function that does the work (effectively inlining binary_op here).

Make the old API PyNumber_TrueDivide, etc. call _PyNumber_Op

Delete binary_op.

Objects/abstract.c

markshannon · 2021-11-08T11:34:27Z

Objects/abstract.c

+}
+
+PyObject *
+_PyNumber_InPlaceOp(PyObject *o1, PyObject *o2, unsigned op)


Same comment as for _PyNumber_Op.

bedevere-bot · 2021-11-08T11:34:46Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

brandtbucher · 2021-11-09T01:08:03Z

See my new (preferred) approach in #29482.

brandtbucher · 2021-11-09T02:27:52Z

You were right @markshannon: adjusting the alignment fixes the slowdown. Nice catch.

Slower (25):
- pickle_dict: 26.8 us +- 0.1 us -> 28.0 us +- 0.1 us: 1.04x slower
- go: 165 ms +- 1 ms -> 172 ms +- 2 ms: 1.04x slower
- regex_effbot: 3.15 ms +- 0.03 ms -> 3.25 ms +- 0.04 ms: 1.03x slower
- regex_dna: 212 ms +- 1 ms -> 219 ms +- 1 ms: 1.03x slower
- scimark_lu: 138 ms +- 3 ms -> 143 ms +- 5 ms: 1.03x slower
- float: 80.5 ms +- 0.9 ms -> 82.8 ms +- 1.1 ms: 1.03x slower
- logging_format: 6.73 us +- 0.09 us -> 6.89 us +- 0.08 us: 1.02x slower
- chameleon: 7.48 ms +- 0.10 ms -> 7.64 ms +- 0.09 ms: 1.02x slower
- regex_v8: 23.2 ms +- 0.1 ms -> 23.7 ms +- 0.4 ms: 1.02x slower
- tornado_http: 109 ms +- 3 ms -> 111 ms +- 3 ms: 1.02x slower
- scimark_monte_carlo: 77.5 ms +- 0.8 ms -> 79.0 ms +- 1.3 ms: 1.02x slower
- nqueens: 90.1 ms +- 0.7 ms -> 91.3 ms +- 1.0 ms: 1.01x slower
- django_template: 37.2 ms +- 0.5 ms -> 37.7 ms +- 0.6 ms: 1.01x slower
- sympy_expand: 501 ms +- 5 ms -> 507 ms +- 3 ms: 1.01x slower
- sqlalchemy_imperative: 18.6 ms +- 0.5 ms -> 18.8 ms +- 0.6 ms: 1.01x slower
- richards: 56.9 ms +- 0.9 ms -> 57.6 ms +- 1.4 ms: 1.01x slower
- regex_compile: 144 ms +- 2 ms -> 146 ms +- 2 ms: 1.01x slower
- unpack_sequence: 43.7 ns +- 0.9 ns -> 44.2 ns +- 1.1 ns: 1.01x slower
- xml_etree_generate: 80.4 ms +- 0.7 ms -> 81.2 ms +- 0.8 ms: 1.01x slower
- json_loads: 25.4 us +- 0.2 us -> 25.6 us +- 0.2 us: 1.01x slower
- mako: 11.9 ms +- 0.1 ms -> 12.0 ms +- 0.0 ms: 1.01x slower
- pickle_list: 4.48 us +- 0.04 us -> 4.52 us +- 0.04 us: 1.01x slower
- scimark_sparse_mat_mult: 4.84 ms +- 0.08 ms -> 4.88 ms +- 0.05 ms: 1.01x slower
- logging_simple: 6.15 us +- 0.08 us -> 6.19 us +- 0.07 us: 1.01x slower
- unpickle_list: 5.05 us +- 0.04 us -> 5.07 us +- 0.05 us: 1.00x slower

Faster (15):
- scimark_fft: 355 ms +- 10 ms -> 345 ms +- 2 ms: 1.03x faster
- fannkuch: 427 ms +- 4 ms -> 419 ms +- 4 ms: 1.02x faster
- xml_etree_iterparse: 107 ms +- 3 ms -> 105 ms +- 2 ms: 1.02x faster
- telco: 6.38 ms +- 0.25 ms -> 6.28 ms +- 0.11 ms: 1.02x faster
- pickle: 9.97 us +- 0.16 us -> 9.84 us +- 0.11 us: 1.01x faster
- pidigits: 190 ms +- 0 ms -> 188 ms +- 0 ms: 1.01x faster
- xml_etree_parse: 155 ms +- 3 ms -> 153 ms +- 1 ms: 1.01x faster
- crypto_pyaes: 87.5 ms +- 0.6 ms -> 86.6 ms +- 1.1 ms: 1.01x faster
- hexiom: 7.53 ms +- 0.12 ms -> 7.47 ms +- 0.06 ms: 1.01x faster
- raytrace: 331 ms +- 2 ms -> 329 ms +- 1 ms: 1.01x faster
- json_dumps: 12.6 ms +- 0.1 ms -> 12.5 ms +- 0.1 ms: 1.01x faster
- spectral_norm: 107 ms +- 1 ms -> 107 ms +- 1 ms: 1.01x faster
- python_startup: 8.51 ms +- 0.01 ms -> 8.49 ms +- 0.01 ms: 1.00x faster
- dulwich_log: 67.6 ms +- 0.3 ms -> 67.4 ms +- 0.6 ms: 1.00x faster
- python_startup_no_site: 5.90 ms +- 0.00 ms -> 5.89 ms +- 0.00 ms: 1.00x faster

Benchmark hidden because not significant (18): 2to3, chaos, deltablue, logging_silent, meteor_contest, nbody, pathlib, pickle_pure_python, pyflate, scimark_sor, sqlalchemy_declarative, sqlite_synth, sympy_integrate, sympy_sum, sympy_str, unpickle, unpickle_pure_python, xml_etree_process

Geometric mean: 1.00x slower

brandtbucher · 2021-11-11T02:26:08Z

Closing in favor of #29482.

brandtbucher added 8 commits October 27, 2021 13:16

Merge BINARY_*/INPLACE_* into BINARY_OP/INPLACE_OP

4b7125f

Clean things up a bit

3a87083

Improve the new _PyNumber_*Op implementations

a2df29a

Add docs

1b8934e

blurb add

3137973

Fix API leak

9905549

Clean up the nb_info struct

f697036

Clean up some comments

33f6e7c

brandtbucher requested a review from markshannon as a code owner November 5, 2021 05:06

the-knights-who-say-ni added the CLA signed label Nov 5, 2021

bedevere-bot added the awaiting core review label Nov 5, 2021

brandtbucher mentioned this pull request Nov 5, 2021

bpo-45636: Merge BINARY_*/INPLACE_* into BINARY_OP/INPLACE_OP #29255

Closed

markshannon reviewed Nov 5, 2021

View reviewed changes

Objects/abstract.c Outdated Show resolved Hide resolved

Split up operator slots and names

bd8e9c7

HannahGraham23 approved these changes Nov 6, 2021

View reviewed changes

python deleted a comment from HannahGraham23 Nov 6, 2021

markshannon self-assigned this Nov 8, 2021

markshannon requested changes Nov 8, 2021

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting core review labels Nov 8, 2021

brandtbucher added 2 commits November 8, 2021 16:16

Apply feedback from code review

4cac17a

Fix comment

3c2957b

brandtbucher mentioned this pull request Nov 9, 2021

bpo-45636: BINARY_OP (third time's the charm) #29482

Merged

brandtbucher closed this Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-45636: Simpler design for `BINARY_OP`/`INPLACE_OP` #29418

bpo-45636: Simpler design for `BINARY_OP`/`INPLACE_OP` #29418

Uh oh!

brandtbucher commented Nov 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

brandtbucher commented Nov 5, 2021

Uh oh!

brandtbucher commented Nov 5, 2021

Uh oh!

arhadthedev commented Nov 7, 2021

Uh oh!

markshannon left a comment

Uh oh!

markshannon Nov 8, 2021

Uh oh!

Uh oh!

markshannon Nov 8, 2021

Uh oh!

bedevere-bot commented Nov 8, 2021

Uh oh!

brandtbucher commented Nov 9, 2021

Uh oh!

brandtbucher commented Nov 9, 2021

Uh oh!

brandtbucher commented Nov 11, 2021

Uh oh!

Uh oh!

Uh oh!

bpo-45636: Simpler design for BINARY_OP/INPLACE_OP #29418

bpo-45636: Simpler design for BINARY_OP/INPLACE_OP #29418

Uh oh!

Conversation

brandtbucher commented Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

brandtbucher commented Nov 5, 2021

Uh oh!

brandtbucher commented Nov 5, 2021

Uh oh!

arhadthedev commented Nov 7, 2021

Uh oh!

markshannon left a comment

Choose a reason for hiding this comment

Uh oh!

markshannon Nov 8, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon Nov 8, 2021

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Nov 8, 2021

Uh oh!

brandtbucher commented Nov 9, 2021

Uh oh!

brandtbucher commented Nov 9, 2021

Uh oh!

brandtbucher commented Nov 11, 2021

Uh oh!

Uh oh!

bpo-45636: Simpler design for `BINARY_OP`/`INPLACE_OP` #29418

bpo-45636: Simpler design for `BINARY_OP`/`INPLACE_OP` #29418

brandtbucher commented Nov 5, 2021 •

edited

Loading