Skip to content

bpo-45636: Merge BINARY_*/INPLACE_* into BINARY_OP/INPLACE_OP #29255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Oct 27, 2021

@markshannon
Copy link
Member

Looks good in general. A couple of minor issues.

@brandtbucher
Copy link
Member Author

brandtbucher commented Oct 29, 2021

Thinking about this a bit more... I'd like to merge this as-is, then open a follow-up PR with the more complex implementation that performs fewer memory reads. I want to at least see some movement on the benchmarks before committing to that.

@markshannon markshannon self-assigned this Oct 31, 2021
@brandtbucher brandtbucher added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 31, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit befdc3f 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 31, 2021
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more things

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@brandtbucher
Copy link
Member Author

brandtbucher commented Nov 1, 2021

The latest changes seem to have introduced a ~1% performance regression (the initial PR was "1.00x slower"). Many of the biggest slowdowns seem to be in historically sensitive benchmarks, though:

Slower (33):
- pidigits: 190 ms +- 0 ms -> 209 ms +- 0 ms: 1.10x slower
- pickle_dict: 26.8 us +- 0.1 us -> 28.6 us +- 0.1 us: 1.07x slower
- scimark_sparse_mat_mult: 4.84 ms +- 0.08 ms -> 5.06 ms +- 0.10 ms: 1.04x slower
- unpack_sequence: 43.7 ns +- 0.9 ns -> 45.7 ns +- 0.6 ns: 1.04x slower
- pickle: 9.97 us +- 0.16 us -> 10.4 us +- 0.1 us: 1.04x slower
- nbody: 115 ms +- 2 ms -> 118 ms +- 3 ms: 1.03x slower
- regex_dna: 212 ms +- 1 ms -> 217 ms +- 1 ms: 1.02x slower
- json_loads: 25.4 us +- 0.2 us -> 26.0 us +- 0.2 us: 1.02x slower
- json_dumps: 12.6 ms +- 0.1 ms -> 12.9 ms +- 0.3 ms: 1.02x slower
- regex_effbot: 3.15 ms +- 0.03 ms -> 3.21 ms +- 0.05 ms: 1.02x slower
- chaos: 76.7 ms +- 0.7 ms -> 78.3 ms +- 1.4 ms: 1.02x slower
- go: 165 ms +- 1 ms -> 168 ms +- 1 ms: 1.02x slower
- pyflate: 539 ms +- 9 ms -> 549 ms +- 7 ms: 1.02x slower
- spectral_norm: 107 ms +- 1 ms -> 109 ms +- 0 ms: 1.02x slower
- float: 80.5 ms +- 0.9 ms -> 81.9 ms +- 0.8 ms: 1.02x slower
- richards: 56.9 ms +- 0.9 ms -> 57.8 ms +- 0.5 ms: 1.01x slower
- scimark_monte_carlo: 77.5 ms +- 0.8 ms -> 78.7 ms +- 0.9 ms: 1.01x slower
- pickle_pure_python: 367 us +- 3 us -> 372 us +- 5 us: 1.01x slower
- regex_compile: 144 ms +- 2 ms -> 146 ms +- 2 ms: 1.01x slower
- deltablue: 4.85 ms +- 0.06 ms -> 4.91 ms +- 0.04 ms: 1.01x slower
- sympy_expand: 501 ms +- 5 ms -> 507 ms +- 4 ms: 1.01x slower
- scimark_lu: 138 ms +- 3 ms -> 140 ms +- 3 ms: 1.01x slower
- scimark_sor: 154 ms +- 2 ms -> 156 ms +- 3 ms: 1.01x slower
- logging_silent: 116 ns +- 3 ns -> 117 ns +- 1 ns: 1.01x slower
- tornado_http: 109 ms +- 3 ms -> 110 ms +- 3 ms: 1.01x slower
- pickle_list: 4.48 us +- 0.04 us -> 4.52 us +- 0.03 us: 1.01x slower
- regex_v8: 23.2 ms +- 0.1 ms -> 23.3 ms +- 0.3 ms: 1.01x slower
- unpickle_pure_python: 272 us +- 3 us -> 273 us +- 2 us: 1.01x slower
- unpickle_list: 5.05 us +- 0.04 us -> 5.08 us +- 0.07 us: 1.01x slower
- raytrace: 331 ms +- 2 ms -> 333 ms +- 3 ms: 1.00x slower
- sympy_integrate: 22.0 ms +- 0.2 ms -> 22.0 ms +- 0.1 ms: 1.00x slower
- python_startup: 8.51 ms +- 0.01 ms -> 8.53 ms +- 0.01 ms: 1.00x slower
- python_startup_no_site: 5.90 ms +- 0.00 ms -> 5.91 ms +- 0.00 ms: 1.00x slower

Faster (13):
- pathlib: 19.0 ms +- 0.3 ms -> 18.6 ms +- 0.1 ms: 1.02x faster
- fannkuch: 427 ms +- 4 ms -> 418 ms +- 4 ms: 1.02x faster
- xml_etree_iterparse: 107 ms +- 3 ms -> 105 ms +- 1 ms: 1.02x faster
- logging_simple: 6.15 us +- 0.08 us -> 6.08 us +- 0.07 us: 1.01x faster
- xml_etree_parse: 155 ms +- 3 ms -> 153 ms +- 2 ms: 1.01x faster
- scimark_fft: 355 ms +- 10 ms -> 351 ms +- 2 ms: 1.01x faster
- sqlalchemy_declarative: 146 ms +- 3 ms -> 144 ms +- 3 ms: 1.01x faster
- crypto_pyaes: 87.5 ms +- 0.6 ms -> 86.8 ms +- 0.5 ms: 1.01x faster
- hexiom: 7.53 ms +- 0.12 ms -> 7.48 ms +- 0.08 ms: 1.01x faster
- dulwich_log: 67.6 ms +- 0.3 ms -> 67.3 ms +- 0.4 ms: 1.00x faster
- sympy_sum: 170 ms +- 2 ms -> 169 ms +- 2 ms: 1.00x faster
- mako: 11.9 ms +- 0.1 ms -> 11.9 ms +- 0.1 ms: 1.00x faster
- 2to3: 271 ms +- 1 ms -> 271 ms +- 1 ms: 1.00x faster

Benchmark hidden because not significant (12): chameleon, django_template, logging_format, meteor_contest, nqueens, sqlalchemy_imperative, sqlite_synth, sympy_str, telco, unpickle, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x slower

@markshannon
Copy link
Member

I can confirm the slowdown, which is a bit puzzling.
If your original design is faster, then maybe we should go with that. It is a bit cleaner.
Ultimately I expect specialization to make the performance of the base version unimportant.

@brandtbucher
Copy link
Member Author

Closing in favor of #29418, which is based on an earlier commit on this branch (and has no perf impact).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants