bpo-36957: Speed up math.isqrt #13405

mdickinson · 2019-05-18T15:46:47Z

Speed up math.isqrt by:

Adding a fast path for inputs smaller than 2**64
Performing the first 5 iterations entirely in C integer arithmetic rather than Python long integer arithmetic for larger inputs.

https://bugs.python.org/issue36957

Co-Authored-By: Serhiy Storchaka <[email protected]>

Use real argument clinic type instead of an alias Co-Authored-By: Serhiy Storchaka <[email protected]>

… other tidying.

- clarify that floats are rejected even if they happen to be squares of small integers - TypeError beats ValueError for a negative float

mdickinson · 2019-05-18T15:48:59Z

No news entry needed, since math.isqrt hasn't yet appeared in any released version of Python.

…math.

serhiy-storchaka · 2019-05-19T11:19:31Z

Modules/mathmodule.c

+
+    /* The first 4 steps can be performed entirely in 32-bit arithmetic;
+        the last needs 64-bit arithmetic. */
+    u = 1U + (uint32_t)(m >> 62);


Could this code be shared?

if (fastpath) { // init fast path } else { // init general path } // common code if (fastpath) { // return fast path } // general path

With new shifting functions this may be clearer.

Right, I considered this, but in the end I decided it was cleaner to keep the two paths separate.

I'll try the refactoring and see how it looks.

Sharing is a bit ugly, because the last line has to be different in the two cases:

u = (u << 15) + (uint32_t)((m >> 17) / u);

versus

v = (u << 15) + ((m >> 17) / u);

Why not to use always the second one? In any case you cast u to 64-bit for the final check. The overhead is unlikely noticeable.

Updated to share via a helper function.

Modules/mathmodule.c

serhiy-storchaka · 2019-05-19T13:16:40Z

Modules/mathmodule.c

+uint64_t
+_approximate_isqrt(uint64_t n)
+{
+    uint32_t u = 1U + (n >> 62);


u = 1U + (uint32_t)(n >> 62)?

I removed the uint32_t casts on the basis that it makes the code look cleaner and the compiler will likely optimise them away anyway, given that the target type is uint32_t.

Just double checked: at least on my machine (macOS 10.14.5, clang from the OS), with -O3, the generated assembly includes one divb, two divls and one divq, so it looks as though the compiler's smart enough to optimise those divisions.

I have to confess I was a bit surprised by the divb.

And the second division could actually be done using 16-bit arithmetic. I don't know whether it's worth trying to persuade the compiler to do that. In theory these division instructions could be a bottleneck, but in practise I suspect that all the overhead of the function call, argument passing, etc. will likely outweigh any advantage from optimising one division.

Completely agree!

serhiy-storchaka · 2019-05-19T13:23:48Z

Modules/mathmodule.c

+    if (m == (uint64_t)(-1) && PyErr_Occurred()) {
+        return NULL;
+    }
+    u = _approximate_isqrt(m);


The above 5 lines (or 4 if exclude Py_DECREF) could be shared too. Not sure that they should.

True. I think I'd rather keep _approximate_isqrt as a pure uint64_t to uint64_t function, though.

Add missing `static` declaration to helper function. Co-Authored-By: Serhiy Storchaka <[email protected]>

mdickinson · 2019-05-19T15:51:23Z

@serhiy-storchaka This PR has changed substantially since you approved it. Are you still okay with me merging it?

serhiy-storchaka

If you would not create this PR I would create a similar PR myself. 😀

bedevere-bot · 2019-05-19T16:51:59Z

@mdickinson: Please replace # with GH- in the commit message next time. Thanks!

mdickinson · 2019-05-19T16:52:28Z

@serhiy-storchaka Again, thank you for the thorough review.

mdickinson and others added 17 commits May 11, 2019 14:56

Add math.isqrt function computing the integer square root.

a95c18f

Code cleanup: remove redundant comments, rename some variables.

9fe4674

Tighten up code a bit more; use Py_XDECREF to simplify error handling.

e735bda

Update Modules/mathmodule.c

91c8dd9

Co-Authored-By: Serhiy Storchaka <[email protected]>

Update Modules/mathmodule.c

ad86923

Use real argument clinic type instead of an alias Co-Authored-By: Serhiy Storchaka <[email protected]>

Add proof sketch

505ebc3

Merge remote-tracking branch 'mdickinson/math-isqrt' into math-isqrt

d2cec04

Updates from review.

be9ba01

Correct and expand documentation.

c30b53c

Fix bad reference handling on error; make some variables block-local;…

df872e8

… other tidying.

Style and consistency fixes.

7c636db

Add missing error check; don't try to DECREF a NULL a

954588a

Simplify some error returns.

a7c6cdb

Another two test cases:

3c3a50e

- clarify that floats are rejected even if they happen to be squares of small integers - TypeError beats ValueError for a negative float

Add fast path for small inputs. Needs tests.

babde98

Merge remote-tracking branch 'origin/master' into math-isqrt-fast-path

4b23a2b

Speed up isqrt for n >= 2**64 as well; add extra tests.

64505d5

the-knights-who-say-ni added the CLA signed label May 18, 2019

bedevere-bot added the awaiting core review label May 18, 2019

mdickinson added the skip news label May 18, 2019

mdickinson added 2 commits May 18, 2019 16:51

Reduce number of test-cases to avoid dominating the run-time of test_…

8f1bd78

…math.

Don't perform unnecessary extra iterations when computing c_bit_length.

564557f

serhiy-storchaka approved these changes May 18, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels May 18, 2019

mdickinson added the performance Performance or resource usage label May 18, 2019

serhiy-storchaka reviewed May 19, 2019

View reviewed changes

mdickinson added 2 commits May 19, 2019 12:28

Merge remote-tracking branch 'origin/master' into math-isqrt-fast-path

ecc0ed2

Abstract common uint64_t code out into a separate function.

ec87ce7

Cleanup.

efea0ea

mdickinson commented May 19, 2019

View reviewed changes

Modules/mathmodule.c Outdated Show resolved Hide resolved

serhiy-storchaka reviewed May 19, 2019

View reviewed changes

mdickinson and others added 2 commits May 19, 2019 14:31

Add a missing Py_DECREF in an error branch. More cleanup.

010f4db

Update Modules/mathmodule.c

1649be8

Add missing `static` declaration to helper function. Co-Authored-By: Serhiy Storchaka <[email protected]>

serhiy-storchaka approved these changes May 19, 2019

View reviewed changes

Add missing backtick.

e5dfb40

mdickinson merged commit 5c08ce9 into python:master May 19, 2019

bedevere-bot removed the awaiting merge label May 19, 2019

mdickinson deleted the math-isqrt-fast-path branch May 19, 2019 16:52

Uh oh!

bpo-36957: Speed up math.isqrt #13405

bpo-36957: Speed up math.isqrt #13405

Uh oh!

Conversation

mdickinson commented May 18, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdickinson commented May 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdickinson commented May 19, 2019

Uh oh!

serhiy-storchaka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 19, 2019

Uh oh!

mdickinson commented May 19, 2019

Uh oh!

Uh oh!

mdickinson commented May 18, 2019 •

edited by bedevere-bot

Loading

serhiy-storchaka left a comment •

edited

Loading