-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-36957: Speed up math.isqrt #13405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-By: Serhiy Storchaka <[email protected]>
Use real argument clinic type instead of an alias Co-Authored-By: Serhiy Storchaka <[email protected]>
- clarify that floats are rejected even if they happen to be squares of small integers - TypeError beats ValueError for a negative float
No news entry needed, since |
Modules/mathmodule.c
Outdated
|
||
/* The first 4 steps can be performed entirely in 32-bit arithmetic; | ||
the last needs 64-bit arithmetic. */ | ||
u = 1U + (uint32_t)(m >> 62); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this code be shared?
if (fastpath) {
// init fast path
}
else {
// init general path
}
// common code
if (fastpath) {
// return fast path
}
// general path
With new shifting functions this may be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I considered this, but in the end I decided it was cleaner to keep the two paths separate.
I'll try the refactoring and see how it looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing is a bit ugly, because the last line has to be different in the two cases:
u = (u << 15) + (uint32_t)((m >> 17) / u);
versus
v = (u << 15) + ((m >> 17) / u);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use always the second one? In any case you cast u
to 64-bit for the final check. The overhead is unlikely noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to share via a helper function.
uint64_t | ||
_approximate_isqrt(uint64_t n) | ||
{ | ||
uint32_t u = 1U + (n >> 62); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u = 1U + (uint32_t)(n >> 62)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the uint32_t
casts on the basis that it makes the code look cleaner and the compiler will likely optimise them away anyway, given that the target type is uint32_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checked: at least on my machine (macOS 10.14.5, clang from the OS), with -O3
, the generated assembly includes one divb
, two divl
s and one divq
, so it looks as though the compiler's smart enough to optimise those divisions.
I have to confess I was a bit surprised by the divb
.
And the second division could actually be done using 16-bit arithmetic. I don't know whether it's worth trying to persuade the compiler to do that. In theory these division instructions could be a bottleneck, but in practise I suspect that all the overhead of the function call, argument passing, etc. will likely outweigh any advantage from optimising one division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree!
Modules/mathmodule.c
Outdated
if (m == (uint64_t)(-1) && PyErr_Occurred()) { | ||
return NULL; | ||
} | ||
u = _approximate_isqrt(m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above 5 lines (or 4 if exclude Py_DECREF) could be shared too. Not sure that they should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I think I'd rather keep _approximate_isqrt
as a pure uint64_t
to uint64_t
function, though.
Add missing `static` declaration to helper function. Co-Authored-By: Serhiy Storchaka <[email protected]>
@serhiy-storchaka This PR has changed substantially since you approved it. Are you still okay with me merging it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would not create this PR I would create a similar PR myself. 😀
@mdickinson: Please replace |
@serhiy-storchaka Again, thank you for the thorough review. |
Speed up math.isqrt by:
https://bugs.python.org/issue36957