-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-35696: Remove unnecessary operation in long_compare() #16146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks your PR without test case is OK, maybe somedays performance benchmark test cloud become a test suite too ;)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Only using Also, it's not clear what's your build environment. i.e. How are you building? Is this debug or release? Are you using LTO and PGO? If you want to keep working on this, I would suggest to:
Ideally this should all be handled by a common set of benchmark machines that could be triggered by the PR. Unfortunately that's not available yet. So doing this will make it really clear if this is indeed a perf win. Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes to get more data points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it doesn't work correctly when Py_SIZE(a) - Py_SIZE(b)
is out of int range.
This comment has been minimized.
This comment has been minimized.
Optimization is achieved by removing unnecessary operations. I am not good at benchmarking.Maybe I change the title from
Good eye! Maybe it can be submitted a new PR to avoid overflow.
This has been processed upstream:
But it does have hidden dangers. For example, disable small_ints. Also need to confirm the other places where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @hongweipeng.
I would highly recommend following through with the suggestions from @eduardo-elizondo, to gather more well rounded benchmarking data to verify the optimizations are present across other environments.
Also, I have a minor formatting suggestion for the news entry.
Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst
Outdated
Show resolved
Hide resolved
There should be no problem. In a 32-bit build, the entire memory space is 0~0xFFFFFFFF.
This means that
Other |
|
You are right. We can return |
But they won't be called if both There are 4 places call 1.in Already discussed, unless the 2.in
3.in
And calling
If
If I miss other palce, please tell me. I am learning how to benchmark and it will take some time. If anyone can help, it’s better. |
Yes, however we should check that result is not casted to
I don't think it makes sense. |
I think it's easier to replace |
Oh, it should be better to return |
@hongweipeng |
@sir-sigurd Thank you very much! |
I'm not sure of the purpose of the whole change since it doesn't seem to make comparison way faster. It's around 1% faster or slower... |
@sir-sigurd Can I use your code in the rewrite? I will leave your name in the |
Sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you remove the NEWS entry.
@serhiy-storchaka: I let you decide if this change is worth it or not ;-)
Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For the one who is going to merge the change, be careful of the commit message generated by GitHub. I suggest to reuse the current PR title:
"bpo-35696: Remove unnecessary operation in long_compare() (GH-16146)"
@serhiy-storchaka, @methane: Would you mind to double check this change? I prefer to not pretend that it optimizes Python, but it should not make it slower :-)
Will sdigit diff = 0;
...
diff = (sdigit) a->ob_digit[i] - (sdigit) b->ob_digit[i];
...
sign = Py_SIZE(a) < 0 ? -diff : diff; edit: #define PyLong_BASE ((digit)1 << PyLong_SHIFT)
#define PyLong_MASK ((digit)(PyLong_BASE - 1))
This code looks fragile, but it works. |
Would removing "diff" variable and reuse sign (Py_ssize_t) instead would be less fragile? If yes, propose maybe a PR. |
In what way could it be considered fragile? The main criticism seemed to be the overflowing issue, but that doesn't seem to be a problem. Also, I'm not certain that I understand how you would get the same functionality if diff was removed, particularly from the while loop: ...
while (--i >= 0) {
diff = (sdigit) a->ob_digit[i] - (sdigit) b->ob_digit[i];
if (diff) {
break;
}
}
... Note: I'm not experienced with C, so I'm asking the above questions only for learning purposes, not as a criticism of the suggestion. |
@aeros167, please look at this code. static Py_ssize_t
long_compare(PyLongObject *a, PyLongObject *b)
{
Py_ssize_t sign = Py_SIZE(a) - Py_SIZE(b);
if (sign == 0) {
Py_ssize_t i = Py_ABS(Py_SIZE(a));
while (--i >= 0) {
sign = (Py_ssize_t) a->ob_digit[i] - b->ob_digit[i];
if (sign) {
break;
}
}
sign = Py_SIZE(a) < 0 ? -sign : sign;
}
return sign;
} Honestly speaking, this PR reduces readability a bit. |
Thanks for clearing that up! For some reason, I thought an |
Is the second cast to sign = (Py_ssize_t) a->ob_digit[i] - (Py_ssize_t) b->ob_digit[i]; |
@aeros167
The key is using it to the first operand. IIRC, the cast for the second operand can be omitted. |
Ah, I see. I've seen it used for multiple operands before, but that might be more of a styling preference to make it more explicit. I have no idea what is preferred for the C-API though, maybe @vstinner would know. Typically it seems that we tend to learn towards explicit for the most part, but it might be different in this case. Good to know though, thanks. (: |
Nobody understands C standards. I prefer to keep an explicit cast on both operands. Does anyone want to propose a PR? This change introduced a new warning: https://bugs.python.org/issue35696#msg352875 |
The warning seems to be caused by |
https://bugs.python.org/issue35696