bpo-35696: Remove unnecessary operation in long_compare() #16146

hongweipeng · 2019-09-14T17:36:07Z

$ ./python -m pyperf timeit -s "a = 100; b = 99" "a == b" --compare-to=../cpython-master/python --duplicate=1000
/home/weapon/workspace/cpython-master/python: ..................... 13.0 ns +- 0.3 ns
/home/weapon/workspace/cpython/python: ..................... 12.7 ns +- 0.1 ns

Mean +- std dev: [/home/weapon/workspace/cpython-master/python] 13.0 ns +- 0.3 ns -> [/home/weapon/workspace/cpython/python] 12.7 ns +- 0.1 ns: 1.02x faster (-2%)

$ ./python -m pyperf timeit -s "a = 2 ** 100 + 1; b = 2 ** 100" "a == b" --compare-to=../cpython-master/python --duplicate=1000
/home/weapon/workspace/cpython-master/python: ..................... 14.1 ns +- 0.2 ns
/home/weapon/workspace/cpython/python: ..................... 13.6 ns +- 0.1 ns

Mean +- std dev: [/home/weapon/workspace/cpython-master/python] 14.1 ns +- 0.2 ns -> [/home/weapon/workspace/cpython/python] 13.6 ns +- 0.1 ns: 1.04x faster (-4%)

https://bugs.python.org/issue35696

shihai1991

Looks your PR without test case is OK, maybe somedays performance benchmark test cloud become a test suite too ;)

Objects/longobject.c

eduardo-elizondo · 2019-09-16T17:09:17Z

Only using pyperf will not give a full picture of why the performance is better. For instance, maybe your change created a better code alignment for your particular architecture, resulting in a slight speedup. However, that doesn't mean that it will be a perf win for all other systems.

Also, it's not clear what's your build environment. i.e. How are you building? Is this debug or release? Are you using LTO and PGO?

If you want to keep working on this, I would suggest to:

Try to build an environment to get a stable benchmark. Follow @vstinner's guide: https://vstinner.github.io/journey-to-stable-benchmark-system.html
Clearly outline your build steps and your environment in the summary
Give more information than just time. For instance, what's the instruction count? Branch misses? instruction cache misses?
Follow up to the previous point but use a simulated machine. For instance use Valgrind's cache grind to gather that data.

Ideally this should all be handled by a common set of benchmark machines that could be triggered by the PR. Unfortunately that's not available yet. So doing this will make it really clear if this is indeed a perf win.

Cheers,
Eddie

eduardo-elizondo

Requesting changes to get more data points

sir-sigurd

It looks like it doesn't work correctly when Py_SIZE(a) - Py_SIZE(b) is out of int range.

hongweipeng · 2019-09-17T04:18:18Z

Optimization is achieved by removing unnecessary operations. I am not good at benchmarking.Maybe I change the title from improving performance to removing unnecessary operations is better.

It looks like it doesn't work correctly when Py_SIZE(a) - Py_SIZE(b) is out of int range.

Good eye! Maybe it can be submitted a new PR to avoid overflow.

if a == 0 and b == 0, the index is out of bound:

This has been processed upstream:

long_richcompare(PyObject *self, PyObject *other, int op)
{
    int result;
    CHECK_BINOP(self, other);
    if (self == other)
        result = 0;
    else
        result = long_compare((PyLongObject*)self, (PyLongObject*)other);
    Py_RETURN_RICHCOMPARE(result, 0, op);
}

But it does have hidden dangers. For example, disable small_ints. Also need to confirm the other places where long_compare is called.

aeros

Thanks for the PR @hongweipeng.

I would highly recommend following through with the suggestions from @eduardo-elizondo, to gather more well rounded benchmarking data to verify the optimizations are present across other environments.

Also, I have a minor formatting suggestion for the news entry.

Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst

ghost · 2019-09-17T06:26:03Z

It looks like it doesn't work correctly when Py_SIZE(a) - Py_SIZE(b) is out of int range.

Good eye! Maybe it can be submitted a new PR to avoid overflow.

There should be no problem.

In a 32-bit build, the entire memory space is 0~0xFFFFFFFF.
So the sum of "two allocated int object's memory" should not be greater than this value.

Py_SIZE(a) is the number of digit elements, digit is 2-byte unsigned short.
So abs(Py_SIZE(a)) + abs(Py_SIZE(b)) always < 0xFFFFFFFF/2

This means that Py_SIZE(a) - Py_SIZE(b) will never overflow.

if a == 0 and b == 0, the index is out of bound:

This has been processed upstream:
long_richcompare(PyObject *self, PyObject *other, int op)
{
    int result;
    CHECK_BINOP(self, other);
    if (self == other)
        result = 0;
    else
        result = long_compare((PyLongObject*)self, (PyLongObject*)other);
    Py_RETURN_RICHCOMPARE(result, 0, op);
}
But it does have hidden dangers. For example, disable small_ints. Also need to confirm the other places where long_compare is called.

Other long_compare() call sites in longobject.c don't check (self == other)

sir-sigurd · 2019-09-17T06:54:26Z

There should be no problem.

In a 32-bit build, the entire memory space is 0~0xFFFFFFFF.
So the sum of "two allocated int object's memory" should not be greater than this value.

Py_SIZE(a) is the number of digit elements, digit is 2-byte unsigned short.
So abs(Py_SIZE(a)) + abs(Py_SIZE(b)) always < 0xFFFFFFFF/2

This means that Py_SIZE(a) - Py_SIZE(b) will never overflow.

long_compare() returns int. On 64-bit build Py_SIZE(a) - Py_SIZE(b) can be out of range of int, don't it? What will happen in such case, for example when Py_SIZE(a) - Py_SIZE(b) == (1LL << 32)?

ghost · 2019-09-17T07:03:18Z

long_compare() returns int. On 64-bit build Py_SIZE(a) - Py_SIZE(b) can be out of range of int, don't it? What will happen in such case, for example when Py_SIZE(a) - Py_SIZE(b) == (1LL << 32)?

You are right. We can return Py_ssize_t instead of int.
In addition, maybe we can use inline function here?

hongweipeng · 2019-09-17T07:48:13Z

Other long_compare() call sites in longobject.c don't check (self == other)

But they won't be called if both a and b are 0.

There are 4 places call long_compare() in longobject.c:

1.in long_richcompare().

Already discussed, unless the small_ints is disabled.

2.in long_invmod() :

if (long_compare(a, (PyLongObject *)_PyLong_One)) {

_PyLong_One is int 1.

3.in _PyLong_DivmodNear(PyObject *a, PyObject *b):

cmp = long_compare((PyLongObject *)twice_rem, (PyLongObject *)b);

And calling _PyLong_DivmodNear is long_round, it has set the parameter b to be 10:

long_round(PyObject *self, PyObject *args)
{
    result = PyLong_FromLong(10L);
    ...
    temp = _PyLong_DivmodNear(self, result);
}

in _PyLong_GCD()

If a and b are both 0. It use simple way instead of calling long_compare():

if (-2 <= size_a && size_a <= 2 && -2 <= size_b && size_b <= 2) {
    Py_INCREF(a);
    Py_INCREF(b);
    goto simple;
}

If I miss other palce, please tell me. I am learning how to benchmark and it will take some time. If anyone can help, it’s better.

sir-sigurd · 2019-09-17T08:02:47Z

You are right. We can return Py_ssize_t instead of int.

Yes, however we should check that result is not casted to int in call sites.

In addition, maybe we can use inline function here?

I don't think it makes sense.

hongweipeng · 2019-09-17T08:07:30Z

I think it's easier to replace Py_SIZE(a) - Py_SIZE(b) with return Py_SIZE(a)>Py_SIZE(b)?1:-1.

hongweipeng · 2019-09-17T08:15:41Z

Oh, it should be better to return Py_ssize_t instead of int.

sir-sigurd · 2019-09-17T08:23:15Z

@hongweipeng
I'd rewrite it like long_compare1() here: https://godbolt.org/z/djHSPX.

hongweipeng · 2019-09-17T08:25:33Z

@sir-sigurd Thank you very much!

Objects/longobject.c

vstinner · 2019-09-17T09:54:59Z

I'm not sure of the purpose of the whole change since it doesn't seem to make comparison way faster. It's around 1% faster or slower...

cc @serhiy-storchaka

Objects/longobject.c

hongweipeng · 2019-09-17T10:30:30Z

while (--i >= 0 && !(diff = (sdigit)a->ob_digit[i] - (sdigit)b->ob_digit[i])); produces fewer instructions than while (--i > 0 && a->ob_digit[i] == b->ob_digit[i]);diff = (sdigit)a->ob_digit[i] - (sdigit)b->ob_digit[i];

@sir-sigurd Can I use your code in the rewrite? I will leave your name in the NEWS.

sir-sigurd · 2019-09-17T10:33:54Z

@sir-sigurd Can I use your code in the rewrite? I will leave your name in the NEWS.

Sure.

vstinner

LGTM if you remove the NEWS entry.

@serhiy-storchaka: I let you decide if this change is worth it or not ;-)

Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst

Objects/longobject.c

vstinner

LGTM.

For the one who is going to merge the change, be careful of the commit message generated by GitHub. I suggest to reuse the current PR title:

"bpo-35696: Remove unnecessary operation in long_compare() (GH-16146)"

@serhiy-storchaka, @methane: Would you mind to double check this change? I prefer to not pretend that it optimizes Python, but it should not make it slower :-)

ghost · 2019-09-18T08:31:25Z

Will diff overflow?

        sdigit diff = 0;
        ...
        diff = (sdigit) a->ob_digit[i] - (sdigit) b->ob_digit[i];
        ...
        sign = Py_SIZE(a) < 0 ? -diff : diff;

edit:
It seems will not overflow.
In a 32-bit build, the minimum possible value of the subtracting is -32766 = 1 - ((1 << 15) - 1)
FYI:

#define PyLong_BASE     ((digit)1 << PyLong_SHIFT)
#define PyLong_MASK     ((digit)(PyLong_BASE - 1))

sdigit (signed short, -32768 <= x <= 32767) just won't overflow.

This code looks fragile, but it works.

vstinner · 2019-09-19T10:08:47Z

This code looks fragile, but it works.

Would removing "diff" variable and reuse sign (Py_ssize_t) instead would be less fragile? If yes, propose maybe a PR.

aeros · 2019-09-20T00:00:21Z

@vstinner:

Would removing "diff" variable and reuse sign (Py_ssize_t) instead would be less fragile? If yes, propose maybe a PR.

In what way could it be considered fragile? The main criticism seemed to be the overflowing issue, but that doesn't seem to be a problem.

Also, I'm not certain that I understand how you would get the same functionality if diff was removed, particularly from the while loop:

...
        while (--i >= 0) {
            diff = (sdigit) a->ob_digit[i] - (sdigit) b->ob_digit[i];
            if (diff) {
                break;
            }
        }
...

Note: I'm not experienced with C, so I'm asking the above questions only for learning purposes, not as a criticism of the suggestion.

ghost · 2019-09-20T02:45:51Z

@aeros167, please look at this code.

static Py_ssize_t
long_compare(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t sign = Py_SIZE(a) - Py_SIZE(b);
    if (sign == 0) {
        Py_ssize_t i = Py_ABS(Py_SIZE(a));
        while (--i >= 0) {
            sign = (Py_ssize_t) a->ob_digit[i] - b->ob_digit[i];
            if (sign) {
                break;
            }
        }
        sign = Py_SIZE(a) < 0 ? -sign : sign;
    }
    return sign;
}

Honestly speaking, this PR reduces readability a bit.

aeros · 2019-09-20T02:52:43Z

@animalize:

@aeros167, please look at this code.

Thanks for clearing that up! For some reason, I thought an sdigit was still needed and wasn't sure how sign could fill that role (since it's of type Py_ssize_t). I'm gradually improving my knowledge of the C-API (mainly for the purpose of improving my reviews), but it's definitely not an area that I'm experienced in.

aeros · 2019-09-20T02:54:17Z

@animalize:

sign = (Py_ssize_t) a->ob_digit[i] - b->ob_digit[i];

Is the second cast to Py_ssize_t not needed?

            sign = (Py_ssize_t) a->ob_digit[i] - (Py_ssize_t) b->ob_digit[i];

ghost · 2019-09-20T03:23:45Z

@aeros167

Is the second cast to Py_ssize_t not needed?

The key is using it to the first operand. IIRC, the cast for the second operand can be omitted.

aeros · 2019-09-20T03:28:59Z

@animalize:

IIRC, the cast for the second operand can be omitted.

Ah, I see. I've seen it used for multiple operands before, but that might be more of a styling preference to make it more explicit. I have no idea what is preferred for the C-API though, maybe @vstinner would know. Typically it seems that we tend to learn towards explicit for the most part, but it might be different in this case. Good to know though, thanks. (:

vstinner · 2019-09-20T21:22:30Z

Ah, I see. I've seen it used for multiple operands before, but that might be more of a styling preference to make it more explicit.

Nobody understands C standards. I prefer to keep an explicit cast on both operands. Does anyone want to propose a PR? This change introduced a new warning: https://bugs.python.org/issue35696#msg352875

hongweipeng · 2019-09-21T10:09:19Z

This change introduced a new warning:

The warning seems to be caused by PYLONG_FROM_UINT(unsigned long, ival) not long_compare().

improve perfomance of long_compare

b8b5f0b

the-knights-who-say-ni added the CLA signed label Sep 14, 2019

bedevere-bot added the awaiting review label Sep 14, 2019

shihai1991 reviewed Sep 15, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

C code requires braces

99858bf

This comment has been minimized.

Sign in to view

eduardo-elizondo suggested changes Sep 16, 2019

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Sep 16, 2019

sir-sigurd reviewed Sep 16, 2019

View reviewed changes

This comment has been minimized.

Sign in to view

aeros added the performance Performance or resource usage label Sep 17, 2019

aeros reviewed Sep 17, 2019

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst Outdated Show resolved Hide resolved

vstinner reviewed Sep 17, 2019

View reviewed changes

Objects/longobject.c Show resolved Hide resolved

serhiy-storchaka reviewed Sep 17, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

code review

da73290

vstinner approved these changes Sep 17, 2019

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2019-09-15-01-30-06.bpo-35696.1iK_1H.rst Outdated Show resolved Hide resolved

bedevere-bot added awaiting merge and removed awaiting core review labels Sep 17, 2019

aeros added the skip news label Sep 17, 2019

remove news

8996ee5

ghost reviewed Sep 18, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Show resolved Hide resolved

code review

2fb6856

hongweipeng changed the title ~~bpo-35696:Slightly improve perfomance of long_compare.~~ bpo-35696:Remove unnecessary operation in long_compare(). Sep 18, 2019

vstinner approved these changes Sep 18, 2019

View reviewed changes

methane changed the title ~~bpo-35696:Remove unnecessary operation in long_compare().~~ bpo-35696:Remove unnecessary operation in long_compare() Sep 18, 2019

methane changed the title ~~bpo-35696:Remove unnecessary operation in long_compare()~~ bpo-35696: remove unnecessary operation in long_compare() Sep 18, 2019

methane changed the title ~~bpo-35696: remove unnecessary operation in long_compare()~~ bpo-35696: Remove unnecessary operation in long_compare() Sep 18, 2019

methane approved these changes Sep 18, 2019

View reviewed changes

methane merged commit 42acb7b into python:master Sep 18, 2019

bedevere-bot removed the awaiting merge label Sep 18, 2019

hongweipeng deleted the imporve_long_compare branch September 19, 2019 01:14

hongweipeng restored the imporve_long_compare branch December 6, 2019 09:46

hongweipeng deleted the imporve_long_compare branch October 14, 2020 10:04

animalize mannequin mentioned this pull request Apr 10, 2022

remove unnecessary operation in long_compare() #79877

Closed

Uh oh!

bpo-35696: Remove unnecessary operation in long_compare() #16146

bpo-35696: Remove unnecessary operation in long_compare() #16146

Uh oh!

Conversation

hongweipeng commented Sep 14, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shihai1991 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

eduardo-elizondo commented Sep 16, 2019

Uh oh!

eduardo-elizondo left a comment

Choose a reason for hiding this comment

Uh oh!

sir-sigurd left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

hongweipeng commented Sep 17, 2019

Uh oh!

aeros left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghost commented Sep 17, 2019

Uh oh!

sir-sigurd commented Sep 17, 2019

Uh oh!

ghost commented Sep 17, 2019

Uh oh!

hongweipeng commented Sep 17, 2019

Uh oh!

sir-sigurd commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongweipeng commented Sep 17, 2019

Uh oh!

hongweipeng commented Sep 17, 2019

Uh oh!

sir-sigurd commented Sep 17, 2019

Uh oh!

hongweipeng commented Sep 17, 2019

Uh oh!

Uh oh!

vstinner commented Sep 17, 2019

Uh oh!

Uh oh!

hongweipeng commented Sep 17, 2019

Uh oh!

sir-sigurd commented Sep 17, 2019

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

ghost commented Sep 18, 2019 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Sep 19, 2019

Uh oh!

aeros commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Sep 20, 2019

Uh oh!

aeros commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeros commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

hongweipeng commented Sep 14, 2019 •

edited by bedevere-bot

Loading

shihai1991 left a comment •

edited

Loading

sir-sigurd commented Sep 17, 2019 •

edited

Loading

ghost commented Sep 18, 2019 •

edited by ghost

Loading

aeros commented Sep 20, 2019 •

edited

Loading

aeros commented Sep 20, 2019 •

edited

Loading

aeros commented Sep 20, 2019 •

edited

Loading

ghost commented Sep 20, 2019 •

edited by ghost

Loading