Skip to content

Commit c7f20f1

Browse files
authored
bpo-46406: Faster single digit int division. (#30626)
* bpo-46406: Faster single digit int division. This expresses the algorithm in a more basic manner resulting in better instruction generation by todays compilers. See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
1 parent 83a0ef2 commit c7f20f1

File tree

2 files changed

+29
-10
lines changed

2 files changed

+29
-10
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The integer division ``//`` implementation has been optimized to better let the
2+
compiler understand its constraints. It can be 20% faster on the amd64 platform
3+
when dividing an int by a value smaller than ``2**30``.

Objects/longobject.c

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1617,25 +1617,41 @@ v_rshift(digit *z, digit *a, Py_ssize_t m, int d)
16171617
in pout, and returning the remainder. pin and pout point at the LSD.
16181618
It's OK for pin == pout on entry, which saves oodles of mallocs/frees in
16191619
_PyLong_Format, but that should be done with great care since ints are
1620-
immutable. */
1620+
immutable.
16211621
1622+
This version of the code can be 20% faster than the pre-2022 version
1623+
on todays compilers on architectures like amd64. It evolved from Mark
1624+
Dickinson observing that a 128:64 divide instruction was always being
1625+
generated by the compiler despite us working with 30-bit digit values.
1626+
See the thread for full context:
1627+
1628+
https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
1629+
1630+
If you ever want to change this code, pay attention to performance using
1631+
different compilers, optimization levels, and cpu architectures. Beware of
1632+
PGO/FDO builds doing value specialization such as a fast path for //10. :)
1633+
1634+
Verify that 17 isn't specialized and this works as a quick test:
1635+
python -m timeit -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
1636+
*/
16221637
static digit
16231638
inplace_divrem1(digit *pout, digit *pin, Py_ssize_t size, digit n)
16241639
{
1625-
twodigits rem = 0;
1640+
digit remainder = 0;
16261641

16271642
assert(n > 0 && n <= PyLong_MASK);
1628-
pin += size;
1629-
pout += size;
16301643
while (--size >= 0) {
1631-
digit hi;
1632-
rem = (rem << PyLong_SHIFT) | *--pin;
1633-
*--pout = hi = (digit)(rem / n);
1634-
rem -= (twodigits)hi * n;
1635-
}
1636-
return (digit)rem;
1644+
twodigits dividend;
1645+
dividend = ((twodigits)remainder << PyLong_SHIFT) | pin[size];
1646+
digit quotient;
1647+
quotient = (digit)(dividend / n);
1648+
remainder = dividend % n;
1649+
pout[size] = quotient;
1650+
}
1651+
return remainder;
16371652
}
16381653

1654+
16391655
/* Divide an integer by a digit, returning both the quotient
16401656
(as function result) and the remainder (through *prem).
16411657
The sign of a is ignored; n should not be zero. */

0 commit comments

Comments
 (0)