Skip to content

bpo-37863: Optimize Fraction.__hash__() #15298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 16, 2019

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Aug 15, 2019

Copy link
Member

@tim-one tim-one left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - that was fast! 😉 Just left a comment about a simplification.

dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
try:
dinv = pow(self._denominator, -1, _PyHASH_MODULUS)
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case ValueError is raised?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is raised if and only if _PyHASH_MODULUS divides self._denominator. There is no modular inverse if and only if the gcd of the base and the modulus is non-trivial (not 1), but the modulus is a prime in this case, so in this case that's the same as the base being a multiple of the modulus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. We need a test for this.

@serhiy-storchaka
Copy link
Member

Any benchmarking results?

@rhettinger
Copy link
Contributor Author

Baseline:

$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(123412343, 58145823453)' 'hash(f)'
10000 loops, best of 11: 21.5 usec per loop
$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(1243, 5813)' 'hash(f)'
20000 loops, best of 11: 19.8 usec per loop
$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(122345234523453412343, 58145234523452345823453)' 'hash(f)'
10000 loops, best of 11: 23.7 usec per loop

Patched

$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(123412343, 58145823453)' 'hash(f)'
100000 loops, best of 11: 3.17 usec per loop
$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(1243, 5813)' 'hash(f)'
100000 loops, best of 11: 2.04 usec per loop
$ ./python.exe -r11 -s 'from fractions import Fraction as F' -s 'f=F(122345234523453412343, 58145234523452345823453)' 'hash(f)'
50000 loops, best of 11: 5.69 usec per loop

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for the denominator being a multiple of the modulus.

dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
try:
dinv = pow(self._denominator, -1, _PyHASH_MODULUS)
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. We need a test for this.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Copy link
Member

@tim-one tim-one left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion for something else to try, avoiding the remaining division (mod) at the cost of another hash() call. At best minor, but possibly significant.

hash_ = _PyHASH_INF
else:
hash_ = abs(self._numerator) * dinv % _PyHASH_MODULUS
result = hash_ if self >= 0 else -hash_
hash_ = hash(abs(self._numerator)) * dinv % _PyHASH_MODULUS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to write this as

hash_ = hash(hash(abs(self._numerator)) * dinv)

to avoid the (in general) 122-by-61 bit % division. The int hash should compute the same thing, but doesn't do any division (it's all bit rotations and adds, one pair per internal int "digit"). (While the docs don't point this out, reduction by _PyHASH_MODULUS in binary is akin to "casting out nines" in decimal to find a remainder mod 9 - no division is really needed.)

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@rhettinger rhettinger merged commit f3cb68f into python:master Aug 16, 2019
@rhettinger rhettinger deleted the fraction_hash branch August 16, 2019 03:58
@bedevere-bot
Copy link

@rhettinger: Please replace # with GH- in the commit message next time. Thanks!

lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants