Skip to content

PYTHON-4663 Fix compatibility with dateutil timezones #1812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 28, 2024

Conversation

ShaneHarvey
Copy link
Member

PYTHON-4663 Fix compatibility with dateutil timezones

Depends on #1811

Here's a quick benchmark to sanity check we're not regressing in performance:

import datetime
import timeit
from bson import DatetimeMS, decode, encode, CodecOptions, DatetimeConversion

MIN = -62135596800000
MAX = 253402300799999
N = 10000
clampopts = CodecOptions(datetime_conversion=DatetimeConversion.DATETIME_CLAMP, tz_aware=True, tzinfo=datetime.timezone.utc)
awareopts = CodecOptions(datetime_conversion=DatetimeConversion.DATETIME, tz_aware=True)
clampawareopts = CodecOptions(datetime_conversion=DatetimeConversion.DATETIME_CLAMP, tz_aware=True)

def decode_in_range_default():
    return decode(data_in)
def decode_in_range_default_aware():
    return decode(data_in, awareopts)
def decode_in_range_default_aware_clamp():
    return decode(data_in, clampawareopts)
def decode_in_range():
    return decode(data_in, codec_options=clampopts)
def decode_out_of_range():
    return decode(data_out, codec_options=clampopts)
def encode_in_range():
    return encode(doc_in)
def encode_out_of_range():
    return encode(doc_out)

doc_in = {"d": [datetime.datetime(1970, 1, 1)] * N}
doc_out = {"d": [DatetimeMS(MIN-1)] * N}
data_in = encode_in_range()
data_out = encode_out_of_range()

if __name__ == '__main__':
    for f in [
        decode_in_range_default, decode_in_range_default_aware, decode_in_range_default_aware_clamp,
        decode_in_range, decode_out_of_range, encode_in_range,
    ]:
        res = timeit.timeit(f"{f.__name__}()", setup=f"from __main__ import {f.__name__}", number=100)
        print(f"{f.__name__}: {res}")

Before:

$ python datetime-bench.py         
decode_in_range_default: 0.06987654202384874
decode_in_range_default_aware: 0.5483650420210324
decode_in_range_default_aware_clamp: 0.6285510419984348
decode_in_range: 0.9283483749604784
decode_out_of_range: 1.639427791989874
encode_in_range: 0.06142729101702571

After:

$ python datetime-bench.py         
decode_in_range_default: 0.06868383300025016
decode_in_range_default_aware: 0.5209614590276033
decode_in_range_default_aware_clamp: 0.5209896659944206
decode_in_range: 0.8219957919791341
decode_out_of_range: 1.5381669579655863
encode_in_range: 0.06304012500913814

In fact we're now 10% faster in decoding aware datetimes (both in and out of range). So it appears is is faster to recompute _min_datetime_ms/_max_datetime_ms in the C extensions than it is to lookup the cached value.

@ShaneHarvey ShaneHarvey marked this pull request as ready for review August 24, 2024 00:49
@ShaneHarvey
Copy link
Member Author

Ready for review. I opened PYTHON-4701 for the pypy test failure.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ShaneHarvey ShaneHarvey merged commit a2059dc into mongodb:master Aug 28, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants