Skip to content

TST: query with timezone aware index & column #34021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 20, 2020
57 changes: 57 additions & 0 deletions pandas/tests/frame/indexing/test_column_vs_index_tz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import pandas as pd
import pandas._testing as tm


class TestColumnvsIndexTZEquality:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need an individual class for this. Writing the test as a function is sufficient.

Don't need a new file for just this test. This test can live in pandas/tests/frame/test_query_eval.py

# https://github.com/pandas-dev/pandas/issues/29463
def check_for_various_tz(self, tz):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @MarcoGorelli, you can just use the tz_aware_fixture here

df = pd.DataFrame(
{
"val": range(10),
"time": pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz),
}
)
df_query = df.query('"2019-01-03 00:00:00+00" < time')
l1 = pd.DataFrame(list(df_query["time"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is the "expected" output, we'll want to construct this with a different method without query

expected = pd.DataFrame(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke ,
I modified everything as per suggestion and the tests pass locally.
But here instead of constructing new "expected" DataFrame we would want to use the ones already generating by using query since "We are testing for result to be same at index and column level"
This is what was expected as per issue ? Right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make sure the "expected" DataFrame takes an independent path than query so if query breaks both "result" and "expected" don't change.

So want to make sure the result of:
result = df.set_index('time').query('"2019-01-03 00:00:00+00" < time')

gives a result of:
expected = pd.DataFrame(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create an expected dataframe , it passes for few tz and fails for others , bcoz the query string always expects the time zone to be "UTC" , query('"2019-01-03 00:00:00+00" < time'), and therefore there is shape mismatch for example for "Asia/Kolkata" it is failing whereas it passes for "US/Eastern".

I assume we cannot change the query string because that is what we are testing for , in that case knowing the expected o/p when comparing to different tz will result in shape mismatch which is exactly happening now.
Any thoughts ?

def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
    # https://github.com/pandas-dev/pandas/issues/29463

    tz = tz_aware_fixture
    expected = pd.DataFrame(
        {"time": pd.date_range(start="2019-01-04", freq="1d", periods=7, tz=tz)}
    ).set_index("time")

    df = pd.DataFrame(
        {"time": pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz)}
    )
    result = df.set_index("time").query('"2019-01-03 00:00:00+00" < time')
    tm.assert_frame_equal(result, expected)
AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (7, 0)
[right]: (8, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just testing that we can query with a tz offset in the query string when the index is tz-aware, we can be with what the query string is.

You can just change the test to be then

    tz = tz_aware_fixture
    expected = pd.DataFrame(index=pd.date_range(start="2019-01-04", freq="1d", periods=10, tz=tz, name='time'))

    df = pd.DataFrame(index=pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz, name=time))
    result = df.query('"2018-01-03 00:00:00+00" < time')
    tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke , Yes this works , I am now banging my head for not thinking this earlier.
Thank you very much , now all the tests pass.

As suggested , I have added the test in test_query_eval.py file.


# # This was earlier raising an exception.
index_query = df.set_index("time").query('"2019-01-03 00:00:00+00" < time')
l2 = pd.DataFrame(list(index_query.index))
tm.assert_frame_equal(l1, l2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the variables so this reads tm.assert_frame_equal(result, expected)


def test_check_column_vs_index_tz_query(self):
tz_list = [
"Africa/Abidjan",
"Africa/Douala",
"Africa/Mbabane",
"America/Argentina/Catamarca",
"America/Belize",
"America/Curacao",
"America/Guatemala",
"America/Kentucky/Louisville",
"America/Mexico_City",
"America/Port-au-Prince",
"America/Sitka",
"Antarctica/Casey",
"Asia/Ashkhabad",
"Asia/Dubai",
"Asia/Khandyga",
"Asia/Qatar",
"Asia/Tomsk",
"Atlantic/Reykjavik",
"Australia/Queensland",
"Canada/Yukon",
"Etc/GMT+7",
"Etc/UCT",
"Europe/Guernsey",
"Europe/Paris",
"Europe/Vienna",
"Indian/Cocos",
"NZ",
"Pacific/Honolulu",
"Pacific/Samoa",
"US/Eastern",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vipulrai91

There's a fixture you can use here, tz_aware_fixture (feel free to ask for help if it's unclear how to use it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli thank you for the feedback.

As far as I understand the change has to be.

def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
    tz_list = tz_aware_fixture

Got this snippet from test_datetime.py , but what actually is tz_aware_fixture?
Also , How Do I run the test locally, I tried going through the docs , but will need some help here.

Thanks

Copy link
Member

@MarcoGorelli MarcoGorelli May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tz = tz_aware_fixture should be enough, then pytest will loop through the different timezones defined in tz_aware_fixture

what actually is tz_aware_fixture?

It's a fixture which you can use to test different timezones.

How Do I run the test locally

See using pytest. So here,

$ pytest pandas/tests/frame/indexing/test_column_vs_index_tz.py

should be enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for guiding.

tz_aware_fixture is a combination of TimeZones and other object types , also even
it throws pytz.exceptions.UnknownTimeZone error.

test_column_vs_index_tz.py:58: TypeError
============================================================ short test summary info =============================================================
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['UTC'] - pytz.exceptions.UnknownTimeZoneErr...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['US/Eastern'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['Asia/Tokyo'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/US/Pacific'] - pytz.exceptions.Un...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/Asia/Singapore'] - pytz.exception...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzutc()] - TypeError: 'tzutc' object is not...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzlocal()] - TypeError: 'tzlocal' object is...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(300)] - TypeError: '_Fixed...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[<UTC>] - TypeError: 'UTC' object is not ite...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(-300)] - TypeError: '_Fixe...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone.utc] - TypeError: 'dateti...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(seconds=3600))]
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(days=-1, seconds=82800), 'foo')]

]

for tz in tz_list:
self.check_for_various_tz(tz)