Skip to content

Fixing econdb tests #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 29, 2019
Merged

Fixing econdb tests #650

merged 18 commits into from
Aug 29, 2019

Conversation

simongarisch
Copy link
Contributor

I've made a start on fixing some of the broken tests. I also added a check to halt processing additional metadata in the EcondbReader class if this metadata does not exist. This was throwing an error as the non-existent metadata was passing back an empty string (where it expected a dict).

@simongarisch
Copy link
Contributor Author

Moving onto the next failing tests for test_eurostat.py. The Eurostat website uses dataset 'cdh_e_fos' as an example for using their API as does our test method 'test_get_cdh_e_fos'. However, this particular dataset does not exist anymore and so the test is failing. I reached out to them just to confirm:

Dear Simon,
We have received the following from the team:
Indeed the page should be updated as the dataset cdh_e_fos doesn’t exist anymore in Eurobase.

I'll update the test to reference a different (currently existing) data source.

@simongarisch
Copy link
Contributor Author

simongarisch commented Jul 9, 2019

I'm currently having a look at the yahoo finance tests. I see that test_yahoo::TestYahoo::test_yahoo_DataReader assumes that Yahoo does not by default adjust dividends for splits.

It looks like yahoo has changed this logic. I can see unadjusted dividends on the Nasdaq page and adjusted dividends on the Yahoo page.

Taking the AAPL example there was a cash dividend of 3.29 (Nasdaq site) on 8 May 2014.
There was then a 1/7 stock split on 09 Jun 2014.
Yahoo records this 8 May dividend as $0.47 (= 3.29 / 7), so yahoo appears to be making dividend adjustments for splits by default.

Within daily.py yahoo.pandas_datareader we have the parameter adjust_dividends: bool, default false. If you set adjust_dividends = True then it'll assume that Yahoo has not made this adjustment and will try to adjust for the dividends a second time. Unadjusted divi = 3.29, yahoo adjusted divi = 3.29 / 7 = 0.47, but then pandas_datareader will adjust this again to 0.47 / 7 = 0.067.

I'll look to raise a separate issue for this. The final yahoo tests can then be fixed when this is resolved.

@econdb
Copy link
Contributor

econdb commented Jul 9, 2019

Looks good on my side, thanks!

@simongarisch
Copy link
Contributor Author

There is a bug in the Yahoo finance website. If you try to collect data from 1st Jan 2019 it'll give you the 31st Dec 2018 as well.

This plays with test_yahoo.py::TestYahoo::test_get_data_interval as the expected days of 252 become 253, expected weeks of 52 become 53...

I'd suggest that we mark this test as expected fail for now. When it starts to pass (if Yahoo fixes their logic) then the expect fail will prompt users to look at it again.

@simongarisch
Copy link
Contributor Author

simongarisch commented Jul 10, 2019

For nasdaq_trader.py the retry_count -= 1 was in the wrong place -> this was leading to the RemoteDataError not being thrown and the function returning _ticker_cache = None.

symbols = None in this case and
When calling assert 'IBM' in symbols.index this was returning
AttributeError: 'NoneType' object has no attribute 'index'

@simongarisch
Copy link
Contributor Author

A quick note that the test_econdp.py tests do not always return columns in the same order. This was causing the Python 3.6 (pandas 0.23) tests to pass but the Python 3.5 (pandas 0.20.3) tests to fail.

In test_get_tourism, for example, sometimes the Total international arrivals is listed first and at other times it is the countries. This was causing an index error. I've added a swaplevel to keep them in an expected order for the test. It's perhaps better to enforce a consistent return order...

@simongarisch
Copy link
Contributor Author

simongarisch commented Jul 11, 2019

The last failing test is get_tiingo_symbols for test_tiingo.py.

import pandas as pd

url = 'https://apimedia.tiingo.com/docs/tiingo/daily/supported_tickers.zip'

df = pd.read_csv(url)

Under pandas >= 0.20.3 this appears to work fine (as per the tests above).
For the tests under pandas == 0.19.2 this raises an error
CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

I see there is a similar issue pandas-dev/pandas#11493
I've tried a few variants for the zip file pull and am happy to take suggestions.

I've marked this as skip for pandas version 0.19.2, but happy to discuss.

@simongarisch
Copy link
Contributor Author

@bashtage Hi there.

I've added changes for failing tests, details above.
This closes #651

Let me know if you have suggestions.

@bashtage bashtage merged commit 8444bb4 into pydata:master Aug 29, 2019
@bashtage
Copy link
Contributor

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants