Skip to content

BUG: (GH4626) Fix decoding based on a passed in non-default encoding in pd.read_stata #4643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 26, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 22, 2013

closes #4626

@jreback
Copy link
Contributor Author

jreback commented Aug 22, 2013

@jseabold can you give this a once over; I included your file as a test case, FYI

It appears to me that everything is encoded by the default cp1252, the 'encoding' parameter is merely to then decode the strings

@jreback
Copy link
Contributor Author

jreback commented Aug 23, 2013

cc @PKEuS

pls take a look as well

jreback added a commit that referenced this pull request Aug 26, 2013
BUG: (GH4626) Fix decoding based on a passed in non-default encoding in pd.read_stata
@jreback jreback merged commit 4cf2030 into pandas-dev:master Aug 26, 2013
@jseabold
Copy link
Contributor

jseabold commented Sep 3, 2013

One comment. It's not a showstopper, but Stata doesn't support unicode. So really the only valid encodings for reading in data are 'ascii' and 'latin-1' or equivalent. Maybe worth a mention in the docs?

@jreback
Copy link
Contributor Author

jreback commented Sep 3, 2013

so it's really not possible to actually encode the data (unless u somehow did it external to stata?, meaning u encoded the bytes themselves

@jseabold
Copy link
Contributor

jseabold commented Sep 3, 2013

Yes, AFAIK, the user has no control over this. A great source of anguish to me lately whenever my data passes through Stata. The only extended characters supported are latin-1. Some info here (from third-party software): http://stattransfer.com/faq/encoding.html

@jreback
Copy link
Contributor Author

jreback commented Sep 3, 2013

ok I'll put in a doc mention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_stata ignoring encoding?
2 participants