Skip to content

DOC: update the pandas.errors.DtypeWarning docstring #20208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 47 additions & 3 deletions pandas/errors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,53 @@ class ParserError(ValueError):

class DtypeWarning(Warning):
"""
Warning that is raised for a dtype incompatibility. This
can happen whenever `pd.read_csv` encounters non-
uniform dtypes in a column(s) of a given CSV file.
Warning raised when reading different dtypes in a column from a file.

Raised for a dtype incompatibility. This can happen whenever `read_csv`
or `read_table` encounter non-uniform dtypes in a column(s) of a given
CSV file.

See Also
--------
pandas.read_csv : Read CSV (comma-separated) file into a DataFrame.
pandas.read_table : Read general delimited file into a DataFrame.

Notes
-----
This warning is issued when dealing with larger files because the dtype
checking happens per chunk read.

Despite the warning, the CSV file is read with mixed types in a single
column which will be an object type. See the examples below to better
understand this issue.

Examples
--------
This example creates and reads a large CSV file with a column that contains
`int` and `str`.

>>> df = pd.DataFrame({'a':['1']*100000 + ['X']*100000 + ['1']*100000,
... 'b':['b']*300000})
>>> df.to_csv('test.csv', index=False)
>>> df2 = pd.read_csv('test.csv')
>>> DtypeWarning: Columns (0) have mixed types... # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the >>> on this line? (it might be you need to move the "# doctest: +SKIP" to the line above after the read_csv)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche @TomAugspurger I've added "import os" and an "os.remove('test.csv')" at the end of the example, just as you've agreed on #20302 , and also put "# doctest: +SKIP" right before the warning, like this:

>>> import os
>>> df = pd.DataFrame({'a':['1']*100000 + ['X']*100000 + ['1']*100000,
...                    'b':['b']*300000})
>>> df.to_csv('test.csv', index=False)
>>> df2 = pd.read_csv('test.csv')
>>> os.remove('test.csv')
# doctest: +SKIP
DtypeWarning: Columns (0) have mixed types...

But I got this message on validation:

################################################################################
################################### Doctests ###################################
################################################################################


Line 32, in pandas.errors.DtypeWarning
Failed example:
os.remove('test.csv')
Expected:
# doctest: +SKIP
DtypeWarning: Columns (0) have mixed types...
Got nothing

And also, removing ".csv" from to_csv and read_csv, like Tom suggested, raises two possible errors:

os.remove('test.csv'): FileNotFoundError: [Errno 2] No such file or directory: 'test.csv'
os.remove('test'): FileNotFoundError: File b'test' does not exist

What do you guys recommend?


Important to notice that df2 will contain both `str` and `int` for the
same input, '1'.

>>> df2.iloc[262140, 0]
'1'
>>> type(df2.iloc[262140, 0])
<class 'str'>
>>> df2.iloc[262150, 0]
1
>>> type(df2.iloc[262150, 0])
<class 'int'>

One way to solve this issue is using the parameter `converters` in the
`read_csv` and `read_table` functions to explicit the conversion:

>>> df2 = pd.read_csv('test', sep='\t', converters={'a': str})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is still test not test.csv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: I think we should recommend dtype={'a': str} instead ?

"""


Expand Down