Skip to content

DOC: pd.read_csv doc-string clarification #11555 #11756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

frankcleary
Copy link
Contributor

closes #11555
Updated IO Tools documentation for read_csv() and read_table() to be consistent with the doc-string.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

what exactly changed here?

@jreback jreback added the Docs label Dec 4, 2015
@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

can you verify that these are in numpy-doc format, see here

mostly it comes down to a parameter like:

parm : datatype, optional/required, default value
     description

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

Parameters

Description of the function arguments, keywords and their respective types.

Parameters
----------
x : type
    Description of parameter `x`.
Enclose variables in single backticks. The colon must be preceded by a space, or omitted if the type is absent.

For the parameter types, be as precise as possible. Below are a few examples of parameters and their types.

Parameters
----------
filename : str
copy : bool
dtype : data-type
iterable : iterable object
shape : int or tuple of int
files : list of str
If it is not necessary to specify a keyword argument, use optional:

x : int, optional
Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).
When a parameter can only assume one of a fixed set of values, those values can be listed in braces, with the default appearing first:

order : {'C', 'F', 'A'}
    Description of `order`.
When two or more input parameters have exactly the same type, shape and description, they can be combined:

x1, x2 : array_like
    Input arrays, description of `x1`, `x2`.

@jreback jreback added this to the 0.18.0 milestone Dec 4, 2015
@frankcleary
Copy link
Contributor Author

In terms of what was changed: I updated the IO Tools description of read_csv() that appears here to match the doc string. I thought that's what the bug was getting at. I also slightly changed the wording describing the default behavior of the header= option.

I can go through and make any changes to put the doc string in numpy format, would you prefer that as a separate PR and/or should I drop the IO Tools doc changes?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

@frankcleary thanks. can you enumerate what changed (here), though. just for my edification.

yes, would like to conform to numpy-doc in this PR.

@frankcleary
Copy link
Contributor Author

Makes sense, I'll do that and update this PR later this week.

@jorisvandenbossche
Copy link
Member

@jreback The list is in io.rst, not in the docstring. So I think it does not need to follow the numpy-doc format?

@jreback
Copy link
Contributor

jreback commented Dec 9, 2015

@jorisvandenbossche hmm, good point. I am not sure what we actually do as we don't have too many places where we go thru the args of functions. is there a consistent pattern now (anywhere)? any reason not to use a doc-string format?

@frankcleary
Copy link
Contributor Author

The only other place I could find in the docs with a similar list of args is for concat in merging.rst. This also appears to have lagged behind the concat docstring in terms of updates (for example there's no copy=...). I wonder if it's better to remove these lists from the *.rst documentation in favor of links to the docs generated from the docstrings themselves?

@jreback
Copy link
Contributor

jreback commented Dec 12, 2015

@frankcleary the ideal thing would be to render the doc-strings in-line. as the out-of-syncness is annoying. Though if you simply want to fix it we can make that another issue.

@frankcleary frankcleary force-pushed the gh11555-read_csv-docs branch from d605e50 to 8a7cf0b Compare December 14, 2015 05:19
@frankcleary
Copy link
Contributor Author

Looks like sphinx can do that, updated. Here's a sample of what the result looks like in the io docs:

screen shot 2015-12-13 at 9 39 32 pm

@jreback
Copy link
Contributor

jreback commented Dec 16, 2015

@jorisvandenbossche how's this look to you

@jorisvandenbossche
Copy link
Member

Hmm, personally, if it looks exactly the same as the generated API page, I would just link to it rather than including it.
But I also understand that keeping a separate list of the parameters up to date is more work, but I think it is nicer for the tutorial docs (so as one of the original commits in this PR: keep it as a list, but update the explanations).

@jreback
Copy link
Contributor

jreback commented Dec 26, 2015

so after seeing this, I think I agree with @jorisvandenbossche

let's revert back to just an updated list (and hopefully we can keep this updated).

@frankcleary can you update.

@frankcleary
Copy link
Contributor Author

Sure, I'll get to it this week.

@frankcleary frankcleary force-pushed the gh11555-read_csv-docs branch from 8a7cf0b to 5945ca3 Compare January 4, 2016 04:30
'round_trip' for the round-trip converter.
- ``filepath_or_buffer``: str or file handle /
:class:`python:io.StringIO`. The string could be a URL. Valid URL schemes
include http, ftp, s3, and file. For file URLs, a host is expected. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention the LocalPath here as well? (its in the above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jreback
Copy link
Contributor

jreback commented Jan 7, 2016

@frankcleary looks good. couple of points.

I would like to have the options in the same order as io.rst sections. grouping if possible (e.g. the NA values together). I think this will be more logical.

All that said the more important options should be closer to the front if at all possible (I think that is how we had it). e.g. things like index_col, header, sep.

So ok to reorg a bit (see if you can keep the code in sync as well). everything is passed by kw so that should be fine. Add a note in the whatsnew to say that the ordering of args was changed (in API section) as well.

@frankcleary
Copy link
Contributor Author

Makes sense, thanks. I will incorporate all those suggestions next weekend and post an update by Monday Jan. 18.

@frankcleary
Copy link
Contributor Author

Here's the order I'm thinking of in terms of logically organizing the arguments, along with the general idea behind the grouping I put them in. Any thoughts?

Basic

filepath_or_buffer
sep
delimiter

Column and index names and location

header
names
index_col
usecols
squeeze
prefix
mangle_dupe_cols

General parsing details

dtype
engine
converters
true_values
false_values
skipinitialspace
skiprows
skipfooter
nrows

NA handling

na_values
keep_default_na
na_filter
verbose
skip_blank_lines

Date handling

parse_dates
infer_datetime_format
keep_date_col
date_parser
dayfirst
thousands
decimal

Iteration

iterator
chunksize

Quoting and file format

compression
lineterminator
quotechar
quoting
escapechar
comment
encoding
dialect
tupleize_cols

Errors

error_bad_lines
warn_bad_lines

@jreback
Copy link
Contributor

jreback commented Jan 25, 2016

@frankcleary a reorg like that would great!

(note you can prob use the ^^^^ to make these sub-sections in the docs (I think) so that they show up on the menu bar as groupings.

@frankcleary
Copy link
Contributor Author

I've got an update almost ready with the reorg and other suggested changes, but need to fix merge conflicts in the parsers.py docstring so I won't be able to finish it tonight. I will get it done this weekend.

Is this about what you're thinking in regard to the sections:

screen shot 2016-02-03 at 10 53 41 pm

@jorisvandenbossche
Copy link
Member

Thanks for this!

One consideration:

Updated IO Tools documentation for read_csv() and read_table() to be consistent with the doc-string
and reorded keyword arguements to group them more logically. Also updated concat docs in merging.rst
to be consistent with doc-string.
@frankcleary frankcleary closed this Feb 8, 2016
@frankcleary frankcleary force-pushed the gh11555-read_csv-docs branch from 5945ca3 to 4112c9f Compare February 8, 2016 06:11
@frankcleary
Copy link
Contributor Author

Sorry, I was trying to push a clean history and it looks things are finished for this PR (can't reopen on a closed PR with a force pushed branch). I opened a new one here: #12256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: pd.read_csv doc-string clarification
3 participants