DOC: pd.read_csv doc-string clarification #11555 #11756

frankcleary · 2015-12-03T21:36:37Z

closes #11555
Updated IO Tools documentation for read_csv() and read_table() to be consistent with the doc-string.

jreback · 2015-12-04T18:05:17Z

what exactly changed here?

jreback · 2015-12-04T18:08:23Z

can you verify that these are in numpy-doc format, see here

mostly it comes down to a parameter like:

parm : datatype, optional/required, default value
     description

jreback · 2015-12-04T18:09:09Z

Parameters

Description of the function arguments, keywords and their respective types.

Parameters
----------
x : type
    Description of parameter `x`.
Enclose variables in single backticks. The colon must be preceded by a space, or omitted if the type is absent.

For the parameter types, be as precise as possible. Below are a few examples of parameters and their types.

Parameters
----------
filename : str
copy : bool
dtype : data-type
iterable : iterable object
shape : int or tuple of int
files : list of str
If it is not necessary to specify a keyword argument, use optional:

x : int, optional
Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).
When a parameter can only assume one of a fixed set of values, those values can be listed in braces, with the default appearing first:

order : {'C', 'F', 'A'}
    Description of `order`.
When two or more input parameters have exactly the same type, shape and description, they can be combined:

x1, x2 : array_like
    Input arrays, description of `x1`, `x2`.

frankcleary · 2015-12-04T20:35:10Z

In terms of what was changed: I updated the IO Tools description of read_csv() that appears here to match the doc string. I thought that's what the bug was getting at. I also slightly changed the wording describing the default behavior of the header= option.

I can go through and make any changes to put the doc string in numpy format, would you prefer that as a separate PR and/or should I drop the IO Tools doc changes?

jreback · 2015-12-04T22:52:16Z

@frankcleary thanks. can you enumerate what changed (here), though. just for my edification.

yes, would like to conform to numpy-doc in this PR.

frankcleary · 2015-12-09T19:13:11Z

Makes sense, I'll do that and update this PR later this week.

jorisvandenbossche · 2015-12-09T19:37:09Z

@jreback The list is in io.rst, not in the docstring. So I think it does not need to follow the numpy-doc format?

jreback · 2015-12-09T20:04:06Z

@jorisvandenbossche hmm, good point. I am not sure what we actually do as we don't have too many places where we go thru the args of functions. is there a consistent pattern now (anywhere)? any reason not to use a doc-string format?

frankcleary · 2015-12-12T00:01:12Z

The only other place I could find in the docs with a similar list of args is for concat in merging.rst. This also appears to have lagged behind the concat docstring in terms of updates (for example there's no copy=...). I wonder if it's better to remove these lists from the *.rst documentation in favor of links to the docs generated from the docstrings themselves?

jreback · 2015-12-12T14:06:13Z

@frankcleary the ideal thing would be to render the doc-strings in-line. as the out-of-syncness is annoying. Though if you simply want to fix it we can make that another issue.

frankcleary · 2015-12-14T05:41:36Z

Looks like sphinx can do that, updated. Here's a sample of what the result looks like in the io docs:

jreback · 2015-12-16T13:57:19Z

@jorisvandenbossche how's this look to you

jorisvandenbossche · 2015-12-18T13:07:23Z

Hmm, personally, if it looks exactly the same as the generated API page, I would just link to it rather than including it.
But I also understand that keeping a separate list of the parameters up to date is more work, but I think it is nicer for the tutorial docs (so as one of the original commits in this PR: keep it as a list, but update the explanations).

jreback · 2015-12-26T00:49:51Z

so after seeing this, I think I agree with @jorisvandenbossche

let's revert back to just an updated list (and hopefully we can keep this updated).

@frankcleary can you update.

frankcleary · 2015-12-30T04:27:48Z

Sure, I'll get to it this week.

jreback · 2016-01-07T13:10:08Z

doc/source/io.rst

-    'round_trip' for the round-trip converter.
+  - ``filepath_or_buffer``: str or file handle /
+    :class:`python:io.StringIO`. The string could be a URL. Valid URL schemes
+    include http, ftp, s3, and file. For file URLs, a host is expected. For


should we mention the LocalPath here as well? (its in the above)

jreback · 2016-01-07T13:18:01Z

@frankcleary looks good. couple of points.

I would like to have the options in the same order as io.rst sections. grouping if possible (e.g. the NA values together). I think this will be more logical.

All that said the more important options should be closer to the front if at all possible (I think that is how we had it). e.g. things like index_col, header, sep.

So ok to reorg a bit (see if you can keep the code in sync as well). everything is passed by kw so that should be fine. Add a note in the whatsnew to say that the ordering of args was changed (in API section) as well.

frankcleary · 2016-01-11T05:43:48Z

Makes sense, thanks. I will incorporate all those suggestions next weekend and post an update by Monday Jan. 18.

frankcleary · 2016-01-18T21:36:07Z

Here's the order I'm thinking of in terms of logically organizing the arguments, along with the general idea behind the grouping I put them in. Any thoughts?

Basic

filepath_or_buffer
sep
delimiter

Column and index names and location

header
names
index_col
usecols
squeeze
prefix
mangle_dupe_cols

General parsing details

dtype
engine
converters
true_values
false_values
skipinitialspace
skiprows
skipfooter
nrows

NA handling

na_values
keep_default_na
na_filter
verbose
skip_blank_lines

Date handling

parse_dates
infer_datetime_format
keep_date_col
date_parser
dayfirst
thousands
decimal

Iteration

iterator
chunksize

Quoting and file format

compression
lineterminator
quotechar
quoting
escapechar
comment
encoding
dialect
tupleize_cols

Errors

error_bad_lines
warn_bad_lines

jreback · 2016-01-25T17:02:19Z

@frankcleary a reorg like that would great!

(note you can prob use the ^^^^ to make these sub-sections in the docs (I think) so that they show up on the menu bar as groupings.

frankcleary · 2016-02-04T06:57:33Z

I've got an update almost ready with the reorg and other suggested changes, but need to fix merge conflicts in the parsers.py docstring so I won't be able to finish it tonight. I will get it done this weekend.

Is this about what you're thinking in regard to the sections:

jorisvandenbossche · 2016-02-04T10:24:12Z

Thanks for this!

One consideration:

I think we should try to make the actual option name stand out a bit more (not it is a bit lost in the dense bullet point). Maybe using something like http://docutils.sourceforge.net/docs/user/rst/quickref.html#definition-lists ? (just an idea, would have to try out to see how it looks)

Updated IO Tools documentation for read_csv() and read_table() to be consistent with the doc-string and reorded keyword arguements to group them more logically. Also updated concat docs in merging.rst to be consistent with doc-string.

frankcleary · 2016-02-08T06:35:13Z

Sorry, I was trying to push a clean history and it looks things are finished for this PR (can't reopen on a closed PR with a force pushed branch). I opened a new one here: #12256

jreback added the Docs label Dec 4, 2015

jreback added this to the 0.18.0 milestone Dec 4, 2015

frankcleary force-pushed the gh11555-read_csv-docs branch from d605e50 to 8a7cf0b Compare December 14, 2015 05:19

frankcleary force-pushed the gh11555-read_csv-docs branch from 8a7cf0b to 5945ca3 Compare January 4, 2016 04:30

jreback reviewed Jan 7, 2016
View reviewed changes

frankcleary closed this Feb 8, 2016

frankcleary force-pushed the gh11555-read_csv-docs branch from 5945ca3 to 4112c9f Compare February 8, 2016 06:11

frankcleary mentioned this pull request Feb 8, 2016

DOC: pd.read_csv doc-string clarification #11555 #12256

Closed

Uh oh!

DOC: pd.read_csv doc-string clarification #11555 #11756

DOC: pd.read_csv doc-string clarification #11555 #11756

Uh oh!

Conversation

frankcleary commented Dec 3, 2015

Uh oh!

jreback commented Dec 4, 2015

Uh oh!

jreback commented Dec 4, 2015

Uh oh!

jreback commented Dec 4, 2015

Uh oh!

frankcleary commented Dec 4, 2015

Uh oh!

jreback commented Dec 4, 2015

Uh oh!

frankcleary commented Dec 9, 2015

Uh oh!

jorisvandenbossche commented Dec 9, 2015

Uh oh!

jreback commented Dec 9, 2015

Uh oh!

frankcleary commented Dec 12, 2015

Uh oh!

jreback commented Dec 12, 2015

Uh oh!

frankcleary commented Dec 14, 2015

Uh oh!

jreback commented Dec 16, 2015

Uh oh!

jorisvandenbossche commented Dec 18, 2015

Uh oh!

jreback commented Dec 26, 2015

Uh oh!

frankcleary commented Dec 30, 2015

Uh oh!

jreback Jan 7, 2016

Choose a reason for hiding this comment

Uh oh!

frankcleary Feb 8, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 7, 2016

Uh oh!

frankcleary commented Jan 11, 2016

Uh oh!

frankcleary commented Jan 18, 2016

Basic

Column and index names and location

General parsing details

NA handling

Date handling

Iteration

Quoting and file format

Errors

Uh oh!

jreback commented Jan 25, 2016

Uh oh!

frankcleary commented Feb 4, 2016

Uh oh!

jorisvandenbossche commented Feb 4, 2016

Uh oh!

frankcleary commented Feb 8, 2016

Uh oh!

Uh oh!