-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: improved the scatter method #20118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: improved the scatter method #20118
Conversation
pandas/plotting/_core.py
Outdated
@@ -2852,22 +2852,48 @@ def pie(self, y=None, **kwds): | |||
|
|||
def scatter(self, x, y, s=None, c=None, **kwds): | |||
""" | |||
Scatter plot | |||
A scatter plot with point size *s* and color *c*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the first line is "A scatter plot of two columns in the DataFrame."
pandas/plotting/_core.py
Outdated
Scatter plot | ||
A scatter plot with point size *s* and color *c*. | ||
|
||
The coordinates of each point *x,y* are defined by two dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter names should be in singel backticks.
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
x, y : label or position, optional | ||
Coordinates for each point. | ||
x : column name or column position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think label or position
is OK. You can remove optional, as both x
and y
are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the description of the argument c
specifies column name or column position. My intention was to make it consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some confusion about using "label or position" as type. In other PR @datapythonista suggests using real types like "int or str" as the argument type, and mention it's the column name or position in the description of the parameter on next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, my mistake, I realized later that label
could be any type, I guess that's why label or position
is being used.
pandas/plotting/_core.py
Outdated
.. plot:: | ||
:context: close-figs | ||
|
||
>>> from sklearn.datasets import load_iris |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't import sklearn here, since it isn't installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just make a small sample dataset instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passed all checks and test and the plot was shown in the docs. But ok, I will write down the dataframe explicitely.
…ng a toy dataset instead
pandas/plotting/_core.py
Outdated
@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds): | |||
|
|||
def scatter(self, x, y, s=None, c=None, **kwds): | |||
""" | |||
Scatter plot | |||
A scatter plot with point size `s` and color `c`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the rest of the docs from the chapter I'd start this with an infinitive verb like "Create a scatter plot", "Generate a scatter plot", "Make a scatter plot"...
FTR this is pandas.DataFrame.plot.scatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the type of the parameters and don't mention them in the description and extended description.
pandas/plotting/_core.py
Outdated
@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds): | |||
|
|||
def scatter(self, x, y, s=None, c=None, **kwds): | |||
""" | |||
Scatter plot | |||
A scatter plot with point size `s` and color `c`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More: I don't think you should mention the meaning of the arguments in the short description of the method. This is what the Parameters
section is for.
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
x, y : label or position, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing the types for all parameters. See https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-3-parameters.
pandas/plotting/_core.py
Outdated
Scatter plot | ||
A scatter plot with point size `s` and color `c`. | ||
|
||
The coordinates of each point `x,y` are defined by two dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: don't talk about the arguments, talk about what the function does, use cases, etc.
pandas/plotting/_core.py
Outdated
s : scalar or array_like, optional | ||
Size of each point. | ||
c : label or position, optional | ||
c : label, column name or column position, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this column contain anything? How does it work? How does the function choose the color of the dots, just any unique color?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think valid options are matplotlib's + a column label, which extracts the array and passes it to matplotlib.
from plt.scatter:
c : color, sequence, or sequence of color, optional, default: 'b'
`c` can be a single color format string, or a sequence of color
specifications of length `N`, or a sequence of `N` numbers to be
mapped to colors using the `cmap` and `norm` specified via kwargs
(see below). Note that `c` should not be a single numeric RGB or
RGBA sequence because that is indistinguishable from an array of
values to be colormapped. `c` can be a 2-D array in which the
rows are RGB or RGBA, however, including the case of a single
row to specify the same color for all points.
You'll to verify that. Check if / when a colorbar is added.
pandas/plotting/_core.py
Outdated
Color of each point. | ||
`**kwds` : optional | ||
kwds : optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use **kwds
instaed of kwds
, even if the validation script complains. Sorry, this have been one the biggest sources of confusion in this sprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remove the : optional
. Just **kwds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I believe we have used **kwds : optionals
in other similar docstrings. Probably we should come up with an agreement on how to document the **kwds
parameter in these cases.
pandas/plotting/_core.py
Outdated
Examples | ||
-------- | ||
|
||
.. plot:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some text above the code explaining what are you doing here?
pandas/plotting/_core.py
Outdated
>>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1], | ||
... [6.4, 3.2, 1], [5.9, 3.0, 2]], | ||
... columns = ['length', 'width', 'species']) | ||
>>> f = df.plot.scatter(x='length', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If what is returned is an axis, better to use ax = ...
here.
pandas/plotting/_core.py
Outdated
>>> f = df.plot.scatter(x='length', | ||
... y='width', | ||
... c='species', | ||
... colormap='viridis') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add at least one more example with different parameters to see the differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if we recommend cmap
or colormap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger They are both accepted but the one described in pandas.DataFrame.plot is colormap.
…values, extended description with use cases, extended examples with extra case.
Hello @DaniGate! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 14, 2018 at 12:57 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #20118 +/- ##
=======================================
Coverage 91.76% 91.76%
=======================================
Files 150 150
Lines 49151 49151
=======================================
Hits 45102 45102
Misses 4049 4049
Continue to review full report at Codecov.
|
…values, extended description with use cases, extended examples with extra case.
… into doc-string_scatter
pandas/plotting/_core.py
Outdated
|
||
The coordinates of each point `x,y` are defined by two dataframe | ||
columns and filled circles are used to represent each point. | ||
The coordinates of each point are defined by two dataframe columns and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Muuuuch better!!!!!!
pandas/plotting/_core.py
Outdated
c : label, column name or column position, optional | ||
Color of each point. | ||
kwds : optional | ||
x : int, str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be int or str
instead, according to the guide: "If more than one type is accepted, separate them by commas, except the last two types, that need to be separated by the word ‘or’"
pandas/plotting/_core.py
Outdated
- A single scalar so all points have the same size. | ||
|
||
- A sequence of scalars, which will be used for each point's size | ||
recursively. For intance [2,14] all points will be size 2 or 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For instance, using [2, 14]
all points will be of size ...". Typo in "instance", the rest is a proposal to make it more prose-English.
pandas/plotting/_core.py
Outdated
c : str, int, array_like, optional | ||
The color of each point. Possible values are: | ||
|
||
- A single color string referred to by name, RGB or RGBA code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified if these bullet points render nicely in the final HTML? I'm not good at restructured text so I tend to be wary about these things. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost done I believe, just a couple of style fixes and IMO it will be good to merge. :) |
pandas/plotting/_core.py
Outdated
@@ -2881,17 +2906,22 @@ def scatter(self, x, y, s=None, c=None, **kwds): | |||
|
|||
Examples | |||
-------- | |||
Let's see how to draw a scatter plot using coordinates and color from | |||
the values in three DataFrame columns. | |||
|
|||
.. plot:: | |||
:context: close-figs | |||
|
|||
>>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1], | |||
... [6.4, 3.2, 1], [5.9, 3.0, 2]], | |||
... columns = ['length', 'width', 'species']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove spaces between the equal sign? columns=['length...
[ci skip]
[ci skip]
OK, also split the examples in two to be a little clearer. Merging, thanks! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.