-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: improved the scatter method #20118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
ab4757f
9b17c14
f71d139
c24a5c6
fcb8d77
08b5f5a
fca7db2
051bd76
a4aede7
4fcb6a8
c70c1e2
ded0ea4
b2abf5a
15b6073
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2852,22 +2852,48 @@ def pie(self, y=None, **kwds): | |
|
||
def scatter(self, x, y, s=None, c=None, **kwds): | ||
""" | ||
Scatter plot | ||
A scatter plot with point size *s* and color *c*. | ||
|
||
The coordinates of each point *x,y* are defined by two dataframe | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parameter names should be in singel backticks. |
||
columns and filled circles are used to represent each point. | ||
|
||
Parameters | ||
---------- | ||
x, y : label or position, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are missing the types for all parameters. See https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-3-parameters. |
||
Coordinates for each point. | ||
x : column name or column position | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the description of the argument There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is some confusion about using "label or position" as type. In other PR @datapythonista suggests using real types like "int or str" as the argument type, and mention it's the column name or position in the description of the parameter on next line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, my mistake, I realized later that |
||
Horizontal coordinates of each point. | ||
y : column name or column position | ||
Vertical coordinates of each point. | ||
s : scalar or array_like, optional | ||
Size of each point. | ||
c : label or position, optional | ||
c : label, column name or column position, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this column contain anything? How does it work? How does the function choose the color of the dots, just any unique color? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think valid options are matplotlib's + a column label, which extracts the array and passes it to matplotlib. from plt.scatter:
You'll to verify that. Check if / when a colorbar is added. |
||
Color of each point. | ||
`**kwds` : optional | ||
kwds : optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger I believe we have used |
||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : matplotlib.AxesSubplot or np.array of them | ||
|
||
See Also | ||
-------- | ||
matplotlib.pyplot.scatter : scatter plot using multiple input data | ||
formats. | ||
|
||
Examples | ||
-------- | ||
|
||
.. plot:: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some text above the code explaining what are you doing here? |
||
:context: close-figs | ||
|
||
>>> from sklearn.datasets import load_iris | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't import sklearn here, since it isn't installed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe just make a small sample dataset instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It passed all checks and test and the plot was shown in the docs. But ok, I will write down the dataframe explicitely. |
||
>>> iris = load_iris() | ||
>>> df = pd.DataFrame(iris.data[:,:2], | ||
... columns=iris.feature_names[:2]) | ||
>>> df['species'] = load_iris().target | ||
>>> f = df.plot.scatter(x='sepal length (cm)', | ||
... y='sepal width (cm)', | ||
... c='species', | ||
... colormap='viridis') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add at least one more example with different parameters to see the differences? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you check if we recommend There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger They are both accepted but the one described in pandas.DataFrame.plot is colormap. |
||
""" | ||
return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the first line is "A scatter plot of two columns in the DataFrame."