-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: Improve the docstring of Str.contains() #20870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unsure best way to phrase |
Codecov Report
@@ Coverage Diff @@
## master #20870 +/- ##
==========================================
+ Coverage 91.79% 91.8% +<.01%
==========================================
Files 153 153
Lines 49411 49411
==========================================
+ Hits 45359 45361 +2
+ Misses 4052 4050 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this
pandas/core/strings.py
Outdated
@@ -295,20 +295,24 @@ def str_count(arr, pat, flags=0): | |||
|
|||
def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True): | |||
""" | |||
Return boolean Series/``array`` whether given pattern/regex is | |||
contained in each string in the Series/Index. | |||
Test if pattern or regex is contained within each string of a Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used by Index
as well which was removed here - please be sure to add that back
pandas/core/strings.py
Outdated
contained in each string in the Series/Index. | ||
Test if pattern or regex is contained within each string of a Series. | ||
|
||
Return boolean Series based on whether a given pattern or regex is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to comment above, this isn't always true
index = pd.Index(['foo>>> index = pd.Index(['foo', 'bar', 'baz'])
>>> index.str.contains('a')
array([False, True, True])
pandas/core/strings.py
Outdated
regex : bool, default True | ||
If True use re.search, otherwise use Python in operator | ||
If True, assumes the passed-in pattern is a regular expression.\n | ||
If False, treats the pattern as a literal string. | ||
|
||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the Returns section from the below PR as reference
pandas/core/strings.py
Outdated
regex : bool, default True | ||
If True use re.search, otherwise use Python in operator | ||
If True, assumes the passed-in pattern is a regular expression.\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rogue newline?
pandas/core/strings.py
Outdated
@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True): | |||
-------- | |||
match : analogous, but stricter, relying on re.match instead of re.search | |||
|
|||
Examples | |||
-------- | |||
>>> s = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been at least two issues opened on the pandas tracker recently due to a misunderstanding of how this function works - can you update the examples to account for and explain using a pattern of .0
against numbers like 40.0
?
Here are the issues I am referring to:
pandas/core/strings.py
Outdated
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
na : default NaN, fill value for missing values. | ||
re module flags, e.g. re.IGNORECASE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To suppress the warning just rewrite this as "Flags to pass through to the re module, e.g. re.IGNORECASE." or something similar
pandas/core/strings.py
Outdated
4 NaN | ||
dtype: object | ||
|
||
>>> s.str.contains('og', na=False, regex=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples are good but some text to direct the user on what to look at preceding each would be helpful. For example here I would prepend something like:
With the above examples, you'll note that NA values in the caller return `np.nan` by default and as a result cast the dtype of the returned object to `object`. You can control this behavior by specifying a an alternate `na` argument
@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True): | |||
-------- | |||
match : analogous, but stricter, relying on re.match instead of re.search | |||
|
|||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add examples for the case
and flags
parameters?
Hey @WillAyd I've tried to fix your correction. What's the next step? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Blair-Young! |
Thanks @TomAugspurger looking forward to more. |
git diff upstream/master -u -- "*.py" | flake8 --diff
Achieved through PyData 2018 pandas sprint.