Skip to content

Commit 3afd771

Browse files
committed
Merge branch 'master' into PR_TOOL_MERGE_PR_20051
2 parents d6f4c8e + 7c14e4f commit 3afd771

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+3515
-2873
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1+
Checklist for the pandas documentation sprint (ignore this if you are doing
2+
an unrelated PR):
3+
4+
- [ ] PR title is "DOC: update the <your-function-or-method> docstring"
5+
- [ ] The validation script passes: `scripts/validate_docstrings.py <your-function-or-method>`
6+
- [ ] The PEP8 style check passes: `git diff upstream/master -u -- "*.py" | flake8 --diff`
7+
- [ ] The html version looks good: `python doc/make.py --single <your-function-or-method>`
8+
- [ ] It has been proofread on language by another sprint participant
9+
10+
Please include the output of the validation script below between the "```" ticks:
11+
12+
```
13+
# paste output of "scripts/validate_docstrings.py <your-function-or-method>" here
14+
# between the "```" (remove this comment, but keep the "```")
15+
16+
```
17+
18+
If the validation script still gives errors, but you think there is a good reason
19+
to deviate in this case (and there are certainly such cases), please state this
20+
explicitly.
21+
22+
23+
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
24+
125
- [ ] closes #xxxx
226
- [ ] tests added / passed
327
- [ ] passes `git diff upstream/master -u -- "*.py" | flake8 --diff`

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,9 @@ scikits
8888
*.c
8989
*.cpp
9090

91-
# Performance Testing #
92-
#######################
91+
# Unit / Performance Testing #
92+
##############################
93+
.pytest_cache/
9394
asv_bench/env/
9495
asv_bench/html/
9596
asv_bench/results/

ci/environment-dev.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ channels:
55
dependencies:
66
- Cython
77
- NumPy
8+
- flake8
89
- moto
910
- pytest>=3.1
1011
- python-dateutil>=2.5.0

ci/requirements-3.6_DOC.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sphinx
55
nbconvert
66
nbformat
77
notebook
8-
matplotlib
8+
matplotlib=2.1*
99
seaborn
1010
scipy
1111
lxml

ci/requirements_dev.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
# Do not modify directly
33
Cython
44
NumPy
5+
flake8
56
moto
67
pytest>=3.1
78
python-dateutil>=2.5.0
89
pytz
910
setuptools>=3.3
10-
sphinx
11+
sphinx

doc/source/categorical.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ are consistent among all columns.
177177
.. note::
178178

179179
To perform table-wise conversion, where all labels in the entire ``DataFrame`` are used as
180-
categories for each column, the ``categories`` parameter can be determined programatically by
180+
categories for each column, the ``categories`` parameter can be determined programmatically by
181181
``categories = pd.unique(df.values.ravel())``.
182182

183183
If you already have ``codes`` and ``categories``, you can use the

doc/source/comparison_with_sas.rst

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ As is customary, we import pandas and NumPy as follows:
2525
This is often used in interactive work (e.g. `Jupyter notebook
2626
<https://jupyter.org/>`_ or terminal) - the equivalent in SAS would be:
2727

28-
.. code-block:: none
28+
.. code-block:: sas
2929
3030
proc print data=df(obs=5);
3131
run;
@@ -65,7 +65,7 @@ in the ``DATA`` step.
6565

6666
Every ``DataFrame`` and ``Series`` has an ``Index`` - which are labels on the
6767
*rows* of the data. SAS does not have an exactly analogous concept. A data set's
68-
row are essentially unlabeled, other than an implicit integer index that can be
68+
rows are essentially unlabeled, other than an implicit integer index that can be
6969
accessed during the ``DATA`` step (``_N_``).
7070

7171
In pandas, if no index is specified, an integer index is also used by default
@@ -87,7 +87,7 @@ A SAS data set can be built from specified values by
8787
placing the data after a ``datalines`` statement and
8888
specifying the column names.
8989

90-
.. code-block:: none
90+
.. code-block:: sas
9191
9292
data df;
9393
input x y;
@@ -121,7 +121,7 @@ will be used in many of the following examples.
121121

122122
SAS provides ``PROC IMPORT`` to read csv data into a data set.
123123

124-
.. code-block:: none
124+
.. code-block:: sas
125125
126126
proc import datafile='tips.csv' dbms=csv out=tips replace;
127127
getnames=yes;
@@ -156,7 +156,7 @@ Exporting Data
156156

157157
The inverse of ``PROC IMPORT`` in SAS is ``PROC EXPORT``
158158

159-
.. code-block:: none
159+
.. code-block:: sas
160160
161161
proc export data=tips outfile='tips2.csv' dbms=csv;
162162
run;
@@ -178,7 +178,7 @@ Operations on Columns
178178
In the ``DATA`` step, arbitrary math expressions can
179179
be used on new or existing columns.
180180

181-
.. code-block:: none
181+
.. code-block:: sas
182182
183183
data tips;
184184
set tips;
@@ -207,7 +207,7 @@ Filtering
207207
Filtering in SAS is done with an ``if`` or ``where`` statement, on one
208208
or more columns.
209209

210-
.. code-block:: none
210+
.. code-block:: sas
211211
212212
data tips;
213213
set tips;
@@ -233,7 +233,7 @@ If/Then Logic
233233

234234
In SAS, if/then logic can be used to create new columns.
235235

236-
.. code-block:: none
236+
.. code-block:: sas
237237
238238
data tips;
239239
set tips;
@@ -262,7 +262,7 @@ Date Functionality
262262
SAS provides a variety of functions to do operations on
263263
date/datetime columns.
264264

265-
.. code-block:: none
265+
.. code-block:: sas
266266
267267
data tips;
268268
set tips;
@@ -307,7 +307,7 @@ Selection of Columns
307307
SAS provides keywords in the ``DATA`` step to select,
308308
drop, and rename columns.
309309

310-
.. code-block:: none
310+
.. code-block:: sas
311311
312312
data tips;
313313
set tips;
@@ -343,7 +343,7 @@ Sorting by Values
343343

344344
Sorting in SAS is accomplished via ``PROC SORT``
345345

346-
.. code-block:: none
346+
.. code-block:: sas
347347
348348
proc sort data=tips;
349349
by sex total_bill;
@@ -369,7 +369,7 @@ SAS determines the length of a character string with the
369369
and `LENGTHC <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002283942.htm>`__
370370
functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailing blanks.
371371

372-
.. code-block:: none
372+
.. code-block:: sas
373373
374374
data _null_;
375375
set tips;
@@ -395,7 +395,7 @@ SAS determines the position of a character in a string with the
395395
``FINDW`` takes the string defined by the first argument and searches for the first position of the substring
396396
you supply as the second argument.
397397

398-
.. code-block:: none
398+
.. code-block:: sas
399399
400400
data _null_;
401401
set tips;
@@ -419,7 +419,7 @@ Substring
419419
SAS extracts a substring from a string based on its position with the
420420
`SUBSTR <http://www2.sas.com/proceedings/sugi25/25/cc/25p088.pdf>`__ function.
421421

422-
.. code-block:: none
422+
.. code-block:: sas
423423
424424
data _null_;
425425
set tips;
@@ -442,7 +442,7 @@ The SAS `SCAN <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/def
442442
function returns the nth word from a string. The first argument is the string you want to parse and the
443443
second argument specifies which word you want to extract.
444444

445-
.. code-block:: none
445+
.. code-block:: sas
446446
447447
data firstlast;
448448
input String $60.;
@@ -474,7 +474,7 @@ The SAS `UPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/d
474474
`PROPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a002598106.htm>`__
475475
functions change the case of the argument.
476476

477-
.. code-block:: none
477+
.. code-block:: sas
478478
479479
data firstlast;
480480
input String $60.;
@@ -516,7 +516,7 @@ types of joins are accomplished using the ``in=`` dummy
516516
variables to track whether a match was found in one or both
517517
input frames.
518518

519-
.. code-block:: none
519+
.. code-block:: sas
520520
521521
proc sort data=df1;
522522
by key;
@@ -572,7 +572,7 @@ operations, and is ignored by default for aggregations.
572572
One difference is that missing data cannot be compared to its sentinel value.
573573
For example, in SAS you could do this to filter missing values.
574574

575-
.. code-block:: none
575+
.. code-block:: sas
576576
577577
data outer_join_nulls;
578578
set outer_join;
@@ -615,7 +615,7 @@ SAS's PROC SUMMARY can be used to group by one or
615615
more key variables and compute aggregations on
616616
numeric columns.
617617

618-
.. code-block:: none
618+
.. code-block:: sas
619619
620620
proc summary data=tips nway;
621621
class sex smoker;
@@ -640,7 +640,7 @@ In SAS, if the group aggregations need to be used with
640640
the original frame, it must be merged back together. For
641641
example, to subtract the mean for each observation by smoker group.
642642

643-
.. code-block:: none
643+
.. code-block:: sas
644644
645645
proc summary data=tips missing nway;
646646
class smoker;
@@ -679,7 +679,7 @@ replicate most other by group processing from SAS. For example,
679679
this ``DATA`` step reads the data by sex/smoker group and filters to
680680
the first entry for each.
681681

682-
.. code-block:: none
682+
.. code-block:: sas
683683
684684
proc sort data=tips;
685685
by sex smoker;
@@ -719,7 +719,7 @@ Data Interop
719719
pandas provides a :func:`read_sas` method that can read SAS data saved in
720720
the XPORT or SAS7BDAT binary format.
721721

722-
.. code-block:: none
722+
.. code-block:: sas
723723
724724
libname xportout xport 'transport-file.xpt';
725725
data xportout.tips;

doc/source/conf.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
'ipython_sphinxext.ipython_console_highlighting',
6464
# lowercase didn't work
6565
'IPython.sphinxext.ipython_console_highlighting',
66+
'matplotlib.sphinxext.plot_directive',
6667
'sphinx.ext.intersphinx',
6768
'sphinx.ext.coverage',
6869
'sphinx.ext.mathjax',
@@ -85,6 +86,14 @@
8586
if any(re.match("\s*api\s*", l) for l in index_rst_lines):
8687
autosummary_generate = True
8788

89+
# matplotlib plot directive
90+
plot_include_source = True
91+
plot_formats = [("png", 90)]
92+
plot_html_show_formats = False
93+
plot_html_show_source_link = False
94+
plot_pre_code = """import numpy as np
95+
import pandas as pd"""
96+
8897
# Add any paths that contain templates here, relative to this directory.
8998
templates_path = ['../_templates']
9099

doc/source/contributing.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -351,8 +351,10 @@ Some other important things to know about the docs:
351351

352352
pandoc doc/source/contributing.rst -t markdown_github > CONTRIBUTING.md
353353

354-
The utility script ``scripts/api_rst_coverage.py`` can be used to compare
355-
the list of methods documented in ``doc/source/api.rst`` (which is used to generate
354+
The utility script ``scripts/validate_docstrings.py`` can be used to get a csv
355+
summary of the API documentation. And also validate common errors in the docstring
356+
of a specific class, function or method. The summary also compares the list of
357+
methods documented in ``doc/source/api.rst`` (which is used to generate
356358
the `API Reference <http://pandas.pydata.org/pandas-docs/stable/api.html>`_ page)
357359
and the actual public methods.
358360
This will identify methods documented in ``doc/source/api.rst`` that are not actually

doc/source/io.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4711,6 +4711,12 @@ writes ``data`` to the database in batches of 1000 rows at a time:
47114711
47124712
data.to_sql('data_chunked', engine, chunksize=1000)
47134713
4714+
.. note::
4715+
4716+
The function :func:`~pandas.DataFrame.to_sql` will perform a multivalue
4717+
insert if the engine dialect ``supports_multivalues_insert``. This will
4718+
greatly speed up the insert in some cases.
4719+
47144720
SQL data types
47154721
++++++++++++++
47164722

doc/source/whatsnew/v0.23.0.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,8 +338,11 @@ Other Enhancements
338338
- For subclassed ``DataFrames``, :func:`DataFrame.apply` will now preserve the ``Series`` subclass (if defined) when passing the data to the applied function (:issue:`19822`)
339339
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
340340
- Added option ``display.html.use_mathjax`` so `MathJax <https://www.mathjax.org/>`_ can be disabled when rendering tables in ``Jupyter`` notebooks (:issue:`19856`, :issue:`19824`)
341+
- :func:`DataFrame.replace` now supports the ``method`` parameter, which can be used to specify the replacement method when ``to_replace`` is a scalar, list or tuple and ``value`` is ``None`` (:issue:`19632`)
341342
- :meth:`Timestamp.month_name`, :meth:`DatetimeIndex.month_name`, and :meth:`Series.dt.month_name` are now available (:issue:`12805`)
342343
- :meth:`Timestamp.day_name` and :meth:`DatetimeIndex.day_name` are now available to return day names with a specified locale (:issue:`12806`)
344+
- :meth:`DataFrame.to_sql` now performs a multivalue insert if the underlying connection supports itk rather than inserting row by row.
345+
``SQLAlchemy`` dialects supporting multivalue inserts include: ``mysql``, ``postgresql``, ``sqlite`` and any dialect with ``supports_multivalues_insert``. (:issue:`14315`, :issue:`8953`)
343346

344347
.. _whatsnew_0230.api_breaking:
345348

@@ -904,6 +907,7 @@ Offsets
904907

905908
Numeric
906909
^^^^^^^
910+
- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
907911
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)
908912
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
909913
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
@@ -1015,6 +1019,7 @@ Reshaping
10151019
- Bug in :func:`DataFrame.iterrows`, which would infers strings not compliant to `ISO8601 <https://en.wikipedia.org/wiki/ISO_8601>`_ to datetimes (:issue:`19671`)
10161020
- Bug in :class:`Series` constructor with ``Categorical`` where a ```ValueError`` is not raised when an index of different length is given (:issue:`19342`)
10171021
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
1022+
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
10181023

10191024
Other
10201025
^^^^^

pandas/_libs/algos_rank_helper.pxi.in

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,10 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
213213
sum_ranks = dups = 0
214214
{{endif}}
215215
if pct:
216-
return ranks / count
216+
if tiebreak == TIEBREAK_DENSE:
217+
return ranks / total_tie_count
218+
else:
219+
return ranks / count
217220
else:
218221
return ranks
219222

@@ -385,7 +388,10 @@ def rank_2d_{{dtype}}(object in_arr, axis=0, ties_method='average',
385388
ranks[i, argsorted[i, z]] = total_tie_count
386389
sum_ranks = dups = 0
387390
if pct:
388-
ranks[i, :] /= count
391+
if tiebreak == TIEBREAK_DENSE:
392+
ranks[i, :] /= total_tie_count
393+
else:
394+
ranks[i, :] /= count
389395
if axis == 0:
390396
return ranks.T
391397
else:

pandas/core/apply.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ def apply_broadcast(self, target):
191191

192192
for i, col in enumerate(target.columns):
193193
res = self.f(target[col])
194-
ares = np. asarray(res).ndim
194+
ares = np.asarray(res).ndim
195195

196196
# must be a scalar or 1d
197197
if ares > 1:

0 commit comments

Comments
 (0)