Skip to content

Commit 8a8f0ac

Browse files
authored
Merge branch 'master' into groupby-mean-datetimelike
2 parents fbda463 + 0072fa8 commit 8a8f0ac

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1528
-502
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
- [ ] closes #xxxx
22
- [ ] tests added / passed
3-
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#code-standards) for how to run them
3+
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit) for how to run them
44
- [ ] whatsnew entry

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ jobs:
168168
PANDAS_DATA_MANAGER: array
169169
PATTERN: ${{ matrix.pattern }}
170170
PYTEST_WORKERS: "auto"
171+
PYTEST_TARGET: pandas
171172
run: |
172173
source activate pandas-dev
173174
ci/run_tests.sh

.github/workflows/posix.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ jobs:
4444
LC_ALL: ${{ matrix.settings[4] }}
4545
PANDAS_TESTING_MODE: ${{ matrix.settings[5] }}
4646
TEST_ARGS: ${{ matrix.settings[6] }}
47+
PYTEST_TARGET: pandas
4748
concurrency:
4849
group: ${{ github.ref }}-${{ matrix.settings[0] }}
4950
cancel-in-progress: ${{github.event_name == 'pull_request'}}

.github/workflows/python-dev.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ env:
1717
PANDAS_CI: 1
1818
PATTERN: "not slow and not network and not clipboard"
1919
COVERAGE: true
20+
PYTEST_TARGET: pandas
2021

2122
jobs:
2223
build:

asv_bench/benchmarks/sparse.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,20 @@ def time_sparse_series_to_coo_single_level(self, sort_labels):
9191
self.ss_two_lvl.sparse.to_coo(sort_labels=sort_labels)
9292

9393

94+
class ToCooFrame:
95+
def setup(self):
96+
N = 10000
97+
k = 10
98+
arr = np.full((N, k), np.nan)
99+
arr[0, 0] = 3.0
100+
arr[12, 7] = -1.0
101+
arr[0, 9] = 11.2
102+
self.df = pd.DataFrame(arr, dtype=pd.SparseDtype("float"))
103+
104+
def time_to_coo(self):
105+
self.df.sparse.to_coo()
106+
107+
94108
class Arithmetic:
95109

96110
params = ([0.1, 0.01], [0, np.nan])

azure-pipelines.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ pr:
1717

1818
variables:
1919
PYTEST_WORKERS: auto
20+
PYTEST_TARGET: pandas
2021

2122
jobs:
2223
# Mac and Linux use the same template

ci/azure/windows.yml

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,33 @@ jobs:
88
vmImage: ${{ parameters.vmImage }}
99
strategy:
1010
matrix:
11-
py38_np18:
11+
py38_np18_1:
1212
ENV_FILE: ci/deps/azure-windows-38.yaml
1313
CONDA_PY: "38"
1414
PATTERN: "not slow and not network"
1515
PYTEST_WORKERS: 2 # GH-42236
16+
PYTEST_TARGET: "pandas/tests/[a-i]*"
1617

17-
py39:
18+
py38_np18_2:
19+
ENV_FILE: ci/deps/azure-windows-38.yaml
20+
CONDA_PY: "38"
21+
PATTERN: "not slow and not network"
22+
PYTEST_WORKERS: 2 # GH-42236
23+
PYTEST_TARGET: "pandas/tests/[j-z]*"
24+
25+
py39_1:
26+
ENV_FILE: ci/deps/azure-windows-39.yaml
27+
CONDA_PY: "39"
28+
PATTERN: "not slow and not network and not high_memory"
29+
PYTEST_WORKERS: 2 # GH-42236
30+
PYTEST_TARGET: "pandas/tests/[a-i]*"
31+
32+
py39_2:
1833
ENV_FILE: ci/deps/azure-windows-39.yaml
1934
CONDA_PY: "39"
2035
PATTERN: "not slow and not network and not high_memory"
2136
PYTEST_WORKERS: 2 # GH-42236
37+
PYTEST_TARGET: "pandas/tests/[j-z]*"
2238

2339
steps:
2440
- powershell: |
@@ -39,6 +55,7 @@ jobs:
3955
displayName: 'Build'
4056
- bash: |
4157
source activate pandas-dev
58+
wmic.exe cpu get caption, deviceid, name, numberofcores, maxclockspeed
4259
ci/run_tests.sh
4360
displayName: 'Test'
4461
- task: PublishTestResults@2

ci/run_tests.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ if [[ $(uname) == "Linux" && -z $DISPLAY ]]; then
1919
XVFB="xvfb-run "
2020
fi
2121

22-
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE pandas"
22+
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE $PYTEST_TARGET"
2323

2424
if [[ $(uname) != "Linux" && $(uname) != "Darwin" ]]; then
2525
# GH#37455 windows py38 build appears to be running out of memory

doc/source/development/contributing.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,12 @@ can comment::
331331

332332
@github-actions pre-commit
333333

334-
on that pull request. This will trigger a workflow which will autofix formatting errors.
334+
on that pull request. This will trigger a workflow which will autofix formatting
335+
errors.
336+
337+
To automatically fix formatting errors on each commit you make, you can
338+
set up pre-commit yourself. First, create a Python :ref:`environment
339+
<contributing_environment>` and then set up :ref:`pre-commit <contributing.pre-commit>`.
335340

336341
Delete your merged branch (optional)
337342
------------------------------------

doc/source/development/contributing_environment.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,6 @@ compiler installation instructions.
133133

134134
Let us know if you have any difficulties by opening an issue or reaching out on `Gitter <https://gitter.im/pydata/pandas/>`_.
135135

136-
137136
Creating a Python environment
138137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139138

doc/source/ecosystem.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,17 @@ Library Accessor Classes Description
575575
.. _composeml: https://github.com/alteryx/compose
576576
.. _datatest: https://datatest.readthedocs.io/
577577
.. _woodwork: https://github.com/alteryx/woodwork
578+
579+
Development tools
580+
----------------------------
581+
582+
`pandas-stubs <https://github.com/VirtusLab/pandas-stubs>`__
583+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
584+
585+
While pandas repository is partially typed, the package itself doesn't expose this information for external use.
586+
Install pandas-stubs to enable basic type coverage of pandas API.
587+
588+
Learn more by reading through these issues `14468 <https://github.com/pandas-dev/pandas/issues/14468>`_,
589+
`26766 <https://github.com/pandas-dev/pandas/issues/26766>`_, `28142 <https://github.com/pandas-dev/pandas/issues/28142>`_.
590+
591+
See installation and usage instructions on the `github page <https://github.com/VirtusLab/pandas-stubs>`__.

doc/source/user_guide/io.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1208,6 +1208,10 @@ Returning Series
12081208
Using the ``squeeze`` keyword, the parser will return output with a single column
12091209
as a ``Series``:
12101210

1211+
.. deprecated:: 1.4.0
1212+
Users should append ``.squeeze("columns")`` to the DataFrame returned by
1213+
``read_csv`` instead.
1214+
12111215
.. ipython:: python
12121216
:suppress:
12131217
@@ -1217,6 +1221,7 @@ as a ``Series``:
12171221
fh.write(data)
12181222
12191223
.. ipython:: python
1224+
:okwarning:
12201225
12211226
print(open("tmp.csv").read())
12221227

doc/source/user_guide/options.rst

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ and so passing in a substring will work - as long as it is unambiguous:
3838

3939
.. ipython:: python
4040
41-
pd.get_option("display.max_rows")
42-
pd.set_option("display.max_rows", 101)
43-
pd.get_option("display.max_rows")
44-
pd.set_option("max_r", 102)
45-
pd.get_option("display.max_rows")
41+
pd.get_option("display.chop_threshold")
42+
pd.set_option("display.chop_threshold", 2)
43+
pd.get_option("display.chop_threshold")
44+
pd.set_option("chop", 4)
45+
pd.get_option("display.chop_threshold")
4646
4747
4848
The following will **not work** because it matches multiple option names, e.g.
@@ -52,7 +52,7 @@ The following will **not work** because it matches multiple option names, e.g.
5252
:okexcept:
5353
5454
try:
55-
pd.get_option("column")
55+
pd.get_option("max")
5656
except KeyError as e:
5757
print(e)
5858
@@ -153,27 +153,27 @@ lines are replaced by an ellipsis.
153153
.. ipython:: python
154154
155155
df = pd.DataFrame(np.random.randn(7, 2))
156-
pd.set_option("max_rows", 7)
156+
pd.set_option("display.max_rows", 7)
157157
df
158-
pd.set_option("max_rows", 5)
158+
pd.set_option("display.max_rows", 5)
159159
df
160-
pd.reset_option("max_rows")
160+
pd.reset_option("display.max_rows")
161161
162162
Once the ``display.max_rows`` is exceeded, the ``display.min_rows`` options
163163
determines how many rows are shown in the truncated repr.
164164

165165
.. ipython:: python
166166
167-
pd.set_option("max_rows", 8)
168-
pd.set_option("min_rows", 4)
167+
pd.set_option("display.max_rows", 8)
168+
pd.set_option("display.min_rows", 4)
169169
# below max_rows -> all rows shown
170170
df = pd.DataFrame(np.random.randn(7, 2))
171171
df
172172
# above max_rows -> only min_rows (4) rows shown
173173
df = pd.DataFrame(np.random.randn(9, 2))
174174
df
175-
pd.reset_option("max_rows")
176-
pd.reset_option("min_rows")
175+
pd.reset_option("display.max_rows")
176+
pd.reset_option("display.min_rows")
177177
178178
``display.expand_frame_repr`` allows for the representation of
179179
dataframes to stretch across pages, wrapped over the full column vs row-wise.
@@ -193,13 +193,13 @@ dataframes to stretch across pages, wrapped over the full column vs row-wise.
193193
.. ipython:: python
194194
195195
df = pd.DataFrame(np.random.randn(10, 10))
196-
pd.set_option("max_rows", 5)
196+
pd.set_option("display.max_rows", 5)
197197
pd.set_option("large_repr", "truncate")
198198
df
199199
pd.set_option("large_repr", "info")
200200
df
201201
pd.reset_option("large_repr")
202-
pd.reset_option("max_rows")
202+
pd.reset_option("display.max_rows")
203203
204204
``display.max_colwidth`` sets the maximum width of columns. Cells
205205
of this length or longer will be truncated with an ellipsis.
@@ -491,6 +491,10 @@ styler.render.repr html Standard output format for
491491
Should be one of "html" or "latex".
492492
styler.render.max_elements 262144 Maximum number of datapoints that Styler will render
493493
trimming either rows, columns or both to fit.
494+
styler.render.max_rows None Maximum number of rows that Styler will render. By default
495+
this is dynamic based on ``max_elements``.
496+
styler.render.max_columns None Maximum number of columns that Styler will render. By default
497+
this is dynamic based on ``max_elements``.
494498
styler.render.encoding utf-8 Default encoding for output HTML or LaTeX files.
495499
styler.format.formatter None Object to specify formatting functions to ``Styler.format``.
496500
styler.format.na_rep None String representation for missing data.

doc/source/whatsnew/v1.3.3.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,13 @@ Fixed regressions
2424
- Fixed regression in :meth:`read_parquet` where the ``fastparquet`` engine would not work properly with fastparquet 0.7.0 (:issue:`43075`)
2525
- Fixed regression in :meth:`DataFrame.loc.__setitem__` raising ``ValueError`` when setting array as cell value (:issue:`43422`)
2626
- Fixed regression in :func:`is_list_like` where objects with ``__iter__`` set to ``None`` would be identified as iterable (:issue:`43373`)
27+
- Fixed regression in :meth:`DataFrame.__getitem__` raising error for slice of :class:`DatetimeIndex` when index is non monotonic (:issue:`43223`)
2728
- Fixed regression in :meth:`.Resampler.aggregate` when used after column selection would raise if ``func`` is a list of aggregation functions (:issue:`42905`)
2829
- Fixed regression in :meth:`DataFrame.corr` where Kendall correlation would produce incorrect results for columns with repeated values (:issue:`43401`)
30+
- Fixed regression in :meth:`DataFrame.groupby` where aggregation on columns with object types dropped results on those columns (:issue:`42395`, :issue:`43108`)
31+
- Fixed regression in :meth:`Series.fillna` raising ``TypeError`` when filling ``float`` ``Series`` with list-like fill value having a dtype which couldn't cast lostlessly (like ``float32`` filled with ``float64``) (:issue:`43424`)
32+
- Fixed regression in :func:`read_csv` throwing an ``AttributeError`` when the file handle is an ``tempfile.SpooledTemporaryFile`` object (:issue:`43439`)
33+
-
2934

3035
.. ---------------------------------------------------------------------------
3136

doc/source/whatsnew/v1.4.0.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Styler
7575
- Styling of indexing has been added, with :meth:`.Styler.apply_index` and :meth:`.Styler.applymap_index`. These mirror the signature of the methods already used to style data values, and work with both HTML and LaTeX format (:issue:`41893`).
7676
- :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`).
7777
- :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`).
78-
- :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption`` (:issue:`41946`, :issue:`43149`).
78+
- :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`).
7979
- Keyword arguments ``level`` and ``names`` added to :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` for additional control of visibility of MultiIndexes and index names (:issue:`25475`, :issue:`43404`, :issue:`43346`)
8080
- Global options have been extended to configure default ``Styler`` properties including formatting and encoding and mathjax options and LaTeX (:issue:`41395`)
8181
- Naive sparsification is now possible for LaTeX without the multirow package (:issue:`43369`)
@@ -104,9 +104,11 @@ Other enhancements
104104
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
105105
- :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` now support the argument ``skipna`` (:issue:`34047`)
106106
- :meth:`read_table` now supports the argument ``storage_options`` (:issue:`39167`)
107+
- :meth:`DataFrame.to_stata` and :meth:`StataWriter` now accept the keyword only argument ``value_labels`` to save labels for non-categorical columns
107108
- Methods that relied on hashmap based algos such as :meth:`DataFrameGroupBy.value_counts`, :meth:`DataFrameGroupBy.count` and :func:`factorize` ignored imaginary component for complex numbers (:issue:`17927`)
108109
- Add :meth:`Series.str.removeprefix` and :meth:`Series.str.removesuffix` introduced in Python 3.9 to remove pre-/suffixes from string-type :class:`Series` (:issue:`36944`)
109110
- :meth:`.GroupBy.mean` now supports `NaT` values (:issue:`43132`)
111+
110112
.. ---------------------------------------------------------------------------
111113
112114
.. _whatsnew_140.notable_bug_fixes:
@@ -277,6 +279,8 @@ Other Deprecations
277279
- Deprecated :meth:`Index.reindex` with a non-unique index (:issue:`42568`)
278280
- Deprecated :meth:`.Styler.render` in favour of :meth:`.Styler.to_html` (:issue:`42140`)
279281
- Deprecated passing in a string column label into ``times`` in :meth:`DataFrame.ewm` (:issue:`43265`)
282+
- Deprecated the 'include_start' and 'include_end' arguments in :meth:`DataFrame.between_time`; in a future version passing 'include_start' or 'include_end' will raise (:issue:`40245`)
283+
- Deprecated the ``squeeze`` argument to :meth:`read_csv`, :meth:`read_table`, and :meth:`read_excel`. Users should squeeze the DataFrame afterwards with ``.squeeze("columns")`` instead. (:issue:`43242`)
280284

281285
.. ---------------------------------------------------------------------------
282286
@@ -294,6 +298,8 @@ Performance improvements
294298
- Performance improvement in :meth:`to_datetime` with ``uint`` dtypes (:issue:`42606`)
295299
- Performance improvement in :meth:`Series.sparse.to_coo` (:issue:`42880`)
296300
- Performance improvement in indexing with a :class:`MultiIndex` indexer on another :class:`MultiIndex` (:issue:43370`)
301+
- Performance improvement in :meth:`GroupBy.quantile` (:issue:`43469`)
302+
-
297303

298304
.. ---------------------------------------------------------------------------
299305
@@ -367,6 +373,7 @@ Indexing
367373
Missing
368374
^^^^^^^
369375
- Bug in :meth:`DataFrame.fillna` with limit and no method ignores axis='columns' or ``axis = 1`` (:issue:`40989`)
376+
- Bug in :meth:`DataFrame.fillna` not replacing missing values when using a dict-like ``value`` and duplicate column names (:issue:`43476`)
370377
-
371378

372379
MultiIndex
@@ -424,6 +431,7 @@ Reshaping
424431

425432
Sparse
426433
^^^^^^
434+
- Bug in :meth:`DataFrame.sparse.to_coo` raising ``AttributeError`` when column names are not unique (:issue:`29564`)
427435
-
428436
-
429437

@@ -440,7 +448,7 @@ Styler
440448
- Bug in :meth:`Styler.apply` where functions which returned Series objects were not correctly handled in terms of aligning their index labels (:issue:`13657`, :issue:`42014`)
441449
- Bug when rendering an empty DataFrame with a named index (:issue:`43305`).
442450
- Bug when rendering a single level MultiIndex (:issue:`43383`).
443-
- Bug when combining non-sparse rendering and :meth:`.Styler.hide_columns` (:issue:`43464`)
451+
- Bug when combining non-sparse rendering and :meth:`.Styler.hide_columns` or :meth:`.Styler.hide_index` (:issue:`43464`)
444452

445453
Other
446454
^^^^^

pandas/_libs/groupby.pyi

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,11 @@ def group_ohlc(
8585
min_count: int = ...,
8686
) -> None: ...
8787
def group_quantile(
88-
out: np.ndarray, # ndarray[float64_t]
88+
out: np.ndarray, # ndarray[float64_t, ndim=2]
8989
values: np.ndarray, # ndarray[numeric, ndim=1]
9090
labels: np.ndarray, # ndarray[int64_t]
9191
mask: np.ndarray, # ndarray[uint8_t]
92-
q: float, # float64_t
92+
qs: np.ndarray, # const float64_t[:]
9393
interpolation: Literal["linear", "lower", "higher", "nearest", "midpoint"],
9494
) -> None: ...
9595
def group_last(

0 commit comments

Comments
 (0)