Skip to content

Commit f94a0bf

Browse files
committed
Merge remote-tracking branch 'upstream/main' into read-csv-from-directory
2 parents 3173270 + e2bd8e6 commit f94a0bf

File tree

12 files changed

+180
-28
lines changed

12 files changed

+180
-28
lines changed

.github/workflows/wheels.yml

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ jobs:
9999
# Note: M1 images on Github Actions start from macOS 14
100100
- [macos-14, macosx_arm64]
101101
- [windows-2022, win_amd64]
102+
- [windows-11-arm, win_arm64]
102103
# TODO: support PyPy?
103104
python: [["cp310", "3.10"], ["cp311", "3.11"], ["cp312", "3.12"], ["cp313", "3.13"], ["cp313t", "3.13"]]
104105
include:
@@ -108,6 +109,12 @@ jobs:
108109
- buildplat: [ubuntu-24.04, pyodide_wasm32]
109110
python: ["cp312", "3.12"]
110111
cibw_build_frontend: 'build'
112+
exclude:
113+
- buildplat: [windows-11-arm, win_arm64]
114+
python: ["cp310", "3.10"]
115+
# BackendUnavailable: Cannot import 'mesonpy'
116+
- buildplat: [windows-11-arm, win_arm64]
117+
python: ["cp313t", "3.13"]
111118

112119
env:
113120
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
@@ -118,6 +125,12 @@ jobs:
118125
with:
119126
fetch-depth: 0
120127

128+
- name: Set up MSVC environment for ARM64
129+
if: matrix.buildplat[1] == 'win_arm64'
130+
uses: ilammy/msvc-dev-cmd@v1
131+
with:
132+
arch: arm64
133+
121134
# TODO: Build wheels from sdist again
122135
# There's some sort of weird race condition?
123136
# within Github that makes the sdist be missing files
@@ -155,9 +168,13 @@ jobs:
155168
env:
156169
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
157170
CIBW_BUILD_FRONTEND: ${{ matrix.cibw_build_frontend || 'pip' }}
158-
CIBW_PLATFORM: ${{ matrix.buildplat[1] == 'pyodide_wasm32' && 'pyodide' || 'auto' }}
171+
CIBW_PLATFORM: ${{ (matrix.buildplat[1] == 'pyodide_wasm32' && 'pyodide') || (matrix.buildplat[1] == 'win_arm64' && 'windows') || 'auto' }}
172+
CIBW_ARCHS: ${{ matrix.buildplat[1] == 'win_arm64' && 'ARM64' || 'auto' }}
173+
CIBW_BEFORE_BUILD_WINDOWS: 'python -m pip install delvewheel'
159174

160-
- name: Set up Python
175+
- name: Set up Python for validation/upload (non-ARM64 Windows & other OS)
176+
# micromamba is not available for ARM64 Windows
177+
if: matrix.buildplat[1] != 'win_arm64'
161178
uses: mamba-org/setup-micromamba@v2
162179
with:
163180
environment-name: wheel-env
@@ -170,6 +187,12 @@ jobs:
170187
cache-downloads: true
171188
cache-environment: true
172189

190+
- name: Install wheel for win_arm64
191+
# installing wheel here because micromamba step was skipped
192+
if: matrix.buildplat[1] == 'win_arm64'
193+
shell: bash -el {0}
194+
run: python -m pip install wheel
195+
173196
- name: Validate wheel RECORD
174197
shell: bash -el {0}
175198
run: for whl in $(ls wheelhouse); do wheel unpack wheelhouse/$whl -d /tmp; done

doc/source/user_guide/10min.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,12 +178,26 @@ Getitem (``[]``)
178178
~~~~~~~~~~~~~~~~
179179

180180
For a :class:`DataFrame`, passing a single label selects a column and
181-
yields a :class:`Series` equivalent to ``df.A``:
181+
yields a :class:`Series`:
182182

183183
.. ipython:: python
184184
185185
df["A"]
186186
187+
If the label only contains letters, numbers, and underscores, you can
188+
alternatively use the column name attribute:
189+
190+
.. ipython:: python
191+
192+
df.A
193+
194+
Passing a list of column labels selects multiple columns, which can be useful
195+
for getting a subset/rearranging:
196+
197+
.. ipython:: python
198+
199+
df[["B", "A"]]
200+
187201
For a :class:`DataFrame`, passing a slice ``:`` selects matching rows:
188202

189203
.. ipython:: python

doc/source/user_guide/missing_data.rst

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -258,9 +258,6 @@ will convert your data to use the nullable data types supporting :class:`NA`,
258258
such as :class:`Int64Dtype` or :class:`ArrowDtype`. This is especially helpful after reading
259259
in data sets from IO methods where data types were inferred.
260260

261-
In this example, while the dtypes of all columns are changed, we show the results for
262-
the first 10 columns.
263-
264261
.. ipython:: python
265262
266263
import io
@@ -434,7 +431,7 @@ where the index and column aligns between the original object and the filled obj
434431
435432
.. note::
436433

437-
:meth:`DataFrame.where` can also be used to fill NA values.Same result as above.
434+
:meth:`DataFrame.where` can also be used to fill NA values. Same result as above.
438435

439436
.. ipython:: python
440437

doc/source/whatsnew/v2.3.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ Timezones
120120
Numeric
121121
^^^^^^^
122122
- Enabled :class:`Series.mode` and :class:`DataFrame.mode` with ``dropna=False`` to sort the result for all dtypes in the presence of NA values; previously only certain dtypes would sort (:issue:`60702`)
123+
- Bug in :meth:`Series.round` on object columns no longer raises ``TypeError``
123124
-
124125

125126
Conversion

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -775,6 +775,7 @@ I/O
775775
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
776776
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
777777
- Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
778+
- Bug in :meth:`HDFStore.select` causing queries on categorical string columns to return unexpected results (:issue:`57608`)
778779
- Bug in :meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
779780
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
780781
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)

pandas/core/computation/pytables.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,8 @@ def stringify(value):
239239
if conv_val not in metadata:
240240
result = -1
241241
else:
242-
result = metadata.searchsorted(conv_val, side="left")
242+
# Find the index of the first match of conv_val in metadata
243+
result = np.flatnonzero(metadata == conv_val)[0]
243244
return TermValue(result, result, "integer")
244245
elif kind == "integer":
245246
try:

pandas/core/frame.py

Lines changed: 92 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4481,18 +4481,58 @@ def _get_item(self, item: Hashable) -> Series:
44814481

44824482
@overload
44834483
def query(
4484-
self, expr: str, *, inplace: Literal[False] = ..., **kwargs
4484+
self,
4485+
expr: str,
4486+
*,
4487+
parser: Literal["pandas", "python"] = ...,
4488+
engine: Literal["python", "numexpr"] | None = ...,
4489+
local_dict: dict[str, Any] | None = ...,
4490+
global_dict: dict[str, Any] | None = ...,
4491+
resolvers: list[Mapping] | None = ...,
4492+
level: int = ...,
4493+
inplace: Literal[False] = ...,
44854494
) -> DataFrame: ...
44864495

44874496
@overload
4488-
def query(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...
4497+
def query(
4498+
self,
4499+
expr: str,
4500+
*,
4501+
parser: Literal["pandas", "python"] = ...,
4502+
engine: Literal["python", "numexpr"] | None = ...,
4503+
local_dict: dict[str, Any] | None = ...,
4504+
global_dict: dict[str, Any] | None = ...,
4505+
resolvers: list[Mapping] | None = ...,
4506+
level: int = ...,
4507+
inplace: Literal[True],
4508+
) -> None: ...
44894509

44904510
@overload
44914511
def query(
4492-
self, expr: str, *, inplace: bool = ..., **kwargs
4512+
self,
4513+
expr: str,
4514+
*,
4515+
parser: Literal["pandas", "python"] = ...,
4516+
engine: Literal["python", "numexpr"] | None = ...,
4517+
local_dict: dict[str, Any] | None = ...,
4518+
global_dict: dict[str, Any] | None = ...,
4519+
resolvers: list[Mapping] | None = ...,
4520+
level: int = ...,
4521+
inplace: bool = ...,
44934522
) -> DataFrame | None: ...
44944523

4495-
def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | None:
4524+
def query(
4525+
self,
4526+
expr: str,
4527+
*,
4528+
parser: Literal["pandas", "python"] = "pandas",
4529+
engine: Literal["python", "numexpr"] | None = None,
4530+
local_dict: dict[str, Any] | None = None,
4531+
global_dict: dict[str, Any] | None = None,
4532+
resolvers: list[Mapping] | None = None,
4533+
level: int = 0,
4534+
inplace: bool = False,
4535+
) -> DataFrame | None:
44964536
"""
44974537
Query the columns of a DataFrame with a boolean expression.
44984538
@@ -4511,11 +4551,41 @@ def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | No
45114551
45124552
See the documentation for :meth:`DataFrame.eval` for details on
45134553
referring to column names and variables in the query string.
4554+
parser : {'pandas', 'python'}, default 'pandas'
4555+
The parser to use to construct the syntax tree from the expression. The
4556+
default of ``'pandas'`` parses code slightly different than standard
4557+
Python. Alternatively, you can parse an expression using the
4558+
``'python'`` parser to retain strict Python semantics. See the
4559+
:ref:`enhancing performance <enhancingperf.eval>` documentation for
4560+
more details.
4561+
engine : {'python', 'numexpr'}, default 'numexpr'
4562+
4563+
The engine used to evaluate the expression. Supported engines are
4564+
4565+
- None : tries to use ``numexpr``, falls back to ``python``
4566+
- ``'numexpr'`` : This default engine evaluates pandas objects using
4567+
numexpr for large speed ups in complex expressions with large frames.
4568+
- ``'python'`` : Performs operations as if you had ``eval``'d in top
4569+
level python. This engine is generally not that useful.
4570+
4571+
More backends may be available in the future.
4572+
local_dict : dict or None, optional
4573+
A dictionary of local variables, taken from locals() by default.
4574+
global_dict : dict or None, optional
4575+
A dictionary of global variables, taken from globals() by default.
4576+
resolvers : list of dict-like or None, optional
4577+
A list of objects implementing the ``__getitem__`` special method that
4578+
you can use to inject an additional collection of namespaces to use for
4579+
variable lookup. For example, this is used in the
4580+
:meth:`~DataFrame.query` method to inject the
4581+
``DataFrame.index`` and ``DataFrame.columns``
4582+
variables that refer to their respective :class:`~pandas.DataFrame`
4583+
instance attributes.
4584+
level : int, optional
4585+
The number of prior stack frames to traverse and add to the current
4586+
scope. Most users will **not** need to change this parameter.
45144587
inplace : bool
45154588
Whether to modify the DataFrame rather than creating a new one.
4516-
**kwargs
4517-
See the documentation for :func:`eval` for complete details
4518-
on the keyword arguments accepted by :meth:`DataFrame.query`.
45194589
45204590
Returns
45214591
-------
@@ -4628,10 +4698,17 @@ def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | No
46284698
if not isinstance(expr, str):
46294699
msg = f"expr must be a string to be evaluated, {type(expr)} given"
46304700
raise ValueError(msg)
4631-
kwargs["level"] = kwargs.pop("level", 0) + 1
4632-
kwargs["target"] = None
46334701

4634-
res = self.eval(expr, **kwargs)
4702+
res = self.eval(
4703+
expr,
4704+
level=level + 1,
4705+
parser=parser,
4706+
target=None,
4707+
engine=engine,
4708+
local_dict=local_dict,
4709+
global_dict=global_dict,
4710+
resolvers=resolvers or (),
4711+
)
46354712

46364713
try:
46374714
result = self.loc[res]
@@ -9181,11 +9258,11 @@ def groupby(
91819258
91829259
Parameters
91839260
----------%s
9184-
columns : str or object or a list of str
9261+
columns : Hashable or a sequence of the previous
91859262
Column to use to make new frame's columns.
9186-
index : str or object or a list of str, optional
9263+
index : Hashable or a sequence of the previous, optional
91879264
Column to use to make new frame's index. If not given, uses existing index.
9188-
values : str, object or a list of the previous, optional
9265+
values : Hashable or a sequence of the previous, optional
91899266
Column(s) to use for populating new frame's values. If not
91909267
specified, all remaining columns will be used and the result will
91919268
have hierarchically indexed columns.
@@ -9324,12 +9401,12 @@ def pivot(
93249401
----------%s
93259402
values : list-like or scalar, optional
93269403
Column or columns to aggregate.
9327-
index : column, Grouper, array, or list of the previous
9404+
index : column, Grouper, array, or sequence of the previous
93289405
Keys to group by on the pivot table index. If a list is passed,
93299406
it can contain any of the other types (except list). If an array is
93309407
passed, it must be the same length as the data and will be used in
93319408
the same manner as column values.
9332-
columns : column, Grouper, array, or list of the previous
9409+
columns : column, Grouper, array, or sequence of the previous
93339410
Keys to group by on the pivot table column. If a list is passed,
93349411
it can contain any of the other types (except list). If an array is
93359412
passed, it must be the same length as the data and will be used in

pandas/core/indexes/base.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1731,10 +1731,16 @@ def name(self) -> Hashable:
17311731
"""
17321732
Return Index or MultiIndex name.
17331733
1734+
Returns
1735+
-------
1736+
label (hashable object)
1737+
The name of the Index.
1738+
17341739
See Also
17351740
--------
17361741
Index.set_names: Able to set new names partially and by level.
17371742
Index.rename: Able to set new names partially and by level.
1743+
Series.name: Corresponding Series property.
17381744
17391745
Examples
17401746
--------

pandas/core/reshape/pivot.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,12 @@ def pivot_table(
7676
Input pandas DataFrame object.
7777
values : list-like or scalar, optional
7878
Column or columns to aggregate.
79-
index : column, Grouper, array, or list of the previous
79+
index : column, Grouper, array, or sequence of the previous
8080
Keys to group by on the pivot table index. If a list is passed,
8181
it can contain any of the other types (except list). If an array is
8282
passed, it must be the same length as the data and will be used in
8383
the same manner as column values.
84-
columns : column, Grouper, array, or list of the previous
84+
columns : column, Grouper, array, or sequence of the previous
8585
Keys to group by on the pivot table column. If a list is passed,
8686
it can contain any of the other types (except list). If an array is
8787
passed, it must be the same length as the data and will be used in
@@ -708,11 +708,11 @@ def pivot(
708708
----------
709709
data : DataFrame
710710
Input pandas DataFrame object.
711-
columns : str or object or a list of str
711+
columns : Hashable or a sequence of the previous
712712
Column to use to make new frame's columns.
713-
index : str or object or a list of str, optional
713+
index : Hashable or a sequence of the previous, optional
714714
Column to use to make new frame's index. If not given, uses existing index.
715-
values : str, object or a list of the previous, optional
715+
values : Hashable or a sequence of the previous, optional
716716
Column(s) to use for populating new frame's values. If not
717717
specified, all remaining columns will be used and the result will
718718
have hierarchically indexed columns.

pandas/core/series.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2514,6 +2514,8 @@ def round(self, decimals: int = 0, *args, **kwargs) -> Series:
25142514
dtype: float64
25152515
"""
25162516
nv.validate_round(args, kwargs)
2517+
if self.dtype == "object":
2518+
raise TypeError("Expected numeric dtype, got object instead.")
25172519
new_mgr = self._mgr.round(decimals=decimals)
25182520
return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(
25192521
self, method="round"

pandas/tests/io/pytables/test_store.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
timedelta_range,
2424
)
2525
import pandas._testing as tm
26+
from pandas.api.types import (
27+
CategoricalDtype,
28+
)
2629
from pandas.tests.io.pytables.common import (
2730
_maybe_remove,
2831
ensure_clean_store,
@@ -1107,3 +1110,23 @@ def test_store_bool_index(tmp_path, setup_path):
11071110
df.to_hdf(path, key="a")
11081111
result = read_hdf(path, "a")
11091112
tm.assert_frame_equal(expected, result)
1113+
1114+
1115+
@pytest.mark.parametrize("model", ["name", "longname", "verylongname"])
1116+
def test_select_categorical_string_columns(tmp_path, model):
1117+
# Corresponding to BUG: 57608
1118+
1119+
path = tmp_path / "test.h5"
1120+
1121+
models = CategoricalDtype(categories=["name", "longname", "verylongname"])
1122+
df = DataFrame(
1123+
{"modelId": ["name", "longname", "longname"], "value": [1, 2, 3]}
1124+
).astype({"modelId": models, "value": int})
1125+
1126+
with HDFStore(path, "w") as store:
1127+
store.append("df", df, data_columns=["modelId"])
1128+
1129+
with HDFStore(path, "r") as store:
1130+
result = store.select("df", "modelId == model")
1131+
expected = df[df["modelId"] == model]
1132+
tm.assert_frame_equal(result, expected)

pandas/tests/series/methods/test_round.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,3 +72,10 @@ def test_round_ea_boolean(self):
7272
tm.assert_series_equal(result, expected)
7373
result.iloc[0] = False
7474
tm.assert_series_equal(ser, expected)
75+
76+
def test_round_dtype_object(self):
77+
# GH#61206
78+
ser = Series([0.2], dtype="object")
79+
msg = "Expected numeric dtype, got object instead."
80+
with pytest.raises(TypeError, match=msg):
81+
ser.round()

0 commit comments

Comments
 (0)