Merge remote-tracking branch 'upstream/main' into read-csv-from-directory

fangchenli · fangchenli · commit f94a0bf397d0 · 2025-05-21T22:36:06.000-07:00
diff --git a/.github/workflows/wheels.yml b/.github/workflows/wheels.yml
@@ -99,6 +99,7 @@ jobs:
         # Note: M1 images on Github Actions start from macOS 14
         - [macos-14, macosx_arm64]
         - [windows-2022, win_amd64]
+        - [windows-11-arm, win_arm64]
         # TODO: support PyPy?
         python: [["cp310", "3.10"], ["cp311", "3.11"], ["cp312", "3.12"], ["cp313", "3.13"], ["cp313t", "3.13"]]
         include:
@@ -108,6 +109,12 @@ jobs:
         - buildplat: [ubuntu-24.04, pyodide_wasm32]
           python: ["cp312", "3.12"]
           cibw_build_frontend: 'build'
+        exclude:
+          - buildplat: [windows-11-arm, win_arm64]
+            python: ["cp310", "3.10"]
+        # BackendUnavailable: Cannot import 'mesonpy'
+          - buildplat: [windows-11-arm, win_arm64]
+            python: ["cp313t", "3.13"]
 
     env:
       IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
@@ -118,6 +125,12 @@ jobs:
         with:
           fetch-depth: 0
 
+      - name: Set up MSVC environment for ARM64
+        if: matrix.buildplat[1] == 'win_arm64'
+        uses: ilammy/msvc-dev-cmd@v1
+        with:
+          arch: arm64
+
       # TODO: Build wheels from sdist again
       # There's some sort of weird race condition?
       # within Github that makes the sdist be missing files
@@ -155,9 +168,13 @@ jobs:
         env:
           CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
           CIBW_BUILD_FRONTEND: ${{ matrix.cibw_build_frontend || 'pip' }}
-          CIBW_PLATFORM: ${{ matrix.buildplat[1] == 'pyodide_wasm32' && 'pyodide' || 'auto' }}
+          CIBW_PLATFORM: ${{ (matrix.buildplat[1] == 'pyodide_wasm32' && 'pyodide') || (matrix.buildplat[1] == 'win_arm64' && 'windows') || 'auto' }}
+          CIBW_ARCHS: ${{ matrix.buildplat[1] == 'win_arm64' && 'ARM64' || 'auto' }}
+          CIBW_BEFORE_BUILD_WINDOWS: 'python -m pip install delvewheel'
 
-      - name: Set up Python
+      - name: Set up Python for validation/upload (non-ARM64 Windows & other OS)
+        # micromamba is not available for ARM64 Windows
+        if: matrix.buildplat[1] != 'win_arm64'
         uses: mamba-org/setup-micromamba@v2
         with:
           environment-name: wheel-env
@@ -170,6 +187,12 @@ jobs:
           cache-downloads: true
           cache-environment: true
 
+      - name: Install wheel for win_arm64
+        # installing wheel here because micromamba step was skipped
+        if: matrix.buildplat[1] == 'win_arm64'
+        shell: bash -el {0}
+        run: python -m pip install wheel
+
       - name: Validate wheel RECORD
         shell: bash -el {0}
         run: for whl in $(ls wheelhouse); do wheel unpack wheelhouse/$whl -d /tmp; done
diff --git a/doc/source/user_guide/10min.rst b/doc/source/user_guide/10min.rst
@@ -178,12 +178,26 @@ Getitem (``[]``)
 ~~~~~~~~~~~~~~~~
 
 For a :class:`DataFrame`, passing a single label selects a column and
-yields a :class:`Series` equivalent to ``df.A``:
+yields a :class:`Series`:
 
 .. ipython:: python
 
    df["A"]
 
+If the label only contains letters, numbers, and underscores, you can
+alternatively use the column name attribute:
+
+.. ipython:: python
+
+   df.A
+
+Passing a list of column labels selects multiple columns, which can be useful
+for getting a subset/rearranging:
+
+.. ipython:: python
+
+   df[["B", "A"]]
+
 For a :class:`DataFrame`, passing a slice ``:`` selects matching rows:
 
 .. ipython:: python
diff --git a/doc/source/user_guide/missing_data.rst b/doc/source/user_guide/missing_data.rst
@@ -258,9 +258,6 @@ will convert your data to use the nullable data types supporting :class:`NA`,
 such as :class:`Int64Dtype` or :class:`ArrowDtype`. This is especially helpful after reading
 in data sets from IO methods where data types were inferred.
 
-In this example, while the dtypes of all columns are changed, we show the results for
-the first 10 columns.
-
 .. ipython:: python
 
    import io
@@ -434,7 +431,7 @@ where the index and column aligns between the original object and the filled obj
 
 .. note::
 
-   :meth:`DataFrame.where` can also be used to fill NA values.Same result as above.
+   :meth:`DataFrame.where` can also be used to fill NA values. Same result as above.
 
    .. ipython:: python
 
diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst
@@ -120,6 +120,7 @@ Timezones
 Numeric
 ^^^^^^^
 - Enabled :class:`Series.mode` and :class:`DataFrame.mode` with ``dropna=False`` to sort the result for all dtypes in the presence of NA values; previously only certain dtypes would sort (:issue:`60702`)
+- Bug in :meth:`Series.round` on object columns no longer raises ``TypeError``
 -
 
 Conversion
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -775,6 +775,7 @@ I/O
 - Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
 - Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
 - Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
+- Bug in :meth:`HDFStore.select` causing queries on categorical string columns to return unexpected results (:issue:`57608`)
 - Bug in :meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
 - Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
 - Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
diff --git a/pandas/core/computation/pytables.py b/pandas/core/computation/pytables.py
@@ -239,7 +239,8 @@ def stringify(value):
             if conv_val not in metadata:
                 result = -1
             else:
-                result = metadata.searchsorted(conv_val, side="left")
+                # Find the index of the first match of conv_val in metadata
+                result = np.flatnonzero(metadata == conv_val)[0]
             return TermValue(result, result, "integer")
         elif kind == "integer":
             try:
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -4481,18 +4481,58 @@ def _get_item(self, item: Hashable) -> Series:
 
     @overload
     def query(
-        self, expr: str, *, inplace: Literal[False] = ..., **kwargs
+        self,
+        expr: str,
+        *,
+        parser: Literal["pandas", "python"] = ...,
+        engine: Literal["python", "numexpr"] | None = ...,
+        local_dict: dict[str, Any] | None = ...,
+        global_dict: dict[str, Any] | None = ...,
+        resolvers: list[Mapping] | None = ...,
+        level: int = ...,
+        inplace: Literal[False] = ...,
     ) -> DataFrame: ...
 
     @overload
-    def query(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...
+    def query(
+        self,
+        expr: str,
+        *,
+        parser: Literal["pandas", "python"] = ...,
+        engine: Literal["python", "numexpr"] | None = ...,
+        local_dict: dict[str, Any] | None = ...,
+        global_dict: dict[str, Any] | None = ...,
+        resolvers: list[Mapping] | None = ...,
+        level: int = ...,
+        inplace: Literal[True],
+    ) -> None: ...
 
     @overload
     def query(
-        self, expr: str, *, inplace: bool = ..., **kwargs
+        self,
+        expr: str,
+        *,
+        parser: Literal["pandas", "python"] = ...,
+        engine: Literal["python", "numexpr"] | None = ...,
+        local_dict: dict[str, Any] | None = ...,
+        global_dict: dict[str, Any] | None = ...,
+        resolvers: list[Mapping] | None = ...,
+        level: int = ...,
+        inplace: bool = ...,
     ) -> DataFrame | None: ...
 
-    def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | None:
+    def query(
+        self,
+        expr: str,
+        *,
+        parser: Literal["pandas", "python"] = "pandas",
+        engine: Literal["python", "numexpr"] | None = None,
+        local_dict: dict[str, Any] | None = None,
+        global_dict: dict[str, Any] | None = None,
+        resolvers: list[Mapping] | None = None,
+        level: int = 0,
+        inplace: bool = False,
+    ) -> DataFrame | None:
         """
         Query the columns of a DataFrame with a boolean expression.
 
@@ -4511,11 +4551,41 @@ def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | No
 
             See the documentation for :meth:`DataFrame.eval` for details on
             referring to column names and variables in the query string.
+        parser : {'pandas', 'python'}, default 'pandas'
+            The parser to use to construct the syntax tree from the expression. The
+            default of ``'pandas'`` parses code slightly different than standard
+            Python. Alternatively, you can parse an expression using the
+            ``'python'`` parser to retain strict Python semantics.  See the
+            :ref:`enhancing performance <enhancingperf.eval>` documentation for
+            more details.
+        engine : {'python', 'numexpr'}, default 'numexpr'
+
+            The engine used to evaluate the expression. Supported engines are
+
+            - None : tries to use ``numexpr``, falls back to ``python``
+            - ``'numexpr'`` : This default engine evaluates pandas objects using
+              numexpr for large speed ups in complex expressions with large frames.
+            - ``'python'`` : Performs operations as if you had ``eval``'d in top
+              level python. This engine is generally not that useful.
+
+            More backends may be available in the future.
+        local_dict : dict or None, optional
+            A dictionary of local variables, taken from locals() by default.
+        global_dict : dict or None, optional
+            A dictionary of global variables, taken from globals() by default.
+        resolvers : list of dict-like or None, optional
+            A list of objects implementing the ``__getitem__`` special method that
+            you can use to inject an additional collection of namespaces to use for
+            variable lookup. For example, this is used in the
+            :meth:`~DataFrame.query` method to inject the
+            ``DataFrame.index`` and ``DataFrame.columns``
+            variables that refer to their respective :class:`~pandas.DataFrame`
+            instance attributes.
+        level : int, optional
+            The number of prior stack frames to traverse and add to the current
+            scope. Most users will **not** need to change this parameter.
         inplace : bool
             Whether to modify the DataFrame rather than creating a new one.
-        **kwargs
-            See the documentation for :func:`eval` for complete details
-            on the keyword arguments accepted by :meth:`DataFrame.query`.
 
         Returns
         -------
@@ -4628,10 +4698,17 @@ def query(self, expr: str, *, inplace: bool = False, **kwargs) -> DataFrame | No
         if not isinstance(expr, str):
             msg = f"expr must be a string to be evaluated, {type(expr)} given"
             raise ValueError(msg)
-        kwargs["level"] = kwargs.pop("level", 0) + 1
-        kwargs["target"] = None
 
-        res = self.eval(expr, **kwargs)
+        res = self.eval(
+            expr,
+            level=level + 1,
+            parser=parser,
+            target=None,
+            engine=engine,
+            local_dict=local_dict,
+            global_dict=global_dict,
+            resolvers=resolvers or (),
+        )
 
         try:
             result = self.loc[res]
@@ -9181,11 +9258,11 @@ def groupby(
 
         Parameters
         ----------%s
-        columns : str or object or a list of str
+        columns : Hashable or a sequence of the previous
             Column to use to make new frame's columns.
-        index : str or object or a list of str, optional
+        index : Hashable or a sequence of the previous, optional
             Column to use to make new frame's index. If not given, uses existing index.
-        values : str, object or a list of the previous, optional
+        values : Hashable or a sequence of the previous, optional
             Column(s) to use for populating new frame's values. If not
             specified, all remaining columns will be used and the result will
             have hierarchically indexed columns.
@@ -9324,12 +9401,12 @@ def pivot(
         ----------%s
         values : list-like or scalar, optional
             Column or columns to aggregate.
-        index : column, Grouper, array, or list of the previous
+        index : column, Grouper, array, or sequence of the previous
             Keys to group by on the pivot table index. If a list is passed,
             it can contain any of the other types (except list). If an array is
             passed, it must be the same length as the data and will be used in
             the same manner as column values.
-        columns : column, Grouper, array, or list of the previous
+        columns : column, Grouper, array, or sequence of the previous
             Keys to group by on the pivot table column. If a list is passed,
             it can contain any of the other types (except list). If an array is
             passed, it must be the same length as the data and will be used in
diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py
@@ -1731,10 +1731,16 @@ def name(self) -> Hashable:
         """
         Return Index or MultiIndex name.
 
+        Returns
+        -------
+        label (hashable object)
+            The name of the Index.
+
         See Also
         --------
         Index.set_names: Able to set new names partially and by level.
         Index.rename: Able to set new names partially and by level.
+        Series.name: Corresponding Series property.
 
         Examples
         --------
diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py
@@ -76,12 +76,12 @@ def pivot_table(
         Input pandas DataFrame object.
     values : list-like or scalar, optional
         Column or columns to aggregate.
-    index : column, Grouper, array, or list of the previous
+    index : column, Grouper, array, or sequence of the previous
         Keys to group by on the pivot table index. If a list is passed,
         it can contain any of the other types (except list). If an array is
         passed, it must be the same length as the data and will be used in
         the same manner as column values.
-    columns : column, Grouper, array, or list of the previous
+    columns : column, Grouper, array, or sequence of the previous
         Keys to group by on the pivot table column. If a list is passed,
         it can contain any of the other types (except list). If an array is
         passed, it must be the same length as the data and will be used in
@@ -708,11 +708,11 @@ def pivot(
     ----------
     data : DataFrame
         Input pandas DataFrame object.
-    columns : str or object or a list of str
+    columns : Hashable or a sequence of the previous
         Column to use to make new frame's columns.
-    index : str or object or a list of str, optional
+    index : Hashable or a sequence of the previous, optional
         Column to use to make new frame's index. If not given, uses existing index.
-    values : str, object or a list of the previous, optional
+    values : Hashable or a sequence of the previous, optional
         Column(s) to use for populating new frame's values. If not
         specified, all remaining columns will be used and the result will
         have hierarchically indexed columns.
diff --git a/pandas/core/series.py b/pandas/core/series.py
@@ -2514,6 +2514,8 @@ def round(self, decimals: int = 0, *args, **kwargs) -> Series:
         dtype: float64
         """
         nv.validate_round(args, kwargs)
+        if self.dtype == "object":
+            raise TypeError("Expected numeric dtype, got object instead.")
         new_mgr = self._mgr.round(decimals=decimals)
         return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(
             self, method="round"
diff --git a/pandas/tests/io/pytables/test_store.py b/pandas/tests/io/pytables/test_store.py
@@ -23,6 +23,9 @@
     timedelta_range,
 )
 import pandas._testing as tm
+from pandas.api.types import (
+    CategoricalDtype,
+)
 from pandas.tests.io.pytables.common import (
     _maybe_remove,
     ensure_clean_store,
@@ -1107,3 +1110,23 @@ def test_store_bool_index(tmp_path, setup_path):
     df.to_hdf(path, key="a")
     result = read_hdf(path, "a")
     tm.assert_frame_equal(expected, result)
+
+
+@pytest.mark.parametrize("model", ["name", "longname", "verylongname"])
+def test_select_categorical_string_columns(tmp_path, model):
+    # Corresponding to BUG: 57608
+
+    path = tmp_path / "test.h5"
+
+    models = CategoricalDtype(categories=["name", "longname", "verylongname"])
+    df = DataFrame(
+        {"modelId": ["name", "longname", "longname"], "value": [1, 2, 3]}
+    ).astype({"modelId": models, "value": int})
+
+    with HDFStore(path, "w") as store:
+        store.append("df", df, data_columns=["modelId"])
+
+    with HDFStore(path, "r") as store:
+        result = store.select("df", "modelId == model")
+        expected = df[df["modelId"] == model]
+        tm.assert_frame_equal(result, expected)
diff --git a/pandas/tests/series/methods/test_round.py b/pandas/tests/series/methods/test_round.py
@@ -72,3 +72,10 @@ def test_round_ea_boolean(self):
         tm.assert_series_equal(result, expected)
         result.iloc[0] = False
         tm.assert_series_equal(ser, expected)
+
+    def test_round_dtype_object(self):
+        # GH#61206
+        ser = Series([0.2], dtype="object")
+        msg = "Expected numeric dtype, got object instead."
+        with pytest.raises(TypeError, match=msg):
+            ser.round()

Original file line number	Diff line number	Diff line change
`@@ -120,6 +120,7 @@ Timezones`
`120`	`120`	`Numeric`
`121`	`121`	`^^^^^^^`
`122`	`122`	- Enabled :class:`Series.mode` and :class:`DataFrame.mode` with ``dropna=False`` to sort the result for all dtypes in the presence of NA values; previously only certain dtypes would sort (:issue:`60702`)
	`123`	+- Bug in :meth:`Series.round` on object columns no longer raises ``TypeError``
`123`	`124`	`-`
`124`	`125`
`125`	`126`	`Conversion`