Skip to content

ENH: add consortium standard entrypoint #54383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/package-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
extra: ["test", "performance", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "all"]
extra: ["test", "performance", "computation", "fss", "aws", "gcp", "excel", "parquet", "feather", "hdf5", "spss", "postgresql", "mysql", "sql-other", "html", "xml", "plot", "output_formatting", "clipboard", "compression", "consortium-standard", "all"]
fail-fast: false
name: Install Extras - ${{ matrix.extra }}
concurrency:
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,6 @@ dependencies:
- pyyaml
- py
- pip:
- dataframe-api-compat>=0.1.7
- pyqt5>=5.15.6
- tzdata>=2022.1
1 change: 1 addition & 0 deletions ci/deps/actions-39-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,6 @@ dependencies:
- zstandard=0.17.0

- pip:
- dataframe-api-compat==0.1.7
- pyqt5==5.15.6
- tzdata==2022.1
11 changes: 11 additions & 0 deletions doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -415,3 +415,14 @@ brotli 0.7.0 compression Brotli compression
python-snappy 0.6.1 compression Snappy compression
Zstandard 0.17.0 compression Zstandard compression
========================= ================== =============== =============================================================

Consortium Standard
^^^^^^^^^^^^^^^^^^^

Installable with ``pip install "pandas[consortium-standard]"``

========================= ================== =================== =============================================================
Dependency Minimum Version pip extra Notes
========================= ================== =================== =============================================================
dataframe-api-compat 0.1.7 consortium-standard Consortium Standard-compatible implementation based on pandas
========================= ================== =================== =============================================================
121 changes: 62 additions & 59 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,7 @@ Other enhancements
- Many read/to_* functions, such as :meth:`DataFrame.to_pickle` and :func:`read_csv`, support forwarding compression arguments to lzma.LZMAFile (:issue:`52979`)
- Reductions :meth:`Series.argmax`, :meth:`Series.argmin`, :meth:`Series.idxmax`, :meth:`Series.idxmin`, :meth:`Index.argmax`, :meth:`Index.argmin`, :meth:`DataFrame.idxmax`, :meth:`DataFrame.idxmin` are now supported for object-dtype objects (:issue:`4279`, :issue:`18021`, :issue:`40685`, :issue:`43697`)
- :meth:`DataFrame.to_parquet` and :func:`read_parquet` will now write and read ``attrs`` respectively (:issue:`54346`)
- Added support for the DataFrame Consortium Standard (:issue:`54383`)
- Performance improvement in :meth:`GroupBy.quantile` (:issue:`51722`)

.. ---------------------------------------------------------------------------
Expand Down Expand Up @@ -256,65 +257,67 @@ Increased minimum versions for dependencies
Some minimum supported versions of dependencies were updated.
If installed, we now require:

+-----------------+-----------------+----------+---------+
| Package | Minimum Version | Required | Changed |
+=================+=================+==========+=========+
| numpy | 1.22.4 | X | X |
+-----------------+-----------------+----------+---------+
| mypy (dev) | 1.4.1 | | X |
+-----------------+-----------------+----------+---------+
| beautifulsoup4 | 4.11.1 | | X |
+-----------------+-----------------+----------+---------+
| bottleneck | 1.3.4 | | X |
+-----------------+-----------------+----------+---------+
| fastparquet | 0.8.1 | | X |
+-----------------+-----------------+----------+---------+
| fsspec | 2022.05.0 | | X |
+-----------------+-----------------+----------+---------+
| hypothesis | 6.46.1 | | X |
+-----------------+-----------------+----------+---------+
| gcsfs | 2022.05.0 | | X |
+-----------------+-----------------+----------+---------+
| jinja2 | 3.1.2 | | X |
+-----------------+-----------------+----------+---------+
| lxml | 4.8.0 | | X |
+-----------------+-----------------+----------+---------+
| numba | 0.55.2 | | X |
+-----------------+-----------------+----------+---------+
| numexpr | 2.8.0 | | X |
+-----------------+-----------------+----------+---------+
| openpyxl | 3.0.10 | | X |
+-----------------+-----------------+----------+---------+
| pandas-gbq | 0.17.5 | | X |
+-----------------+-----------------+----------+---------+
| psycopg2 | 2.9.3 | | X |
+-----------------+-----------------+----------+---------+
| pyreadstat | 1.1.5 | | X |
+-----------------+-----------------+----------+---------+
| pyqt5 | 5.15.6 | | X |
+-----------------+-----------------+----------+---------+
| pytables | 3.7.0 | | X |
+-----------------+-----------------+----------+---------+
| pytest | 7.3.2 | | X |
+-----------------+-----------------+----------+---------+
| python-snappy | 0.6.1 | | X |
+-----------------+-----------------+----------+---------+
| pyxlsb | 1.0.9 | | X |
+-----------------+-----------------+----------+---------+
| s3fs | 2022.05.0 | | X |
+-----------------+-----------------+----------+---------+
| scipy | 1.8.1 | | X |
+-----------------+-----------------+----------+---------+
| sqlalchemy | 1.4.36 | | X |
+-----------------+-----------------+----------+---------+
| tabulate | 0.8.10 | | X |
+-----------------+-----------------+----------+---------+
| xarray | 2022.03.0 | | X |
+-----------------+-----------------+----------+---------+
| xlsxwriter | 3.0.3 | | X |
+-----------------+-----------------+----------+---------+
| zstandard | 0.17.0 | | X |
+-----------------+-----------------+----------+---------+
+----------------------+-----------------+----------+---------+
| Package | Minimum Version | Required | Changed |
+======================+=================+==========+=========+
| numpy | 1.22.4 | X | X |
+----------------------+-----------------+----------+---------+
| mypy (dev) | 1.4.1 | | X |
+----------------------+-----------------+----------+---------+
| beautifulsoup4 | 4.11.1 | | X |
+----------------------+-----------------+----------+---------+
| bottleneck | 1.3.4 | | X |
+----------------------+-----------------+----------+---------+
| dataframe-api-compat | 0.1.7 | | X |
+----------------------+-----------------+----------+---------+
| fastparquet | 0.8.1 | | X |
+----------------------+-----------------+----------+---------+
| fsspec | 2022.05.0 | | X |
+----------------------+-----------------+----------+---------+
| hypothesis | 6.46.1 | | X |
+----------------------+-----------------+----------+---------+
| gcsfs | 2022.05.0 | | X |
+----------------------+-----------------+----------+---------+
| jinja2 | 3.1.2 | | X |
+----------------------+-----------------+----------+---------+
| lxml | 4.8.0 | | X |
+----------------------+-----------------+----------+---------+
| numba | 0.55.2 | | X |
+----------------------+-----------------+----------+---------+
| numexpr | 2.8.0 | | X |
+----------------------+-----------------+----------+---------+
| openpyxl | 3.0.10 | | X |
+----------------------+-----------------+----------+---------+
| pandas-gbq | 0.17.5 | | X |
+----------------------+-----------------+----------+---------+
| psycopg2 | 2.9.3 | | X |
+----------------------+-----------------+----------+---------+
| pyreadstat | 1.1.5 | | X |
+----------------------+-----------------+----------+---------+
| pyqt5 | 5.15.6 | | X |
+----------------------+-----------------+----------+---------+
| pytables | 3.7.0 | | X |
+----------------------+-----------------+----------+---------+
| pytest | 7.3.2 | | X |
+----------------------+-----------------+----------+---------+
| python-snappy | 0.6.1 | | X |
+----------------------+-----------------+----------+---------+
| pyxlsb | 1.0.9 | | X |
+----------------------+-----------------+----------+---------+
| s3fs | 2022.05.0 | | X |
+----------------------+-----------------+----------+---------+
| scipy | 1.8.1 | | X |
+----------------------+-----------------+----------+---------+
| sqlalchemy | 1.4.36 | | X |
+----------------------+-----------------+----------+---------+
| tabulate | 0.8.10 | | X |
+----------------------+-----------------+----------+---------+
| xarray | 2022.03.0 | | X |
+----------------------+-----------------+----------+---------+
| xlsxwriter | 3.0.3 | | X |
+----------------------+-----------------+----------+---------+
| zstandard | 0.17.0 | | X |
+----------------------+-----------------+----------+---------+

For `optional libraries <https://pandas.pydata.org/docs/getting_started/install.html>`_ the general recommendation is to use the latest version.
The following table lists the lowest version per library that is currently being tested throughout the development of pandas.
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ dependencies:
- pygments # Code highlighting

- pip:
- dataframe-api-compat>=0.1.7
- sphinx-toggleprompt # conda-forge version has stricter pins on jinja2
- typing_extensions; python_version<"3.11"
- tzdata>=2022.1
1 change: 1 addition & 0 deletions pandas/compat/_optional.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"blosc": "1.21.0",
"bottleneck": "1.3.4",
"brotli": "0.7.0",
"dataframe-api-compat": "0.1.7",
"fastparquet": "0.8.1",
"fsspec": "2022.05.0",
"html5lib": "1.1",
Expand Down
15 changes: 15 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -932,6 +932,21 @@ def __dataframe__(

return PandasDataFrameXchg(self, nan_as_null, allow_copy)

def __dataframe_consortium_standard__(
self, *, api_version: str | None = None
) -> Any:
"""
Provide entry point to the Consortium DataFrame Standard API.

This is developed and maintained outside of pandas.
Please report any issues to https://github.com/data-apis/dataframe-api-compat.
"""
dataframe_api_compat = import_optional_dependency("dataframe_api_compat")
convert_to_standard_compliant_dataframe = (
dataframe_api_compat.pandas_standard.convert_to_standard_compliant_dataframe
)
return convert_to_standard_compliant_dataframe(self, api_version=api_version)

# ----------------------------------------------------------------------

@property
Expand Down
17 changes: 17 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from pandas._libs.lib import is_range_indexer
from pandas.compat import PYPY
from pandas.compat._constants import REF_COUNT
from pandas.compat._optional import import_optional_dependency
from pandas.compat.numpy import function as nv
from pandas.errors import (
ChainedAssignmentError,
Expand Down Expand Up @@ -955,6 +956,22 @@ def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
arr.flags.writeable = False
return arr

# ----------------------------------------------------------------------

def __column_consortium_standard__(self, *, api_version: str | None = None) -> Any:
"""
Provide entry point to the Consortium DataFrame Standard API.

This is developed and maintained outside of pandas.
Please report any issues to https://github.com/data-apis/dataframe-api-compat.
"""
dataframe_api_compat = import_optional_dependency("dataframe_api_compat")
return (
dataframe_api_compat.pandas_standard.convert_to_standard_compliant_column(
self, api_version=api_version
)
)

# ----------------------------------------------------------------------
# Unary Methods

Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/test_downstream.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,27 @@ def test_from_obscure_array(dtype, array_likes):
tm.assert_index_equal(result, expected)


def test_dataframe_consortium() -> None:
"""
Test some basic methods of the dataframe consortium standard.

Full testing is done at https://github.com/data-apis/dataframe-api-compat,
this is just to check that the entry point works as expected.
"""
pytest.importorskip("dataframe_api_compat")
df_pd = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df = df_pd.__dataframe_consortium_standard__()
result_1 = df.get_column_names()
expected_1 = ["a", "b"]
assert result_1 == expected_1

ser = Series([1, 2, 3])
col = ser.__column_consortium_standard__()
result_2 = col.get_value(1)
expected_2 = 2
assert result_2 == expected_2


def test_xarray_coerce_unit():
# GH44053
xr = pytest.importorskip("xarray")
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,13 @@ plot = ['matplotlib>=3.6.1']
output_formatting = ['jinja2>=3.1.2', 'tabulate>=0.8.10']
clipboard = ['PyQt5>=5.15.6', 'qtpy>=2.2.0']
compression = ['brotlipy>=0.7.0', 'python-snappy>=0.6.1', 'zstandard>=0.17.0']
consortium-standard = ['dataframe-api-compat>=0.1.7']
all = ['beautifulsoup4>=4.11.1',
# blosc only available on conda (https://github.com/Blosc/python-blosc/issues/297)
#'blosc>=1.21.0',
'bottleneck>=1.3.4',
'brotlipy>=0.7.0',
'dataframe-api-compat>=0.1.7',
'fastparquet>=0.8.1',
'fsspec>=2022.05.0',
'gcsfs>=2022.05.0',
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ feedparser
pyyaml
requests
pygments
dataframe-api-compat>=0.1.7
sphinx-toggleprompt
typing_extensions; python_version<"3.11"
tzdata>=2022.1