sthagen
diff --git a/‎.pre-commit-config.yaml
Lines changed: 1 addition & 1 deletion b/‎.pre-commit-config.yaml
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/ecosystem.rst
Lines changed: 12 additions & 11 deletions b/‎doc/source/ecosystem.rst
Lines changed: 12 additions & 11 deletions
diff --git a/‎doc/source/whatsnew/v1.2.4.rst
Lines changed: 1 addition & 0 deletions b/‎doc/source/whatsnew/v1.2.4.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/source/whatsnew/v1.3.0.rst
Lines changed: 1 addition & 0 deletions b/‎doc/source/whatsnew/v1.3.0.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎environment.yml
Lines changed: 1 addition & 0 deletions b/‎environment.yml
Lines changed: 1 addition & 0 deletions
diff --git a/‎pandas/_libs/algos.pyx
Lines changed: 47 additions & 27 deletions b/‎pandas/_libs/algos.pyx
Lines changed: 47 additions & 27 deletions
diff --git a/‎pandas/_libs/groupby.pyi
Lines changed: 168 additions & 0 deletions b/‎pandas/_libs/groupby.pyi
Lines changed: 168 additions & 0 deletions
@@ -36,7 +36,7 @@ repos:
     rev: 3.9.0
     hooks:
     -   id: flake8
-        additional_dependencies: [flake8-comprehensions>=3.1.0]
+        additional_dependencies: [flake8-comprehensions>=3.1.0, flake8-bugbear>=21.3.2]
     -   id: flake8
         name: flake8 (cython)
         types: [cython]
 
@@ -475,7 +475,7 @@ arrays can be stored inside pandas' Series and DataFrame.
 `Pandas-Genomics`_
 ~~~~~~~~~~~~~~~~~~
 
-Pandas-Genomics provides extension types and extension arrays for working with genomics data
+Pandas-Genomics provides extension types, extension arrays, and extension accessors for working with genomics data
 
 `Pint-Pandas`_
 ~~~~~~~~~~~~~~
@@ -502,16 +502,17 @@ A directory of projects providing
 :ref:`extension accessors <extending.register-accessors>`. This is for users to
 discover new accessors and for library authors to coordinate on the namespace.
 
-=============== ============ ==================================== ===============================================================
-Library         Accessor     Classes                              Description
-=============== ============ ==================================== ===============================================================
-`cyberpandas`_  ``ip``       ``Series``                           Provides common operations for working with IP addresses.
-`pdvega`_       ``vgplot``   ``Series``, ``DataFrame``            Provides plotting functions from the Altair_ library.
-`pandas_path`_  ``path``     ``Index``, ``Series``                Provides `pathlib.Path`_ functions for Series.
-`pint-pandas`_  ``pint``     ``Series``, ``DataFrame``            Provides units support for numeric Series and DataFrames.
-`composeml`_    ``slice``    ``DataFrame``                        Provides a generator for enhanced data slicing.
-`datatest`_     ``validate`` ``Series``, ``DataFrame``, ``Index`` Provides validation, differences, and acceptance managers.
-=============== ============ ==================================== ===============================================================
+================== ============ ==================================== ===============================================================================
+Library            Accessor     Classes                              Description
+================== ============ ==================================== ===============================================================================
+`cyberpandas`_     ``ip``       ``Series``                           Provides common operations for working with IP addresses.
+`pdvega`_          ``vgplot``   ``Series``, ``DataFrame``            Provides plotting functions from the Altair_ library.
+`pandas-genomics`_ ``genomics`` ``Series``, ``DataFrame``            Provides common operations for quality control and analysis of genomics data
+`pandas_path`_     ``path``     ``Index``, ``Series``                Provides `pathlib.Path`_ functions for Series.
+`pint-pandas`_     ``pint``     ``Series``, ``DataFrame``            Provides units support for numeric Series and DataFrames.
+`composeml`_       ``slice``    ``DataFrame``                        Provides a generator for enhanced data slicing.
+`datatest`_        ``validate`` ``Series``, ``DataFrame``, ``Index`` Provides validation, differences, and acceptance managers.
+================== ============ ==================================== ===============================================================================
 
 .. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest
 .. _pdvega: https://altair-viz.github.io/pdvega/
 
@@ -17,6 +17,7 @@ Fixed regressions
 
 - Fixed regression in :meth:`DataFrame.sum` when ``min_count`` greater than the :class:`DataFrame` shape was passed resulted in a ``ValueError`` (:issue:`39738`)
 - Fixed regression in :meth:`DataFrame.to_json` raising ``AttributeError`` when run on PyPy (:issue:`39837`)
+- Fixed regression in (in)equality comparison of ``pd.NaT`` with a non-datetimelike numpy array returning a scalar instead of an array (:issue:`40722`)
 - Fixed regression in :meth:`DataFrame.where` not returning a copy in the case of an all True condition (:issue:`39595`)
 - Fixed regression in :meth:`DataFrame.replace` raising ``IndexError`` when ``regex`` was a multi-key dictionary (:issue:`39338`)
 -
 
@@ -561,6 +561,7 @@ Numeric
 - Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
 - Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
 - Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)
+- Bug in :meth:`DataFrameGroupBy.rank` giving incorrect results with ``pct=True`` and equal values between consecutive groups (:issue:`40518`)
 -
 
 Conversion
 
@@ -21,6 +21,7 @@ dependencies:
   - black=20.8b1
   - cpplint
   - flake8
+  - flake8-bugbear>=21.3.2  # used by flake8, find likely bugs
   - flake8-comprehensions>=3.1.0  # used by flake8, linting of unnecessary comprehensions
   - isort>=5.2.1  # check that imports are in the right order
   - mypy=0.812
 
@@ -947,12 +947,14 @@ def rank_1d(
         TiebreakEnumType tiebreak
         Py_ssize_t i, j, N, grp_start=0, dups=0, sum_ranks=0
         Py_ssize_t grp_vals_seen=1, grp_na_count=0
-        ndarray[int64_t, ndim=1] lexsort_indexer
-        ndarray[float64_t, ndim=1] grp_sizes, out
+        ndarray[int64_t, ndim=1] grp_sizes
+        ndarray[intp_t, ndim=1] lexsort_indexer
+        ndarray[float64_t, ndim=1] out
         ndarray[rank_t, ndim=1] masked_vals
         ndarray[uint8_t, ndim=1] mask
         bint keep_na, at_end, next_val_diff, check_labels, group_changed
         rank_t nan_fill_val
+        int64_t grp_size
 
     tiebreak = tiebreakers[ties_method]
     if tiebreak == TIEBREAK_FIRST:
@@ -965,7 +967,7 @@ def rank_1d(
     # TODO Cython 3.0: cast won't be necessary (#2992)
     assert <Py_ssize_t>len(labels) == N
     out = np.empty(N)
-    grp_sizes = np.ones(N)
+    grp_sizes = np.ones(N, dtype=np.int64)
 
     # If all 0 labels, can short-circuit later label
     # comparisons
@@ -1022,7 +1024,7 @@ def rank_1d(
     # each label corresponds to a different group value,
     # the mask helps you differentiate missing values before
     # performing sort on the actual values
-    lexsort_indexer = np.lexsort(order).astype(np.int64, copy=False)
+    lexsort_indexer = np.lexsort(order).astype(np.intp, copy=False)
 
     if not ascending:
         lexsort_indexer = lexsort_indexer[::-1]
@@ -1093,13 +1095,15 @@ def rank_1d(
                     for j in range(i - dups + 1, i + 1):
                         out[lexsort_indexer[j]] = grp_vals_seen
 
-                # Look forward to the next value (using the sorting in lexsort_indexer)
-                # if the value does not equal the current value then we need to
-                # reset the dups and sum_ranks, knowing that a new value is
-                # coming up. The conditional also needs to handle nan equality
-                # and the end of iteration
-                if next_val_diff or (mask[lexsort_indexer[i]]
-                                     ^ mask[lexsort_indexer[i+1]]):
+                # Look forward to the next value (using the sorting in
+                # lexsort_indexer). If the value does not equal the current
+                # value then we need to reset the dups and sum_ranks, knowing
+                # that a new value is coming up. The conditional also needs
+                # to handle nan equality and the end of iteration. If group
+                # changes we do not record seeing a new value in the group
+                if not group_changed and (next_val_diff or
+                                          (mask[lexsort_indexer[i]]
+                                           ^ mask[lexsort_indexer[i+1]])):
                     dups = sum_ranks = 0
                     grp_vals_seen += 1
 
@@ -1110,14 +1114,21 @@ def rank_1d(
                 # group encountered (used by pct calculations later). Also be
                 # sure to reset any of the items helping to calculate dups
                 if group_changed:
+
+                    # If not dense tiebreak, group size used to compute
+                    # percentile will be # of non-null elements in group
                     if tiebreak != TIEBREAK_DENSE:
-                        for j in range(grp_start, i + 1):
-                            grp_sizes[lexsort_indexer[j]] = \
-                                (i - grp_start + 1 - grp_na_count)
+                        grp_size = i - grp_start + 1 - grp_na_count
+
+                    # Otherwise, it will be the number of distinct values
+                    # in the group, subtracting 1 if NaNs are present
+                    # since that is a distinct value we shouldn't count
                     else:
-                        for j in range(grp_start, i + 1):
-                            grp_sizes[lexsort_indexer[j]] = \
-                                (grp_vals_seen - 1 - (grp_na_count > 0))
+                        grp_size = grp_vals_seen - (grp_na_count > 0)
+
+                    for j in range(grp_start, i + 1):
+                        grp_sizes[lexsort_indexer[j]] = grp_size
+
                     dups = sum_ranks = 0
                     grp_na_count = 0
                     grp_start = i + 1
@@ -1184,12 +1195,14 @@ def rank_1d(
                             out[lexsort_indexer[j]] = grp_vals_seen
 
                     # Look forward to the next value (using the sorting in
-                    # lexsort_indexer) if the value does not equal the current
+                    # lexsort_indexer). If the value does not equal the current
                     # value then we need to reset the dups and sum_ranks, knowing
                     # that a new value is coming up. The conditional also needs
-                    # to handle nan equality and the end of iteration
-                    if next_val_diff or (mask[lexsort_indexer[i]]
-                                         ^ mask[lexsort_indexer[i+1]]):
+                    # to handle nan equality and the end of iteration. If group
+                    # changes we do not record seeing a new value in the group
+                    if not group_changed and (next_val_diff or
+                                              (mask[lexsort_indexer[i]]
+                                               ^ mask[lexsort_indexer[i+1]])):
                         dups = sum_ranks = 0
                         grp_vals_seen += 1
 
@@ -1200,14 +1213,21 @@ def rank_1d(
                     # group encountered (used by pct calculations later). Also be
                     # sure to reset any of the items helping to calculate dups
                     if group_changed:
+
+                        # If not dense tiebreak, group size used to compute
+                        # percentile will be # of non-null elements in group
                         if tiebreak != TIEBREAK_DENSE:
-                            for j in range(grp_start, i + 1):
-                                grp_sizes[lexsort_indexer[j]] = \
-                                    (i - grp_start + 1 - grp_na_count)
+                            grp_size = i - grp_start + 1 - grp_na_count
+
+                        # Otherwise, it will be the number of distinct values
+                        # in the group, subtracting 1 if NaNs are present
+                        # since that is a distinct value we shouldn't count
                         else:
-                            for j in range(grp_start, i + 1):
-                                grp_sizes[lexsort_indexer[j]] = \
-                                    (grp_vals_seen - 1 - (grp_na_count > 0))
+                            grp_size = grp_vals_seen - (grp_na_count > 0)
+
+                        for j in range(grp_start, i + 1):
+                            grp_sizes[lexsort_indexer[j]] = grp_size
+
                         dups = sum_ranks = 0
                         grp_na_count = 0
                         grp_start = i + 1
 
@@ -0,0 +1,168 @@
+from typing import Literal
+
+import numpy as np
+
+def group_median_float64(
+    out: np.ndarray,       # ndarray[float64_t, ndim=2]
+    counts: np.ndarray,    # ndarray[int64_t]
+    values: np.ndarray,    # ndarray[float64_t, ndim=2]
+    labels: np.ndarray,    # ndarray[int64_t]
+    min_count: int = ...,  # Py_ssize_t
+) -> None: ...
+
+def group_cumprod_float64(
+    out: np.ndarray,     # float64_t[:, ::1]
+    values: np.ndarray,  # const float64_t[:, :]
+    labels: np.ndarray,  # const int64_t[:]
+    ngroups: int,
+    is_datetimelike: bool,
+    skipna: bool = ...,
+) -> None: ...
+
+def group_cumsum(
+    out: np.ndarray,     # numeric[:, ::1]
+    values: np.ndarray,  # ndarray[numeric, ndim=2]
+    labels: np.ndarray,  # const int64_t[:]
+    ngroups: int,
+    is_datetimelike: bool,
+    skipna: bool = ...,
+) -> None: ...
+
+
+def group_shift_indexer(
+    out: np.ndarray,     # int64_t[::1]
+    labels: np.ndarray,  # const int64_t[:]
+    ngroups: int,
+    periods: int,
+) -> None: ...
+
+
+def group_fillna_indexer(
+    out: np.ndarray,     # ndarray[int64_t]
+    labels: np.ndarray,  # ndarray[int64_t]
+    mask: np.ndarray,    # ndarray[uint8_t]
+    direction: Literal["ffill", "bfill"],
+    limit: int,          # int64_t
+    dropna: bool,
+) -> None: ...
+
+
+def group_any_all(
+    out: np.ndarray,     # uint8_t[::1]
+    values: np.ndarray,  # const uint8_t[::1]
+    labels: np.ndarray,  # const int64_t[:]
+    mask: np.ndarray,    # const uint8_t[::1]
+    val_test: Literal["any", "all"],
+    skipna: bool,
+) -> None: ...
+
+def group_add(
+    out: np.ndarray,     # complexfloating_t[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[complexfloating_t, ndim=2]
+    labels: np.ndarray,  # const intp_t[:]
+    min_count: int = ...
+) -> None: ...
+
+def group_prod(
+    out: np.ndarray,     # floating[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[floating, ndim=2]
+    labels: np.ndarray,  # const intp_t[:]
+    min_count: int = ...
+) -> None: ...
+
+def group_var(
+    out: np.ndarray,       # floating[:, ::1]
+    counts: np.ndarray,    # int64_t[::1]
+    values: np.ndarray,    # ndarray[floating, ndim=2]
+    labels: np.ndarray,    # const intp_t[:]
+    min_count: int = ...,  # Py_ssize_t
+    ddof: int = ...,       # int64_t
+) -> None: ...
+
+def group_mean(
+    out: np.ndarray,     # floating[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[floating, ndim=2]
+    labels: np.ndarray,  # const intp_t[:]
+    min_count: int = ...
+) -> None: ...
+
+def group_ohlc(
+    out: np.ndarray,     # floating[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[floating, ndim=2]
+    labels: np.ndarray,  # const intp_t[:]
+    min_count: int = ...
+) -> None: ...
+
+def group_quantile(
+    out: np.ndarray,     # ndarray[float64_t]
+    values: np.ndarray,  # ndarray[numeric, ndim=1]
+    labels: np.ndarray,  # ndarray[int64_t]
+    mask: np.ndarray,    # ndarray[uint8_t]
+    q: float,            # float64_t
+    interpolation: Literal["linear", "lower", "higher", "nearest", "midpoint"],
+) -> None: ...
+
+def group_last(
+    out: np.ndarray,       # rank_t[:, ::1]
+    counts: np.ndarray,    # int64_t[::1]
+    values: np.ndarray,    # ndarray[rank_t, ndim=2]
+    labels: np.ndarray,    # const int64_t[:]
+    min_count: int = ...,  # Py_ssize_t
+) -> None: ...
+
+def group_nth(
+    out: np.ndarray,        # rank_t[:, ::1]
+    counts: np.ndarray,     # int64_t[::1]
+    values: np.ndarray,     # ndarray[rank_t, ndim=2]
+    labels: np.ndarray,     # const int64_t[:]
+    min_count: int = ...,   # int64_t
+    rank: int = ...,        # int64_t
+) -> None: ...
+
+def group_rank(
+    out: np.ndarray,     # float64_t[:, ::1]
+    values: np.ndarray,  # ndarray[rank_t, ndim=2]
+    labels: np.ndarray,  # const int64_t[:]
+    ngroups: int,
+    is_datetimelike: bool,
+    ties_method: Literal["aveage", "min", "max", "first", "dense"] = ...,
+    ascending: bool = ...,
+    pct: bool = ...,
+    na_option: Literal["keep", "top", "bottom"] = ...,
+) -> None: ...
+
+def group_max(
+    out: np.ndarray,     # groupby_t[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[groupby_t, ndim=2]
+    labels: np.ndarray,  # const int64_t[:]
+    min_count: int = ...,
+) -> None: ...
+
+def group_min(
+    out: np.ndarray,     # groupby_t[:, ::1]
+    counts: np.ndarray,  # int64_t[::1]
+    values: np.ndarray,  # ndarray[groupby_t, ndim=2]
+    labels: np.ndarray,  # const int64_t[:]
+    min_count: int = ...,
+) -> None: ...
+
+def group_cummin(
+    out: np.ndarray,      # groupby_t[:, ::1]
+    values: np.ndarray,   # ndarray[groupby_t, ndim=2]
+    labels: np.ndarray,   # const int64_t[:]
+    ngroups: int,
+    is_datetimelike: bool,
+) -> None: ...
+
+def group_cummax(
+    out: np.ndarray,      # groupby_t[:, ::1]
+    values: np.ndarray,   # ndarray[groupby_t, ndim=2]
+    labels: np.ndarray,   # const int64_t[:]
+    ngroups: int,
+    is_datetimelike: bool,
+) -> None: ...
Original file line number	Diff line number	Diff line change
`@@ -17,6 +17,7 @@ Fixed regressions`
`17`	`17`
`18`	`18`	- Fixed regression in :meth:`DataFrame.sum` when ``min_count`` greater than the :class:`DataFrame` shape was passed resulted in a ``ValueError`` (:issue:`39738`)
`19`	`19`	- Fixed regression in :meth:`DataFrame.to_json` raising ``AttributeError`` when run on PyPy (:issue:`39837`)
	`20`	+- Fixed regression in (in)equality comparison of ``pd.NaT`` with a non-datetimelike numpy array returning a scalar instead of an array (:issue:`40722`)
`20`	`21`	- Fixed regression in :meth:`DataFrame.where` not returning a copy in the case of an all True condition (:issue:`39595`)
`21`	`22`	- Fixed regression in :meth:`DataFrame.replace` raising ``IndexError`` when ``regex`` was a multi-key dictionary (:issue:`39338`)
`22`	`23`	`-`
Original file line number	Diff line number	Diff line change
`@@ -561,6 +561,7 @@ Numeric`
`561`	`561`	- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
`562`	`562`	- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
`563`	`563`	- Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)
	`564`	+- Bug in :meth:`DataFrameGroupBy.rank` giving incorrect results with ``pct=True`` and equal values between consecutive groups (:issue:`40518`)
`564`	`565`	`-`
`565`	`566`
`566`	`567`	`Conversion`