jbrockmendel
diff --git a/‎doc/source/user_guide/groupby.rst
Lines changed: 27 additions & 0 deletions b/‎doc/source/user_guide/groupby.rst
Lines changed: 27 additions & 0 deletions
diff --git a/‎doc/source/whatsnew/v1.1.0.rst
Lines changed: 38 additions & 0 deletions b/‎doc/source/whatsnew/v1.1.0.rst
Lines changed: 38 additions & 0 deletions
diff --git a/‎pandas/_libs/lib.pyx
Lines changed: 1 addition & 1 deletion b/‎pandas/_libs/lib.pyx
Lines changed: 1 addition & 1 deletion
diff --git a/‎pandas/_libs/missing.pxd
Lines changed: 1 addition & 0 deletions b/‎pandas/_libs/missing.pxd
Lines changed: 1 addition & 0 deletions
diff --git a/‎pandas/_libs/missing.pyx
Lines changed: 5 additions & 0 deletions b/‎pandas/_libs/missing.pyx
Lines changed: 5 additions & 0 deletions
diff --git a/‎pandas/_libs/tslib.pyx
Lines changed: 9 additions & 7 deletions b/‎pandas/_libs/tslib.pyx
Lines changed: 9 additions & 7 deletions
diff --git a/‎pandas/_libs/tslibs/conversion.pyx
Lines changed: 6 additions & 5 deletions b/‎pandas/_libs/tslibs/conversion.pyx
Lines changed: 6 additions & 5 deletions
diff --git a/‎pandas/_libs/tslibs/frequencies.pyx
Lines changed: 4 additions & 4 deletions b/‎pandas/_libs/tslibs/frequencies.pyx
Lines changed: 4 additions & 4 deletions
diff --git a/‎pandas/_libs/tslibs/nattype.pyx
Lines changed: 5 additions & 7 deletions b/‎pandas/_libs/tslibs/nattype.pyx
Lines changed: 5 additions & 7 deletions
diff --git a/‎pandas/_libs/tslibs/offsets.pyx
Lines changed: 8 additions & 15 deletions b/‎pandas/_libs/tslibs/offsets.pyx
Lines changed: 8 additions & 15 deletions
@@ -199,6 +199,33 @@ For example, the groups created by ``groupby()`` below are in the order they app
    df3.groupby(['X']).get_group('B')
 
 
+.. _groupby.dropna:
+
+.. versionadded:: 1.1.0
+
+GroupBy dropna
+^^^^^^^^^^^^^^
+
+By default ``NA`` values are excluded from group keys during the ``groupby`` operation. However,
+in case you want to include ``NA`` values in group keys, you could pass ``dropna=False`` to achieve it.
+
+.. ipython:: python
+
+    df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
+    df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])
+
+    df_dropna
+
+.. ipython:: python
+
+    # Default `dropna` is set to True, which will exclude NaNs in keys
+    df_dropna.groupby(by=["b"], dropna=True).sum()
+
+    # In order to allow NaN in keys, set `dropna` to False
+    df_dropna.groupby(by=["b"], dropna=False).sum()
+
+The default setting of ``dropna`` argument is ``True`` which means ``NA`` are not included in group keys.
+
 
 .. _groupby.attributes:
 
 
@@ -36,6 +36,37 @@ For example:
    ser["2014"]
    ser.loc["May 2015"]
 
+
+.. _whatsnew_110.groupby_key:
+
+Allow NA in groupby key
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+With :ref:`groupby <groupby.dropna>` , we've added a ``dropna`` keyword to :meth:`DataFrame.groupby` and :meth:`Series.groupby` in order to
+allow ``NA`` values in group keys. Users can define ``dropna`` to ``False`` if they want to include
+``NA`` values in groupby keys. The default is set to ``True`` for ``dropna`` to keep backwards
+compatibility (:issue:`3729`)
+
+.. ipython:: python
+
+    df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
+    df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])
+
+    df_dropna
+
+.. ipython:: python
+
+    # Default `dropna` is set to True, which will exclude NaNs in keys
+    df_dropna.groupby(by=["b"], dropna=True).sum()
+
+    # In order to allow NaN in keys, set `dropna` to False
+    df_dropna.groupby(by=["b"], dropna=False).sum()
+
+The default setting of ``dropna`` argument is ``True`` which means ``NA`` are not included in group keys.
+
+.. versionadded:: 1.1.0
+
+
 .. _whatsnew_110.key_sorting:
 
 Sorting with keys
@@ -563,6 +594,7 @@ Datetimelike
 - Bug in :meth:`DatetimeIndex.intersection` losing ``freq`` and timezone in some cases (:issue:`33604`)
 - Bug in :class:`DatetimeIndex` addition and subtraction with some types of :class:`DateOffset` objects incorrectly retaining an invalid ``freq`` attribute (:issue:`33779`)
 - Bug in :class:`DatetimeIndex` where setting the ``freq`` attribute on an index could silently change the ``freq`` attribute on another index viewing the same data (:issue:`33552`)
+- :meth:`DataFrame.min`/:meth:`DataFrame.max` not returning consistent result with :meth:`Series.min`/:meth:`Series.max` when called on objects initialized with empty :func:`pd.to_datetime`
 - Bug in :meth:`DatetimeIndex.intersection` and :meth:`TimedeltaIndex.intersection` with results not having the correct ``name`` attribute (:issue:`33904`)
 - Bug in :meth:`DatetimeArray.__setitem__`, :meth:`TimedeltaArray.__setitem__`, :meth:`PeriodArray.__setitem__` incorrectly allowing values with ``int64`` dtype to be silently cast (:issue:`33717`)
 
@@ -574,6 +606,9 @@ Timedelta
 - Timedeltas now understand ``µs`` as identifier for microsecond (:issue:`32899`)
 - :class:`Timedelta` string representation now includes nanoseconds, when nanoseconds are non-zero (:issue:`9309`)
 - Bug in comparing a :class:`Timedelta`` object against a ``np.ndarray`` with ``timedelta64`` dtype incorrectly viewing all entries as unequal (:issue:`33441`)
+- Bug in :func:`timedelta_range` that produced an extra point on a edge case (:issue:`30353`, :issue:`33498`)
+- Bug in :meth:`DataFrame.resample` that produced an extra point on a edge case (:issue:`30353`, :issue:`13022`, :issue:`33498`)
+- Bug in :meth:`DataFrame.resample` that ignored the ``loffset`` argument when dealing with timedelta (:issue:`7687`, :issue:`33498`)
 
 Timezones
 ^^^^^^^^^
@@ -717,6 +752,7 @@ Groupby/resample/rolling
 - Bug in :meth:`DataFrameGroupby.transform` produces incorrect result with transformation functions (:issue:`30918`)
 - Bug in :meth:`GroupBy.count` causes segmentation fault when grouped-by column contains NaNs (:issue:`32841`)
 - Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` produces inconsistent type when aggregating Boolean series (:issue:`32894`)
+- Bug in :meth:`DataFrameGroupBy.sum` and :meth:`SeriesGroupBy.sum` where a large negative number would be returned when the number of non-null values was below ``min_count`` for nullable integer dtypes (:issue:`32861`)
 - Bug in :meth:`SeriesGroupBy.quantile` raising on nullable integers (:issue:`33136`)
 - Bug in :meth:`SeriesGroupBy.first`, :meth:`SeriesGroupBy.last`, :meth:`SeriesGroupBy.min`, and :meth:`SeriesGroupBy.max` returning floats when applied to nullable Booleans (:issue:`33071`)
 - Bug in :meth:`DataFrameGroupBy.agg` with dictionary input losing ``ExtensionArray`` dtypes (:issue:`32194`)
@@ -747,6 +783,7 @@ Reshaping
 - Bug in :meth:`DataFrame.unstack` when MultiIndexed columns and MultiIndexed rows were used (:issue:`32624`, :issue:`24729` and :issue:`28306`)
 - Bug in :func:`concat` was not allowing for concatenation of ``DataFrame`` and ``Series`` with duplicate keys (:issue:`33654`)
 - Bug in :func:`cut` raised an error when non-unique labels (:issue:`33141`)
+- Bug in :meth:`DataFrame.replace` casts columns to ``object`` dtype if items in ``to_replace`` not in values (:issue:`32988`)
 
 
 Sparse
@@ -763,6 +800,7 @@ ExtensionArray
 - Fixed bug that caused :meth:`Series.__repr__()` to crash for extension types whose elements are multidimensional arrays (:issue:`33770`).
 - Fixed bug where :meth:`Series.update` would raise a ``ValueError`` for ``ExtensionArray`` dtypes with missing values (:issue:`33980`)
 - Fixed bug where :meth:`StringArray.memory_usage` was not implemented (:issue:`33963`)
+- Fixed bug that `DataFrame(columns=.., dtype='string')` would fail (:issue:`27953`, :issue:`33623`)
 
 
 Other
 
@@ -75,7 +75,7 @@ from pandas._libs.tslibs.nattype cimport (
 from pandas._libs.tslibs.conversion cimport convert_to_tsobject
 from pandas._libs.tslibs.timedeltas cimport convert_to_timedelta64
 from pandas._libs.tslibs.timezones cimport get_timezone, tz_compare
-from pandas._libs.tslibs.period cimport is_period_object
+from pandas._libs.tslibs.base cimport is_period_object
 
 from pandas._libs.missing cimport (
     checknull,
 
@@ -6,6 +6,7 @@ cpdef ndarray[uint8_t] isnaobj(ndarray arr)
 
 cdef bint is_null_datetime64(v)
 cdef bint is_null_timedelta64(v)
+cdef bint checknull_with_nat_and_na(object obj)
 
 cdef class C_NAType:
     pass
 
@@ -279,6 +279,11 @@ cdef inline bint is_null_timedelta64(v):
     return False
 
 
+cdef bint checknull_with_nat_and_na(object obj):
+    # See GH#32214
+    return checknull_with_nat(obj) or obj is C_NA
+
+
 # -----------------------------------------------------------------------------
 # Implementation of NA singleton
 
 
@@ -51,8 +51,7 @@ from pandas._libs.tslibs.conversion cimport (
     get_datetime64_nanos)
 
 from pandas._libs.tslibs.nattype import nat_strings
-from pandas._libs.tslibs.nattype cimport (
-    checknull_with_nat, NPY_NAT, c_NaT as NaT)
+from pandas._libs.tslibs.nattype cimport NPY_NAT, c_NaT as NaT
 
 from pandas._libs.tslibs.offsets cimport to_offset
 
@@ -64,6 +63,9 @@ from pandas._libs.tslibs.tzconversion cimport (
     tz_convert_utc_to_tzlocal,
 )
 
+# Note: this is the only non-tslibs intra-pandas dependency here
+from pandas._libs.missing cimport checknull_with_nat_and_na
+
 
 cdef inline object create_datetime_from_ts(
     int64_t value,
@@ -438,7 +440,7 @@ def array_with_unit_to_datetime(
         for i in range(n):
             val = values[i]
 
-            if checknull_with_nat(val):
+            if checknull_with_nat_and_na(val):
                 iresult[i] = NPY_NAT
 
             elif is_integer_object(val) or is_float_object(val):
@@ -505,7 +507,7 @@ def array_with_unit_to_datetime(
     for i in range(n):
         val = values[i]
 
-        if checknull_with_nat(val):
+        if checknull_with_nat_and_na(val):
             oresult[i] = <object>NaT
         elif is_integer_object(val) or is_float_object(val):
 
@@ -602,7 +604,7 @@ cpdef array_to_datetime(
             val = values[i]
 
             try:
-                if checknull_with_nat(val):
+                if checknull_with_nat_and_na(val):
                     iresult[i] = NPY_NAT
 
                 elif PyDateTime_Check(val):
@@ -812,7 +814,7 @@ cdef ignore_errors_out_of_bounds_fallback(ndarray[object] values):
         val = values[i]
 
         # set as nan except if its a NaT
-        if checknull_with_nat(val):
+        if checknull_with_nat_and_na(val):
             if isinstance(val, float):
                 oresult[i] = np.nan
             else:
@@ -874,7 +876,7 @@ cdef array_to_datetime_object(
     # 2) datetime strings, which we return as datetime.datetime
     for i in range(n):
         val = values[i]
-        if checknull_with_nat(val) or PyDateTime_Check(val):
+        if checknull_with_nat_and_na(val) or PyDateTime_Check(val):
             # GH 25978. No need to parse NaT-like or datetime-like vals
             oresult[i] = val
         elif isinstance(val, str):
 
@@ -13,7 +13,7 @@ from cpython.datetime cimport (datetime, time, tzinfo,
                                PyDateTime_IMPORT)
 PyDateTime_IMPORT
 
-from pandas._libs.tslibs.base cimport ABCTimestamp
+from pandas._libs.tslibs.base cimport ABCTimestamp, is_period_object
 
 from pandas._libs.tslibs.np_datetime cimport (
     check_dts_bounds, npy_datetimestruct, pandas_datetime_to_datetimestruct,
@@ -37,10 +37,11 @@ from pandas._libs.tslibs.nattype import nat_strings
 from pandas._libs.tslibs.nattype cimport (
     NPY_NAT, checknull_with_nat, c_NaT as NaT)
 
-from pandas._libs.tslibs.tzconversion import (
-    tz_localize_to_utc, tz_convert_single)
+from pandas._libs.tslibs.tzconversion import tz_localize_to_utc
 from pandas._libs.tslibs.tzconversion cimport (
-    _tz_convert_tzlocal_utc, _tz_convert_tzlocal_fromutc)
+    _tz_convert_tzlocal_utc, _tz_convert_tzlocal_fromutc,
+    tz_convert_single
+)
 
 # ----------------------------------------------------------------------
 # Constants
@@ -286,7 +287,7 @@ cdef convert_to_tsobject(object ts, object tz, object unit,
         # Keep the converter same as PyDateTime's
         ts = datetime.combine(ts, time())
         return convert_datetime_to_tsobject(ts, tz)
-    elif getattr(ts, '_typ', None) == 'period':
+    elif is_period_object(ts):
         raise ValueError("Cannot convert Period to Timestamp "
                          "unambiguously. Use to_timestamp")
     else:
 
@@ -3,7 +3,7 @@ import re
 cimport numpy as cnp
 cnp.import_array()
 
-from pandas._libs.tslibs.util cimport is_integer_object
+from pandas._libs.tslibs.util cimport is_integer_object, is_offset_object
 
 from pandas._libs.tslibs.ccalendar import MONTH_NUMBERS
 
@@ -153,7 +153,7 @@ cpdef get_freq_code(freqstr):
     >>> get_freq_code(('D', 3))
     (6000, 3)
     """
-    if getattr(freqstr, '_typ', None) == 'dateoffset':
+    if is_offset_object(freqstr):
         freqstr = (freqstr.rule_code, freqstr.n)
 
     if isinstance(freqstr, tuple):
@@ -451,8 +451,8 @@ cdef str _maybe_coerce_freq(code):
     code : string
     """
     assert code is not None
-    if getattr(code, '_typ', None) == 'dateoffset':
-        # i.e. isinstance(code, ABCDateOffset):
+    if is_offset_object(code):
+        # i.e. isinstance(code, DateOffset):
         code = code.rule_code
     return code.upper()
 
 
@@ -15,11 +15,10 @@ from cpython.datetime cimport (
     datetime,
     timedelta,
 )
+PyDateTime_IMPORT
 
 from cpython.version cimport PY_MINOR_VERSION
 
-PyDateTime_IMPORT
-
 import numpy as np
 cimport numpy as cnp
 from numpy cimport int64_t
@@ -30,8 +29,7 @@ from pandas._libs.tslibs.np_datetime cimport (
     get_timedelta64_value,
 )
 cimport pandas._libs.tslibs.util as util
-
-from pandas._libs.missing cimport C_NA
+from pandas._libs.tslibs.base cimport is_period_object
 
 
 # ----------------------------------------------------------------------
@@ -150,7 +148,7 @@ cdef class _NaT(datetime):
         elif util.is_offset_object(other):
             return c_NaT
 
-        elif util.is_integer_object(other) or util.is_period_object(other):
+        elif util.is_integer_object(other) or is_period_object(other):
             # For Period compat
             # TODO: the integer behavior is deprecated, remove it
             return c_NaT
@@ -186,7 +184,7 @@ cdef class _NaT(datetime):
         elif util.is_offset_object(other):
             return c_NaT
 
-        elif util.is_integer_object(other) or util.is_period_object(other):
+        elif util.is_integer_object(other) or is_period_object(other):
             # For Period compat
             # TODO: the integer behavior is deprecated, remove it
             return c_NaT
@@ -809,7 +807,7 @@ cdef inline bint checknull_with_nat(object val):
     """
     Utility to check if a value is a nat or not.
     """
-    return val is None or util.is_nan(val) or val is c_NaT or val is C_NA
+    return val is None or util.is_nan(val) or val is c_NaT
 
 
 cpdef bint is_null_datetimelike(object val, bint inat_is_null=True):
 
@@ -128,7 +128,7 @@ def apply_index_wraps(func):
     # not play nicely with cython class methods
     def wrapper(self, other):
 
-        is_index = getattr(other, "_typ", "") == "datetimeindex"
+        is_index = not util.is_array(other._data)
 
         # operate on DatetimeArray
         arr = other._data if is_index else other
@@ -168,7 +168,8 @@ def apply_wraps(func):
         elif isinstance(other, (np.datetime64, datetime, date)):
             other = Timestamp(other)
         else:
-            raise TypeError(other)
+            # This will end up returning NotImplemented back in __add__
+            raise ApplyTypeError
 
         tz = other.tzinfo
         nano = other.nanosecond
@@ -474,11 +475,6 @@ class _BaseOffset:
         return type(self)(n=1, normalize=self.normalize, **self.kwds)
 
     def __add__(self, other):
-        if getattr(other, "_typ", None) in ["datetimeindex", "periodindex",
-                                            "datetimearray", "periodarray",
-                                            "series", "period", "dataframe"]:
-            # defer to the other class's implementation
-            return other + self
         try:
             return self.apply(other)
         except ApplyTypeError:
@@ -497,12 +493,12 @@ class _BaseOffset:
         return self.apply(other)
 
     def __mul__(self, other):
-        if hasattr(other, "_typ"):
-            return NotImplemented
         if util.is_array(other):
             return np.array([self * x for x in other])
-        return type(self)(n=other * self.n, normalize=self.normalize,
-                          **self.kwds)
+        elif is_integer_object(other):
+            return type(self)(n=other * self.n, normalize=self.normalize,
+                              **self.kwds)
+        return NotImplemented
 
     def __neg__(self):
         # Note: we are deferring directly to __mul__ instead of __rmul__, as
@@ -705,10 +701,7 @@ class BaseOffset(_BaseOffset):
         return self.__add__(other)
 
     def __rsub__(self, other):
-        if getattr(other, '_typ', None) in ['datetimeindex', 'series']:
-            # i.e. isinstance(other, (ABCDatetimeIndex, ABCSeries))
-            return other - self
-        return -self + other
+        return (-self).__add__(other)
 
 
 cdef class _Tick(ABCTick):