Skip to content

Commit f0b1d97

Browse files
committed
Merge remote-tracking branch 'upstream/master' into 26814-optional-fu
2 parents 207bd5a + 634577e commit f0b1d97

File tree

4 files changed

+67
-47
lines changed

4 files changed

+67
-47
lines changed

doc/source/user_guide/timeseries.rst

Lines changed: 45 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -761,34 +761,6 @@ regularity will result in a ``DatetimeIndex``, although frequency is lost:
761761
762762
ts2[[0, 2, 6]].index
763763
764-
.. _timeseries.iterating-label:
765-
766-
Iterating through groups
767-
------------------------
768-
769-
With the ``Resampler`` object in hand, iterating through the grouped data is very
770-
natural and functions similarly to :py:func:`itertools.groupby`:
771-
772-
.. ipython:: python
773-
774-
small = pd.Series(
775-
range(6),
776-
index=pd.to_datetime(['2017-01-01T00:00:00',
777-
'2017-01-01T00:30:00',
778-
'2017-01-01T00:31:00',
779-
'2017-01-01T01:00:00',
780-
'2017-01-01T03:00:00',
781-
'2017-01-01T03:05:00'])
782-
)
783-
resampled = small.resample('H')
784-
785-
for name, group in resampled:
786-
print("Group: ", name)
787-
print("-" * 27)
788-
print(group, end="\n\n")
789-
790-
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
791-
792764
.. _timeseries.components:
793765

794766
Time/Date Components
@@ -1628,24 +1600,32 @@ labels.
16281600
16291601
ts.resample('5Min', label='left', loffset='1s').mean()
16301602
1631-
.. note::
1603+
.. warning::
16321604

1633-
The default values for ``label`` and ``closed`` is 'left' for all
1605+
The default values for ``label`` and ``closed`` is '**left**' for all
16341606
frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W'
16351607
which all have a default of 'right'.
16361608

1609+
This might unintendedly lead to looking ahead, where the value for a later
1610+
time is pulled back to a previous time as in the following example with
1611+
the :class:`~pandas.tseries.offsets.BusinessDay` frequency:
1612+
16371613
.. ipython:: python
16381614
1639-
rng2 = pd.date_range('1/1/2012', end='3/31/2012', freq='D')
1640-
ts2 = pd.Series(range(len(rng2)), index=rng2)
1615+
s = pd.date_range('2000-01-01', '2000-01-05').to_series()
1616+
s.iloc[2] = pd.NaT
1617+
s.dt.weekday_name
16411618
1642-
# default: label='right', closed='right'
1643-
ts2.resample('M').max()
1619+
# default: label='left', closed='left'
1620+
s.resample('B').last().dt.weekday_name
16441621
1645-
# default: label='left', closed='left'
1646-
ts2.resample('SM').max()
1622+
Notice how the value for Sunday got pulled back to the previous Friday.
1623+
To get the behavior where the value for Sunday is pushed to Monday, use
1624+
instead
16471625

1648-
ts2.resample('SM', label='right', closed='right').max()
1626+
.. ipython:: python
1627+
1628+
s.resample('B', label='right', closed='right').last().dt.weekday_name
16491629
16501630
The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
16511631
specified axis for a ``DataFrame``.
@@ -1796,6 +1776,34 @@ level of ``MultiIndex``, its name or location can be passed to the
17961776
17971777
df.resample('M', level='d').sum()
17981778
1779+
.. _timeseries.iterating-label:
1780+
1781+
Iterating through groups
1782+
~~~~~~~~~~~~~~~~~~~~~~~~
1783+
1784+
With the ``Resampler`` object in hand, iterating through the grouped data is very
1785+
natural and functions similarly to :py:func:`itertools.groupby`:
1786+
1787+
.. ipython:: python
1788+
1789+
small = pd.Series(
1790+
range(6),
1791+
index=pd.to_datetime(['2017-01-01T00:00:00',
1792+
'2017-01-01T00:30:00',
1793+
'2017-01-01T00:31:00',
1794+
'2017-01-01T01:00:00',
1795+
'2017-01-01T03:00:00',
1796+
'2017-01-01T03:05:00'])
1797+
)
1798+
resampled = small.resample('H')
1799+
1800+
for name, group in resampled:
1801+
print("Group: ", name)
1802+
print("-" * 27)
1803+
print(group, end="\n\n")
1804+
1805+
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
1806+
17991807

18001808
.. _timeseries.periods:
18011809

doc/source/whatsnew/v0.25.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -667,6 +667,7 @@ I/O
667667
- Bug in :func:`read_json` where date strings with ``Z`` were not converted to a UTC timezone (:issue:`26168`)
668668
- Added ``cache_dates=True`` parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`)
669669
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
670+
- Fixed bug in :func:`pandas.read_csv` where a BOM would result in incorrect parsing using engine='python' (:issue:`26545`)
670671
- :func:`read_excel` now raises a ``ValueError`` when input is of type :class:`pandas.io.excel.ExcelFile` and ``engine`` param is passed since :class:`pandas.io.excel.ExcelFile` has an engine defined (:issue:`26566`)
671672
- Bug while selecting from :class:`HDFStore` with ``where=''`` specified (:issue:`26610`).
672673

pandas/io/parsers.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2755,23 +2755,24 @@ def _check_for_bom(self, first_row):
27552755
if first_elt != _BOM:
27562756
return first_row
27572757

2758-
first_row = first_row[0]
2758+
first_row_bom = first_row[0]
27592759

2760-
if len(first_row) > 1 and first_row[1] == self.quotechar:
2760+
if len(first_row_bom) > 1 and first_row_bom[1] == self.quotechar:
27612761
start = 2
2762-
quote = first_row[1]
2763-
end = first_row[2:].index(quote) + 2
2762+
quote = first_row_bom[1]
2763+
end = first_row_bom[2:].index(quote) + 2
27642764

27652765
# Extract the data between the quotation marks
2766-
new_row = first_row[start:end]
2766+
new_row = first_row_bom[start:end]
27672767

27682768
# Extract any remaining data after the second
27692769
# quotation mark.
2770-
if len(first_row) > end + 1:
2771-
new_row += first_row[end + 1:]
2772-
return [new_row]
2773-
elif len(first_row) > 1:
2774-
return [first_row[1:]]
2770+
if len(first_row_bom) > end + 1:
2771+
new_row += first_row_bom[end + 1:]
2772+
return [new_row] + first_row[1:]
2773+
2774+
elif len(first_row_bom) > 1:
2775+
return [first_row_bom[1:]]
27752776
else:
27762777
# First row is just the BOM, so we
27772778
# return an empty string.

pandas/tests/io/parser/test_common.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1927,3 +1927,13 @@ def test_read_table_deprecated(all_parsers):
19271927
check_stacklevel=False):
19281928
result = parser.read_table(StringIO(data))
19291929
tm.assert_frame_equal(result, expected)
1930+
1931+
1932+
def test_first_row_bom(all_parsers):
1933+
# see gh-26545
1934+
parser = all_parsers
1935+
data = '''\ufeff"Head1" "Head2" "Head3"'''
1936+
1937+
result = parser.read_csv(StringIO(data), delimiter='\t')
1938+
expected = DataFrame(columns=["Head1", "Head2", "Head3"])
1939+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)