Skip to content

Commit 3616f29

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into tslibs-conversion11
2 parents 4b94f3d + 8dac633 commit 3616f29

File tree

11 files changed

+84
-28
lines changed

11 files changed

+84
-28
lines changed

ci/install_travis.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ fi
3434

3535
# install miniconda
3636
if [ "${TRAVIS_OS_NAME}" == "osx" ]; then
37-
time wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh || exit 1
37+
time wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -q -O miniconda.sh || exit 1
3838
else
39-
time wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh || exit 1
39+
time wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -q -O miniconda.sh || exit 1
4040
fi
4141
time bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1
4242

ci/requirements-3.6_NUMPY_DEV.build.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,10 @@ PRE_WHEELS="https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf
1212
pip install --pre --upgrade --timeout=60 -f $PRE_WHEELS numpy scipy
1313

1414
# install dateutil from master
15-
pip install -U git+git://github.com/dateutil/dateutil.git
15+
16+
# TODO(jreback), temp disable dateutil master has changed
17+
# pip install -U git+git://github.com/dateutil/dateutil.git
18+
pip install python-dateutil
1619

1720
# cython via pip
1821
pip install cython

doc/source/advanced.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -174,14 +174,14 @@ on a deeper level.
174174
Defined Levels
175175
~~~~~~~~~~~~~~
176176

177-
The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even
177+
The repr of a ``MultiIndex`` shows all the defined levels of an index, even
178178
if the they are not actually used. When slicing an index, you may notice this.
179179
For example:
180180

181181
.. ipython:: python
182182
183-
# original multi-index
184-
df.columns
183+
  # original MultiIndex
184+
  df.columns
185185
186186
# sliced
187187
df[['foo','qux']].columns
@@ -264,7 +264,7 @@ Passing a list of labels or tuples works similar to reindexing:
264264
Using slicers
265265
~~~~~~~~~~~~~
266266

267-
You can slice a multi-index by providing multiple indexers.
267+
You can slice a ``MultiIndex`` by providing multiple indexers.
268268

269269
You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label <indexing.label>`,
270270
including slices, lists of labels, labels, and boolean indexers.
@@ -278,16 +278,16 @@ As usual, **both sides** of the slicers are included as this is label indexing.
278278

279279
You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
280280
for the **columns**. There are some ambiguous cases where the passed indexer could be mis-interpreted
281-
as indexing *both* axes, rather than into say the MuliIndex for the rows.
281+
  as indexing *both* axes, rather than into say the ``MultiIndex`` for the rows.
282282

283283
You should do this:
284284

285285
.. code-block:: python
286286
287287
df.loc[(slice('A1','A3'),.....), :]
288288
289-
rather than this:
290-
289+
  rather than this:
290+
 
291291
.. code-block:: python
292292
293293
df.loc[(slice('A1','A3'),.....)]
@@ -494,7 +494,7 @@ are named.
494494
s.sort_index(level='L2')
495495
496496
On higher dimensional objects, you can sort any of the other axes by level if
497-
they have a MultiIndex:
497+
they have a ``MultiIndex``:
498498

499499
.. ipython:: python
500500

doc/source/io.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4538,6 +4538,16 @@ Read from a parquet file.
45384538
45394539
result.dtypes
45404540
4541+
Read only certain columns of a parquet file.
4542+
4543+
.. ipython:: python
4544+
4545+
result = pd.read_parquet('example_pa.parquet', engine='pyarrow', columns=['a', 'b'])
4546+
result = pd.read_parquet('example_fp.parquet', engine='fastparquet', columns=['a', 'b'])
4547+
4548+
result.dtypes
4549+
4550+
45414551
.. ipython:: python
45424552
:suppress:
45434553

doc/source/whatsnew/v0.21.1.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Bug Fixes
6060
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when ``NaT`` is present (:issue:`17791`)
6161
- Bug in :class:`DatetimeIndex` subtracting datetimelike from DatetimeIndex could fail to overflow (:issue:`18020`)
6262
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
63+
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
6364

6465
Conversion
6566
^^^^^^^^^^
@@ -82,6 +83,7 @@ I/O
8283
- Bug in :func:`read_csv` when reading a compressed UTF-16 encoded file (:issue:`18071`)
8384
- Bug in :func:`read_csv` for handling null values in index columns when specifying ``na_filter=False`` (:issue:`5239`)
8485
- Bug in :meth:`DataFrame.to_csv` when the table had ``MultiIndex`` columns, and a list of strings was passed in for ``header`` (:issue:`5539`)
86+
- :func:`read_parquet` now allows to specify the columns to read from a parquet file (:issue:`18154`)
8587

8688
Plotting
8789
^^^^^^^^

pandas/compat/__init__.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -381,17 +381,20 @@ def raise_with_traceback(exc, traceback=Ellipsis):
381381
# http://stackoverflow.com/questions/4126348
382382
# Thanks to @martineau at SO
383383

384-
from dateutil import parser as _date_parser
385384
import dateutil
385+
386+
if PY2 and LooseVersion(dateutil.__version__) == '2.0':
387+
# dateutil brokenness
388+
raise Exception('dateutil 2.0 incompatible with Python 2.x, you must '
389+
'install version 1.5 or 2.1+!')
390+
391+
from dateutil import parser as _date_parser
386392
if LooseVersion(dateutil.__version__) < '2.0':
393+
387394
@functools.wraps(_date_parser.parse)
388395
def parse_date(timestr, *args, **kwargs):
389396
timestr = bytes(timestr)
390397
return _date_parser.parse(timestr, *args, **kwargs)
391-
elif PY2 and LooseVersion(dateutil.__version__) == '2.0':
392-
# dateutil brokenness
393-
raise Exception('dateutil 2.0 incompatible with Python 2.x, you must '
394-
'install version 1.5 or 2.1+!')
395398
else:
396399
parse_date = _date_parser.parse
397400

pandas/core/groupby.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4365,7 +4365,8 @@ def count(self):
43654365
ids, _, ngroups = self.grouper.group_info
43664366
mask = ids != -1
43674367

4368-
val = ((mask & ~isna(blk.get_values())) for blk in data.blocks)
4368+
val = ((mask & ~isna(np.atleast_2d(blk.get_values())))
4369+
for blk in data.blocks)
43694370
loc = (blk.mgr_locs for blk in data.blocks)
43704371

43714372
counter = partial(count_level_2d, labels=ids, max_bin=ngroups, axis=1)

pandas/io/parquet.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,9 @@ def write(self, df, path, compression='snappy',
7676
table, path, compression=compression,
7777
coerce_timestamps=coerce_timestamps, **kwargs)
7878

79-
def read(self, path):
79+
def read(self, path, columns=None):
8080
path, _, _ = get_filepath_or_buffer(path)
81-
return self.api.parquet.read_table(path).to_pandas()
81+
return self.api.parquet.read_table(path, columns=columns).to_pandas()
8282

8383

8484
class FastParquetImpl(object):
@@ -115,9 +115,9 @@ def write(self, df, path, compression='snappy', **kwargs):
115115
self.api.write(path, df,
116116
compression=compression, **kwargs)
117117

118-
def read(self, path):
118+
def read(self, path, columns=None):
119119
path, _, _ = get_filepath_or_buffer(path)
120-
return self.api.ParquetFile(path).to_pandas()
120+
return self.api.ParquetFile(path).to_pandas(columns=columns)
121121

122122

123123
def to_parquet(df, path, engine='auto', compression='snappy', **kwargs):
@@ -178,7 +178,7 @@ def to_parquet(df, path, engine='auto', compression='snappy', **kwargs):
178178
return impl.write(df, path, compression=compression)
179179

180180

181-
def read_parquet(path, engine='auto', **kwargs):
181+
def read_parquet(path, engine='auto', columns=None, **kwargs):
182182
"""
183183
Load a parquet object from the file path, returning a DataFrame.
184184
@@ -188,6 +188,10 @@ def read_parquet(path, engine='auto', **kwargs):
188188
----------
189189
path : string
190190
File path
191+
columns: list, default=None
192+
If not None, only these columns will be read from the file.
193+
194+
.. versionadded 0.21.1
191195
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
192196
Parquet reader library to use. If 'auto', then the option
193197
'io.parquet.engine' is used. If 'auto', then the first
@@ -201,4 +205,4 @@ def read_parquet(path, engine='auto', **kwargs):
201205
"""
202206

203207
impl = get_engine(engine)
204-
return impl.read(path)
208+
return impl.read(path, columns=columns)

pandas/tests/frame/test_dtypes.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
from pandas import (DataFrame, Series, date_range, Timedelta, Timestamp,
1111
compat, concat, option_context)
1212
from pandas.compat import u
13+
from pandas import _np_version_under1p14
14+
1315
from pandas.core.dtypes.dtypes import DatetimeTZDtype
1416
from pandas.tests.frame.common import TestData
1517
from pandas.util.testing import (assert_series_equal,
@@ -531,7 +533,12 @@ def test_astype_str(self):
531533
assert_frame_equal(result, expected)
532534

533535
result = DataFrame([1.12345678901234567890]).astype(tt)
534-
expected = DataFrame(['1.12345678901'])
536+
if _np_version_under1p14:
537+
# < 1.14 truncates
538+
expected = DataFrame(['1.12345678901'])
539+
else:
540+
# >= 1.14 preserves the full repr
541+
expected = DataFrame(['1.1234567890123457'])
535542
assert_frame_equal(result, expected)
536543

537544
@pytest.mark.parametrize("dtype_class", [dict, Series])

pandas/tests/groupby/test_counting.py

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
from __future__ import print_function
33

44
import numpy as np
5+
import pytest
56

6-
from pandas import (DataFrame, Series, MultiIndex)
7-
from pandas.util.testing import assert_series_equal
7+
from pandas import (DataFrame, Series, MultiIndex, Timestamp, Timedelta,
8+
Period)
9+
from pandas.util.testing import (assert_series_equal, assert_frame_equal)
810
from pandas.compat import (range, product as cart_product)
911

1012

@@ -195,3 +197,18 @@ def test_ngroup_respects_groupby_order(self):
195197
g.ngroup())
196198
assert_series_equal(Series(df['group_index'].values),
197199
g.cumcount())
200+
201+
@pytest.mark.parametrize('datetimelike', [
202+
[Timestamp('2016-05-%02d 20:09:25+00:00' % i) for i in range(1, 4)],
203+
[Timestamp('2016-05-%02d 20:09:25' % i) for i in range(1, 4)],
204+
[Timedelta(x, unit="h") for x in range(1, 4)],
205+
[Period(freq="2W", year=2017, month=x) for x in range(1, 4)]])
206+
def test_count_with_datetimelike(self, datetimelike):
207+
# test for #13393, where DataframeGroupBy.count() fails
208+
# when counting a datetimelike column.
209+
210+
df = DataFrame({'x': ['a', 'a', 'b'], 'y': datetimelike})
211+
res = df.groupby('x').count()
212+
expected = DataFrame({'y': [2, 1]}, index=['a', 'b'])
213+
expected.index.name = "x"
214+
assert_frame_equal(expected, res)

pandas/tests/io/test_parquet.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -192,15 +192,15 @@ def check_round_trip(self, df, engine, expected=None, **kwargs):
192192

193193
with tm.ensure_clean() as path:
194194
df.to_parquet(path, engine, **kwargs)
195-
result = read_parquet(path, engine)
195+
result = read_parquet(path, engine, **kwargs)
196196

197197
if expected is None:
198198
expected = df
199199
tm.assert_frame_equal(result, expected)
200200

201201
# repeat
202202
to_parquet(df, path, engine, **kwargs)
203-
result = pd.read_parquet(path, engine)
203+
result = pd.read_parquet(path, engine, **kwargs)
204204

205205
if expected is None:
206206
expected = df
@@ -282,6 +282,15 @@ def test_compression(self, engine, compression):
282282
df = pd.DataFrame({'A': [1, 2, 3]})
283283
self.check_round_trip(df, engine, compression=compression)
284284

285+
def test_read_columns(self, engine):
286+
# GH18154
287+
df = pd.DataFrame({'string': list('abc'),
288+
'int': list(range(1, 4))})
289+
290+
expected = pd.DataFrame({'string': list('abc')})
291+
self.check_round_trip(df, engine, expected=expected,
292+
compression=None, columns=["string"])
293+
285294

286295
class TestParquetPyArrow(Base):
287296

0 commit comments

Comments
 (0)