Skip to content

Commit e7b865c

Browse files
committed
merge
2 parents 95c012a + d150f17 commit e7b865c

File tree

418 files changed

+14423
-10622
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

418 files changed

+14423
-10622
lines changed

.github/CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Whether you are a novice or experienced software developer, all contributions and suggestions are welcome!
44

5-
Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst) or [on the website](https://pandas-docs.github.io/pandas-docs-travis/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information.
5+
Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst) or [on the website](https://pandas-docs.github.io/pandas-docs-travis/development/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information.
66

77
## Getting Started
88

.github/FUNDING.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
custom: https://pandas.pydata.org/donate.html

.travis.yml

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,10 @@ matrix:
4848
env:
4949
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
5050

51-
# In allow_failures
52-
- dist: trusty
53-
env:
54-
- JOB="3.6, doc" ENV_FILE="ci/deps/travis-36-doc.yaml" DOC=true
5551
allow_failures:
5652
- dist: trusty
5753
env:
5854
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
59-
- dist: trusty
60-
env:
61-
- JOB="3.6, doc" ENV_FILE="ci/deps/travis-36-doc.yaml" DOC=true
6255

6356
before_install:
6457
- echo "before_install"
@@ -86,19 +79,21 @@ install:
8679
- ci/submit_cython_cache.sh
8780
- echo "install done"
8881

82+
before_script:
83+
# display server (for clipboard functionality) needs to be started here,
84+
# does not work if done in install:setup_env.sh (GH-26103)
85+
- export DISPLAY=":99.0"
86+
- echo "sh -e /etc/init.d/xvfb start"
87+
- sh -e /etc/init.d/xvfb start
88+
- sleep 3
89+
8990
script:
9091
- echo "script start"
9192
- source activate pandas-dev
92-
- ci/build_docs.sh
9393
- ci/run_tests.sh
9494

9595
after_script:
9696
- echo "after_script start"
9797
- source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd
98-
- if [ -e test-data-single.xml ]; then
99-
ci/print_skipped.py test-data-single.xml;
100-
fi
101-
- if [ -e test-data-multiple.xml ]; then
102-
ci/print_skipped.py test-data-multiple.xml;
103-
fi
98+
- ci/print_skipped.py
10499
- echo "after_script done"

LICENSES/HAVEN_LICENSE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
YEAR: 2013-2016
2+
COPYRIGHT HOLDER: Hadley Wickham; RStudio; and Evan Miller

LICENSES/HAVEN_MIT

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
Based on http://opensource.org/licenses/MIT
2+
3+
This is a template. Complete and ship as file LICENSE the following 2
4+
lines (only)
5+
6+
YEAR:
7+
COPYRIGHT HOLDER:
8+
9+
and specify as
10+
11+
License: MIT + file LICENSE
12+
13+
Copyright (c) <YEAR>, <COPYRIGHT HOLDER>
14+
15+
Permission is hereby granted, free of charge, to any person obtaining
16+
a copy of this software and associated documentation files (the
17+
"Software"), to deal in the Software without restriction, including
18+
without limitation the rights to use, copy, modify, merge, publish,
19+
distribute, sublicense, and/or sell copies of the Software, and to
20+
permit persons to whom the Software is furnished to do so, subject to
21+
the following conditions:
22+
23+
The above copyright notice and this permission notice shall be
24+
included in all copies or substantial portions of the Software.
25+
26+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
27+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
28+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
29+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
30+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
31+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
32+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ Most development discussion is taking place on github in this repo. Further, the
224224

225225
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
226226

227-
A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas-docs.github.io/pandas-docs-travis/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
227+
A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
228228

229229
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
230230

asv_bench/asv.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@
107107
// `asv` will cache wheels of the recent builds in each
108108
// environment, making them faster to install next time. This is
109109
// number of builds to keep, per environment.
110-
"wheel_cache_size": 8,
110+
"build_cache_size": 8,
111111

112112
// The commits after which the regression search in `asv publish`
113113
// should start looking for regressions. Dictionary whose keys are

asv_bench/benchmarks/frame_methods.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@ def time_dict_rename_both_axes(self):
9696

9797

9898
class Iteration:
99+
# mem_itertuples_* benchmarks are slow
100+
timeout = 120
99101

100102
def setup(self):
101103
N = 1000

asv_bench/benchmarks/groupby.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
from functools import partial
22
from itertools import product
33
from string import ascii_letters
4-
import warnings
54

65
import numpy as np
76

87
from pandas import (
9-
Categorical, DataFrame, MultiIndex, Series, TimeGrouper, Timestamp,
8+
Categorical, DataFrame, MultiIndex, Series, Timestamp,
109
date_range, period_range)
1110
import pandas.util.testing as tm
1211

@@ -301,10 +300,6 @@ def setup(self):
301300
def time_multi_size(self):
302301
self.df.groupby(['key1', 'key2']).size()
303302

304-
def time_dt_timegrouper_size(self):
305-
with warnings.catch_warnings(record=True):
306-
self.df.groupby(TimeGrouper(key='dates', freq='M')).size()
307-
308303
def time_category_size(self):
309304
self.draws.groupby(self.cats).size()
310305

asv_bench/benchmarks/index_object.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ def time_is_dates_only(self):
5252

5353
class Ops:
5454

55-
sample_time = 0.2
5655
params = ['float', 'int']
5756
param_names = ['dtype']
5857

@@ -95,6 +94,12 @@ def time_min(self):
9594
def time_min_trivial(self):
9695
self.idx_inc.min()
9796

97+
def time_get_loc_inc(self):
98+
self.idx_inc.get_loc(900000)
99+
100+
def time_get_loc_dec(self):
101+
self.idx_dec.get_loc(100000)
102+
98103

99104
class IndexAppend:
100105

@@ -191,8 +196,26 @@ def setup(self, N):
191196
self.intv = IntervalIndex.from_arrays(left, right)
192197
self.intv._engine
193198

199+
self.intv2 = IntervalIndex.from_arrays(left + 1, right + 1)
200+
self.intv2._engine
201+
202+
self.left = IntervalIndex.from_breaks(np.arange(N))
203+
self.right = IntervalIndex.from_breaks(np.arange(N - 3, 2 * N - 3))
204+
194205
def time_monotonic_inc(self, N):
195206
self.intv.is_monotonic_increasing
196207

208+
def time_is_unique(self, N):
209+
self.intv.is_unique
210+
211+
def time_intersection(self, N):
212+
self.left.intersection(self.right)
213+
214+
def time_intersection_one_duplicate(self, N):
215+
self.intv.intersection(self.right)
216+
217+
def time_intersection_both_duplicate(self, N):
218+
self.intv.intersection(self.intv2)
219+
197220

198221
from .pandas_vb_common import setup # noqa: F401

asv_bench/benchmarks/io/csv.py

Lines changed: 56 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
import numpy as np
55
import pandas.util.testing as tm
6-
from pandas import DataFrame, Categorical, date_range, read_csv
6+
from pandas import DataFrame, Categorical, date_range, read_csv, to_datetime
77
from pandas.io.parsers import _parser_defaults
88
from io import StringIO
99

@@ -96,6 +96,35 @@ def time_read_csv(self, infer_datetime_format, format):
9696
infer_datetime_format=infer_datetime_format)
9797

9898

99+
class ReadCSVConcatDatetime(StringIORewind):
100+
101+
iso8601 = '%Y-%m-%d %H:%M:%S'
102+
103+
def setup(self):
104+
rng = date_range('1/1/2000', periods=50000, freq='S')
105+
self.StringIO_input = StringIO('\n'.join(
106+
rng.strftime(self.iso8601).tolist()))
107+
108+
def time_read_csv(self):
109+
read_csv(self.data(self.StringIO_input),
110+
header=None, names=['foo'], parse_dates=['foo'],
111+
infer_datetime_format=False)
112+
113+
114+
class ReadCSVConcatDatetimeBadDateValue(StringIORewind):
115+
116+
params = (['nan', '0', ''],)
117+
param_names = ['bad_date_value']
118+
119+
def setup(self, bad_date_value):
120+
self.StringIO_input = StringIO(('%s,\n' % bad_date_value) * 50000)
121+
122+
def time_read_csv(self, bad_date_value):
123+
read_csv(self.data(self.StringIO_input),
124+
header=None, names=['foo', 'bar'], parse_dates=['foo'],
125+
infer_datetime_format=False)
126+
127+
99128
class ReadCSVSkipRows(BaseIO):
100129

101130
fname = '__test__.csv'
@@ -273,7 +302,7 @@ def mem_parser_chunks(self):
273302

274303
class ReadCSVParseSpecialDate(StringIORewind):
275304
params = (['mY', 'mdY', 'hm'],)
276-
params_name = ['value']
305+
param_names = ['value']
277306
objects = {
278307
'mY': '01-2019\n10-2019\n02/2000\n',
279308
'mdY': '12/02/2010\n',
@@ -290,4 +319,29 @@ def time_read_special_date(self, value):
290319
names=['Date'], parse_dates=['Date'])
291320

292321

322+
class ParseDateComparison(StringIORewind):
323+
params = ([False, True],)
324+
param_names = ['cache_dates']
325+
326+
def setup(self, cache_dates):
327+
count_elem = 10000
328+
data = '12-02-2010\n' * count_elem
329+
self.StringIO_input = StringIO(data)
330+
331+
def time_read_csv_dayfirst(self, cache_dates):
332+
read_csv(self.data(self.StringIO_input), sep=',', header=None,
333+
names=['Date'], parse_dates=['Date'], cache_dates=cache_dates,
334+
dayfirst=True)
335+
336+
def time_to_datetime_dayfirst(self, cache_dates):
337+
df = read_csv(self.data(self.StringIO_input),
338+
dtype={'date': str}, names=['date'])
339+
to_datetime(df['date'], cache=cache_dates, dayfirst=True)
340+
341+
def time_to_datetime_format_DD_MM_YYYY(self, cache_dates):
342+
df = read_csv(self.data(self.StringIO_input),
343+
dtype={'date': str}, names=['date'])
344+
to_datetime(df['date'], cache=cache_dates, format='%d-%m-%Y')
345+
346+
293347
from ..pandas_vb_common import setup # noqa: F401

asv_bench/benchmarks/io/parsers.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
import numpy as np
2+
3+
try:
4+
from pandas._libs.tslibs.parsing import (
5+
_concat_date_cols, _does_string_look_like_datetime)
6+
except ImportError:
7+
# Avoid whole benchmark suite import failure on asv (currently 0.4)
8+
pass
9+
10+
11+
class DoesStringLookLikeDatetime(object):
12+
13+
params = (['2Q2005', '0.0', '10000'],)
14+
param_names = ['value']
15+
16+
def setup(self, value):
17+
self.objects = [value] * 1000000
18+
19+
def time_check_datetimes(self, value):
20+
for obj in self.objects:
21+
_does_string_look_like_datetime(obj)
22+
23+
24+
class ConcatDateCols(object):
25+
26+
params = ([1234567890, 'AAAA'], [1, 2])
27+
param_names = ['value', 'dim']
28+
29+
def setup(self, value, dim):
30+
count_elem = 10000
31+
if dim == 1:
32+
self.object = (np.array([value] * count_elem),)
33+
if dim == 2:
34+
self.object = (np.array([value] * count_elem),
35+
np.array([value] * count_elem))
36+
37+
def time_check_concat(self, value, dim):
38+
_concat_date_cols(self.object)

asv_bench/benchmarks/multiindex_object.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import numpy as np
44
import pandas.util.testing as tm
5-
from pandas import date_range, MultiIndex
5+
from pandas import date_range, MultiIndex, DataFrame
66

77

88
class GetLoc:
@@ -126,4 +126,18 @@ def time_datetime_level_values_sliced(self, mi):
126126
mi[:10].values
127127

128128

129+
class CategoricalLevel:
130+
131+
def setup(self):
132+
133+
self.df = DataFrame({
134+
'a': np.arange(1_000_000, dtype=np.int32),
135+
'b': np.arange(1_000_000, dtype=np.int64),
136+
'c': np.arange(1_000_000, dtype=float),
137+
}).astype({'a': 'category', 'b': 'category'})
138+
139+
def time_categorical_level(self):
140+
self.df.set_index(['a', 'b'])
141+
142+
129143
from .pandas_vb_common import setup # noqa: F401

asv_bench/benchmarks/rolling.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44

55
class Methods:
66

7-
sample_time = 0.2
87
params = (['DataFrame', 'Series'],
98
[10, 1000],
109
['int', 'float'],
@@ -23,7 +22,6 @@ def time_rolling(self, constructor, window, dtype, method):
2322

2423
class ExpandingMethods:
2524

26-
sample_time = 0.2
2725
params = (['DataFrame', 'Series'],
2826
['int', 'float'],
2927
['median', 'mean', 'max', 'min', 'std', 'count', 'skew', 'kurt',
@@ -41,7 +39,6 @@ def time_expanding(self, constructor, dtype, method):
4139

4240
class EWMMethods:
4341

44-
sample_time = 0.2
4542
params = (['DataFrame', 'Series'],
4643
[10, 1000],
4744
['int', 'float'],
@@ -58,7 +55,6 @@ def time_ewm(self, constructor, window, dtype, method):
5855

5956

6057
class VariableWindowMethods(Methods):
61-
sample_time = 0.2
6258
params = (['DataFrame', 'Series'],
6359
['50s', '1h', '1d'],
6460
['int', 'float'],
@@ -75,7 +71,6 @@ def setup(self, constructor, window, dtype, method):
7571

7672
class Pairwise:
7773

78-
sample_time = 0.2
7974
params = ([10, 1000, None],
8075
['corr', 'cov'],
8176
[True, False])
@@ -95,7 +90,6 @@ def time_pairwise(self, window, method, pairwise):
9590

9691

9792
class Quantile:
98-
sample_time = 0.2
9993
params = (['DataFrame', 'Series'],
10094
[10, 1000],
10195
['int', 'float'],

0 commit comments

Comments
 (0)