Skip to content

Commit 2cc6e48

Browse files
timmiejreback
authored andcommitted
inserted all tested changes again after reset to master
the merge conflict chaos was too hard ;-(
1 parent dc3ead3 commit 2cc6e48

15 files changed

+443
-20
lines changed

doc/source/basics.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,9 @@ take an optional ``axis`` argument:
465465
df.apply(lambda x: x.max() - x.min())
466466
df.apply(np.cumsum)
467467
df.apply(np.exp)
468+
469+
Please note that the default is the application along the DataFrame's index
470+
(axis=0) whereas for applying along the columns (axis=1) axis must be specified explicitly (see also :ref:`api.dataframe` and :ref:`api.series`).
468471

469472
Depending on the return type of the function passed to ``apply``, the result
470473
will either be of lower dimension or the same dimension.

doc/source/dsintro.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,9 @@ available to insert at a particular location in the columns:
431431
432432
Indexing / Selection
433433
~~~~~~~~~~~~~~~~~~~~
434+
435+
.. _dsintro.basics-of-indexing:
436+
434437
The basics of indexing are as follows:
435438

436439
.. csv-table::
@@ -450,6 +453,20 @@ DataFrame:
450453
451454
df.loc['b']
452455
df.iloc[2]
456+
457+
There is also support for purely integer-based indexing provided by the following methods:
458+
459+
.. _dsintro.integer-indexing:
460+
461+
.. csv-table::
462+
:header: "Method","Description"
463+
:widths: 40,60
464+
465+
``Series.iget_value(i)``, Retrieve value stored at location ``i``
466+
``Series.iget(i)``, Alias for ``iget_value``
467+
``DataFrame.irow(i)``, Retrieve the ``i``-th row
468+
``DataFrame.icol(j)``, Retrieve the ``j``-th column
469+
"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
453470

454471
For a more exhaustive treatment of more sophisticated label-based indexing and
455472
slicing, see the :ref:`section on indexing <indexing>`. We will address the

doc/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,4 +133,5 @@ See the package overview for more detail about what's in the library.
133133
related
134134
comparison_with_r
135135
api
136+
shortcuts
136137
release

doc/source/shortcuts.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
.. _shortcuts:
2+
3+
.. currentmodule:: pandas
4+
5+
****************************************
6+
Shortcuts wthin Pandas Doucumentation
7+
****************************************
8+
9+
This page shall give quick access to sections in the documentation on commonly used parameters.
10+
Instead of a cheatsheet, it shall just provides refrences preventing the reader from need to bookmark refrence pages or search through the documentation.
11+
12+
.. internal comment: these tables could also be marked by a special directive (overview)
13+
14+
Basics
15+
~~~~~~~~~~~~~~~~~~~~
16+
17+
* :ref:`The basics of indexing <dsintro.basics-of-indexing>` summarises the key methods for selction and indexing.
18+
* :ref:`Integer Indexing <dsintro.integer-indexing>` supplements with additional methods.
19+
20+
Time Series
21+
~~~~~~~~~~~~~~~~~~~~
22+
23+
* :ref:`DateOffset objects <timeseries.dateoffset>` lists the implemented frequencies.
24+
* :ref:`Common Aliases <timeseries.offset-aliases>` ease up referencing these frequencies.
25+
26+
27+
Helpers
28+
~~~~~~~~~~~~~~~~~~~~
29+
30+
* Quickly generate test data frames as given in :ref:`in this example <reshaping.reshapeing-pivoting>`

doc/source/timeseries.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -387,6 +387,8 @@ regularity will result in a ``DatetimeIndex`` (but frequency is lost):
387387
DateOffset objects
388388
------------------
389389

390+
.. _timeseries.dateoffset:
391+
390392
In the preceding examples, we created DatetimeIndex objects at various
391393
frequencies by passing in frequency strings like 'M', 'W', and 'BM to the
392394
``freq`` keyword. Under the hood, these frequency strings are being translated
@@ -547,6 +549,8 @@ calendars which account for local holidays and local weekend conventions.
547549
Offset Aliases
548550
~~~~~~~~~~~~~~
549551

552+
.. _timeseries.offset-aliases:
553+
550554
A number of string aliases are given to useful common time series
551555
frequencies. We will refer to these aliases as *offset aliases*
552556
(referred to as *time rules* prior to v0.8.0).
@@ -1246,3 +1250,51 @@ The following are equivalent statements in the two versions of numpy.
12461250
y / np.timedelta64(1,'D')
12471251
y / np.timedelta64(1,'s')
12481252
1253+
Working with timedate-based indices
1254+
-------------------------------------------------------
1255+
1256+
The :func:`pd.datetime` allows for vectorised operations using datetime information stored in a :ref:`timeseries.datetimeindex`.
1257+
1258+
Use cases are:
1259+
1260+
* calculation of sunsunrise, sunset, daylength
1261+
* boolean test of working hours
1262+
1263+
An example contributed by a savvy user at `Stackoverflow <http://stackoverflow.com/a/15839530>`_:
1264+
1265+
.. ipython:: python
1266+
1267+
import pandas as pd
1268+
1269+
###1) create a date column from indiviadual year, month, day columns
1270+
df = pd.DataFrame({"year": [1992, 2003, 2014], "month": [2,3,4], "day": [10,20,30]})
1271+
df
1272+
1273+
df["Date"] = df.apply(lambda x: pd.datetime(x['year'], x['month'], x['day']), axis=1)
1274+
df
1275+
1276+
###2) alternatively, use the equivalent to datetime.datetime.combine
1277+
import numpy as np
1278+
1279+
#create a hourly timeseries
1280+
data_randints = np.random.randint(1, 10, 4000)
1281+
data_randints = data_randints.reshape(1000, 4)
1282+
ts = pd.Series(randn(1000), index=pd.date_range('1/1/2000', periods=1000, freq='H'))
1283+
df = pd.DataFrame(data_randints, index=ts.index, columns=['A', 'B', 'C', 'D'])
1284+
df.head()
1285+
1286+
#only for examplary purposes: get the date & time from the df.index
1287+
# in real world, these would be read in or generated from different columns
1288+
df['date'] = df.index.date
1289+
df.head()
1290+
1291+
df['time'] = df.index.time
1292+
df.head()
1293+
1294+
#combine both:
1295+
df['datetime'] = df.apply((lambda x: pd.datetime.combine(x['date'], x['time'])), axis=1)
1296+
df.head()
1297+
1298+
#the index could be set to the created column
1299+
df = df.set_index(['datetime'])
1300+
df.head()

doc/source/v0.13.0.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,10 @@ Bug Fixes
270270

271271
- Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)
272272

273+
- read_excel (:issue:`4332`) supports a date_parser. This enabales to reading in hours like 01:00-24:00 in both `Excel datemodes <http://support.microsoft.com/kb/214330/en-us>`_
274+
275+
- (Excel) parser (:issue:`4340`) allows skipping an arbitrary number of lines between header and first row.
276+
273277
See the :ref:`full release notes
274278
<release>` or issue tracker
275279
on GitHub for a complete list.

doc/source/v0.7.0.txt

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -150,18 +150,8 @@ This change also has the same impact on DataFrame:
150150
In [5]: df.ix[3]
151151
KeyError: 3
152152

153-
In order to support purely integer-based indexing, the following methods have
154-
been added:
155-
156-
.. csv-table::
157-
:header: "Method","Description"
158-
:widths: 40,60
159-
160-
``Series.iget_value(i)``, Retrieve value stored at location ``i``
161-
``Series.iget(i)``, Alias for ``iget_value``
162-
``DataFrame.irow(i)``, Retrieve the ``i``-th row
163-
``DataFrame.icol(j)``, Retrieve the ``j``-th column
164-
"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
153+
In order to support purely `integer-based indexing <dsintro.integer-indexing>`, `corresponding methods <dsintro.integer-indexing>` have been added.
154+
165155

166156
API tweaks regarding label-based slicing
167157
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pandas/io/date_converters.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
"""This module is designed for community supported date conversion functions"""
2+
from datetime import datetime, timedelta, time
3+
24
from pandas.compat import range
35
import numpy as np
46
import pandas.lib as lib
@@ -56,3 +58,76 @@ def _check_columns(cols):
5658
raise AssertionError()
5759

5860
return N
61+
62+
63+
## Datetime Conversion for date_parsers
64+
## see also: create a community supported set of typical converters
65+
## https://github.com/pydata/pandas/issues/1180
66+
67+
def offset_datetime(dt_in, days=0, hours=0, minutes=0,
68+
seconds=0, microseconds=0):
69+
'''appply corrective time offset using datetime.timedelta
70+
71+
input
72+
-----
73+
dt_in : datetime.time or datetime.datetime object
74+
days : integer value (positive or negative) for days component of offset
75+
hours : integer value (positive or negative) for hours component of offset
76+
minutes : integer value (positive or negative) for
77+
minutes component of offset
78+
seconds : integer value (positive or negative) for
79+
seconds component of offset
80+
microseconds : integer value (positive or negative) for
81+
microseconds component of offset
82+
83+
output
84+
------
85+
ti_corr : datetime.time or datetime.datetime object
86+
87+
88+
'''
89+
# if a excel time like '23.07.2013 24:00' they actually mean
90+
# in Python '23.07.2013 23:59', must be converted
91+
# offset = -10 # minutes
92+
delta = timedelta(days=days, hours=hours, minutes=minutes,
93+
seconds=seconds, microseconds=microseconds)
94+
95+
#check if offset it to me applied on datetime or time
96+
if type(dt_in) is time:
97+
#create psydo datetime
98+
dt_now = datetime.now()
99+
dt_base = datetime.combine(dt_now, dt_in)
100+
else:
101+
dt_base = dt_in
102+
103+
dt_corr = (dt_base) + delta
104+
105+
#if input is time, we return it.
106+
if type(dt_in) is time:
107+
dt_corr = dt_corr.time()
108+
109+
return dt_corr
110+
111+
112+
def dt2ti(dt_in):
113+
'''converts wrong datetime.datetime to datetime.time
114+
115+
input
116+
-----
117+
dt_in : dt_in : datetime.time or datetime.datetime object
118+
119+
output
120+
-------
121+
ti_corr : datetime.time object
122+
'''
123+
# so we correct those which are not of type :mod:datetime.time
124+
# impdt2tiortant hint:
125+
# http://stackoverflow.com/a/12906456
126+
if type(dt_in) is not time:
127+
dt_in = dt_in.time()
128+
elif type(dt_in) is datetime:
129+
dt_in = dt_in.time()
130+
else:
131+
pass
132+
133+
return dt_in

pandas/io/excel.py

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -127,15 +127,18 @@ def parse(self, sheetname, header=0, skiprows=None, skip_footer=0,
127127
skipfooter = kwds.pop('skipfooter', None)
128128
if skipfooter is not None:
129129
skip_footer = skipfooter
130-
131-
return self._parse_excel(sheetname, header=header, skiprows=skiprows,
130+
131+
# this now gives back a df
132+
res = self._parse_excel(sheetname, header=header, skiprows=skiprows,
132133
index_col=index_col,
133134
has_index_names=has_index_names,
134135
parse_cols=parse_cols,
135136
parse_dates=parse_dates,
136137
date_parser=date_parser, na_values=na_values,
137138
thousands=thousands, chunksize=chunksize,
138139
skip_footer=skip_footer, **kwds)
140+
141+
return res
139142

140143
def _should_parse(self, i, parse_cols):
141144

@@ -195,11 +198,24 @@ def _parse_excel(self, sheetname, header=0, skiprows=None, skip_footer=0,
195198
if parse_cols is None or should_parse[j]:
196199
if typ == XL_CELL_DATE:
197200
dt = xldate_as_tuple(value, datemode)
201+
198202
# how to produce this first case?
203+
# if the year is ZERO then values are time/hours
199204
if dt[0] < datetime.MINYEAR: # pragma: no cover
200-
value = datetime.time(*dt[3:])
205+
datemode = 1
206+
dt = xldate_as_tuple(value, datemode)
207+
208+
value = datetime.time(*dt[3:])
209+
210+
211+
#or insert a full date
201212
else:
202213
value = datetime.datetime(*dt)
214+
215+
#apply eventual date_parser correction
216+
if date_parser:
217+
value = date_parser(value)
218+
203219
elif typ == XL_CELL_ERROR:
204220
value = np.nan
205221
elif typ == XL_CELL_BOOLEAN:
@@ -221,8 +237,15 @@ def _parse_excel(self, sheetname, header=0, skiprows=None, skip_footer=0,
221237
skip_footer=skip_footer,
222238
chunksize=chunksize,
223239
**kwds)
240+
res = parser.read()
241+
242+
if header is not None:
243+
244+
if len(data[header]) == len(res.columns.tolist()):
245+
res.columns = data[header]
246+
224247

225-
return parser.read()
248+
return res
226249

227250
@property
228251
def sheet_names(self):

pandas/io/parsers.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1150,7 +1150,11 @@ def TextParser(*args, **kwds):
11501150
returns Series if only one column
11511151
"""
11521152
kwds['engine'] = 'python'
1153-
return TextFileReader(*args, **kwds)
1153+
1154+
res = TextFileReader(*args, **kwds)
1155+
1156+
1157+
return res
11541158

11551159
# delimiter=None, dialect=None, names=None, header=0,
11561160
# index_col=None,
@@ -1385,6 +1389,7 @@ def _convert_data(self, data):
13851389
clean_conv)
13861390

13871391
def _infer_columns(self):
1392+
#TODO: this full part is too complex and somewhat strage!!!
13881393
names = self.names
13891394

13901395
if self.header is not None:
@@ -1396,13 +1401,20 @@ def _infer_columns(self):
13961401
header = list(header) + [header[-1]+1]
13971402
else:
13981403
have_mi_columns = False
1404+
#TODO: explain why header (in this case 1 number) needs to be a list???
13991405
header = [ header ]
14001406

14011407
columns = []
14021408
for level, hr in enumerate(header):
1403-
1409+
#TODO: explain why self.buf is needed.
1410+
# the header is correctly retrieved in excel.py by
1411+
# data[header] = _trim_excel_header(data[header])
14041412
if len(self.buf) > 0:
14051413
line = self.buf[0]
1414+
1415+
elif (header[0] == hr) and (level == 0) and (header[0] > 0):
1416+
line = self._get_header()
1417+
14061418
else:
14071419
line = self._next_line()
14081420

@@ -1456,8 +1468,24 @@ def _infer_columns(self):
14561468
columns = [ names ]
14571469

14581470
return columns
1471+
1472+
def _get_header(self):
1473+
''' reads header if e.g. header
1474+
FIXME: this tshoul be turned into something much less complicates
1475+
FIXME: all due to the header assuming that there is never a row between
1476+
data and header
1477+
'''
1478+
if isinstance(self.data, list):
1479+
line = self.data[self.header]
1480+
self.pos = self.header +1
1481+
else:
1482+
line = self._next_line()
1483+
1484+
return line
14591485

14601486
def _next_line(self):
1487+
#FIXME: why is self.data at times a list and sometimes a _scv.reader??
1488+
# reduce complexity here!!!
14611489
if isinstance(self.data, list):
14621490
while self.pos in self.skiprows:
14631491
self.pos += 1
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)