inserted all tested changes again after reset to master

timmie · jreback · commit 2cc6e48a9bbc · 2013-08-21T21:38:17.000-04:00
the merge conflict chaos was too hard ;-(
diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -465,6 +465,9 @@ take an optional ``axis`` argument:
    df.apply(lambda x: x.max() - x.min())
    df.apply(np.cumsum)
    df.apply(np.exp)
+   
+Please note that the default is the application along the DataFrame's index
+(axis=0) whereas for applying along the columns (axis=1) axis must be specified explicitly (see also :ref:`api.dataframe` and :ref:`api.series`).
 
 Depending on the return type of the function passed to ``apply``, the result
 will either be of lower dimension or the same dimension.
diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -431,6 +431,9 @@ available to insert at a particular location in the columns:
 
 Indexing / Selection
 ~~~~~~~~~~~~~~~~~~~~
+
+.. _dsintro.basics-of-indexing:
+
 The basics of indexing are as follows:
 
 .. csv-table::
@@ -450,6 +453,20 @@ DataFrame:
 
    df.loc['b']
    df.iloc[2]
+   
+There is also support for purely integer-based indexing provided by the following methods:
+
+.. _dsintro.integer-indexing:
+
+.. csv-table::
+    :header: "Method","Description"
+    :widths: 40,60
+
+	``Series.iget_value(i)``, Retrieve value stored at location ``i``
+	``Series.iget(i)``, Alias for ``iget_value``
+	``DataFrame.irow(i)``, Retrieve the ``i``-th row
+	``DataFrame.icol(j)``, Retrieve the ``j``-th column
+	"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
 
 For a more exhaustive treatment of more sophisticated label-based indexing and
 slicing, see the :ref:`section on indexing <indexing>`. We will address the
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -133,4 +133,5 @@ See the package overview for more detail about what's in the library.
     related
     comparison_with_r
     api
+    shortcuts
     release
diff --git a/doc/source/shortcuts.rst b/doc/source/shortcuts.rst
@@ -0,0 +1,30 @@
+.. _shortcuts:
+
+.. currentmodule:: pandas
+
+****************************************
+Shortcuts wthin Pandas Doucumentation
+****************************************
+
+This page shall give quick access to sections in the documentation on commonly used parameters.
+Instead of a cheatsheet, it shall just provides refrences preventing the reader from need to bookmark refrence pages or search through the documentation.
+
+.. internal comment: these tables could also be marked by a special directive (overview)
+
+Basics
+~~~~~~~~~~~~~~~~~~~~
+
+* :ref:`The basics of indexing <dsintro.basics-of-indexing>` summarises the key methods for selction and indexing.
+* :ref:`Integer Indexing <dsintro.integer-indexing>` supplements with additional methods.
+
+Time Series
+~~~~~~~~~~~~~~~~~~~~
+
+* :ref:`DateOffset objects <timeseries.dateoffset>` lists the implemented frequencies.
+* :ref:`Common Aliases <timeseries.offset-aliases>` ease up referencing these frequencies.
+
+
+Helpers
+~~~~~~~~~~~~~~~~~~~~
+
+* Quickly generate test data frames as given in :ref:`in this example <reshaping.reshapeing-pivoting>` 
diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
@@ -387,6 +387,8 @@ regularity will result in a ``DatetimeIndex`` (but frequency is lost):
 DateOffset objects
 ------------------
 
+.. _timeseries.dateoffset:
+
 In the preceding examples, we created DatetimeIndex objects at various
 frequencies by passing in frequency strings like 'M', 'W', and 'BM to the
 ``freq`` keyword. Under the hood, these frequency strings are being translated
@@ -547,6 +549,8 @@ calendars which account for local holidays and local weekend conventions.
 Offset Aliases
 ~~~~~~~~~~~~~~
 
+.. _timeseries.offset-aliases:
+
 A number of string aliases are given to useful common time series
 frequencies. We will refer to these aliases as *offset aliases*
 (referred to as *time rules* prior to v0.8.0).
@@ -1246,3 +1250,51 @@ The following are equivalent statements in the two versions of numpy.
        y / np.timedelta64(1,'D')
        y / np.timedelta64(1,'s')
 
+Working with timedate-based indices
+-------------------------------------------------------
+
+The :func:`pd.datetime` allows for vectorised operations using datetime information stored in a :ref:`timeseries.datetimeindex`.
+
+Use cases are:
+
+* calculation of sunsunrise, sunset, daylength
+* boolean test of working hours
+
+An example contributed by a savvy user at `Stackoverflow <http://stackoverflow.com/a/15839530>`_:
+
+.. ipython:: python
+
+   import pandas as pd
+
+   ###1) create a date column from indiviadual year, month, day columns
+   df = pd.DataFrame({"year": [1992, 2003, 2014], "month": [2,3,4], "day": [10,20,30]})
+   df
+
+   df["Date"] = df.apply(lambda x: pd.datetime(x['year'], x['month'], x['day']), axis=1)
+   df
+   
+   ###2) alternatively, use the equivalent to datetime.datetime.combine
+   import numpy as np
+   
+   #create a hourly timeseries
+   data_randints = np.random.randint(1, 10, 4000)
+   data_randints = data_randints.reshape(1000, 4)
+   ts = pd.Series(randn(1000), index=pd.date_range('1/1/2000', periods=1000, freq='H'))
+   df = pd.DataFrame(data_randints, index=ts.index, columns=['A', 'B', 'C', 'D'])
+   df.head()
+      
+   #only for examplary purposes: get the date & time from the df.index
+   #                             in real world, these would be read in or generated from different columns
+   df['date'] = df.index.date
+   df.head()
+   
+   df['time'] = df.index.time
+   df.head()
+   
+   #combine both:
+   df['datetime'] = df.apply((lambda x: pd.datetime.combine(x['date'], x['time'])), axis=1)
+   df.head()
+   
+   #the index could be set to the created column
+   df = df.set_index(['datetime'])
+   df.head()
diff --git a/doc/source/v0.13.0.txt b/doc/source/v0.13.0.txt
@@ -270,6 +270,10 @@ Bug Fixes
 
   - Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)
 
+  - read_excel (:issue:`4332`) supports a date_parser. This enabales to reading in hours like 01:00-24:00 in both `Excel datemodes <http://support.microsoft.com/kb/214330/en-us>`_
+
+  - (Excel) parser (:issue:`4340`) allows skipping an arbitrary number of lines between header and first row. 
+
 See the :ref:`full release notes
 <release>` or issue tracker
 on GitHub for a complete list.
diff --git a/doc/source/v0.7.0.txt b/doc/source/v0.7.0.txt
@@ -150,18 +150,8 @@ This change also has the same impact on DataFrame:
    In [5]: df.ix[3]
    KeyError: 3
 
-In order to support purely integer-based indexing, the following methods have
-been added:
-
-.. csv-table::
-    :header: "Method","Description"
-    :widths: 40,60
-
-	``Series.iget_value(i)``, Retrieve value stored at location ``i``
-	``Series.iget(i)``, Alias for ``iget_value``
-	``DataFrame.irow(i)``, Retrieve the ``i``-th row
-	``DataFrame.icol(j)``, Retrieve the ``j``-th column
-	"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
+In order to support purely `integer-based indexing <dsintro.integer-indexing>`, `corresponding methods <dsintro.integer-indexing>` have been added.
+
 
 API tweaks regarding label-based slicing
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/pandas/io/date_converters.py b/pandas/io/date_converters.py
@@ -1,4 +1,6 @@
 """This module is designed for community supported date conversion functions"""
+from datetime import datetime, timedelta, time
+
 from pandas.compat import range
 import numpy as np
 import pandas.lib as lib
@@ -56,3 +58,76 @@ def _check_columns(cols):
             raise AssertionError()
 
     return N
+
+
+## Datetime Conversion for date_parsers
+## see also: create a community supported set of typical converters
+##           https://github.com/pydata/pandas/issues/1180
+
+def offset_datetime(dt_in, days=0, hours=0, minutes=0,
+                    seconds=0, microseconds=0):
+    '''appply corrective time offset using datetime.timedelta
+
+    input
+    -----
+    dt_in : datetime.time or datetime.datetime object
+    days : integer value (positive or negative) for days component of offset
+    hours : integer value (positive or negative) for hours component of offset
+    minutes : integer value (positive or negative) for
+              minutes component of offset
+    seconds : integer value (positive or negative) for
+              seconds component of offset
+    microseconds : integer value (positive or negative) for
+                   microseconds component of offset
+
+    output
+    ------
+    ti_corr : datetime.time or datetime.datetime object
+
+
+    '''
+    # if a excel time like '23.07.2013 24:00' they actually mean
+    # in Python '23.07.2013 23:59', must be converted
+#            offset = -10 # minutes
+    delta = timedelta(days=days, hours=hours, minutes=minutes,
+                      seconds=seconds, microseconds=microseconds)
+
+    #check if offset it to me applied on datetime or time
+    if type(dt_in) is time:
+        #create psydo datetime
+        dt_now = datetime.now()
+        dt_base = datetime.combine(dt_now, dt_in)
+    else:
+        dt_base = dt_in
+
+    dt_corr = (dt_base) + delta
+
+    #if input is time, we return it.
+    if type(dt_in) is time:
+        dt_corr = dt_corr.time()
+
+    return dt_corr
+
+
+def dt2ti(dt_in):
+    '''converts wrong datetime.datetime to datetime.time
+
+    input
+    -----
+    dt_in : dt_in : datetime.time or datetime.datetime object
+
+    output
+    -------
+    ti_corr : datetime.time object
+    '''
+    # so we correct those which are not of type :mod:datetime.time
+    # impdt2tiortant hint:
+    # http://stackoverflow.com/a/12906456
+    if type(dt_in) is not time:
+        dt_in = dt_in.time()
+    elif type(dt_in) is datetime:
+        dt_in = dt_in.time()
+    else:
+        pass
+
+    return dt_in
diff --git a/pandas/io/excel.py b/pandas/io/excel.py
@@ -127,15 +127,18 @@ def parse(self, sheetname, header=0, skiprows=None, skip_footer=0,
         skipfooter = kwds.pop('skipfooter', None)
         if skipfooter is not None:
             skip_footer = skipfooter
-
-        return self._parse_excel(sheetname, header=header, skiprows=skiprows,
+        
+        # this now gives back a df
+        res = self._parse_excel(sheetname, header=header, skiprows=skiprows,
                                  index_col=index_col,
                                  has_index_names=has_index_names,
                                  parse_cols=parse_cols,
                                  parse_dates=parse_dates,
                                  date_parser=date_parser, na_values=na_values,
                                  thousands=thousands, chunksize=chunksize,
                                  skip_footer=skip_footer, **kwds)
+                    
+        return res
 
     def _should_parse(self, i, parse_cols):
 
@@ -195,11 +198,24 @@ def _parse_excel(self, sheetname, header=0, skiprows=None, skip_footer=0,
                 if parse_cols is None or should_parse[j]:
                     if typ == XL_CELL_DATE:
                         dt = xldate_as_tuple(value, datemode)
+                        
                         # how to produce this first case?
+                        # if the year is ZERO then values are time/hours
                         if dt[0] < datetime.MINYEAR:  # pragma: no cover
-                            value = datetime.time(*dt[3:])
+                            datemode = 1
+                            dt = xldate_as_tuple(value, datemode)
+                            
+                            value = datetime.time(*dt[3:])  
+                                     
+
+                        #or insert a full date
                         else:
                             value = datetime.datetime(*dt)
+                        
+                        #apply eventual date_parser correction
+                        if date_parser:
+                                value = date_parser(value)    
+                            
                     elif typ == XL_CELL_ERROR:
                         value = np.nan
                     elif typ == XL_CELL_BOOLEAN:
@@ -221,8 +237,15 @@ def _parse_excel(self, sheetname, header=0, skiprows=None, skip_footer=0,
                             skip_footer=skip_footer,
                             chunksize=chunksize,
                             **kwds)
+        res = parser.read() 
+        
+        if header is not None:
+
+            if len(data[header]) == len(res.columns.tolist()):
+                res.columns = data[header]
+        
 
-        return parser.read()
+        return res
 
     @property
     def sheet_names(self):
diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
@@ -1150,7 +1150,11 @@ def TextParser(*args, **kwds):
         returns Series if only one column
     """
     kwds['engine'] = 'python'
-    return TextFileReader(*args, **kwds)
+    
+    res = TextFileReader(*args, **kwds)
+    
+    
+    return res
 
 # delimiter=None, dialect=None, names=None, header=0,
 # index_col=None,
@@ -1385,6 +1389,7 @@ def _convert_data(self, data):
                                          clean_conv)
 
     def _infer_columns(self):
+        #TODO: this full part is too complex and somewhat strage!!!
         names = self.names
 
         if self.header is not None:
@@ -1396,13 +1401,20 @@ def _infer_columns(self):
                 header = list(header) + [header[-1]+1]
             else:
                 have_mi_columns = False
+                #TODO: explain why header (in this case 1 number) needs to be a list???
                 header = [ header ]
 
             columns = []
             for level, hr in enumerate(header):
-
+                #TODO: explain why self.buf is needed.
+                #      the header is correctly retrieved in excel.py by
+                #      data[header] = _trim_excel_header(data[header])
                 if len(self.buf) > 0:
                     line = self.buf[0]
+
+                elif (header[0] == hr) and (level == 0) and (header[0] > 0):
+                     line = self._get_header()
+                    
                 else:
                     line = self._next_line()
 
@@ -1456,8 +1468,24 @@ def _infer_columns(self):
                 columns = [ names ]
 
         return columns
+        
+    def _get_header(self):
+        ''' reads header if e.g. header 
+        FIXME: this tshoul be turned into something much less complicates
+        FIXME: all due to the header assuming that there is never a row between
+               data and header
+        '''
+        if isinstance(self.data, list):
+            line = self.data[self.header]
+            self.pos = self.header +1
+        else:
+            line = self._next_line()
+        
+        return line
 
     def _next_line(self):
+        #FIXME: why is self.data at times a list and sometimes a _scv.reader??
+        #       reduce complexity here!!!
         if isinstance(self.data, list):
             while self.pos in self.skiprows:
                 self.pos += 1
diff --git a/pandas/io/tests/data/example_file_2013-07-25.xlsx b/pandas/io/tests/data/example_file_2013-07-25.xlsx
diff --git a/pandas/io/tests/data/example_file_2013-07-25_1904-dates.xlsx b/pandas/io/tests/data/example_file_2013-07-25_1904-dates.xlsx
diff --git a/pandas/io/tests/test_date_converters.py b/pandas/io/tests/test_date_converters.py
diff --git a/pandas/io/tests/test_excel.py b/pandas/io/tests/test_excel.py
diff --git a/pandas/io/tests/test_parsers.py b/pandas/io/tests/test_parsers.py