Skip to content

Add input_df init argument to pass df/series to transformers #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2017

Conversation

dukebody
Copy link
Collaborator

@dukebody dukebody commented Apr 9, 2017

Work for #60.

Passing Series/DataFrames to the transformers

By default the transformers are passed a numpy array of the selected columns
as input. This is because sklearn transformers are historically designed to
work with numpy arrays, not with pandas dataframes, even though their basic
indexing interfaces are similar.

However we can pass a dataframe/series to the transformers to handle custom
cases initializing the dataframe mapper with ``input_df=True`::

    >>> from sklearn.base import TransformerMixin
    >>> class DateEncoder(TransformerMixin):
    ...    def fit(self, X, y=None):
    ...        return self
    ...
    ...    def transform(self, X):
    ...        dt = X.dt
    ...        return pd.concat([dt.year, dt.month, dt.day], axis=1)
    >>> dates_df = pd.DataFrame(
    ...     {'dates': pd.date_range('2015-10-30', '2015-11-02')})
    >>> mapper_dates = DataFrameMapper([
    ...     ('dates', DateEncoder())
    ... ], input_df=True)
    >>> mapper_dates.fit_transform(dates_df)
    array([[2015,   10,   30],
           [2015,   10,   31],
           [2015,   11,    1],
           [2015,   11,    2]])

@dukebody
Copy link
Collaborator Author

dukebody commented Apr 9, 2017

@jph00, @hshteingart can you check this and confirm it works correctly with your use-cases?

Copy link

@jph00 jph00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :) I don't have to time study it in depth right now, but it looks pretty good to me.

@dukebody dukebody merged commit c50565c into master Apr 17, 2017
@dukebody dukebody deleted the transformers-input-df branch April 17, 2017 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants