Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior with SimpleFunctionTransformer #6

Open
bingcao opened this issue Apr 30, 2018 · 1 comment
Open

Weird behavior with SimpleFunctionTransformer #6

bingcao opened this issue Apr 30, 2018 · 1 comment
Labels
bug Something isn't working

Comments

@bingcao
Copy link
Contributor

bingcao commented Apr 30, 2018

The following code does work:

transformer = [
    LagImputer(groupby_kwargs={'level': 'city'}),
    Imputer(),
    StandardScaler(),
    SimpleFunctionTransformer(
        lambda df: np.mean(df, axis=1)
    ),
]

but this doesn't:

transformer = [
    LagImputer(groupby_kwargs={'level': 'city'}),
    Imputer(),
    SimpleFunctionTransformer(
        lambda df: np.mean(df, axis=1)
    ),
    StandardScaler()
]

and it gives this stack trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-103-10686064a0e4> in <module>()
----> 1 mapper.fit(X_df, y_df)

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn_pandas/dataframe_mapper.py in fit(self, X, y)
    212                 with add_column_names_to_exception(columns):
    213                     Xt = self._get_col_subset(X, columns, input_df)
--> 214                     _call_fit(transformers.fit, Xt, y)
    215 
    216         # handle features not explicitly selected

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn_pandas/pipeline.py in _call_fit(fit_method, X, y, **kwargs)
     22     """
     23     try:
---> 24         return fit_method(X, y, **kwargs)
     25     except TypeError:
     26         # fit takes only one argument

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/fhub_core/feature.py in wrapped(X, y, **kwargs)
     46                     "Converting using approach '{}'".format(convert.__name__))
     47                 if y is not None:
---> 48                     return func(convert(X), y=convert(y), **kwargs)
     49                 else:
     50                     return func(convert(X), **kwargs)

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn_pandas/pipeline.py in fit(self, X, y, **fit_params)
     74 
     75     def fit(self, X, y=None, **fit_params):
---> 76         Xt, fit_params = self._pre_transform(X, y, **fit_params)
     77         _call_fit(self.steps[-1][-1].fit, Xt, y, **fit_params)
     78         return self

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn_pandas/pipeline.py in _pre_transform(self, X, y, **fit_params)
     67             if hasattr(transform, "fit_transform"):
     68                 Xt = _call_fit(transform.fit_transform,
---> 69                                Xt, y, **fit_params_steps[name])
     70             else:
     71                 Xt = _call_fit(transform.fit,

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn_pandas/pipeline.py in _call_fit(fit_method, X, y, **kwargs)
     22     """
     23     try:
---> 24         return fit_method(X, y, **kwargs)
     25     except TypeError:
     26         # fit takes only one argument

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    518         else:
    519             # fit method of arity 2 (supervised transformation)
--> 520             return self.fit(X, y, **fit_params).transform(X)
    521 
    522 

~/dengue_prediction/dengue_prediction_env/lib/python3.6/site-packages/fhub_transformers/base.py in transform(self, X, **transform_kwargs)
     36     def transform(self, X, **transform_kwargs):
     37         if self.groupby_kwargs:
---> 38             call = X.sort_index().groupby(**self.groupby_kwargs).apply
     39         else:
     40             call = X.sort_index().pipe

AttributeError: ['ndvi_se', 'ndvi_sw', 'ndvi_ne', 'ndvi_nw']: 'numpy.ndarray' object has no attribute 'sort_index'
@micahjsmith
Copy link
Owner

micahjsmith commented Apr 30, 2018

Simple issue, inscrutable debugging, complicated fix.

The output of np.mean has shape (1456, ). sklearn then complains about 1d arrays. Reproduce more clearly as follows

df = X_df[input]
tmp = np.mean(df, axis=1)
StandardScaler().fit_transform(tmp)

The solution that requires you to understand the intricacies of sklearn is to adapt

transformer = [
    # ...
    SimpleFunctionTransformer(
        lambda df: np.mean(df, axis=1).reshape(-1, 1)
    ),
    # ...
]

Though having to understand this is what fhub_core is trying to avoid.

@micahjsmith micahjsmith added the bug Something isn't working label Apr 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants