[Question] My Question? #1735

Estefano13 · 2024-08-22T12:05:33Z

I'm trying to produce learning curves for a couple of models. the models were initially trained and saved using pickle. After loading and refit on the whole subset, I experience no issues. However, when attempting to refit the models on training data subsets, the refit process fails due to: ValueError: SelectClassificationRates removed all features.

My data is an array of floats with no negative values. Because I am trying to get scores at different training set sizes, the training subsets sometimes have more features than samples (Could this be responsible for the problem?). The model was originally fitted on the whole training set using GroupKFold as cv strategy.

I've observed that this issue somewhat depends on the subset being trained on, with a bias toward failing toward smaller subsets. Here is the error output:

ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 learning_curve_AutoML(my_model_, X_train.to_numpy(), y_train, groups = X_amplitude_train["Sample ID"], cv=5, scoring = "balanced_accuracy", train_sizes=np.linspace(.3, 1.0, 5))

11 frames
in learning_curve_AutoML(estimator_, X, y, groups, train_sizes, scoring, cv, ret_data)
36
37 print(X_train_0.shape)
---> 38 estimator_.refit(X_train_0, y_train_0)
39
40 train_scores = score(estimator, X_train_0, y_train_0, scorer)

/usr/local/lib/python3.10/dist-packages/autosklearn/estimators.py in refit(self, X, y)
792
793 """
--> 794 self.automl_.refit(X, y)
795 return self
796

/usr/local/lib/python3.10/dist-packages/autosklearn/automl.py in refit(self, X, y, max_reshuffles)
1214
1215 if i == (max_reshuffles - 1):
-> 1216 raise e
1217
1218 self._can_predict = True

/usr/local/lib/python3.10/dist-packages/autosklearn/automl.py in refit(self, X, y, max_reshuffles)
1194 try:
1195 if self._budget_type is None:
-> 1196 _fit_and_suppress_warnings(self._logger, model, X, y)
1197 else:
1198 _fit_with_budget(

/usr/local/lib/python3.10/dist-packages/autosklearn/evaluation/abstract_evaluator.py in _fit_and_suppress_warnings(logger, model, X, y)
186 with warnings.catch_warnings():
187 warnings.showwarning = send_warnings_to_log
--> 188 model.fit(X, y)
189
190 return model

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/base.py in fit(self, X, y, **fit_params)
122 a classification algorithm first.
123 """
--> 124 X, fit_params = self.fit_transformer(X, y, **fit_params)
125 self.fit_estimator(X, y, **fit_params)
126 return self

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/classification.py in fit_transformer(self, X, y, fit_params)
121 fit_params.update(_fit_params)
122
--> 123 X, fit_params = super().fit_transformer(X, y, fit_params=fit_params)
124
125 return X, fit_params

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/base.py in fit_transformer(self, X, y, fit_params)
134 }
135 fit_params_steps = self._check_fit_params(**fit_params)
--> 136 Xt = self._fit(X, y, **fit_params_steps)
137 return Xt, fit_params_steps[self.steps[-1][0]]
138

/usr/local/lib/python3.10/dist-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps)
301 cloned_transformer = clone(transformer)
302 # Fit or load from cache the current transformer
--> 303 X, fitted_transformer = fit_transform_one_cached(
304 cloned_transformer, X, y, None,
305 message_clsname='Pipeline',

/usr/local/lib/python3.10/dist-packages/joblib/memory.py in call(self, *args, **kwargs)
310
311 def call(self, *args, **kwargs):
--> 312 return self.func(*args, **kwargs)
313
314 def call_and_shelve(self, *args, **kwargs):

/usr/local/lib/python3.10/dist-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
754 res = transformer.fit_transform(X, y, **fit_params)
755 else:
--> 756 res = transformer.fit(X, y, **fit_params).transform(X)
757
758 if weight is None:

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/components/feature_preprocessing/select_rates_classification.py in transform(self, X)
94
95 if Xt.shape[1] == 0:
---> 96 raise ValueError("%s removed all features." % self.class.name)
97 return Xt
98

ValueError: SelectClassificationRates removed all features.

System Details

I am running this on Google Colab

python version: 3.10.12
Autosklearn version = 0.15.0
sklearn version = 0.24.2

Is this working as intended? Any suggestions as to how to avoid this problem in the future? Should I just exclude the problematic feature preprocessing step?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] My Question? #1735

[Question] My Question? #1735

Estefano13 commented Aug 22, 2024 •

edited

Loading

[Question] My Question? #1735

[Question] My Question? #1735

Comments

Estefano13 commented Aug 22, 2024 • edited Loading

System Details

Estefano13 commented Aug 22, 2024 •

edited

Loading