You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to get the pre-processed data that was used to train a model.
How did this question come about?
The preprocessed data could be used to, for example, calculate its summary statistics and then compare with the un-transformed data or with the data preprocessed with different methods.
Would a small code snippet help?
This question is relevant to a standard application of AutoSklearnClassifier function based on the example given in the docs.
Here's a snippet anyways:
importsklearn.datasetsimportsklearn.metricsimportautosklearn.classificationX, y=sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test=sklearn.model_selection.train_test_split(
X, y, random_state=1
)
automl=autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
tmp_folder="/tmp/autosklearn_classification_example_tmp",
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
## get configuration for a model/runrun_key=list(automl.automl_.runhistory_.data.keys())[0]
run_value=automl.automl_.runhistory_.data[run_key]
config=automl.automl_.runhistory_.ids_config[run_key.config_id]
print(config)
I looked into the tmp_folders smac3-output and .auto-sklearn folders, but could not find any files containing preprocessed data or relevant information that I could use to get the preprocessed data.
I tried filtering the configuration and providing it to AutoSklearnPreprocessingAlgorithm function, for which I repeatedly got Not implemented errors.
I tried creating a custom sklearn Pipeline using the pre-processing functions from autosklearn e.g. rescaling from data_preprocessing module, but i found that this approach was not directly compatible with the configuration requirements of auto-sklearn.
Suggestion
A couple of functions could be implemented to (1) filter the configuration for a fitted model to keep only the keys related to the pre-processing steps, and then (2) run the corresponding steps to get the preprocessed data. For example, the code could look like this:
I would like to get the pre-processed data that was used to train a model.
How did this question come about?
The preprocessed data could be used to, for example, calculate its summary statistics and then compare with the un-transformed data or with the data preprocessed with different methods.
Would a small code snippet help?
This question is relevant to a standard application of
AutoSklearnClassifier
function based on the example given in the docs.Here's a snippet anyways:
What have you already looked at?
I have already looked at
tmp_folder
ssmac3-output
and.auto-sklearn
folders, but could not find any files containing preprocessed data or relevant information that I could use to get the preprocessed data.AutoSklearnPreprocessingAlgorithm
function, for which I repeatedly gotNot implemented
errors.Pipeline
using the pre-processing functions from autosklearn e.g. rescaling fromdata_preprocessing
module, but i found that this approach was not directly compatible with the configuration requirements ofauto-sklearn
.Suggestion
A couple of functions could be implemented to (1) filter the configuration for a fitted model to keep only the keys related to the pre-processing steps, and then (2) run the corresponding steps to get the preprocessed data. For example, the code could look like this:
This is just a suggestion. If there is any other way of obtaining the pre-processed data, please let me know.
System Details (if relevant)
auto-sklearn
: 0.15.0The text was updated successfully, but these errors were encountered: