fix bug feature_standardizer not applied

CDDLeiden · Jan 18, 2024 · 1b1aea9 · 1b1aea9
1 parent fde0f3b
commit 1b1aea9
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 39 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,49 +1,17 @@
 # Change Log
 
-From v2.0.1 to v2.1.0.a2
+From v2.1.0 to v2.1.1
 
 ## Fixes
 
-- fixed error with serialization of the `DataFrameDescriptorSet` (#63)
-- Papyrus descriptors are not fetched by default anymore from the `Papyrus`  adapter, which caused fetching of unnecessary data.
-- A [potential bug in new version of pandas](https://github.com/pandas-dev/pandas/issues/55009)  broke scaffold generation so a workaround was implemented.
+- ⚠️ Important! ⚠️ Fixed bug in `predictMols` where the `feature_standardizer` was 
+  not being applied to the calculated features. This bug was introduced in v2.1.0.
+  Models trained with v2.1.0 are compatible with v2.1.1, make sure to update 
+  QSPRpred to v2.1.1 to ensure that the `feature_standardizer` is applied when
+  predicting on new molecules.
 
 ## Changes
-- `QSPRModel.evaluate` moved to a separate class `EvaluationMethod` in `qsprpred.models.interfaces`, with subclasses for cross-validation and making predictions on a test set in `qsprpred.models.evaluation_methods` (`CrossValidation` and `EvaluateTestSetPerformance` respectively).
-- `QSPRModel` attribute `scoreFunc` is removed.
-- 'qspr/models' is no longer added to the output path of `QSPRModel.save`, allowing for complete control over the output path.
-- `SKlearnMetrics.supportsTask` now uses a dictionary like dict[ModelTasks, list[str]] to map tasks to supported metric names. (#53)
-- `GBMTRandomSplit` and `ScaffoldSplit` now use the `GBMTDataSplit` to create balanced splits. `RandomSplit` still functions the same way as a completely random test split.
-- `PCMSplit` replaces `StratifiedPerTarget` and is compatible with `RandomSplit`, `ScaffoldSplit` and `ClusterSplit`.
-- `DuplicatesFilter` refactored to`RepeatsFilter`, as it also captures scenarios where triplicates/quadruplicates are found in the dataset. These scenarios are now also covered by the respective UnitTest.
-- The versioning scheme of development snapshots has changed from `devX` to `alphaX`/`betaX`, where `X` is an integer that increments with each release.
-- The following model class have been renamed and moved:
-    - `models.models.QSPRsklearn` > `models.sklearn.SklearnModel`
-    - `deep.models.QSPRDNN` > `extra.gpu.models.dnn.DNNModel`
-    - `extra.models.pcm.ModelPCM` > `extra.models.pcm.PCMModel`
-    - `extra.models.pcm.QSPRsklearnPCM` > `extra.models.pcm.SklearnPCMModel`
-- The command line interface modules now use input and output file paths instead
-  of automatically placing all files in a subfolder `qspr`, allowing for more
-  control over the output and input paths.
 
 ## New Features
-- `GBMTDataSplit` - parent class to create globally balanced splits with the [gbmt-split](https://github.com/sohviluukkonen/gbmt-splits) package.
-- `ClusterSplit` - splits data based clustering of molecular fingerprints (uses `GBMTDataSplit`).
-- Raise error if search space for optuna optimization is missing search space type annotation or if type not in list.
-- When installing package with pip, the commit hash and date of the installation is saved into `qsprpred._version`
-- `HyperParameterOptimization` classes now accept a `evaluation_method` argument, which is an instance of `EvaluationMethod` (see above). This allows for hyperparameter optimization to be performed on a test set, or on a cross-validation set. (#11)
-- `HyperParameterOptimization` now accepts `score_aggregation` argument, which is a function that takes a list of scores and returns a single score. This allows for the use of different aggregation functions, such as `np.mean` or `np.median` to combine scores from different folds. (#45)
-- A new tutorial `adding_new_components.ipynb` has been added to the `tutorials` folder, which demonstrates how to add new model to QSPRpred.
-- A new function `Metrics.checkMetricCompatibility` has been added, which checks if a metric is compatible with a given task and a given prediction methods (i.e. `predict` or `predictProba`)
-- In `EvaluationMethod` (see above), an attribute `use_proba` has been added, which determines whether the `predict` or `predictProba` method is used to make predictions (#56).
-- Add new descriptorset `SmilesDesc` to use the smiles strings as a descriptor.
-- New module `early_stopping` with classes `EarlyStopping` and `EarlyStoppingMode` has been added. This module allows for more control over early stopping in models that support it.
-- Add new descriptorset `SmilesDesc` to use the smiles strings as a descriptor.
-- Refactoring of the test suite under `qsprpred.data` and improvement of temporary file handling (!114).
-- `PyBoostModel` - QSPRpred wrapper for py-boost models. Requires optional `pyboost` dependencies.
-- `ChempropModel` - QSPRpred wrapper for Chemprop models. Requires optional `deep` dependencies.
-- The `data_CLI` argument `--log_transform` (`-lt`) has been changed to `--transform_data` (`-t`), which now accepts a number of transformations to apply to the target data. Available transformations are `log`, `log10`, `log2`, `sqrt`, `cbrt`, `exp`, `exp2`, `exp10`, `square`, `cube`, `reciprocal`.
-- New `data_CLI`, `model_CLI` and `predict_CLI` argument `--skip_backup` (`-sb`) to skip the backup of the output files. WARNING: This will overwrite existing files.
 
 ## Removed Features
-- `StratifiedPerTarget` is replaced by `PCMSplit`.
diff --git a/qsprpred/models/interfaces.py b/qsprpred/models/interfaces.py
@@ -593,7 +593,7 @@ def convertToNumpy(
                 data matrix and/or target matrix in np.ndarray format
         """
         if isinstance(X, QSPRDataset):
-            X = X.getFeatures(raw=True, concat=True)
+            X = X.getFeatures(concat=True)
         if isinstance(X, pd.DataFrame):
             X = X.values
         if y is not None: