diff --git a/CHANGELOG.md b/CHANGELOG.md index c7fed59c..99d37936 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,49 +1,17 @@ # Change Log -From v2.0.1 to v2.1.0.a2 +From v2.1.0 to v2.1.1 ## Fixes -- fixed error with serialization of the `DataFrameDescriptorSet` (#63) -- Papyrus descriptors are not fetched by default anymore from the `Papyrus` adapter, which caused fetching of unnecessary data. -- A [potential bug in new version of pandas](https://github.com/pandas-dev/pandas/issues/55009) broke scaffold generation so a workaround was implemented. +- ⚠️ Important! ⚠️ Fixed bug in `predictMols` where the `feature_standardizer` was + not being applied to the calculated features. This bug was introduced in v2.1.0. + Models trained with v2.1.0 are compatible with v2.1.1, make sure to update + QSPRpred to v2.1.1 to ensure that the `feature_standardizer` is applied when + predicting on new molecules. ## Changes -- `QSPRModel.evaluate` moved to a separate class `EvaluationMethod` in `qsprpred.models.interfaces`, with subclasses for cross-validation and making predictions on a test set in `qsprpred.models.evaluation_methods` (`CrossValidation` and `EvaluateTestSetPerformance` respectively). -- `QSPRModel` attribute `scoreFunc` is removed. -- 'qspr/models' is no longer added to the output path of `QSPRModel.save`, allowing for complete control over the output path. -- `SKlearnMetrics.supportsTask` now uses a dictionary like dict[ModelTasks, list[str]] to map tasks to supported metric names. (#53) -- `GBMTRandomSplit` and `ScaffoldSplit` now use the `GBMTDataSplit` to create balanced splits. `RandomSplit` still functions the same way as a completely random test split. -- `PCMSplit` replaces `StratifiedPerTarget` and is compatible with `RandomSplit`, `ScaffoldSplit` and `ClusterSplit`. -- `DuplicatesFilter` refactored to`RepeatsFilter`, as it also captures scenarios where triplicates/quadruplicates are found in the dataset. These scenarios are now also covered by the respective UnitTest. -- The versioning scheme of development snapshots has changed from `devX` to `alphaX`/`betaX`, where `X` is an integer that increments with each release. -- The following model class have been renamed and moved: - - `models.models.QSPRsklearn` > `models.sklearn.SklearnModel` - - `deep.models.QSPRDNN` > `extra.gpu.models.dnn.DNNModel` - - `extra.models.pcm.ModelPCM` > `extra.models.pcm.PCMModel` - - `extra.models.pcm.QSPRsklearnPCM` > `extra.models.pcm.SklearnPCMModel` -- The command line interface modules now use input and output file paths instead - of automatically placing all files in a subfolder `qspr`, allowing for more - control over the output and input paths. ## New Features -- `GBMTDataSplit` - parent class to create globally balanced splits with the [gbmt-split](https://github.com/sohviluukkonen/gbmt-splits) package. -- `ClusterSplit` - splits data based clustering of molecular fingerprints (uses `GBMTDataSplit`). -- Raise error if search space for optuna optimization is missing search space type annotation or if type not in list. -- When installing package with pip, the commit hash and date of the installation is saved into `qsprpred._version` -- `HyperParameterOptimization` classes now accept a `evaluation_method` argument, which is an instance of `EvaluationMethod` (see above). This allows for hyperparameter optimization to be performed on a test set, or on a cross-validation set. (#11) -- `HyperParameterOptimization` now accepts `score_aggregation` argument, which is a function that takes a list of scores and returns a single score. This allows for the use of different aggregation functions, such as `np.mean` or `np.median` to combine scores from different folds. (#45) -- A new tutorial `adding_new_components.ipynb` has been added to the `tutorials` folder, which demonstrates how to add new model to QSPRpred. -- A new function `Metrics.checkMetricCompatibility` has been added, which checks if a metric is compatible with a given task and a given prediction methods (i.e. `predict` or `predictProba`) -- In `EvaluationMethod` (see above), an attribute `use_proba` has been added, which determines whether the `predict` or `predictProba` method is used to make predictions (#56). -- Add new descriptorset `SmilesDesc` to use the smiles strings as a descriptor. -- New module `early_stopping` with classes `EarlyStopping` and `EarlyStoppingMode` has been added. This module allows for more control over early stopping in models that support it. -- Add new descriptorset `SmilesDesc` to use the smiles strings as a descriptor. -- Refactoring of the test suite under `qsprpred.data` and improvement of temporary file handling (!114). -- `PyBoostModel` - QSPRpred wrapper for py-boost models. Requires optional `pyboost` dependencies. -- `ChempropModel` - QSPRpred wrapper for Chemprop models. Requires optional `deep` dependencies. -- The `data_CLI` argument `--log_transform` (`-lt`) has been changed to `--transform_data` (`-t`), which now accepts a number of transformations to apply to the target data. Available transformations are `log`, `log10`, `log2`, `sqrt`, `cbrt`, `exp`, `exp2`, `exp10`, `square`, `cube`, `reciprocal`. -- New `data_CLI`, `model_CLI` and `predict_CLI` argument `--skip_backup` (`-sb`) to skip the backup of the output files. WARNING: This will overwrite existing files. ## Removed Features -- `StratifiedPerTarget` is replaced by `PCMSplit`. diff --git a/qsprpred/models/interfaces.py b/qsprpred/models/interfaces.py index 3f23550b..0fb6288a 100644 --- a/qsprpred/models/interfaces.py +++ b/qsprpred/models/interfaces.py @@ -593,7 +593,7 @@ def convertToNumpy( data matrix and/or target matrix in np.ndarray format """ if isinstance(X, QSPRDataset): - X = X.getFeatures(raw=True, concat=True) + X = X.getFeatures(concat=True) if isinstance(X, pd.DataFrame): X = X.values if y is not None: