Performance and model evaluation of CALLC. The evaluation can be replicated by taking the following steps:
In order to evaluate CALLC we need to generate some predictions. In total there are 5 prediction sets that can be distinguished;
- Learning curves with duplicate analyte structures across datasets
- Learning curves without duplicate analyte structures across datasets
- Cross-validation without duplicate analyte structures across datasets
- Cross-validation with duplicate analyte structures across datasets
- Comparison with Aicheler et al
However, we need models in Layer 1, so rerun the following code:
CALLC/new_train_l1.py
The freshly trained models will be located in:
CALLC/mods_l1/
With these models we can generate the prediction sets mentioned above. Once the code below from 1.x is run the predictions can be found in:
CALLC/test_preds/
Place these predictions in the appropiate folder located in:
data/predictions/
In order to generate predictions for the learning curve with duplicate analytes structures across datasets, run the following:
CALLC/initial_train.py
Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.
In order to generate predictions for the learning curve without duplicate analytes structures across datasets, run the following:
CALLC/initial_train.py
Make sure the main function gets the value 'train/retmetfeatures_nodup.csv' for the parameter infilen.
In order to generate predictions for the CV with duplicate analytes structures across datasets, run the following:
CALLC/initial_train_CV.py
Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.
In order to generate predictions for the CV without duplicate analytes structures across datasets, run the following:
CALLC/initial_train_CV.py
Make sure the main function gets the value 'train/retmetfeatures_nodup.csv' for the parameter infilen.
In order to generate predictions for the comparison with the aicheler model run the following:
CALLC/initial_train_aicheler.py
Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.
After you have placed all predictions in:
data/predictions/
You can run the following script:
./parse_predictions.py
The predictions will be parsed and for each specific metric a result file will be placed in the appropiate folders here:
data/parsed/
Run the following code to replicate the figures:
./manuscript_figs.R
Make sure all the parsed predictions are located here:
data/parsed/
Then the figures can be found here:
figures/