CALLC_evaluation

Performance and model evaluation of CALLC. The evaluation can be replicated by taking the following steps:

1. Generating predictions

In order to evaluate CALLC we need to generate some predictions. In total there are 5 prediction sets that can be distinguished;

Learning curves with duplicate analyte structures across datasets
Learning curves without duplicate analyte structures across datasets
Cross-validation without duplicate analyte structures across datasets
Cross-validation with duplicate analyte structures across datasets
Comparison with Aicheler et al

However, we need models in Layer 1, so rerun the following code:

CALLC/new_train_l1.py

The freshly trained models will be located in:

CALLC/mods_l1/

With these models we can generate the prediction sets mentioned above. Once the code below from 1.x is run the predictions can be found in:

CALLC/test_preds/

Place these predictions in the appropiate folder located in:

data/predictions/

1.1 Learning Curve

In order to generate predictions for the learning curve with duplicate analytes structures across datasets, run the following:

CALLC/initial_train.py

Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.

In order to generate predictions for the learning curve without duplicate analytes structures across datasets, run the following:

CALLC/initial_train.py

Make sure the main function gets the value 'train/retmetfeatures_nodup.csv' for the parameter infilen.

1.2 Cross-Validation

In order to generate predictions for the CV with duplicate analytes structures across datasets, run the following:

CALLC/initial_train_CV.py

Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.

In order to generate predictions for the CV without duplicate analytes structures across datasets, run the following:

CALLC/initial_train_CV.py

Make sure the main function gets the value 'train/retmetfeatures_nodup.csv' for the parameter infilen.

1.3 Aicheler comparison

In order to generate predictions for the comparison with the aicheler model run the following:

CALLC/initial_train_aicheler.py

Make sure the main function gets the value 'train/retmetfeatures.csv' for the parameter infilen.

2. Parsing predictions

After you have placed all predictions in:

data/predictions/

You can run the following script:

./parse_predictions.py

The predictions will be parsed and for each specific metric a result file will be placed in the appropiate folders here:

data/parsed/

3. Generating figures

Run the following code to replicate the figures:

./manuscript_figs.R

Make sure all the parsed predictions are located here:

data/parsed/

Then the figures can be found here:

figures/

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CALLC		CALLC
data		data
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
l3coef.R		l3coef.R
l3coefs_proc.py		l3coefs_proc.py
manuscript_figs.R		manuscript_figs.R
parse_predictions.py		parse_predictions.py
performance_plots.ipynb		performance_plots.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CALLC_evaluation

1. Generating predictions

1.1 Learning Curve

1.2 Cross-Validation

1.3 Aicheler comparison

2. Parsing predictions

3. Generating figures

About

Releases

Packages

Languages

License

RobbinBouwmeester/CALLC_evaluation

Folders and files

Latest commit

History

Repository files navigation

CALLC_evaluation

1. Generating predictions

1.1 Learning Curve

1.2 Cross-Validation

1.3 Aicheler comparison

2. Parsing predictions

3. Generating figures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages