This repository contains scripts that can help you create benchmark data set for protein modeling or protein design.
python3
You can use the dependencies/install_dependencies.py
script to install Biopython and docopt.
It can also download DSSP for you. But you need to install DSSP, MSMS and PyMol yourself (make sure that dssp, msms
and pymol are in your PATH).
Before diving into how these scripts work, let's try an example first. Make sure that the dependency packages and applications are installed. Then run:
./run_benchmark_constructor.py my_set job_scripts/multiple_loop.py -a inputs/kic/
This will take a couple of minutes to finish. Then you will find that a data/0_my_set/
directory is created. Inside this directory is the benchmark set constructed by the
scripts.
In general, constructing a benchmark dataset usually has three steps:
- Collect a set of candidate structures.
- Filter out structures that don't meet certain criteria.
- Write the structual information into specific file format.
The benchmark_constructor
module provides some structure_collectors
,
filters
and file_normalizers
to do the jobs listed above. You can
also write your own customized classes. After getting all the classes you need, you
need to assemble them into a python script (see job_scripts/multiple_loop.py
as an example) and run the script with run_benchmark_constructor.py
. You can
run you job either in sequential or parallel. See benchmark_constructor/README.rst
for writing new classes.