This repository contains a Common Workflow Language (CWL) implementation of the 1000Genomes Workflow, initially implemented for the Pegasus workflow management system.
Note that all the metadata reported in the *.cwl
files, particularly the s:author
and s:license
fields, concern only the CWL descriptions themselves, not the related script, whose License is reported on the original repository. If you want to give credit to the original 1000Genomes Workflow, please cite the following article:
Rafael Ferreira da Silva, Rosa Filgueira, Ewa Deelman, Erola Pairo-Castineira, Ian M. Overton, Malcolm P. Atkinson, Using simple PID-inspired controllers for online resilient resource management of distributed scientific workflows, Future Generation Computer Systems, 95, 615-628, 2019, ISSN 0167-739X, https://doi.org/10.1016/j.future.2019.01.015.
Running this workflow requires a CWL runner. For example, the CWL reference implementation, called cwltool, can be installed as follows:
python3 -m venv venv
source venv/bin/activate
pip install cwlref-runner
Two workflow steps require a Python>=3.6
interpreter and the packages listed in the requirements.txt
file. Such packages can be installed as follows:
pip install -r requirements.txt
Workflow input data are stored online on the 1000Genomes workflow repository and the 1000Genomes FTP server. The download_data.sh
script creates the data directory structure and downloads all the required data in the proper locations.
Once all software and data dependencies are installed, the workflow can be launched using the following command:
cwl-runner main.cwl config.yml