This repository is a fork from hclimente/gwas-tools with for now only the GWAS analysis part to prepare its adaptation for another project from Chloé-Agathe Azencott's team at the CBIO
gwas-tools contains pipelines for common use-cases when dealing with GWAS datasets, from data preprocessing to biomarker discovery.
Clone the repository, and add the bin folder to your path:
git clone [email protected]:kumquatum/gwas-tools.git
export PATH=$PATH:$PWD/gwas-tools/bin
- Nextflow
- Docker (optionnal)
Install all tools described below or build your own docker image based on the Dockerfile
provided (some tools are under copyright and prevent us from providing a docker image) with :
# Being in gwas-tools folder
docker build -t <name_of_your_image> .
The docker image can then used in nextflow by adding the parameter -with-docker <name_of_your_image>
.
Tool | License |
---|---|
BEDOPS | GPLv2 |
HotNet2 | Copyright |
IMPUTE | Copyright |
PLINK 1.90 | GPLv3 |
VEGAS2v02 | GPLv3 |
R::biglasso | GPLv3 |
R::bigmemory | LGPLv3 |
R::CASMAP | GPLv2 |
R::BioNet | GPLv2 |
R::dmGWASv3 | GPLv2 |
R::igraph | GPLv3 |
R::LEANR | GPLv3 |
R::martini | MIT |
R::ranger | GPLv3 |
R::SigModv2 | ? |
R::SKAT | GPLv3 |
R::snpStats | GPLv3 |
A partial minimal set of files is available in test/data
to demonstrate the use of gwas-tools. For the SConES tool to function, the PPI file need to be downloaded and prepared as in bin/templates/dbs/biogrid.sh
- With PLINK
vegas2.nf \
--bfile test/data/example \
--gencode 31 \
--genome 37 \
--buffer 50000 \
--vegas_params '-top 10' \
-with-docker <name_of_your_image>
- With regenie
# Extraction of the phenotype from fam before use of regenie
format_conversion.nf \
--file_to_convert test/data/example.fam \
--conversion_type "fam2phenotype" \
-with-docker <name_of_your_image>
vegas2_regenie.nf \
--bfile test/data/example \
--phenotype examplkke.tsv \
--regenie_params_s1 "\-\-cc12 \-\-exclude test/data/snplist_rm.txt" \
--regenie_params_s2 "\-\-cc12 \-\-exclude test/data/snplist_rm.txt" \
--gencode 31 --genome 37 \
--buffer 50000 \
--vegas_params '-top 10' \
-with-docker <name_of_your_image>
Different references exists for gene id : Ensembl, HGNC (also known as gene symbol), entrez. Depending on your interaction file provided (protein protein interaction network or else), you will may have to convert your ids from one to another. This command generates a table with equivalences based on GENCODE and HGNC from the SNPs in a bim file.
snp2gene.nf \
--bim test/data/example.bim \
--genome GRCh38 \
-with-docker <name_of_your_image>
It can then be used to convert ids from one reference to another depending on the one used by your interaction file. Example with the conversion of the VEGAS pipeline output to hgnc (can also be done to ensembl with vegas2ensembl
):
format_conversion.nf \
--file_to_convert scored_genes.vegas.txt \
--conversion_type vegas2hgnc \
--additional_file snp2hgnc.tsv
Note : the headers of the reference file need to have the 3 columns named snp,ensembl_gene_id, hgnc_gene_id if you provide another one than the one from snp2gene.nf
pipeline
Multiple algorithms were adapted and benchmarked for the detection of SNPs associated to a phenotype. If you use any of the following algorithms, please cite the following article:
Climente-González H, Lonjou C, Lesueur F, GENESIS study group, Stoppa-Lyonnet D, et al. (2021) Boosting GWAS using biological networks: A study on susceptibility to familial breast cancer. PLOS Computational Biology 17(3): e1008819. https://doi.org/10.1371/journal.pcbi.1008819
- dmGWAS:
dmgwas.nf \
--vegas test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2 \
-with-docker <name_of_your_image>
- heinz:
heinz.nf \
--vegas test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2 \
--fdr 0.5 \
-with-docker <name_of_your_image>
- HotNet2:
hotnet2.nf \
--scores test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2 \
--hotnet2_path hotnet2 \
--lfdr_cutoff 0.125 \
-with-docker <name_of_your_image>
- LEAN:
lean.nf \
--vegas test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2 \
-with-docker <name_of_your_image>
- Sigmod:
# With docker
sigmod.nf \
--vegas test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2 \
-with-docker <name_of_your_image>
# Without docker
sigmod.nf \
--sigmod <path_to_your_SigMod_v2_folder> \
--vegas test/data/scored_genes.vegas.txt \
--tab2 test/data/tab2
- SConES:
old_scones.nf \
--bfile test/data/example \
--network gi \
--snp2gene test/data/snp2gene.tsv \
--tab2 test/data/tab2 \
-with-docker <name_of_your_image>
Usual mistakes :
- Having the wrong number of
-
for a pipeline parameter :-
is for nextflow parameters--
is for pipeline parameters