Skip to content

Commit

Permalink
Merge pull request #19 from VIB-PSB/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
hdbeukel authored May 13, 2024
2 parents 24eef94 + 7d191b4 commit da1b4dd
Show file tree
Hide file tree
Showing 8 changed files with 91 additions and 216 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/configuration_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ There are mainly two cases in which the user might want to alter the internal MI

### Modification of the motif mapping file for the locus-based mode of maize

By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile_lb```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.
By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.

To download the maize "large" motif mapping file and coordinates of the "large" non-coding genomic space:

Expand Down Expand Up @@ -192,14 +192,14 @@ wget https://zenodo.org/record/8386283/files/zma_v5_promoter_1kbup_1kbdown_sorte
Then (using the "small" definition as example), change the parameters on the command line:

```
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile_lb data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
```
or add them to the configuration file, along with the other parameters:

```nextflow
params {
/// [Other parameters...]
MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
MotMapsFile = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed"
/// [Other parameters...]
}
Expand Down
142 changes: 33 additions & 109 deletions mini_ac.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,129 +10,53 @@ workflow MINIAC {
params.Shuffle_seed = -1
params.Csv_output = false

if (params.mode == "genome_wide" && params.species == "maize_v4") {

params.MotMapsFile_gw = "$projectDir/data/zma_v4/zma_v4_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v4/zma_v4_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v4/zma_v4_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
// define species id used for data subfolder and data file prefix
def species
switch(params.species) {
case "arabidopsis":
species = "ath"
break
case "maize_v4":
species = "zma_v4"
break
case "maize_v5":
species = "zma_v5"
break
default:
exit 1, "MINI-AC can only be run for the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}'."
}

else if (params.mode == "genome_wide" && params.species == "maize_v5") {
// set input data parameters shared between genome-wide and locus-based modes
params.Faix_file = "$projectDir/data/${species}/${species}.fasta.fai"
params.Motif_tf_file = "$projectDir/data/${species}/${species}_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/${species}/${species}_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/${species}/${species}_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/${species}/${species}_gene_metadata_file.txt"

params.MotMapsFile_gw = "$projectDir/data/zma_v5/zma_v5_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v5/zma_v5_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v5/zma_v5_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.1
if (params.mode == "genome_wide") {

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else if (params.mode == "genome_wide" && params.species == "arabidopsis") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/${species}/${species}_noncod_merged.bed"
params.Genes_coords = "$projectDir/data/${species}/${species}_genes_coords_sorted.bed"

params.MotMapsFile_gw = "$projectDir/data/ath/ath_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/ath/ath_noncod_merged.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/ath/ath_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}

else if (params.mode == "locus_based" && params.species == "maize_v4") {

params.MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}
genome_wide_miniac(params)

} else if (params.mode == "locus_based") {

else if (params.mode == "locus_based" && params.species == "maize_v5") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/${species}/${species}_promoter_5kbup_1kbdown_sorted.bed"

params.MotMapsFile_lb = "$projectDir/data/zma_v5/zma_v5_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v5/zma_v5_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

locus_based_miniac(params)

} else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' or 'locus_based'. Instead it got '${params.mode}'."
}

else if (params.mode == "locus_based" && params.species == "arabidopsis") {

params.MotMapsFile_lb = "$projectDir/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' and 'locus_based', and with the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}' and '${params.mode}' "
}
}


workflow {
MINIAC()
}
16 changes: 8 additions & 8 deletions tests/mini_ac.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
Non_cod_genome = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_noncod_merged_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/zma_v4/zma_v4_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -91,13 +91,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
Promoter_file = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_promoter_5kbup_1kbdown_sorted_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -153,14 +153,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
MotMapsFile = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
Non_cod_genome = "${baseDir}/data/ath/ath_noncod_merged.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/ath/ath_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -220,13 +220,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
MotMapsFile = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
Promoter_file = "${baseDir}/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down
Loading

0 comments on commit da1b4dd

Please sign in to comment.