-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metagenomics Remove Human Reads #93
base: DEV_Metagenomics_rmHR_NF_conversion
Are you sure you want to change the base?
Metagenomics Remove Human Reads #93
Conversation
would appreciate input on this!
…ow_Documentation/NF_MGEstHostReads-B/Estimate_Host_Reads.config to Metagenomics/Estimate_host_reads_in_raw_data/Workflow_Documentation/NF_MGEstHostReads-B/workflow_code/Estimate_Host_Reads.config
…ation/NF_MGEstHostReads-B/Estimate_Host_Reads.nf to Metagenomics/Estimate_host_reads_in_raw_data/Workflow_Documentation/NF_MGEstHostReads-B/workflow_code/Estimate_Host_Reads.nf
…ow_Documentation/NF_MGEstHostReads-B/workflow_code/Estimate_Host_Reads.config to Metagenomics/Estimate_host_reads_in_raw_data/Workflow_Documentation/NF_MGEstHostReads-B/workflow_code/config/Estimate_Host_Reads.config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments: the code blocks in the README need to be executable as-is. Ideally, we want users to be able to run the workflow example by copy/paste of the provided code blocks. As currently written, the README attempts to combine the genelab-utils based instructions from the snakemake workflows with the nextflow instructions from the nextflow workflows. Unfortunately, that is redundant/confusing. Pick one or the other. Further comments are in-line on the README, nextflow, and config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be moved one directory up so that the instructions are easier to find. This is also how all of the other workflows are structured (README with instruction in the main workflow folder, not in "workflow_code")
|
||
Nextflow can be installed either through [Anaconda](https://anaconda.org/bioconda/nextflow) or as documented on the [Nextflow documentation page](https://www.nextflow.io/docs/latest/getstarted.html). | ||
|
||
> Note: If you want to install Anaconda, we recommend installing a Miniconda, Python3 version appropriate for your system, as instructed by [Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro#getting-and-installing-conda). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant with the instructions in the previous step, which already have the user installing conda/mamba
|
||
### 2. Install Nextflow and Singularity | ||
|
||
#### 2a. Install Nextflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The genelab-utils conda package already includes Nextflow (version 22.10.6). Do we want to tell people to use that version (in which case we can update this text to say that Nextflow is included) or do we want to have separate instructions that make it clear that Nextflow installation is only needed if they want to use a different environment for running the workflow and not genelab-utils.
|
||
|
||
## General workflow info | ||
The current pipeline for how GeneLab identifies and removes human DNA in Illumina metagenomics sequencing data (MGRemoveHumanReads), [GL-DPPD-7105-A.md](../../Pipeline_GL-DPPD-7105_Versions/GL-DPPD-7105-A.md), is implemented as a [NextFlow](https://www.nextflow.io/docs/stable/index.html) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) run all tools in containers. This workflow (NF_MGRemoveHumanReads-A) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with NextFlow and Singularity, but if you want to learn more about those, [this NextFlow tutorial](https://training.nextflow.io/basic_training/) within [NextFlow's documentation](https://www.nextflow.io/docs/stable/index.html) is a good place to start for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current pipeline for how GeneLab identifies and removes human DNA in Illumina metagenomics sequencing data (MGRemoveHumanReads), [GL-DPPD-7105-A.md](../../Pipeline_GL-DPPD-7105_Versions/GL-DPPD-7105-A.md), is implemented as a [NextFlow](https://www.nextflow.io/docs/stable/index.html) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) run all tools in containers. This workflow (NF_MGRemoveHumanReads-A) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with NextFlow and Singularity, but if you want to learn more about those, [this NextFlow tutorial](https://training.nextflow.io/basic_training/) within [NextFlow's documentation](https://www.nextflow.io/docs/stable/index.html) is a good place to start for that. | |
The current pipeline for how GeneLab identifies and removes human DNA in Illumina metagenomics sequencing data (MGRemoveHumanReads), [GL-DPPD-7105-A.md](../../Pipeline_GL-DPPD-7105_Versions/GL-DPPD-7105-A.md), is implemented as a [NextFlow](https://www.nextflow.io/docs/stable/index.html) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_MGRemoveHumanReads-A) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating or modifying Nextflow workflows is not required to run the workflow as-is, the [Nextflow documentation](https://www.nextflow.io/docs/stable/index.html) is a useful resource for users who wish to modify and/or extend the workflow. |
...n_reads_from_raw_data/Workflow_Documentation/NF_MGRemoveHumanReads-A/workflow_code/README.md
Outdated
Show resolved
Hide resolved
.splitText() | ||
.map { it.trim() } | ||
.map { sample_id -> | ||
def files = file("${params.reads_dir}${sample_id}${params.PE_reads_suffix}").toList().sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def files = file("${params.reads_dir}${sample_id}${params.PE_reads_suffix}").toList().sort() | |
def files = file("${params.reads_dir}/${sample_id}${params.PE_reads_suffix}").toList().sort() | |
} | ||
} | ||
else { | ||
reads_ch = Channel.fromFilePairs(params.reads_dir + "*" + params.PE_reads_suffix, checkIfExists: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reads_ch = Channel.fromFilePairs(params.reads_dir + "*" + params.PE_reads_suffix, checkIfExists: true) | |
reads_ch = Channel.fromFilePairs(params.reads_dir + "/*" + params.PE_reads_suffix, checkIfExists: true) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README refers to the file with uppercase "Reads", so it's confusing for it to be lowercase "reads". Also, Nextflow doesn't automatically pickup this file. Either add a nextflow.config that points to this one or rename this to "nextflow.config"
|
||
params.sample_id_list = "/workspace/GeneLab_Data_Processing/rmv/unique_sample_ids.txt" //list of sample IDs to proccess if specify_reads is true | ||
|
||
params.reads_dir = "$projectDir/example-reads_PE/" //directory to find sample reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "$projectDir" refer to? It's OK to rely on the user setting a variable, but then you have to tell them to do so in the README.
params.num_threads = 2 | ||
params.kraken_output_dir = "$projectDir/kraken2-outputs" //location to output files, relative to wd or full path | ||
params.human_db_name = 'kraken2-human-db' // | ||
params.human_db_path = "$projectDir/${params.human_db_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the $projectDir isn't discussed in the README. In this case, it is also a good idea to make it clear to the user that the db path doesn't necessarily have to be the same as the read path and using "projectDir" for both is confusing, unless they're downloading the DB (which isn't the default behavior).
Download DB: ${params.DL_kraken} | ||
Single end reads: ${params.single_end} | ||
Use SampleID file: ${params.specify_reads} | ||
Outputs: ${params.human_db_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be the kraken_output_dir and not the human_db_path?
Co-authored-by: Barbara Novak <[email protected]>
…low_Documentation/NF_MGRemoveHumanReads-A/workflow_code/README.md to Metagenomics/Remove_human_reads_from_raw_data/Workflow_Documentation/NF_MGRemoveHumanReads-A/README.md corrected readme location and uploaded changes
changed WF name, fixed some documentation, and removed the Estimate Host Reads WF from PR @bnovak32 @asaravia-butler