Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software installation instructions #3

Open
tavareshugo opened this issue Mar 4, 2024 · 13 comments
Open

Software installation instructions #3

tavareshugo opened this issue Mar 4, 2024 · 13 comments

Comments

@tavareshugo
Copy link
Contributor

@lkalmar can you list here on this issue all the software that is used in the course?
So we can keep track of things for future iterations of the course.

@lkalmar
Copy link
Contributor

lkalmar commented Mar 4, 2024

R packages:
dada2, phyloseq, Biostrings, ggplot2, reshape2, readxl, tydiverse

Command line applications:
fastqc, multiqc, cutadapt, trimmomatic, bowtie2, samtools, metaphlan, mash, SPAdes, clumpify.sh (part of the bbmap package), flash (for merging reads), maxbin2, checkm (database!), gtdbtk (database!), prokka, abricate (database!)

@tavareshugo
Copy link
Contributor Author

Can I check:

  1. It's checkm (version 1), not checkm2?
  2. Do you have commands/links to download all those databases?

@lkalmar
Copy link
Contributor

lkalmar commented Mar 4, 2024

Can I check:

  1. It's checkm (version 1), not checkm2?
  2. Do you have commands/links to download all those databases?
  1. it is the original checkm, that is still the gold standard, but I will look into checkm2
  2. I think all of these are conda installable, but to be sure, I will collect all the installations here.

@tavareshugo
Copy link
Contributor Author

For point 2. I just meant the databases, not the software itself.
For example, with CheckM2 they have a command checkm2 database --download --path <output>.

I don't think CheckM (version 1) has a command, but there is this: https://data.ace.uq.edu.au/public/CheckM_databases/
Is that the correct database to download?

For the other programs, I don't know if the databases come with the software or if they need to be installed separately.

@tavareshugo
Copy link
Contributor Author

gtdbtk database can be obtained with the download-db.sh command

@tavareshugo
Copy link
Contributor Author

abricate just seems to have the databases as part of the installation.

@lkalmar I've updated the data and setup page, would you mind revising before I close this issue?

@lkalmar
Copy link
Contributor

lkalmar commented Mar 4, 2024

Yes, checkm has a database and you need to set the database path. Abricate comes with the database, but there is a way to update those here

@lkalmar
Copy link
Contributor

lkalmar commented Mar 4, 2024

Otherwise, the update is perfect, please close the issue

@tavareshugo
Copy link
Contributor Author

Reopening as we are missing MetaPhlan database.

In this page they recommend for conda installations:

metaphlan --install --bowtie2db <database folder>

Does this look right @lkalmar?

In the future we would then need to adjust the materials to point to a --bowtie2db folder that we decide to save the database into.

@tavareshugo tavareshugo reopened this Mar 6, 2024
@lkalmar
Copy link
Contributor

lkalmar commented Mar 6, 2024

You either install the DB in your conda / miniconda / mamba / micromamba folder with the simple command metaphlan --install (not recommended on the HPC, but on own computer or in-house server this is simpler).

Or, you define the database path with the above mentioned metaphlan --install --bowtie2db <database folder> but in that case you have to define the database path during the run.

@tavareshugo
Copy link
Contributor Author

tavareshugo commented Mar 6, 2024

Installation scripts from metaphaln and gtdb-tk are not reliable.

  • MetaPhlan: follow instructions here to find the FTP link to their databases. Will need to dig into their install scripts to figure out exactly what files we need.
  • GTDB-Tk: Follow instructions to download manually

@tavareshugo
Copy link
Contributor Author

it's a bad idea to have everything in the same environment, due to dependency conflicts (e.g. an old version of maxbin).

Update the instructions to have each software in a separate environment.

@lkalmar
Copy link
Contributor

lkalmar commented Mar 7, 2024

Maybe not each, than we end up with a huge number of envs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants