Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing plants in PlusPFP #14

Open
bioreactordan opened this issue Jan 12, 2023 · 2 comments
Open

Missing plants in PlusPFP #14

bioreactordan opened this issue Jan 12, 2023 · 2 comments

Comments

@bioreactordan
Copy link

Hi,

I'm using PlusPFP to identify algae species (plants). However, the algae species of interest (Chlorella sorokiniana) is not represented in the index even though it is listed in RefSeq with a full genome as of October 2022. Did you filter RefSeq for plants in any way that would remove certain genomes? I cross-checked RefSeq plant with the .txt file and some of the 2,663 organisms listed under the plant category in RefSeq are not in the index.

Thanks

@mclaugsf
Copy link

We noticed the same thing; Humulus lupulus (hops) is missing along with anything from the Humulus genome. Curious how this got skipped while Cannabis sativa is present?

@BenLangmead
Copy link
Owner

I can unravel at least 1 level of the mystery, which is that we use this file to determine what to download: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/assembly_summary.txt

And the mentioned genome seems not to be in that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants