Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL update needed for the latest MEROPs database file #1054

Open
calizilla opened this issue Jul 11, 2024 · 5 comments
Open

URL update needed for the latest MEROPs database file #1054

calizilla opened this issue Jul 11, 2024 · 5 comments

Comments

@calizilla
Copy link

Are you using the latest release?
v 1.8.17

Describe the bug
Funannotate pulls down this MEROPs db file which has 5009 genes and was last updated in 2019:
https://ftp.ebi.ac.uk/pub/databases/merops/current_release/merops_scan.lib

The latest file (updated 2023) contains 5098 genes and has URL:
https://ftp.ebi.ac.uk/pub/databases/merops/current_release/meropsscan.lib

Simple change to line 141 of script funannotate/setupDB.py from:

fasta = os.path.join(FUNDB, 'merops_scan.lib')

to:

fasta = os.path.join(FUNDB, 'meropsscan.lib')

and change line 199 of funannotate/resources.py from:

"merops": "https://ftp.ebi.ac.uk/pub/databases/merops/current_release/merops_scan.lib",

to:

"merops": "https://ftp.ebi.ac.uk/pub/databases/merops/current_release/meropsscan.lib",

will resolve the issue.

@hyphaltip
Copy link
Collaborator

I'll make this change but it ultimately looks like a bug/problem with MEROPS release to not use the same file name in the latest release? did you also inform them of this issue - seems like this will bite a lot of people who assume the filename structure would stay same between releases

@calizilla
Copy link
Author

@hyphaltip thanks for the fix; and fair point - I just emailed [email protected] to advise of the issue and suggested they maintain copies at both filenames

hyphaltip added a commit that referenced this issue Aug 12, 2024
update MEROPS release filename is now 'meropsscan.lib' per issue #1054
@hyphaltip
Copy link
Collaborator

I pushed the new version as the default and it required a manual change to the code as the version is hardcoded in the code @nextgenusfs ? we can fix this in funannotate2 - though wish EBI would provide version number as a parseable option in their repository.

@calizilla
Copy link
Author

calizilla commented Aug 14, 2024

@hyphaltip thanks. I still have not yeard back from EBI regarding the issue on their end.

Just wondering why funannotate chooses to use the meropscan.lib database rather than the pepunit.lib? I have now re-annotated my genomes (one fungus and one plant) using pepunit.lib and obtained far more MEROPs hits against pepunit (see below table). This is the number of unique MEROPs annotated genes, not the total number of hits to the respective database.

meropscan.lib pepunit.lib
plant 1180 2676
fungus 492 1804

@hyphaltip
Copy link
Collaborator

hyphaltip commented Aug 19, 2024

that's a jon @nextgenusfs question - he implemented this.

In my own work, if I am doing comparative genomics I end up running my suite of protein domain profiling from the predicted proteins rather than really worrying about the annotation that is part of the final genbank record as I would likely want to run this for the most up-to-date version of DBs. So its good you can get your own results for the DB you want rather than necessarily depending on funannotate for that as these are just added annotations in genbank files.

I'm not familiar with the nuance of these MEROPs files anyways so if you have an explanation of what each provide maybe there is a better one for the general goals of the toolkit here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants