-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add genome and annotations for Acacia crassicarpa #195
Comments
Little FYI on this one, @StevenCannon-USDA ; it looks like most of the children of the gene features are given type "transcript" but some of the AHRD-related processing behaves badly if protein-coding transcripts are not given as mRNA. Let me know if you have any concerns about me making that change wholesale. I checked and the number of genes is only off by one from the number of primary proteins, so I think it's fair to assume they are mRNA. The discrepancy seems to be caused by a gene with ID=acacr.Acra3RX.gnm1.ann1.nbis-gene-1 which also has attributes: gene_id=g12430;transcript_id=g12430.t2 |
Thank you - and I don't have concerns about s/transcript/mRNA/. (Why are there a million ways to munge a GFF?) |
maybe you should write a song about it! ;) |
Oh, actually there's probably more to do on this file, but I'd like a second opinion. In addition to that one weird little gene, it looks like there are a bunch (~2000) of existing mRNA features that appear to essentially just be duplicates of what was originally represented as "transcript", but with an odd ID that nothing else references. For example:
note that the transcript_id=g3.t1 part of the first one seems to suggest it really is a duplicate of the one below. I'm proposing to just delete these "extras" since they don't appear to provide any value and are arguably detrimental in that they appear in the transcripts fasta without any splicing, since they have no exon children (they don't appear in cds or protein since they don't have CDS children). Let me know if you see something about these that I'm overlooking that would argue for their being preserved |
I agree with you: OK to delete those transcript records that seem to be duplicates of the mRNA features. History of this gene file: I received it in The original structure of the of the noncompliant records is:
In the gff3, this becomes
My reading of this is that the |
Thanks, this helps clarify- I think I've seen "nbis" appearing in other files occasionally too. In any case I'll delete the "nbis" records and retain the other to keep the naming consistent (and switch the "transcript" -> "mRNA"). |
Main steps for adding new genome and annotation collections
Genus/species/collection names:
Acacia/crassicarpa/genomes/Acra3RX.gnm1.YX4L
Acacia/crassicarpa/annotations/Acra3RX.gnm1.ann1.6C0V
Add collection(s) to the Data Store, including commits to datastore-metadata (at annex as of 2024-02-13)
Validate the README(s)
Update about_this_collection.yml
Calculate AHRD functional annotations
Calculate gene family assignments (.gfa)
Add to pan-gene set
Load relevant mine
Add BLAST targets
Incorporate into GCV
Update the jekyll collections listing
Update browser configs
run BUSCO
Update DSCensor
Add LINKOUTS to datastore, refresh linkout service
The text was updated successfully, but these errors were encountered: