-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New genome and annotations for Chamaecrista fasciculata (two haplotypes) #208
Comments
This one is back in play, following our discussion about handling haplotype-resolved assemblies. |
@StevenCannon-USDA should have the AHRDs on these two completed soon and will move from annex to main datastore. My preference would be to move them both there since it seems like it would make sense to include them both in at least some (if not all) downstream systems. But wanted to confirm with you since I think originally you were planning to leave secondary haplotypes in the annex. Also one very minor note, it seems that the procedure you're using for the upstream processing is producing uncompressed gff3 for the gene_models_main files, although they have the .gz suffix. Not really a problem since we have to add the AHRD stuff in and redo compression/indexing but it is a bit confusing when gunzip complains... |
Thanks for the alert about the uncompressed GFF3s. I suspect that was due to some additional manual stuff I did when the automated compression failed (I think) due to an interrupted session. |
OK, the data content related tasks (AHRD/BUSCO/gfa) should be complete and I've moved the folders into the main datastore; downstream steps will proceed as time permits but if there's any you consider higher priority than others let me know. Regarding the compression, it definitely was an issue on both haplotypes and I feel like I've seen it before but not %100 sure about that. In any case if I see it again I'll let you know. |
OK, thank you. I'll also investigate the compression issue -- at least next time I run the process.
|
well that looks pretty straightforward- but now that I think about it some more I don't think an interrupted session would explain the observed behavior which is as if the original file were simply renamed with a .gz suffix. Is it possible that there's something else that just names it with a gz extension (in which case the code above wouldn't even see it there)? |
Helpful suggestion/clue. You are right.
I'll plan to add checks for this in ds_souschef.pl once I've finished some other tasks. |
Main steps for adding new genome and annotation collections
Genus/species/collection names:
Haplotype 1:
Haplotype 2:
Chamaecrista/fasciculata/genomes/ISC494698.gnm1_hap2.G6BY
Chamaecrista/fasciculata/annotations/ISC494698.gnm1_hap2.ann1.WXZF
Add collection(s) to the Data Store, including commits to datastore-metadata
Validate the README(s)
Update about_this_collection.yml
Calculate AHRD functional annotations
Calculate gene family assignments (.gfa)
[N/A ] Add to pan-gene set
Load relevant mine
Add BLAST targets
Incorporate into GCV
Update the jekyll collections listing
Update browser configs
run BUSCO
Update DSCensor
Add LINKOUTS to datastore, refresh linkout service
The text was updated successfully, but these errors were encountered: