Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate an identification extension to track changes in taxonomic assignment #120

Open
CecSve opened this issue Aug 15, 2024 · 4 comments

Comments

@CecSve
Copy link

CecSve commented Aug 15, 2024

Tool users supply a taxonomy file when the data is processed by the tool to generate a dwc-a. Ideally, the scientificName is either a BIN, SH etc. and it is possible to include Linnean ranks with further taxonomic identification.

Would it make sense to support a verbatimIdentification and perhaps an identificationRemarks field to the generated archive, where the original identification (maybe already used in scientific publications) can be added? Maybe more fields would be relevant and could be packaged as an extension file, although the fields mentioned could also just be added to the occurrence core file.

It could allow data users to track the changes in taxonomic identification.

@thomasstjerne
Copy link
Collaborator

thomasstjerne commented Aug 15, 2024

Actually, verbatimIdentification and identificationRemarks are both in the default list of fields listed in the taxonomy mapping. People use the fields in slightly different ways, but verbatimIdentification is often for the full taxonomy string retrieved from your blasting or whatever assigment tool you use e.g. k__Stramenopila;p__Ochrophyta;c__Phaeophyceae;o__Fucales;f__Sargassaceae;g__Sargassum;s__Sargassum_sp

Also, any field in Occurrence Core or DNA Derived data can be added by a user even though they are not in the default list.

Screenshot at Aug 15 10-42-24

@CecSve
Copy link
Author

CecSve commented Aug 15, 2024

Oh great - that makes sense. I was just wondering if the tool should automatically fill the verbatimIdentification field based on the input from the publisher? It could be used as the original identification to track changes.

@CecSve
Copy link
Author

CecSve commented Aug 15, 2024

And the identificationRemarks could include information about the values and refDB if users opt to use the seqID tool to assign taxonomy, for example:

bitScore: 111 | expectValue: 4.03e-24 | queryCoverage: 100 | matchType: BLAST_EXACT_MATCH | queried against a 99% clustered version of the BOLD Public Database v2024-01-06 public data (COI-5P sequences)

@thomasstjerne
Copy link
Collaborator

And the identificationRemarks could include information about the values and refDB if users opt to use the seqID tool to assign taxonomy, for example:

bitScore: 111 | expectValue: 4.03e-24 | queryCoverage: 100 | matchType: BLAST_EXACT_MATCH | queried against a 99% clustered version of the BOLD Public Database v2024-01-06 public data (COI-5P sequences)

Yes - eaxactly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants