-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFO: "gene_functions" collection #40
Comments
Let's make DOI required, since it is in the other READMEs and I use DOI to fill out the Publication object. PMID must be optional, of course. There are some older papers that don't have DOIs, and I say let's not cite them. This is because folks forget to put the DOI in. If it's optional, then it doesn't fail validation. |
The journal I come across frequently that lacks PMID is Crop Science. But I'm fine with requiring DOI and making PMID optional. |
I'd like to add an optional key, "phenotype_description", to hold a free-text brief description of the phenotype described by the gene_function record. Examples:
|
So those are in addition to, but not linked to in any way, the ontology terms. I'd argue that any specific "phenotype description" should be associated with an ontology term, such as:
Otherwise, they're just orphaned text attributes that don't link to anything higher up. (And, reminder, the spec needs to be updated to put relations with the entities that they refer to. Order doesn't have meaning in YAML.) |
A single "phenotype_description" key-value pair, to hold the human-readable gestalt description. These may sometimes be fairly complex, whereas the ontology terms are "pointillistic" and often difficult to select appropriately. The phehotype_description would, indeed, be orphaned relative to the atomic ontology terms. Here are some examples from some work-in-progress:
|
Ahh, OK, so a single YAML has a single phenotype_description which is therefore associated with all the listed traits. Gotcha. Kinda like a description or summary. |
@sammyjava - right. So maybe "phenotype_summary" conveys the idea better. |
Well sometimes we have a summary "Doesn't make nodules; infection thread aborts" and a longer description that describes the measurement, e.g. "Nodule formation was inspected using a confocal microscope; if fewer than 10 nodules are present on an full root strand then the phenotype is defined as Doesn't make nodules." (I'm sure I got that wrong, but you get the idea.) Something to consider since you're adding in bespoke trait attributes. |
Brevity is a virtue. |
Sorry: for continuity with other READMEs, let's make it "phenotype_synopsis" rather than "...description" or "...summary". I'll make it so. |
would it make sense to associate the phenotype in this sense with the reference that described it? Just thinking that the specifics of the phenotype in this sense will depend on the type of mutation of the gene (induced knockout/overexpression/natural variation) in which deviation from wild-type is observed. In any case, presumably such a description is derived from specific reference, but if it would be a synthesis across several that we don't plan to tie to specific alleles, then top-level as you have suggestion is appropriate. Just something to consider. |
It would - but at the cost of more "method and protocol". We would end up doing it wrong or inconsistently. Overall, my preference is to try to keep things simple where possible. Somewhat relatedly: one of my take-aways from the pain of this paper ... Oellrich et al., 2015(url) ... is that ontologies are cumbersome and difficult to apply well, difficult to compose into meaningful "sentences," etc. So, I'll encourage focusing on the entities (anatomy or trait terms) and discourage use of relation and quality terms. I am revising the README now, and will write a protocols document. |
Yeah, FWIW we only have regular terms associated with stuff in the mines, not quality or relation terms. The ontologies themselves have their heirarchy, of course, but I just find a term that goes with a trait and if it's up- or down- or whatever I don't add that. Every term is standalone, they are not linked. |
I propose formats and methods for collecting and storing information about genes experimentally associated with phenotypes. See the description in the README and examples of the three file types in this datastore-specifications directory.
You can also see a few more examples, and the two associated scripts, in this repository (which will go away once the RFO is settled).
A few comments about my objectives and philosophy behind the specification:
gene_model_full_id
,confidence
,traits: entity
,references: citation
,references: [doi or pmid]
. In total, there are nine top-level keys, and essentially five second-level keys.---
) as there are genes-with-described-functions. Each document (kind of a "function card") is unnamed, but a primary key could be composed from two required fields: gene_model_full_id and the first ontology accession, e.g.glyma.Wm82.gnm2.ann1.Glyma.10G221500 and TO:0002616
(for flowering time).The text was updated successfully, but these errors were encountered: