Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for concept "nullizygosity" #62

Closed
ppavlidis opened this issue Aug 10, 2023 · 5 comments
Closed

Add support for concept "nullizygosity" #62

ppavlidis opened this issue Aug 10, 2023 · 5 comments

Comments

@ppavlidis
Copy link

This is a concept supported by NCIT (http://purl.obolibrary.org/obo/NCIT_C148063) but not GENO, while both ontologies have (unrelated) classes for homozygous etc.

My use case is annotating data.

While I'm at it, I'm looking for general insight into the relationship among GENO, SO, VariO and NCIT. There is duplications of concepts, but also gaps, that have no obvious rhyme or reason. Your readme says that GENO is "orthogonal" to these other ontologies, but I'm having trouble understanding what you mean. For example, VariO, SO and GENO all have a separate class "allele".

To give another example, to express the idea of a constitutively activating mutation, there is nothing I can find that captures this in a compact way. PATO has the more generic concept 'constitutively active' while SO has gain_of_function_variant, VariO has 'gain of function' as well, but GENO doesn't. (I don't consider 'gain of function' and 'constitutively active' to be synonymous)

Is this by design? What is the best practice for data annotators? Should we be using all of these ontologies? If so, how do we choose which "allele" or "homozygous" concept to use?

Thanks!

@mbrush
Copy link
Member

mbrush commented Aug 16, 2023

Thanks for the questions @ppavlidis. First, a bit of background. GENO was created initially for the Monarch Initiative and its model organism focus - to support annotating, representing, and computing over knowledge about the types and levels and relationships between genetic variation. Working with ClinGen - we began to generalize/expand to cover more human-centric concepts, under the same high level conceptual framework as model organism concepts . . . the first attempt to ontologically unite these two worlds in such a way AFAIK. More on the specific use cases driving GENO development can be found here.

The statement about GENO's 'orthogonality' to other ontologies covering genetic concepts reflects the fact that the scope of GENO overlaps with some but ultimately has a different focus and perspective.

  • GENO focuses on describing forms and levels of genetic variation, and attributes characterizing this variation.
  • SO focuses more on canonical types of features in a genome (gene, exon, promoter, etc). It delves a bit into variation-related concepts, but this is not its primary remit, and IMO their characterization is limited and inconsistent. That said, as possible GENO builds on foundational/canonical concepts in SO that are the subjects/targets of variation.
  • VARIO to my knowledge is more a catalog of the ways variations can affect the structure and function of DNA and proteins. It's focuses is on the molecular and biological consequences that result, not on the form and architecture of the variation itself.
  • NCIT as you know is very broad scope. It covers some concepts related to genetic variation, but in a less systematic and ontologically rigorous way - in this subdomain, it is not as principled or internally consistent (not surprising given its extremely broad scope and use case).

We do acknowledge that the overlap of GENO with these and other ontologies can raise questions annotators like yourself. Esp when it comes to terms like 'allele' that appear in all of them with subtly different definitions ('allele' is just one of those terms that is used differently in different communities of practice - GENO tries to define it broadly to encompass all usages, and then define more specific terms below it to cover more precise meanings)

For your annotation use case, if you are ok using terms form multiple ontologies, we try to make logical connections between GENO and others through our re-use and mappings (you'll note that many classes in GENO are imported form SO). If you are looking for a single ontology to use as a source of annotations, and GENO seems to fit your scope - we are happy to add new terms that may be missing if they are a fit for GENO.

This brings us to your request for a 'nullizygous' term. This concept speaks to the functionality of alleles in a single locus complement - specifically one where both copies of a gene at the locus are non-functional. It strays a bit from the more foundational terms currently in GENO's zygosity hierarchy, but I think it would be ok to add 'nullizygous' as a direct child of 'disomic zygosity'. (I realize that we are on the hook to provide definitions for several other existing zygosity terms, including 'disomic zygosity' - I opened a separate ticket for this here).

Hope this helps, but I expect you will have more questions/feedback. Happy to continue the conversation here.

@ppavlidis
Copy link
Author

Thanks Matthew.

Could you comment on the "activating mutation" question? Is this also in the scope of GENO and if not, where?

For our infrastructure and curation workflows, we want to minimize how many ontologies we deal with, for many reasons. So we don't use NCIT (partly for the reasons you mention) or VARIO, but we do use SO.

My over-aching question is whether we should use GENO at all. It only makes sense if GENO is sufficiently comprehensive/adds value (for us) and is expected to have some longevity. In the last 20+ years the number of times we've had to switch from one ontology to another, deal with sudden bulk deprecation or even deletion of terms, etc. has been a real and costly issue. Sometimes makes free text seem the way to go.

@mbrush
Copy link
Member

mbrush commented Sep 2, 2023

In the last 20+ years the number of times we've had to switch from one ontology to another, deal with sudden bulk deprecation or even deletion of terms, etc. has been a real and costly issue.

You've hit on one of the major challenges for annotators using ontologies. If the terms you need for you use case fits within the scope of the Sequence Ontology, such that they could add missing terms to fill gaps you identify - they are probably the safer bet, given how long they have been around, and their large number of users.

If you feel that GENO is a better fit w.r.t. the perspective it takes on the domain, and its focus on variation related concepts, we would be happy to add terms you require that we feel are in scope.

I don’t know the full scope of terms you need/anticipate needing - I can advise better if I did.

Re: using multiple ontologies - in theory, OBO ontologies are supposed to be built to work together with each other - as many projects need to annotate or model data that spans the scope of >1 ontology. In practice, this is easier said than done.

What projects often do is construct an 'application ontology' by combining terms from one or more 'domain ontologies' like SO, GENO, Vario - you build something that is suited specifically for their use case. There is a maintenance cost here, but it is often the only way for a project to get everything it needs. Even if there is drift between the source ontologies and your application ontology, a base level of interoperability will persist. And use of an ontology in general will provide a structured terminology to draw terms from - which is usually a better way to go that resorting to free text!

Finally, re: 'activating mutation' - this is the same type of concept as 'gain of function mutation' (in that they are based on the impact of the mutation on gene function). The SO has a 'functional_effect_variant' branch dedicated to these types of concepts - and I am surprised there is no term here for 'activating mutation'. Seems like it might fit nicely under 'functionally_abnormal'? I would think this a better home for such a concept than GENO. If you submit a request to their issue tracker here, that includes a suggested name, definition, and placement in the current SO hierarchy, I suspect they can add the term you need.

@ppavlidis
Copy link
Author

We curate genomics data sets. As genotypes are frequently manipulated/studied in such work, we need to describe a range of situations. SO does cover quite a bit of what we need. GENO has 'heterozygous' and 'homozygous' but not 'nullizygous', but I hesitate to import GENO just for a few terms. But if that's the best practice in the field, we can do it.

We do have a tiny ontology where we have some terms we want (especially ones we only use internally), but this was just mostly a placeholder with the expectation that more established ontologies would support them eventually as the terms in question are not obscure. Since our data are exposed publicly, I feel it is best if we use terms from more widely-used ontologies, rather than generating our own IRIs. But I also think using related concepts represented in different unconnected ontologies has less value than something that is better-integrated.

And we don't have the resources or expertise to be ontology developers, at least not in any substantial way. Here I'm testing the waters to see how difficult it is to get gaps addressed by the experts. When I look at term request trackers, I see unfulfilled issues that are years old - including in SO. Having to shop terms among different ontology groups is another pain point. (Yes, I know everybody is strapped for resources etc.)

For what it's worth, I filed The-Sequence-Ontology/SO-Ontologies#625

This discussion is quickly getting to be of broader scope than my original question, sorry about that, happy to take it to some other venue if you prefer. But just to recap the key points, I think you are going to add 'nullizygous' to GENO, and we'll wait to see what SO says about the other one?

@mbrush
Copy link
Member

mbrush commented Oct 3, 2023

Closing this issue with the creation of #63 to address the primary concern of the original issue here. Note there that 'nullizygous' has been added to GENO, along with definitions for other zygosity terms.

@mbrush mbrush closed this as completed Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants