-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build an EFO term "precision" classification pipeline #2
Comments
I also prefer the proposed disease precision scale of low/medium/high to the former scale of area/root/subtype, since it avoids any confusion with the term "root". The proposed definitions and helpful, and as noted some examples would enrich the definitions, as would review of some terms that do not fit cleanly within a single category. Since our use case at Related Sciences is ultimately for drug development, these definitions are anchored to clinical characteristics and specificity. I imagine there might be some more non-clinical characteristics of each precision level that could enrich and solidify the definitions to support broader applications. Looping in @DnlRKorn @matentzn @zoependlington @nicolevasilevsky @d0choa in case you have feedback on whether classifying diseases based on precision would be useful and whether the low/medium/high scale and definitions make sense. This classification is something we're initially planning to perform on EFO but could extend to MONDO as well. |
@dhimmel I think from our perspective, the most significant distinction is between "true diagnosable disease" and "disease grouping"! This is sort of related to the "precision" mentioned here, but maybe needs other kinds of evidence, like "mentions in PubMed" etc. |
Thanks @matentzn for weighing in. I think a true diagnosable disease might be the union of the medium and high precision buckets, while disease grouping would be low. We also could create a 2-class outcome in addition that could be predicted from the feature set we create, which should include features like publication mentions and other things. Linking a related issue at monarch-initiative/mondo#685. Also I notice |
Most of these come from Mondo, and are the consequence in metamodelling of ontologies aligned with mondo. For example, there is a group of OrDO classes explicitly defined as groupings in ORDO, which make up that subset. For the more general disease_grouping subset, I think this was a fairly incomplete attempt to manually curate disease groupings. @nicolevasilevsky would know best! |
Linking #5, which added the labeled data from EFO v3.43.0 |
Related to #13, I wanted to add two visuals that clearly communicate something about why we did this originally:
Code: random_efo_term_samples.ipynb
As a graph, EFO is very difficult to work with. Removing the most general terms (i.e. precision= Note: "post-processing" in this case means creating a new graph with all nodes and edges for |
Nice very helpful. What do you think about recreating the classification example table but for each row including paired low, medium, high triplets. Such that the medium term would be a descendant of the low term, and the high precision term would be a descendant of the medium. This would exclude some terms from being selected, for example by excluding medium terms with no low ancestors. |
There are several EFO term classifications that could be useful. I propose we start with trying to assign a certain
precision
to terms based on the following definitions:high
: High precision terms have the greatest ontological specificity, sometimes (but not necessarily) correspond to small groups of relatively homogeneous patients, often have greater diagnostic certainty and typically represent the forefront of clinical practice, i.e. they're closest to precision/personalized medicine. Examples:medium
: Medium precision terms are the ontological ancestors ofhigh
precision terms (if any are known), often include indications in later stage clinical trials and generally reflect groups of patients assumed to be suffering from a condition with a shared, or at least similar, physiological or environmental origin. Examples:low
: Low precision terms are the ontological ancestors of bothmedium
andhigh
precision terms, group collections of diseases with some, but often not many, shared characteristics, maybe be named in early stage clinical trials and typically connote a relatively heterogenous patient population. They are also often terms used within the ontology for organizational purposes or completeness. Examples:note: More examples like this are given in #3.
I like this description of the task and these names/definitions more than the disease "subtype", "root" and "area" idea we had used before internally, and it better captures what I was initially trying to accomplish with that work anyhow. I'm certainly open to discussing it more though.
We can use some of the labels we already have to bootstrap this effort and I would say the next steps are:
I'll add some more details on those steps in related issues.
TODO:
The text was updated successfully, but these errors were encountered: