-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul taxon subsets #3363
Overhaul taxon subsets #3363
Conversation
Use latest version (0.3.1) of the Uberon-specific ROBOT plugin, which provides a new command to facilitate the creation of taxon subsets.
The custom Makefile contains two sets of rules to create taxon subsets in two different ways: * one set using OWLTools' `--make-species-subset` command (resulting in the `*-basic.owl` subsets); * one set using the files in `src/ontology/contexts` to do basically the same thing as `--make-species-subset`, merely in a slightly different way (resulting in the `*-view.owl` subsets). This command replaces both sets of rules by a single rule that relies on the newly available `create-species-subset` command in the Uberon ROBOT plugin. In addition, a new rule is added to allow the creation of a component file that contains `oboInOwl:inSubset` annotations to tag all the classes that belong to a given subset. That rule is currently unused, but the expectation is that it could be used by downstream applications to facilitate the use of taxon-specific subsets.
There is no reason to have two different naming conventions for the taxon-specific subsets (-view and -basic). Let's settle for -view. This may require updating the PURL configuration for Uberon, if there are people out there that are using the euarchontoglires-basic.owl and/or amniote-basic.owl artifacts (though GitHub download stats suggest nobody ever downloaded them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the code changes, they look great, - I would like to offer a single word of caution: renaming files, even subset files, may break existing pipelines somewhere on the deep web unless we also add a purl redirect to the OBO purl config.
--reasoner ELK \ | ||
$(foreach root,$(TAXON_SUBSET_ROOTS),--root $(root)) \ | ||
reason --reasoner ELK --equivalent-classes-allowed all \ | ||
--exclude-tautologies structural \ | ||
relax \ | ||
remove --axioms equivalent \ | ||
relax \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't imagine what this second relax does but well since it's there..
I know. But precisely, this is to be handled at the PURL level, which is here exactly for this purpose. There is no reason for us to refrain from renaming files if it brings better consistency (in this case, by having all taxon subsets consistently named |
Hi @gouttegd - great to see this. I'm especially excited to see this:
I would really like the tags to in incorporated into the release files. We can use them straight away in our autosuggest pipelines to boost for species relevant terms. |
I wasn’t sure whether this was your preferred option, so for now the idea was to merely produce the But it certainly can be done directly upstream if preferred. Do we want all release artefacts to include those tags (e.g. Based on your comment I assume the former (tags included in all release artefacts), in which case we’ll need a new intermediate file (upstream of |
In addition to producing taxon subset files, we also want to include, directly into the release products, the oboInOwl:inSubset annotations that mark all terms that belong to the taxon subsets. There is a bit of a conundrum here as the taxon subsets are computed on the main uberon.owl product, but we want to include them in that very product. To solve this, we introduce a "near-final" intermediate product (tmp/uberon.owl, referenced by the new Make variable POSTPROCESS_SRC). The pipeline that produced the final uberon.owl now produces that intermediate tmp/uberon.owl, from which the taxon subsets are derived. The new final uberon.owl is produced simply by merging the intermediate tmp/uberon.owl with the taxon subset tag files. Some of the other parts of the Uberon pipeline that were using the final uberon.owl are now using the intermediate tmp/uberon.owl, because the taxon subset annotations are not needed for those steps. This notably concerns the bridge checks and the building of Composite Metazoan.
The rule that produces the `uberon.json.gz` file has nothing to do in the main "BUILDING UBERON ITSELF" pipeline, and it is in fact doubtful that this rule is even useful, so we move it to the purgatory.
The infortunate use of $^ in the rule that produces `uberon.owl` leads to `uberon-full.owl` being forcefully injected into `uberon.owl`, because `uberon-full.owl` is declared (in the ODK-generated Makefile) as a dependency of `uberon.owl`. We must avoid using that variable and only use the dependencies that are explicitly listed in uberon.Makefile.
PR updated to include the For now, we include only the tags for the human subset and the mouse subset. To add tags for another subset, it would simply be a matter of adding another tag file to the POSTPROCESS_ADDITIONS = subsets/human-tags.ofn \
subsets/mouse-tags.ofn \
subsets/drosophila-tags.ofn Some considerations:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that uberon.owl was swapped out to POST_PROCESS
in all the relevant places in the makefile, this looks good to me!
Only two places, in fact:
All other custom pipelines that were dependent on |
Merging here. @dosumis: (1) If you want more taxon subset tags than just human and mouse to be included in the main product, just say so and we can add new subsets anytime, it’s one line to add in the custom Makefile. (2) I assume you want to see the same thing in CL? |
This PR updates the way we are building “taxon subsets”.
As explained in #3362, we currently have, for reasons unknown (to me at least), two slightly different methods to create taxon subsets: one producing the
-view
subsets (human-view
,mouse-view
,xenopus-view
) and one producing the-basic
subsets (amniote-basic
,euarchotonglires-basic
). Both methods rely on the use of OWLTools.This PR replaces both methods by a single one (so that all taxon subsets are produced in the same way) that relies on a new command in Uberon’s custom ROBOT plugin.
The PR does not change which taxon subsets are produced and released by default (the five subsets aforementioned:
human
,mouse
,xenopus
,amniote
, andeuarchotonglires
).More subsets can be produced on demand, all that is needed is to define a
TAXON_ID_subsetname
Make variable pointing to the desired NCBITaxon ID.For example, to create a subset for, say, insects, one can do:
The PR also adds a possibility to create, not a subset directly, but a small component containing only
oboInOwl:inSubset
annotations to “tag” classes that belong to a taxon subset. For example:would create a
human-tags.ofn
component containing, for all Uberon classes that belong to the human subset,oboInOwl:inSubset <http://purl.obolibrary.org/obo/uberon/core#human_subset>
annotation assertion axioms. Such a component can then be merged with the main ontology for downstream use (e.g., extracting all the classes of the subset).closes #3362