Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add taxon subsets #2811

Merged
merged 5 commits into from
Nov 27, 2024
Merged

Add taxon subsets #2811

merged 5 commits into from
Nov 27, 2024

Conversation

gouttegd
Copy link
Collaborator

This PR adds taxon-specific subsets to CL.

A taxon-specific subset for a given taxon is automatically generated by using the taxon constraints declared in the ontology to exclude any class that, due to the constraints, is known not to exist in the taxon. This is the same approach used for similar subsets in Uberon (obophenotype/uberon#3363).

Two new release artefacts are added: human-view.owl (the human-specific subset) and mouse-view.owl (the mouse-specific subset). In addition, in the main release product (cl.owl), classes that were found to belong in the human (respectively mouse) subset are tagged with a http://purl.obolibrary.org/obo/cl#human_subset (respectively http://purl.obolibrary.org/obo/cl#mouse_subset) subset annotation (again, similar to what was done in Uberon).

Add two new subsets: human-view and mouse-view. They are automatically
generated by applying the taxon constraints declared in the ontology to
exclude any class that can be inferred not to exist in human and mouse,
respectively.

This is the same strategy as used to produce similar subsets in Uberon,
and is reusing the `create-species-subset` custom ROBOT command
developed to that effect in the Uberon ROBOT plugin.
Now that we can generate tag files for the taxon subsets, we merge those
into the main cl.owl release artefact.

This requires that we no longer build the taxon subsets from the same
release artefact, to avoid an obvious chicken-and-egg problem. So we now
generate the subsets from cl-full.owl instead (which is, in effect,
almost the same thing as cl.owl, modulo the ontology annotation).
In the preprocess step, where we make use of the FlyBase ROBOT plugin,
make the 'all_robot_plugins' target an order-only prerequisite, so that
once the plugins have been installed, we do not always trigger a rebuild
of the preprocessed ontology, which in turn would trigger a rebuild of
everything else.
The subset files are generated upon every release and are release
artefacts, there is no need to commit them.
@gouttegd gouttegd self-assigned this Nov 27, 2024
@gouttegd gouttegd requested review from dosumis and matentzn November 27, 2024 15:15
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. One minor nitpick but I will leave it to you to ignore or act on

# tags for the taxon subsets.
POSTPROCESS_ADDITIONS = subsets/human-tags.ofn \
subsets/mouse-tags.ofn
$(ONT).owl: $(ONT)-full.owl $(POSTPROCESS_ADDITIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some mild itch in my nose to overwrite x.owl rather than x-full.owl, just because it means that the primary release does not correspond direct to any of the known ontology types. Minor itch though, if you don't share that sentiment I am fine with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this way you avoid circularity..

Copy link
Collaborator Author

@gouttegd gouttegd Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary release does not correspond direct to any of the known ontology types

Never thought this could be a concern. o_O uberon.owl also does not correspond to any of the ODK-defined “ontology types“.

And in this case, the difference between cl-full.owl and cl.owl is quite minor, since it is only the additions of the oboInOwl:inSubset annotations, whereas in Uberon, uberon.owl uses a completely different pipeline than oberon-full.owl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A longer-term alternative could be to have some kind of POSTPROCESS step in the ODK (similar to the PREPROCESS that is already in there), which would be a no-op by default but that could be overridden by projects if they need to perform some last-minute changes at the very end of the build process.

@gouttegd gouttegd merged commit f447456 into master Nov 27, 2024
1 check passed
@gouttegd gouttegd deleted the add-taxon-subsets branch November 27, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants