Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of -obo-report.tsv induced by OBO serialisation quirks #1107

Closed
gouttegd opened this issue Oct 18, 2024 · 1 comment · Fixed by #1108
Closed

Failure of -obo-report.tsv induced by OBO serialisation quirks #1107

gouttegd opened this issue Oct 18, 2024 · 1 comment · Fixed by #1108
Assignees
Labels

Comments

@gouttegd
Copy link
Contributor

This problem was encountered by Uberon on this PR; see the CI failure report here.

The main change in that PR is the addition of a new term that is disjoint from two imported GO terms:

[Term]
id: UBERON:0000001 ! gross anatomical part
name: gross anatomical part
disjoint_from: GO:0005623 ! cell
disjoint_from: GO:0110165 ! cellular anatomical entity

(Other tags in that frame removed for clarity, they don’t matter here. What matter are the two disjoint_from tags.)

At some point in the standard pipeline, the -edit file is processed to generate what we call $(SRCMERGED). Basically, imports statement are removed and components and pattern-derived axioms are merged. Importantly, $(SRCMERGED) is in the same format as the edit file, so if the ontology is maintained in the OBO format (as is the case for Uberon), then $(SRCMERGED) is also written in the OBO format. $(SRCMERGED) is then used as input to robot report to generate the reports/uberon-edit.obo-obo-report.tsv report, which is the step that failed in the above PR.

The problem is that, when writing $(SRCMERGED), the disjointness axioms that, in the -edit file, are part of the UBERON:0000001 frame, are now written as if they were part of the frames of the referenced GO terms. That is, the fragment above becomes:

[Term]
id: GO:0005623
disjoint_from: UBERON:0000001 ! gross anatomical part

[Term]
id: GO:0110165
disjoint_from: UBERON:0000001 ! gross anatomical part

[Term]
id: UBERON:0000001
name: gross anatomical part

(I have not looked at the logic used by the OWLAPI to decide in which frame a disjoint_from tag should go, but I suspect the decision is made by lexicographically sorting the IRIs involved in the axiom, and putting the tag in the frame corresponding to the first IRI. In this example, GO:0005623 would be sorted before UBERON:0000001, so the OWLAPI creates a frame for GO:0005623 instead of writing the disjoint_from tag in the frame for UBERON:0000001.)

Now, logically, this does not change anything. A DisjointWith: B is obviously the same as B DisjointWith: A. But the second form makes the missing_label test (part of the standard ROBOT report check) fail, because the GO:0005623 and GO:0110165 frames don’t have a label!

@gouttegd gouttegd added the bug label Oct 18, 2024
@gouttegd
Copy link
Contributor Author

The easiest fix, I believe, would be to always write the $(SRCMERGED) file in OFN format (or any non-OBO format, really), independently of the format of the edit file.

gouttegd added a commit that referenced this issue Oct 18, 2024
Make sure the $(SRCMERGED) intermediate file is always written in OFN
format, rather than in the same format than the -edit file.

If $(SRCMERGED) is written in OBO format, this can cause some checks
made on that file to fail because of some peculiarities of the OBO
serialisation that do not accurately reflect the contents of the
original -edit file.

closes #1107
@gouttegd gouttegd self-assigned this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant