Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation against json-schema or protobuf #120

Open
cmungall opened this issue May 27, 2024 · 1 comment
Open

Add validation against json-schema or protobuf #120

cmungall opened this issue May 27, 2024 · 1 comment

Comments

@cmungall
Copy link
Member

I was confused as to why monarch-initiative/phenopacket-store#97 was necessary

As I understand the following java repo does validation using json-schema
https://github.com/phenopackets/phenopacket-tools

(I wasn't able to find the json-schema)

I had assumed that pyphetools used the same schema, but it looks like the only validation is procedural, not complete schema validation.

Shouldn't pyphetools do schema validation (using any of jsonschema, pydantic, linkml, ...)

@ielis
Copy link
Member

ielis commented May 28, 2024

The top-level JSON Schema document phenopacket-schema.json used by phenopacket-tools is in this folder under v2 subfolder. It is indeed a little hard to find.

I think validation of Phenopacket Schema elements such as phenopacket, family or cohort needs to have multiple tiers. The lowest tier can be implemented with JSON Schema to check types, absence of random fields, correct cardinalities, etc.

The upper level tiers should e.g. check presence of a Metadata | Resource element for all used OntologyClasses, using current term IDs for an ontology version (easiest to implement for HPO for me), or more exotic requirements, such as not using term (e.g. Clonic seizure) and its ancestor (e.g. Seizure) in a single phenopacket.

I think it may be hard to implement all these just using JSON Schema, Pydantic, or LinkML. So, we should have an API somewhere and a bunch of out-of-the-shelf validators. I would favor writing these in Python.


Nevertheless, right now we check the validity mostly manually in the code and we manually run phenopacket-tools afterwards. It would probably be good to automatize this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants