Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate JSON schema #63

Merged
merged 3 commits into from
May 10, 2024
Merged

Generate JSON schema #63

merged 3 commits into from
May 10, 2024

Conversation

mwoehlke
Copy link
Member

@mwoehlke mwoehlke commented May 3, 2024

Add machinery to generate a JSON schema from the documentation. The schema can be used to validate CPS JSON.

As a side benefit to changing how attributes are declared to be more machine-readable, it is also slightly easier to specify common properties of an attribute.

Supersedes #25. Unlike that proposal, this retains reST (with only modest changes aside from all descriptions gaining a level of indent) as the source of truth, so does not introduce new errors and retains superior editability. It should also be somewhat more robust, as there is less human-written textual duplication.

Fix some style issues in our Sphinx extension, as reported by flake8.
Add directives to our custom-domain extension to declare objects and
attributes, and use these to specify the schema. In addition to
providing a slightly more structured way to specify attribute
parameters, this also makes it possible to create a data model of the
schema which we will eventually use to generate a JSON schema that can
be used to partially validate CPS JSON.
@mwoehlke
Copy link
Member Author

mwoehlke commented May 3, 2024

Note: I think the generated schema is okay, and Python jsonschema seems happy enough with it, but feedback from anyone more familiar with JSON schemas and/or using them for validation would be appreciated.

@mwoehlke mwoehlke mentioned this pull request May 3, 2024
@autoantwort
Copy link
Contributor

Could you add the default version to the schema and ideally the example values?

@autoantwort
Copy link
Contributor

autoantwort commented May 4, 2024

Currently the description of the platform attribute is duplicated and as a result the longer description is ignored.

The format of the website should be "format": "uri"

version_schema should use enum.

Nitpick, but can you arrange the properties in a meaningful order?

Can you commit the schema?

Copy link
Collaborator

@bretbrownjr bretbrownjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections. I'm OK with merging this.

It would be handy to publish the schema somewhere... either in the docs somehow or maybe as a GitHub release.

The machinery of generating that schema is probably less interesting than the output itself.

Eventually, it would be nice to see how the schema differs between the base and head commits of a PR to confirm that we're matching roughly semver expectations with respect to the CPS version number, but let's not let perfect be the enemy of the good, etc.

@mwoehlke
Copy link
Member Author

mwoehlke commented May 9, 2024

Could you add the default version to the schema and ideally the example values?

@autoantwort, thanks for the feedback! (Also, sorry about the delay in replying.) I'm not following you here, though; are you talking about cps_version? I assume by "default value" you mean I should copy in the version from conf.py? I'm not very familiar with JSON schema, so it would help if you could express this as what you want to see changed in the resulting schema.

That said, I may punt on this (and other changes) in the interest of making progress. It will be beneficial to have something even if it's not perfect, but we can and should continue to make refinements.

Can you arrange the properties in a meaningful order?

What is "meaningful"? Right now they are in the order "core", "supplemental", with each of those in lexicographical order. It should be straight-forward to remove the core/supplemental split and have purely lexicographical order, but anything else is difficult to specify and maintain.

Anyway, the intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

The format of the website should be "format": "uri"

Do you mean this?

--- a/cps.schema.json
+++ b/cps.schema.json
@@ -488,6 +488,7 @@
             },
             "website@0": {
                 "description": "Specifies the URI at which the package's website may be found.",
+                "format": "uri",
                 "$ref": "#/definitions/types/string"
             }
         }

(Note: I ran the generated schema through python -m json.tool to get the above. I'm on the fence whether the canonical version should be pretty-printed. It's a trivial code change, but also makes the file much larger for debatable benefit. It's easy enough to run the generated version through a pretty-printer.)

Can you commit the schema?

I'd rather not; checking in generated files is generally undesirable, and I don't see how it's especially helpful when, once this lands, the schema will be available via the published pages. I'd really prefer to avoid the added workflow pain of trying to keep a generated-but-checked-in file up to date.

If you just want to know what it looks like without having to generate it yourself, see https://github.com/cps-org/cps/files/15203794/cps.schema.json. Once/if this lands, it'll be accessible via the published Pages (see also reply to Bret, below).

It would be handy to publish the schema somewhere

Once this lands, it should be accessible from the same location as the rest of the published Pages. Adding a link should be done, but that can be a follow-up.

Eventually, it would be nice to see how the schema differs between the base and head commits of a PR to confirm that we're matching roughly semver expectations with respect to the CPS version number

That's... technically do-able. Generating a diff from GHA should be easy if we can use the published copy as the comparison target. (Otherwise we need a second build, which is somewhat annoying.) Presenting said diff in a useful way might be challenging. (I agree we shouldn't do that in this PR, especially as I'd prefer to use the published copy as the "original", which means generation has to land before comparison can be tested.)

@autoantwort
Copy link
Contributor

I assume by "default value" you mean I should copy in the version from conf.py?

Sorry that was unclear. I mean the default value of a field. For example here (link_languages) the default value is C, but this is not expressed in the schema.

What is "meaningful"? The intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Do you mean this?

Yes

I'm on the fence whether the canonical version should be pretty-printed.

Imo yes. It is a benefit if you can directly read the file if you have to (Ideally you never have, but...).

the schema will be available via the published pages.

This is also ok. I only want a way so that I can provide the $schema url and add the url to https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json. vcpkg for example has done this by committing the file: https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json#L1260

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem. And I guess you have to run the generator anyway to get the schema for the website. So adding a git diff --exit-code to the github action workflow is enough to let the check fail if someone changes the schema but not committed the schema.

Using the structured information extracted from the documentation (see
previous commit), build and write a JSON schema. This also introduces a
new utility library to convert build the schema description from the set
of objects and attributes.
@mwoehlke
Copy link
Member Author

Okay, since Bret already 👍'd this, I'm going to merge it as-is (with one tiny tweak; see below) so follow-ups are easier to review.

@autoantwort, I may not be able to get format working right away. I'm definitely not going to tackle publishing diffs to PRs immediately. Please be encouraged to open Issues for improvements you'd like to see; thanks!

I only want a way so that I can provide the $schema url [...]

Makes sense. The generated schema already has an $id... which is supposed to be correct already (except the file doesn't exist there, yet), but it looks like I set it up to put the right $id in and then copied from autoantwort's PR and forgot to change it to the right location. I've done that now, so it should be correct once this lands.

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem.

Perhaps, but I still don't think it's worth making it more complicated to make changes.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Sure; in fact, I had this in the back of my head as a follow-up already. However, please make it part of make html if possible. At minimum, it needs to be reproducible in local builds, not just when deployed via GHA.

@mwoehlke mwoehlke merged commit 28be72d into master May 10, 2024
3 checks passed
@mwoehlke mwoehlke deleted the generate-json-schema branch May 10, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants