Generate JSON schema #63

mwoehlke · 2024-05-03T18:34:54Z

Add machinery to generate a JSON schema from the documentation. The schema can be used to validate CPS JSON.

As a side benefit to changing how attributes are declared to be more machine-readable, it is also slightly easier to specify common properties of an attribute.

Supersedes #25. Unlike that proposal, this retains reST (with only modest changes aside from all descriptions gaining a level of indent) as the source of truth, so does not introduce new errors and retains superior editability. It should also be somewhat more robust, as there is less human-written textual duplication.

Fix some style issues in our Sphinx extension, as reported by flake8.

Add directives to our custom-domain extension to declare objects and attributes, and use these to specify the schema. In addition to providing a slightly more structured way to specify attribute parameters, this also makes it possible to create a data model of the schema which we will eventually use to generate a JSON schema that can be used to partially validate CPS JSON.

mwoehlke · 2024-05-03T18:38:15Z

Note: I think the generated schema is okay, and Python jsonschema seems happy enough with it, but feedback from anyone more familiar with JSON schemas and/or using them for validation would be appreciated.

autoantwort · 2024-05-04T15:29:06Z

Could you add the default version to the schema and ideally the example values?

autoantwort · 2024-05-04T15:36:54Z

Currently the description of the platform attribute is duplicated and as a result the longer description is ignored.

The format of the website should be "format": "uri"

version_schema should use enum.

Nitpick, but can you arrange the properties in a meaningful order?

Can you commit the schema?

bretbrownjr

No objections. I'm OK with merging this.

It would be handy to publish the schema somewhere... either in the docs somehow or maybe as a GitHub release.

The machinery of generating that schema is probably less interesting than the output itself.

Eventually, it would be nice to see how the schema differs between the base and head commits of a PR to confirm that we're matching roughly semver expectations with respect to the CPS version number, but let's not let perfect be the enemy of the good, etc.

mwoehlke · 2024-05-09T20:26:49Z

Could you add the default version to the schema and ideally the example values?

@autoantwort, thanks for the feedback! (Also, sorry about the delay in replying.) I'm not following you here, though; are you talking about cps_version? I assume by "default value" you mean I should copy in the version from conf.py? I'm not very familiar with JSON schema, so it would help if you could express this as what you want to see changed in the resulting schema.

That said, I may punt on this (and other changes) in the interest of making progress. It will be beneficial to have something even if it's not perfect, but we can and should continue to make refinements.

Can you arrange the properties in a meaningful order?

What is "meaningful"? Right now they are in the order "core", "supplemental", with each of those in lexicographical order. It should be straight-forward to remove the core/supplemental split and have purely lexicographical order, but anything else is difficult to specify and maintain.

Anyway, the intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

The format of the website should be "format": "uri"

Do you mean this?

--- a/cps.schema.json
+++ b/cps.schema.json
@@ -488,6 +488,7 @@
             },
             "website@0": {
                 "description": "Specifies the URI at which the package's website may be found.",
+                "format": "uri",
                 "$ref": "#/definitions/types/string"
             }
         }

(Note: I ran the generated schema through python -m json.tool to get the above. I'm on the fence whether the canonical version should be pretty-printed. It's a trivial code change, but also makes the file much larger for debatable benefit. It's easy enough to run the generated version through a pretty-printer.)

Can you commit the schema?

I'd rather not; checking in generated files is generally undesirable, and I don't see how it's especially helpful when, once this lands, the schema will be available via the published pages. I'd really prefer to avoid the added workflow pain of trying to keep a generated-but-checked-in file up to date.

If you just want to know what it looks like without having to generate it yourself, see https://github.com/cps-org/cps/files/15203794/cps.schema.json. Once/if this lands, it'll be accessible via the published Pages (see also reply to Bret, below).

It would be handy to publish the schema somewhere

Once this lands, it should be accessible from the same location as the rest of the published Pages. Adding a link should be done, but that can be a follow-up.

Eventually, it would be nice to see how the schema differs between the base and head commits of a PR to confirm that we're matching roughly semver expectations with respect to the CPS version number

That's... technically do-able. Generating a diff from GHA should be easy if we can use the published copy as the comparison target. (Otherwise we need a second build, which is somewhat annoying.) Presenting said diff in a useful way might be challenging. (I agree we shouldn't do that in this PR, especially as I'd prefer to use the published copy as the "original", which means generation has to land before comparison can be tested.)

autoantwort · 2024-05-10T17:30:28Z

I assume by "default value" you mean I should copy in the version from conf.py?

Sorry that was unclear. I mean the default value of a field. For example here (link_languages) the default value is C, but this is not expressed in the schema.

What is "meaningful"? The intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Do you mean this?

Yes

I'm on the fence whether the canonical version should be pretty-printed.

Imo yes. It is a benefit if you can directly read the file if you have to (Ideally you never have, but...).

the schema will be available via the published pages.

This is also ok. I only want a way so that I can provide the $schema url and add the url to https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json. vcpkg for example has done this by committing the file: https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json#L1260

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem. And I guess you have to run the generator anyway to get the schema for the website. So adding a git diff --exit-code to the github action workflow is enough to let the check fail if someone changes the schema but not committed the schema.

Using the structured information extracted from the documentation (see previous commit), build and write a JSON schema. This also introduces a new utility library to convert build the schema description from the set of objects and attributes.

mwoehlke · 2024-05-10T17:56:42Z

Okay, since Bret already 👍'd this, I'm going to merge it as-is (with one tiny tweak; see below) so follow-ups are easier to review.

@autoantwort, I may not be able to get format working right away. I'm definitely not going to tackle publishing diffs to PRs immediately. Please be encouraged to open Issues for improvements you'd like to see; thanks!

I only want a way so that I can provide the $schema url [...]

Makes sense. The generated schema already has an $id... which is supposed to be correct already (except the file doesn't exist there, yet), but it looks like I set it up to put the right $id in and then copied from autoantwort's PR and forgot to change it to the right location. I've done that now, so it should be correct once this lands.

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem.

Perhaps, but I still don't think it's worth making it more complicated to make changes.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Sure; in fact, I had this in the back of my head as a follow-up already. However, please make it part of make html if possible. At minimum, it needs to be reproducible in local builds, not just when deployed via GHA.

mwoehlke added 2 commits April 30, 2024 17:12

Fix flake8 warnings in cpy.py

d7f82d9

Fix some style issues in our Sphinx extension, as reported by flake8.

mwoehlke mentioned this pull request May 3, 2024

Add json schema #25

Closed

bretbrownjr approved these changes May 9, 2024

View reviewed changes

Generate JSON schema from documentation

b2e7671

Using the structured information extracted from the documentation (see previous commit), build and write a JSON schema. This also introduces a new utility library to convert build the schema description from the set of objects and attributes.

mwoehlke force-pushed the generate-json-schema branch from 586a175 to b2e7671 Compare May 10, 2024 17:50

mwoehlke merged commit 28be72d into master May 10, 2024
3 checks passed

mwoehlke deleted the generate-json-schema branch May 10, 2024 17:56

mwoehlke mentioned this pull request May 14, 2024

JSON schemas for supplemental files #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate JSON schema #63

Generate JSON schema #63

mwoehlke commented May 3, 2024

mwoehlke commented May 3, 2024

autoantwort commented May 4, 2024

autoantwort commented May 4, 2024 •

edited

Loading

bretbrownjr left a comment

mwoehlke commented May 9, 2024

autoantwort commented May 10, 2024

mwoehlke commented May 10, 2024

Generate JSON schema #63

Generate JSON schema #63

Conversation

mwoehlke commented May 3, 2024

mwoehlke commented May 3, 2024

autoantwort commented May 4, 2024

autoantwort commented May 4, 2024 • edited Loading

bretbrownjr left a comment

Choose a reason for hiding this comment

mwoehlke commented May 9, 2024

autoantwort commented May 10, 2024

mwoehlke commented May 10, 2024

autoantwort commented May 4, 2024 •

edited

Loading