Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

Closed
handrews opened this issue Feb 26, 2023 · 5 comments

Comments

@handrews
Copy link
Contributor

This scenario (which is apparently not tested int he JSON Schema test suite) is not currently supported:

{
  "$id": "https://example.com/schemas/root"
  "$defs": {
    "foo": {
      "$id": "foo",
      "type": "string"
    }
  },
  "$ref": "foo"
}

The distinction here is that evaluation never passes through "#/$defs/foo" before it is referenced. The JSON Schema test suite only tests embedded "$id"s that are evaluated before they are referenced, apparently.

There are essentially two ways to do this: Walk all applicator and schema location ("$defs") keywords at load time and index the "$id"s, or trigger a scan of loaded schemas upon finding a "$ref" that cannot otherwise be resolved.

I am personally inclined towards scanning up front, which also makes some obscure cases involving embedded "$schema" (which I'll file separately) if jschon wants to support them. But there are pros and cons either way.

@handrews handrews changed the title Detect embedded "$id"s as valid "$ref" targets Detect embedded "$id"s and "$anchors" as valid "$ref" targets Feb 28, 2023
@handrews handrews changed the title Detect embedded "$id"s and "$anchors" as valid "$ref" targets Detect embedded "$id"s and "$anchor"s as valid "$ref" targets Feb 28, 2023
@handrews handrews changed the title Detect embedded "$id"s and "$anchor"s as valid "$ref" targets Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets Feb 28, 2023
@handrews
Copy link
Contributor Author

After digging through the code for "$anchor" and "$dynamicAnchor" I am fairly certain they suffer the same problem. For "$anchor", this is tested by a recently added test case (refRemote.json -> anchor within remote ref -> remote anchor valid) for both 2019-09 and 2020-12 (and presumably draft-next).

@marksparkza I'd propose handling this by:

  • Defining a SchemaLocationMixin from which both ApplicatorMixin and DefsKeyword would inherit
  • Document that all keywords that have subschemas, whether they are applied automatically (ApplicatorMixin) or not (DefsKeyword and similar constructs such as probably contentSchema) MUST inherit from SchemaLocationMixin
  • Require subclasses of SchemaLocationMixin to implement a generator method that returns each subschema
  • Implement a generator method on JSONSchema that chains all SchemaLocationMixin generator results
  • Adjust the standard keyword class inheritance accordingly
  • Use these generators to walk each schema document on loading and populate the catalog with the targets found

This approach could also be the basis for implementing #59, if there is a desire to support it.

@handrews
Copy link
Contributor Author

BTW I made that proposal up off the top of my head and am not attached to it, I just wanted to offer a starting point.

@handrews
Copy link
Contributor Author

Having dug around a bit more, it seems your ApplicatorMixin hierarchy overlaps a lot with my SchemaLocationMixin hierarchy idea, as you are using it for DefsKeyword and not for the by-reference applicators. I'll poke around more and make sure I understand this better.

I might suggest tweaking the class names to more closely align with the usage of "applicator" and "location" in the spec, unless that will be some sort of compatibility problem for the next release. I'm also happy to stick with your use of "Applicator" if you'd rather.

@handrews
Copy link
Contributor Author

handrews commented Mar 4, 2023

@marksparkza I've dug around a lot more now, and I think I have a better feel for what you're doing with cataloging schemas and resolving references on-demand. I'm playing around with some ideas and will post here or open a new issue if that feels more appropriate.

I think your approach can be extended to do a search of not-yet-instantiated subschemas when the catalog cannot find a schema, which could also be initiated up front (perhaps a command-line option so folks can choose their performance trade-off).

Or we could keep things simpler by just always scanning schemas when they are loaded.

There is a difficulty with load-on-demand (regardless of the source), which is that there's no way to detect an embedded schema in a resource that has not been loaded. This is one reason that I default to pre-loading in memory with other tools (see #53, although the InMemorySource proposal wouldn't solve this, only direct loading into the catalog).

This is one reason why the spec discourages load-on-demand and encourages pre-loading. I think it is sufficient to document that embedded resources in load-on-demand schemas will not be detected. That's just a tradeoff users make if they want load-on-demand.

BTW I just want to say that this package is a joy to work with! Especially as someone coming back to Python having not done anything major with it since the 2.7 days. It's been a great way to learn what has changed. I know I'm filing a bunch of stuff and proposing some fairly big changes here, but that's definitely not intended as criticism!

@handrews
Copy link
Contributor Author

handrews commented Mar 5, 2023

I'm going to close this because I confused myself several different ways. I think I understand things now, and after doing a bit more work to be sure, I will re-file the correct issue for what is actually going on with the failing test case from the recent additions to the JSON Schema Test Suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant