Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

handrews · 2023-02-26T06:45:08Z

This scenario (which is apparently not tested int he JSON Schema test suite) is not currently supported:

{
  "$id": "https://example.com/schemas/root"
  "$defs": {
    "foo": {
      "$id": "foo",
      "type": "string"
    }
  },
  "$ref": "foo"
}

The distinction here is that evaluation never passes through "#/$defs/foo" before it is referenced. The JSON Schema test suite only tests embedded "$id"s that are evaluated before they are referenced, apparently.

There are essentially two ways to do this: Walk all applicator and schema location ("$defs") keywords at load time and index the "$id"s, or trigger a scan of loaded schemas upon finding a "$ref" that cannot otherwise be resolved.

I am personally inclined towards scanning up front, which also makes some obscure cases involving embedded "$schema" (which I'll file separately) if jschon wants to support them. But there are pros and cons either way.

The text was updated successfully, but these errors were encountered:

handrews · 2023-02-28T00:14:05Z

After digging through the code for "$anchor" and "$dynamicAnchor" I am fairly certain they suffer the same problem. For "$anchor", this is tested by a recently added test case (refRemote.json -> anchor within remote ref -> remote anchor valid) for both 2019-09 and 2020-12 (and presumably draft-next).

@marksparkza I'd propose handling this by:

Defining a SchemaLocationMixin from which both ApplicatorMixin and DefsKeyword would inherit
Document that all keywords that have subschemas, whether they are applied automatically (ApplicatorMixin) or not (DefsKeyword and similar constructs such as probably contentSchema) MUST inherit from SchemaLocationMixin
Require subclasses of SchemaLocationMixin to implement a generator method that returns each subschema
Implement a generator method on JSONSchema that chains all SchemaLocationMixin generator results
Adjust the standard keyword class inheritance accordingly
Use these generators to walk each schema document on loading and populate the catalog with the targets found

This approach could also be the basis for implementing #59, if there is a desire to support it.

handrews · 2023-02-28T01:34:29Z

BTW I made that proposal up off the top of my head and am not attached to it, I just wanted to offer a starting point.

handrews · 2023-02-28T06:14:15Z

Having dug around a bit more, it seems your ApplicatorMixin hierarchy overlaps a lot with my SchemaLocationMixin hierarchy idea, as you are using it for DefsKeyword and not for the by-reference applicators. I'll poke around more and make sure I understand this better.

I might suggest tweaking the class names to more closely align with the usage of "applicator" and "location" in the spec, unless that will be some sort of compatibility problem for the next release. I'm also happy to stick with your use of "Applicator" if you'd rather.

handrews · 2023-03-04T01:37:37Z

@marksparkza I've dug around a lot more now, and I think I have a better feel for what you're doing with cataloging schemas and resolving references on-demand. I'm playing around with some ideas and will post here or open a new issue if that feels more appropriate.

I think your approach can be extended to do a search of not-yet-instantiated subschemas when the catalog cannot find a schema, which could also be initiated up front (perhaps a command-line option so folks can choose their performance trade-off).

Or we could keep things simpler by just always scanning schemas when they are loaded.

There is a difficulty with load-on-demand (regardless of the source), which is that there's no way to detect an embedded schema in a resource that has not been loaded. This is one reason that I default to pre-loading in memory with other tools (see #53, although the InMemorySource proposal wouldn't solve this, only direct loading into the catalog).

This is one reason why the spec discourages load-on-demand and encourages pre-loading. I think it is sufficient to document that embedded resources in load-on-demand schemas will not be detected. That's just a tradeoff users make if they want load-on-demand.

BTW I just want to say that this package is a joy to work with! Especially as someone coming back to Python having not done anything major with it since the 2.7 days. It's been a great way to learn what has changed. I know I'm filing a bunch of stuff and proposing some fairly big changes here, but that's definitely not intended as criticism!

handrews · 2023-03-05T00:29:51Z

I'm going to close this because I confused myself several different ways. I think I understand things now, and after doing a bit more work to be sure, I will re-file the correct issue for what is actually going on with the failing test case from the recent additions to the JSON Schema Test Suite.

handrews changed the title ~~Detect embedded "$id"s as valid "$ref" targets~~ Detect embedded "$id"s and "$anchors" as valid "$ref" targets Feb 28, 2023

handrews changed the title ~~Detect embedded "$id"s and "$anchors" as valid "$ref" targets~~ Detect embedded "$id"s and "$anchor"s as valid "$ref" targets Feb 28, 2023

handrews changed the title ~~Detect embedded "$id"s and "$anchor"s as valid "$ref" targets~~ Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets Feb 28, 2023

handrews mentioned this issue Mar 4, 2023

Add descendentclasses for multi-level control #63

Closed

handrews closed this as completed Mar 5, 2023

handrews mentioned this issue Mar 5, 2023

Resolve embedded schema URIs in new remote schemas #65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

handrews commented Feb 26, 2023

handrews commented Feb 28, 2023

handrews commented Feb 28, 2023

handrews commented Feb 28, 2023

handrews commented Mar 4, 2023

handrews commented Mar 5, 2023

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets #58

Comments

handrews commented Feb 26, 2023

handrews commented Feb 28, 2023

handrews commented Feb 28, 2023

handrews commented Feb 28, 2023

handrews commented Mar 4, 2023

handrews commented Mar 5, 2023