-
-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace $ref
& $dynamicRef
support / RefResolver
with the new referencing library
#1049
Conversation
Referencing happens to not support it at the minute. It's close to EOL, so rather than adding it, I suspect it's droppable.
Still will be tweaked as the referencing library's public API changes.
Internal uses of it will be removed, replaced with referencing's resolution APIs, though RefResolver will continue to function during its deprecation period.
This will make it easier to swap over to referencing's Resolver (while preserving backwards compat for anyone who passes a RefResolver to a Validator). At some point in the future this method may become public (which will make it easier for external dialects to resolve references) but let's keep it private for a bit until it's clear that the interface is stable -- a future draft might do crazy things like have adjacent properties to $ref affect the resolution behavior, which would make this method need to take more than just the $ref value.
It will imminently be replaced by referencing.Registry-based resolution.
Just avoids a deprecation warning when we switch over.
Makes way for our newer resolver to live in the shorter name. (This should have no effect on the public API, where we have Validator.resolver returning this attribute after emitting a deprecation warning.) Also bumps the minimum attrs version we depend on, as we need the alias functionality.
Passes all the remaining referencing tests across all drafts, hooray! Makes Validators take a referencing.Registry argument which users should use to customize preloaded schemas, or to configure remote reference retrieval. This fully obsoletes jsonschema.RefResolver, which has already been deprecated in a previous commit. Users should move to instead loading schemas into referencing.Registry objects. See the referencing documentation at https://referencing.rtfd.io/ for details (with more jsonschema-specific information to be added shortly). Note that the interface for resolving references on a Validator is not yet public (and hidden behind _resolver and _validate_reference attributes). One or both of these are likely to become public after some period of stabilization. Feedback is of course welcome!
<19.3 can have performance problems on 3.11 where there aren't wheels (and where I think it's falling back to using the pure- Python implementation even on CPython). Here it's ~2x slower.
Closes: #523
If you're willing to wait a couple of days, I'd like to take a crack at doing this tomorrow. The new doc is very promising, and it should slot in as a replacement pretty easily based on what I'm seeing. I'm not certain yet, but this may help with a few different use cases. I know one thing that people wanted was a way to disable network callouts, which was awkward to handle before. |
Definitely willing to wait! Thanks for the eyes / brain. |
Thanks! That all looks like great feedback. I'll probably try to take most of today (and maybe even tomorrow and Sunday) away from a computer to relax a bit, so it may be a few days before I read + respond, let alone address 'em with changes. But super appreciated. |
OK against my better judgement I actually did a tiny bit of work this afternoon :D -- will respond (or address with docs) the rest of your comments (both of y'all, thanks again), but few quick notes.
You actually don't need to do this at all! Most of the point of a But specifically, if I have a resource that I want to refer to via from referencing import Registry, Resource
resource = Resource.from_contents({"$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "foo://example.json"})
registry = Registry().with_resource(uri="bar://baz", resource=resource)
print(registry.get_or_retrieve("foo://example.json").value)
# -> Resource(contents={'$schema': 'https://json-schema.org/draft/2020-12/schema', '$id': 'foo://example.json'}, _specification=<Specification name='draft2020-12'>)
print(registry.get_or_retrieve("bar://baz").value)
# -> same (I'm showing it via the The "real question" to me was something like what @jpmckinney mentioned, where it's annoying to have to do
I'll address the basic idea later, I've waffled a bit over the last ~15 years on how much I like having common base classes for exceptions so perhaps! But ignoring that for a second, the idea is/was that
I think there's definitely room to add a helper (probably in referencing I guess, since having a whole other package seems clunky) which basically is like Of course if you have suggestions for how to make these clearer they're also welcome! And yeah will get to responding to the rest after the weekend. |
I've added a high-level porting table, some more initial discussion, and stressed that registries are immutable -- the doc preview should update -- will get to the rest in the next day or two! |
I'm not sure I'm fully convinced of the utility of adding such a thing yet, but I also pushed a branch with the helper I alluded to in:
Feedback is welcome on that too (as I say I'm not sure the complexity is worthwhile, but it's an option). |
I considered originally suggesting that On the flipside, a lot of people are going to want this behavior, so it seems silly not to provide it somehow... What about splitting the difference and offering the function as an example? (You could even keep the tests, and put it all together as I'll take a look at the new doc after my workday; thanks for doing another pass! |
Yeah spot on, this is exactly what I was trying to say initially for why I didn't include this -- it's a quite "straightforward" interface to define
Could work.
Great, thanks! And... after all that, I also couldn't resist being a bit over the top -- so there's a shortcut now for |
I just did another read and I like the results. 👍
I just rewrote my code to remove the unnecessary I don't think any changes are needed on It's hard to tell how this could be refined or improved on this part. Probably a significant portion of users will never use these APIs directly at all.
To start with, let me dispel any concern: I didn't see the other exceptions raised. I was just updating error handling and it wasn't clear what errors could be raise during resolution. Also, note that the examples include I think I was a bit confused about why the error classes don't inherit from a common base, more than I think that having a shared based would be a good thing. It wasn't 100% clear to me what the meanings of the errors is. Rather than worrying about any functional changes to A couple of other thoughts:
|
It's a bit tricky, I'm still working on this. Some microbenchmarks are faster (even significantly -- e.g. validator creation is ~2x faster in microbenchmarks). But others indeed are a bit slower. Even though my (jsonschema's) test suite is ~2x slower itself, more targeted microbenchmarks show things aren't that bad. But yeah still working on this. At least from my own profiling so far, it's actually (Of course the old implementation isn't fully compliant, so at some point even if things are slower obviously we'd still prefer the new one -- but yeah there's definitely still room to optimize, I'm fairly confident it's possible to make things faster even than the existing main branch... Of course if you do anything there let me know.) And yeah still not fully typing out a response to the rest, will get back to it -- the dialect thing is tricky. For general URIs, |
Just use virtue in CI.
We're not a general class, so we know what fields we need ahead of time. This seems to give ~15% speedup on Validator evolution, which happens often as part of walking up and down schemas.
I realize I didn't get back to leaving a response on the dialect URI-related comment (which I think is the last one I didn't address but let me know if I missed another one!) -- But two things:
Half of this was a bug in I think that leaves HTTP vs HTTPS -- I think for just that use case a helper probably isn't very useful since it's more about adding resources than retrieving them, but I'm not against some sort of helper at some point if there's more variants. Technically speaking these are different URIs, but if someone chooses to use the helper function I guess it's fine. But yeah maybe not for now, especially given the empty fragment case is handled? Let me know if you have any other feedback, otherwise I'll likely be merging this in the next day or two, especially given I've basically written enough of a first pass on my rust replacement for pyrsistent to demonstrate it's faster (still not as fast as it could be, but faster!) |
Going to merge this I think! But if anyone has further comments feel free to open an issue if needed. A (beta) release will come out shortly! |
Very exciting! I've been keeping one eye on your Rust work as I'm able. It's motivating me to get off my fanny and try learning Rust again. 😉 From my perspective, all is well with this change. I even found some schemas in SchemaStore which couldn't be loaded with
I think that was the only open topic in this thread. Everything else seems to be resolved as we close this out.
Makes sense; I wasn't aware of that detail of the spec. That may also explain why schemas are more often missing the trailing
Yeah, I am in agreement on this. I don't think the HTTP/HTTTPS URIs should be magically synonymized, since they are different. It's an entirely different matter if a frontend like
and I'd even be open to the idea of including such behaviors in The only related thing that I'm still wondering about is how a user should declare and use a custom metaschema moving forward. I don't have any evidence of anyone doing this, but I used to have a clearer notion of how one would define a validator class and register it into |
Hah nice. Well, I'll say the extent I've "learned" it is "I can write a bunch of
Nice! Yeah my guess is (thankfully? unsurprisingly?) it's probably a pretty compliant implementation -- I'm sure there's a bug or two hiding which will get uncovered and hopefully easily fixed, but clearly the primary concern at the minute is now getting the
👍🏽
Good catch :) I was waiting to see if someone would notice this. You're.. the second! See #1061. Needless to say though I thankfully hadn't forgotten about this, though we indeed don't have a test case that "reminds" me of it, but I remembered it anyhow somehow -- so this will happen before a real release (and there'll be another alpha or beta which contains support for it to ensure it works) -- until then there's a todo here reminding me and a ticket in referencing which is python-jsonschema/referencing#28 (and to a lesser extent python-jsonschema/referencing#24). The basic idea I think is that every time you call Thanks again for the comments, keep em coming! |
This PR represents a major overhaul, a month or two of work's worth, of the referencing behavior via the
$ref
and$dynamicRef
keywords.It makes referencing resolution fully compliant across all drafts (yay!) and hopefully makes the user-facing interface for configuring reference resolution a lot easier to understand.
All of the above is by introduction of a new (external) dependency, the referencing library.
referencing
is going to be in beta for at least a release or two of this library, and then will become 1.0.Users of referencing behavior are highly encouraged to provide feedback, I could definitely use it. As I say, these interfaces may change slightly (if anything on the
referencing
side, I think the interface here injsonschema
is pretty intentionally narrow).But some old use cases are now possible, and it's very likely that a whole lot of
$ref
-related bugs are now closable. I'll be going through all the bugs which are related and making notes / closing them with reference to this PR as part of the release (which will be 4.18.0, though likely after at least one beta release, 4.18.0b1).As you might notice from the version number,
RefResolver
, though now fully deprecated, should be free from any functional changes, as this PR is intended to be fully backwards compatible. Fingers crossed that that's the case, though I'm bracing to see if someone has mucked withRefResolver
's internals in a drastic enough way to make that not be the case.New documentation is also written (previewable below) on a
JSON Schema Referencing
page.Performance-wise, some initial benchmarks indicate that despite the increased level of compliance with the spec, performance should be on par, and in some cases better than before. The benchmark suite is anything but comprehensive though, so performance feedback is also welcome, we'll do what we can to address any issues uncovered there.
This PR will likely get merged shortly after a few final checks (mainly it was the documentation-writing, so now we're getting quite close). If you're someone with feedback, please feel free to leave it on the PR. Perhaps @sirosen will ping you specifically if you don't mind, as I have on my list to see how easy it is to plug
referencing
intocheck-jsonschema
as a good test of whether the new interface is easy enough to use -- I'm happy to try to do so myself or otherwise if you try it I'd love feedback on how easy you find doing so.📚 Documentation preview 📚: https://python-jsonschema--1049.org.readthedocs.build/en/1049/