Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add support for a second message attribute on errors which is guaranteed not to include the failing instance #1218

Open
sirosen opened this issue Jan 26, 2024 · 5 comments
Labels
Enhancement Some new desired functionality Help Wanted An enhancement or bug for which a pull request is welcomed and which should have a clear definition.

Comments

@sirosen
Copy link
Member

sirosen commented Jan 26, 2024

This is a feature which I'd be very happy to work on, if it gets the ol' 👍 of approval as an idea!
(It's a case in which I could probably do FOSS work as part of my paid work allotment, rather than on personal time.)

Motivation

For context, we have a use-case at $WORK in which jsonschema is being used to validate data which is sourced from a mixture of locations. Some of the data is user-supplied, and some of it is sourced from config or application-owned secrets.
As a result, the user who is triggering this interaction is not guaranteed to have the rights/permission to see the full body of data being passed to jsonschema. The application therefore cannot show ValidationError.message data, since it may contain secrets.

I think the feature idea here has broader applicability -- and I don't think that the case I have at $WORK represents a great application design -- but there's my particular user story.

More generally, having some kind of message string which doesn't include the instance would allow applications to inspect the messages more reliably and potentially choose special behaviors for known errors. Today, this is only possible with some detailed knowledge of the error formats, and checks such as err.message.startswith(...).

Proposal: ValidationError.reason

I'm proposing a new string property on error classes, which is primarily intended for ValidationError but may be useful in other cases (needs some evaluation). The purpose of this property is to name the error case sans the input instance. In concert with the JSON Path to the failing instance, this is nearly equivalent to the current content in message, but it doesn't produce str(instance) as part of the error message.

reason is either provided explicitly or falls back to None. I considered having it default to message but that makes things messier for a client, which then has fewer guarantees about what might be in the reason field.

Where possible, reason may be used to add details which are not present in the message and would now be a breaking change to introduce.

reason would be documented as a concise description of the error cause which does not to contain details from the instance being evaluated.

Example Changes

additionalProperties keyword validation produces messages of the form "Additional properties are not allowed ({fields} {was|were} unexpected)".
The reason field on such an error would be "Additional properties are not allowed.".

maxLength validation produces messages of the forms "{instance} is expected to be empty" and "{instance} is too long".
The reason field on such an error would be "The value is too long. The maximum length is {maxLength}."

Note

I would be open to an alternative interpretation in which we make reason more closely aligned with describing the schema/spec.
That could lead to terse and technical reason values like "additionalProperties validation failed." or "maxLength={maxLength} exceeded.".

The above proposed changes are only where I'm leaning right now -- a slight re-encoding of message without the instance and potentially with other useful info.

Proposed Implementation

I would make this change by first updating the core _Error base class to support reason as a str | None field.
I then "just" need to walk all of the keywords and add explicit reason strings for each validator, plus a test case for each.

I don't think this is very complex as a change, although it is a bit tedious and laborious to add.

@Julian
Copy link
Member

Julian commented Jan 29, 2024

In general this sounds fine to me -- the main two things I think I'd want to be sure of are:

  • In whatever future where required, additionalProperties, propertyNames, dependentRequired and any other validators that reference property *names* need a way to indicate the erroring names in ValidationErrors #119 lands it will be by splitting ValidationError into n per-keyword exception classes, e.g. exceptions.failure.MinItems, exceptions.failure.Required, exceptions.failure.AllOf, etc. In that world, a bunch of the error properties are going to be deprecated (e.g. ValidationError.context which only applies to applicator keywords, so will only be an attribute on AllOf and not Required exceptions). And hopefully _Error will disappear entirely, but we'll see. I did some work to start on this but didn't finish it, mostly again because to do it right requires assembling a test suite for error messages that's better than the ad hoc one here (TestValidationErrorMessages). It's not an insurmountable amount of work, just work, and I intend(ed) to get back to it sometime this year now that referencing is maturing (though there's an important piece of work I want to do there too which is higher priority). Anyways -- all this is relevant only inasmuch as I don't want to make that work any harder -- but I think the attribute you're proposing will be applicable to all classes, so that should be fine.
  • I do though want to have a name that fits, especially since choosing the right one will mean it won't need changing again as part of the above. In particular, we already have context and cause -- extremely general sounding names which we use in sort of specific ways. So I want to avoid more of that, especially again to be sure the names stick when we do the Great Exception Split of 2024(hopefully). I think short_message is a bit clearer for that, but maybe there are even more ideas we can bikeshed if need be. We'll regardless have to document what you say (that it's guaranteed to not have the instance in it).

But other than the above, and a request to add tests of this property for all keywords, I'm good with it!

@Julian Julian added Enhancement Some new desired functionality Help Wanted An enhancement or bug for which a pull request is welcomed and which should have a clear definition. labels Jan 29, 2024
@sirosen
Copy link
Member Author

sirosen commented Jan 30, 2024

I have a mix of thoughts, not all well-organized, so I'll try to summarize what I see right now.

  1. I'm okay with an alternative name.

I've found some APIs somewhat-frustrating to use when they make names for related things hard to distinguish. e.g. reason, cause, details, and description all being different strings. It's hard to know which one is which. reason is vulnerable to this criticism.
I'm not sure that short_message is what I would choose though, since it's possible, at least in one of my examples for maxLength, for the message to be longer than a message.

I really wanted to call it cause, but I saw that's already in use. I'll need to think more about naming. Never easy! 😀

  1. I need to read your comment on the future refactoring direction more carefully.

This is going to take me some time to digest and make sure I understand. I'm not worried too much about it, but I haven't digested it enough to think through the implications.

  1. There are two things which I've conflated in my initial proposal, which are separately useful:
  • messages which are static (never interpolate any values)
  • info currently missing from message (like keyword values)

I think that both of these are features which would be nice to have.
If I were to separate those, maybe there are two new fields?

# pseudocode for the new attributes
class ValidationError:
    ...
    # a pair of (keyword, keyword-value), e.g. `("maxLength", 3)`
    keyword: tuple[str, Any]
    # a string unique to the keyword validation failure, e.g. "too long" for maxLength
    error_category_message: str

I'm just spitballing a little bit here.

@Julian
Copy link
Member

Julian commented Jan 30, 2024

info currently missing from message (like keyword values)

The only known examples of this should be #119, #992 and #993 and in general it's considered a bug, but yeah the solution I'd like is what I mentioned, of course happy to elaborate or brainstorm further on it

@sirosen
Copy link
Member Author

sirosen commented Feb 23, 2024

I just wanted to drop a small note, since it's been ~1 month and I haven't done anything:

I still want to do this, but I'm trying to negotiate the scheduling at work so that I can do this during working hours. (I've had really limited hobby development time lately.)

@Julian
Copy link
Member

Julian commented Feb 23, 2024

Cool, thanks for the update -- I have little doubt you won't completely disappear :D -- definitely understood on the hobby side!

amickan added a commit to comic/grand-challenge.org that referenced this issue Mar 25, 2024
Json validation errors used to include the instance (the submitted
json), the schema and the error. This changes the reporting to only
print the error message without the schema and instance. This is a
temporary fix until this is implemented:
python-jsonschema/jsonschema#1218

It also standardizes the message for `FILE_COPY_STATUS` notifications,
opting for the less verbose version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality Help Wanted An enhancement or bug for which a pull request is welcomed and which should have a clear definition.
Projects
None yet
Development

No branches or pull requests

2 participants