Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error in convert for unset values #758

Open
tijmenr opened this issue Oct 20, 2024 · 1 comment
Open

Unexpected error in convert for unset values #758

tijmenr opened this issue Oct 20, 2024 · 1 comment

Comments

@tijmenr
Copy link

tijmenr commented Oct 20, 2024

Description

The convert function does not handle UNSET values as I expected (or I am missing some detail in how optionality and unset work together). If a field has a union type that allows msgspec.UnsetType, it seems to ignore that during the conversion:

>>> import msgspec
>>> msgspec.__version__
'0.18.6'

>>> class A(msgspec.Struct,):
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> class B(msgspec.Struct):    # Basically same as A, but might differ in encoded field names
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> a = A(x='x')   # a=A(x='x', y=UNSET)

The value a fits the struct B (as it allows an unset y field), so I would expect conversion of a to an instance of B to succeed, with the value of the y field set to either UNSET (as it is in a) or the defined default in B (which in this case happens to be UNSET as well; this would need documentation). However, it fails on this unset value of the y field`:

>>> msgspec.convert(a, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

(Trying to "convert" a into a new instance of struct A does not yield an error, probably because there is some shortcut logic at play).

The same error also occurs when trying to convert a dict with an explicit unset value:

>>> msgspec.convert({'x': 'x', 'y': msgspec.UNSET}, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

I currently have a use case where I basically need to translate the field names of a multi-level json structure from kebab-case to PascalCase, and my idea was to create two roughly the same structs K and P (using the handy rename option so that each gets the desired encoded field names), decode the source document using struct K, convert it into struct P, and use that last one for output. However, this issue with unset fields hinders that approach. (I'm going to try and change the default value from UNSET to None as a workaround, but a side effect is not being able to distinguish between a field just not being present, or being present with a null value in the source document.)

On a side note (not sure if that would be a separate bug or a feature), having a dict as an intermediate step between K and P does not help, because msgspec.structs.asdict does not honor the omit_defaults option of the struct, so it does include the unset fields in the dict, and this means convert can not work on the dict either (unless I first rebuild the dict myself to remove all unset fields).

@tijmenr
Copy link
Author

tijmenr commented Oct 21, 2024

Another unexpected behavior for convert:

Given:

>>> class A(msgspec.Struct, rename='pascal'): a_x: int
...    
>>> class B(msgspec.Struct, rename='kebab'): a_x: int
... 
>>> class C(msgspec.Struct): AX: int
... 

The following conversions work, making it seem as though convert is flexible on whether it uses the "real" field name or the "encoded" one:

>>> msgspec.convert(A(1), B, from_attributes=True)    # Apparently uses a_x from A
B(a_x=1)
>>> msgspec.convert(B(1), A, from_attributes=True)    # Apparently uses a_x from B
A(a_x=1)
>>> msgspec.convert(C(1), A, from_attributes=True)    # Apparently uses AX from C
A(a_x=1)

But the flexibility on "real" or "encoded" is apparently only on the destination type, because the following fails:

>>> msgspec.convert(A(1), C, from_attributes=True)    # Cannot use AX from A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

And then suddenly when converting not from a Struct, but from a dict object, it only works if the dict contains the "encoded" field names:

>>> msgspec.convert({'AX':1}, A)    # Using encoded field name works
A(a_x=1)

>>> msgspec.convert({'a_x':1}, A)    # Using real field name fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

>>> # So you cannot do a roundtrip...
>>> msgspec.convert(msgspec.structs.asdict(A(1)), A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

Again, this behavior makes it difficult to use msgspec to translate similar json structures between different field name conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant