Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about customizing __init__ or __post_init__ #326

Closed
tlambert03 opened this issue Feb 18, 2023 · 5 comments
Closed

Discussion about customizing __init__ or __post_init__ #326

tlambert03 opened this issue Feb 18, 2023 · 5 comments

Comments

@tlambert03
Copy link

tlambert03 commented Feb 18, 2023

Description

Hi @jcrist, following up here from https://github.com/davidbrochart/ypywidgets/issues/6#issuecomment-1435227915 ... slightly related to #70, on the general topic of customizing either __init__ or adding some __post_init__ concept.

background

As a bit of context, I'm generally interested in dataclass patterns of all flavors, and enjoy understanding their relative merits and goals; and msgspec seems to be an exciting addition when serialization and speed are top concerns. My interests originally stemmed from my work on napari, where I tried to develop classes & objects that "feel" to the user like relatively simple dataclass-style objects that use some observer pattern to stay in sync with some GUI/view. I extracted that work into psygnal, where I've more recently been adding more generic support to turn any dataclass pattern into an "evented" dataclass that could connect callbacks to changes. I support msgspec (in addition to dataclasses, pydantic, and attrs)

from psygnal import evented
import msgspec

@evented
class Person(msgspec.Struct):
    name: str
    age: int = 0

jim = Person('Jim')
jim.events.age.connect(lambda: print("happy birthday!"))
jim.age += 1  # prints "happy birthday!"

more recently, I've been helping out with anywidget, trying to leverage that pattern to allow widget developers make a python model that stays in sync with a frontend js model, and I'd like to be able to provide modern typing syntax and minimal boilerplate. Given its speed and focus on serialization, it seems that msgspec would be particularly well suited to that ...

customizing __init__

This issue is therefore part feature request & part discussion. I'd like to encourage developers of relatively simple models to give msgspec a try when developing evented models that they want to syncronize with some view – particularly if that view is in some other process that requires serialized communcation. I suspect however – perhaps prematurely – that many developers (i.e. would-be consumers of psygnal.evented or anywidget) would be flustered a bit to learn that they can't customize initialization of their objects after instance creation, either by overriding __init__, or by implementing some new __post_init__.

As an example, to support msgspec in the first place with psygnal.evented, I had to change my internal code to use a descriptor object to get around the fact that I couldn't use __init__ to initialize the event-tracking object for each new Struct instance. (It so happens I now like that pattern much better 😂 ... but that was how this got on my radar to begin with). A default_factory could conceivably work here too, but only if the factory function was passed the new Struct instance (I think).

question

My question, I suppose, is what would you say to a developer who wanted to use msgspec.Struct as their dataclass pattern, maybe even as a superclass base for multiple models in their library. From your experience, do you feel like there's almost always a good alternative to customizing __init__? Would you say that inasmuch as their classes remain purely data-driven and declarative, then msgspec is ideal, but as they become loaded with more business logic that maybe it's not the use case msgspec is targeting? More generally, if we want to do something/anything after the creation of the Struct instance that isn't just the initialization of a new field, is it possible?

as you can see here, I don't have an immediate/personal need to customize __init__, it's a bit more of an open "is this really never necessary?" question, when I think about what to suggest to downstream users.

thanks for all your work! 🙏

@jcrist
Copy link
Owner

jcrist commented Feb 24, 2023

Thanks for opening this! Apologies for the delayed response, I was on vacation all week.

First - pysygnal looks neat! I really like how you've abstracted out the event tracking logic, letting other libraries build off of that. Do you have an actual use case for using msgspec with this library, or is this more of an experimental "lets see what dataclass-like-libraries we can support"?


Getting to your question, the main issue with supporting overriding __init__ here is handling developer expectations of when that code is run. I generally agree with the attrs docs on this one. There are two main reasons to write your own __init__:

Deriving/generating a subset of parameters

Sometimes you might want to derive/generate a subset of parameters, avoiding forcing the user to pass them in manually. For example, say we had a right triangle type with lengths for the 3 fields:

class RightTriangle(Struct):
    """A right triangle with sides a, b, and c (the hypotenuse)"""
    a: float
    b: float
    c: float

You might want to modify the __init__ to avoid passing in the c parameter, and instead derive it from a and b. Currently msgspec (like dataclasses) only supports default factories that take no arguments, so there's no way c can be derived from a and b. We could extend default_factory to optionally take in self (like attrs does), but that has other problems (not compatible with typing.dataclass_transform, has issues if the factory depends on other optional fields).

In this situation, I'd recommend using a classmethod to provide an alternative constructor:

class RightTriangle(Struct):
    """A right triangle with sides a, b, and c (the hypotenuse)"""
    a: float
    b: float
    c: float

    @classmethod
    def from_legs(cls, a, b) -> RightTriangle:
        return cls(a, b, (a**2 + b**2)**0.5)

This keeps it clear with user's intuition that from_legs is an alternative constructor. It's not called on decode/pickle.loads/__copy__/... If we allowed overriding __init__ directly, I forsee a set of users who may expect decode to work transparently on messages only containing a and b, since that's what the constructor takes.

Setting up some extra state

Another reason to write your own __init__ may be to setup some extra state related to the object. Your pysygnal events tracker is one example of this (although I also like the descriptor approach better here). In this case the signature of the __init__ wouldn't change, you'd want to hook into something after (or before) the default __init__ ran.

dataclasses and attrs handle this with __post_init__ and __attrs_post_init__/__attrs_pre_init__ respectively. We could support something like this if needed, but for now I've deliberately left it out until a valid use case arises.


The design decision for me is matching user's intuition about what code runs when. I've too often seen python devs confused when their object has invalid state after a pickle.loads call, because they expected __init__ to run again on load.

If we allow code to run when a Struct instance is created in one occasion, I want to make sure we run it in all occasions. For reference, Struct instances can be created in the following ways:

  • Calling the default generated constructor (MyStruct(...))
  • Any of the builtin decoders (msgspec.json.decode, msgspec.msgpack.decode, ...)
  • msgspec.from_builtins
  • msgspec.structs.replace
  • copy.copy (hits Struct.__copy__) and copy.deepcopy
  • pickle.loads

We can't make that guarantee if we let users override __init__ itself, since we don't (and can't) call __init__ on decode. We could match that guarantee if we added something like __post_init__(self) (or __pre_init__(self)), but I'd only want to add that with a real motivating use case.


Would you say that inasmuch as their classes remain purely data-driven and declarative, then msgspec is ideal, but as they become loaded with more business logic that maybe it's not the use case msgspec is targeting?

I'd say that msgspec is primarily concerned with easy and performant (de)serialization of python objects. If your class is never serialized, then I don't think msgspec would be useful for you. If your class is serialized, then I'm interested in making sure msgspec is flexible enough to support users use cases, while also nudging them into what I consider good design patterns.

@tlambert03
Copy link
Author

tlambert03 commented Feb 24, 2023

Thanks a lot for the thoughts! I agree with them all!

If we allow code to run when a Struct instance is created in one occasion, I want to make sure we run it in all occasions.

This line particularly resonates with me, and I hadn't previously considered it. Thank you!

Do you have an actual use case for using msgspec with this library, or is this more of an experimental "lets see what dataclass-like-libraries we can support"?

Yes (though while an evented Struct is important here, i think you'll see that the need for custom __init__ is still rather vague, my apologies 😅 ).

The motivating use case for me was with https://github.com/manzt/anywidget. It's a library I've been helping out with that aims to make it easier to bring modern JS modules to target multiple notebook environments (notebooks, jupyter-lab, vscode, colab) without all the boilerplate. While it is currently subclassing ipywidgets (which uses traitlets both for dataclass and observer pattern), we've been working on a new pattern using psygnal and dataclasses on the python side. As you probably know, the main name of the game with jupyter widgets is synchronizing a python-side model with a front-end js model, so ser/des is happening constantly. As such, I thought a msgspec-based model would be an ideal update to the traitlets-based approach.

The bare minimum mechanics then becomes:

from msgspec import Struct
from anywidget._descriptor import MimeBundleDescriptor
from psygnal import SignalGroupDescriptor

class SomeWidget(Struct):
    # whatever fields the user needs to synchronize between front & back end
    foo: int = 0
    bar: str = 'baz'

    # this object implements the observer pattern, allowed callbacks to be connected
    # at, for example: SomeWidget().events.foo.connect(my_callback).  It also patches __setattr__ 😬
    events = SignalGroupDescriptor()

    # this object will create a Comm object to communicate with the JS side when a repr is
    # requested by a notebook.  It would call `msgspec.json.encode(self)`
    _repr_mimebundle_ = MimeBundleDescriptor()

I can imagine offering that as a base class, in which case an anywidget user would have this lovely
modern pattern, backed by msgspec serialization:

from anywidget import AnyWidget

class Switch(AnyWidget):
    state: bool = False
    _esm: ClassVar = 'some javascript module...'

... and that's where I (rather pre-emptively) imagined that AnyWidget users would say "huh??? I can't customize init at all?" ... even though I personally can't tell you why they would want to 😂. But again, it's all a bit premature.

Deriving/generating a subset of parameters: In this situation, I'd recommend using a classmethod to provide an alternative constructor:

yep, I'm all for the classmethod approach for offering a customized signature to derive dataclass parameters!

Setting up some extra state ... We could support something like this if needed, but for now I've deliberately left it out until a valid use case arises.

I absolutely respect that approach!

in summary:
I don't believe i yet have that super valid use case that should motivate you to add __post_init__. My interests are still partially as a "middle man" contemplating adopting/encouraging users of a library I work on to use msgspec, and imagining that it might have some pushback. But, I'm absolutely with you: I wouldn't want to muddy a clean pattern with pre-mature features until a really compelling use case arises. For my immediate needs, descriptors are enough.

I really appreciate you taking the time to write this out. (I hope you don't mind and I hope it will be a useful thread if the general topic comes up again in the future)

@jcrist
Copy link
Owner

jcrist commented Feb 24, 2023

Thanks for the thoughtful reply. anywidget also looks neat - it's been a while since I've heavily used ipywidgets, but when I did I definitely valued not having to know any web-dev stuff to make something simple "just work". It's nice to see the ecosystem of these tools improve over time.

If you ever have a concrete use case for adding __post_init__ (or any other issue), please let me know - I'm always happy to help! I'm going to close this for now.

@jcrist jcrist closed this as completed Feb 24, 2023
@dhirschfeld
Copy link

xref: Potentially resolved in #470

@tlambert03
Copy link
Author

Wow!

On a quick scan of the related issues, i couldn't quickly find the "killer app" that motivated adding the feature. Can you summarize in a sentence or two?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants