Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Support for accessors #230

Open
znichollscr opened this issue Jan 17, 2025 · 7 comments
Open

feature: Support for accessors #230

znichollscr opened this issue Jan 17, 2025 · 7 comments
Assignees
Labels
config Related to configuration options feature New feature or request fund Issue priority can be boosted griffe extension Can be solved with a Griffe extension insiders Candidate for Insiders

Comments

@znichollscr
Copy link

znichollscr commented Jan 17, 2025

Is your feature request related to a problem? Please describe.

If you document accessors, there isn't an obvious solution for how to show the namespace in which the accessor ends up in (as opposed to the original API). In the sphinx world, there is a package dedicated to handling this situation: https://github.com/xarray-contrib/sphinx-autosummary-accessors

Describe the solution you'd like

A similar thing for the mkdocs universe.

Describe alternatives you've considered

Additional context

I have written such an extension here: https://github.com/climate-resource/mkdocstrings-python-accessors. As you can see, it is super basic, but it seems to do what I want (see an example at the bottom of this page , with built docs here (note the namespace which is documented)).


This issue was labeled with the fund label. The following section is automatically added by Polar.

Boost priority in our backlog through Polar's "issue funding". Issues linked to monthly sponsorships of $50 or more (author, upvoters) are already prioritized, see how we manage our backlog.

Fund with Polar
@pawamoy
Copy link
Member

pawamoy commented Jan 17, 2025

Hi @znichollscr, thanks for the feature request!

So, if I understand correctly, pandas provides API that lets developers register "accessors", which are then available from pandas' API itself. Since users are expected to use the pandas API instead of the developer's accessor, developers want to document it as such in their API docs.

IMO it would better live as a Griffe extension rather than a mkdocstrings handler feature. Are you aware of Griffe's extension system / have you considered using it?

@pawamoy pawamoy added the griffe extension Can be solved with a Griffe extension label Jan 17, 2025
@znichollscr
Copy link
Author

So, if I understand correctly, pandas provides API that lets developers register "accessors", which are then available from pandas' API itself. Since users are expected to use the pandas API instead of the developer's accessor, developers want to document it as such in their API docs.

Exactly.

Are you aware of Griffe's extension system / have you considered using it?

I did have a read through it and prototyped an extension. I was nervous about using that, because I wasn't sure what would happen if I changed the names used in Griffe's tree given that there is no code under the pandas API (so if Griffe goes looking for the source, it won't find anything which might affect source previews, for example). If you think that's not an issue/have suggestions for how to avoid this, I'd be very happy to hear them.

Thanks for the fast response and great package by the way!

@pawamoy
Copy link
Member

pawamoy commented Jan 18, 2025

I see, thanks. Yes this is a valid concern. I never had this use-case before where one wants to render something under a different name, under a different package even, while not actually appearing in this package's tree. Well, the extension could actually add objects into the pandas tree, but the filepath (used to get source among others) is a property. I wonder if the extension could simply override the filepath property with a fixed attribute. Mypy would probably complain, but I think it might work? Also there's the issue that pandas must be loaded before the extension can add objects to its tree.

Maybe the simpler solution here is really to just override the identifier of the object while rendering. Something akin to mkdocstrings/mkdocstrings#725, but for the identifier itself instead of the heading title or ToC label.

::: path.to.my.registered.accessor
    options:
        identifier: pandas.dataframe.my_namespace

That could cause issues with autorefs though.

I'll have a look at your custom handler to see exactly how you render it, to see if there's a way to make this all work in a generic way 🙂

Thanks for the fast response and great package by the way!

You're welcome! Thank you for the kind words 💟

@pawamoy pawamoy added config Related to configuration options insiders Candidate for Insiders fund Issue priority can be boosted and removed griffe extension Can be solved with a Griffe extension labels Jan 18, 2025
@pawamoy
Copy link
Member

pawamoy commented Jan 18, 2025

Oh OK you simply overwrite the name of Griffe objects. I think that will give you incorrect paths (ids) and therefore incorrect inventory entries, as well as incorrect autorefs (making your objects un-reference'able).

@znichollscr
Copy link
Author

znichollscr commented Jan 18, 2025

Very true. Once you think about it, it's actually quite a complex problem. The solution we have definitely will have plenty of drawbacks.

Well, the extension could actually add objects into the pandas tree, but the filepath (used to get source among others) is a property. I wonder if the extension could simply override the filepath property with a fixed attribute.

This might actually be the only way to do this. I guess that would also better follow the accessor pattern in general, which does actually inject an API into another package's namespace (effectively). As you say though, you have to then somehow have a way to trace back to the original code because the API only exists at runtime, never statically (at least not without some clever stub work probably, maybe even not then, this does not bode well...).

@pawamoy pawamoy added the griffe extension Can be solved with a Griffe extension label Jan 18, 2025
@pawamoy
Copy link
Member

pawamoy commented Jan 18, 2025

Here's how I'd go about injecting the object in the pandas tree:

import ast
import copy

import griffe


class PandasAccessors(griffe.Extension):
    def on_class_instance(self, *, cls: griffe.Class, agent: griffe.Visitor | griffe.Inspector, **kwargs):
        try:
            decorator = next(dec for dec in cls.decorators if dec.callable_path == "pandas.register_accessor")
        except StopIteration:
            return
        namespace = ast.literal_eval(decorator.value.arguments[0])
        pd_obj = copy.deepcopy(cls)
        pd_obj.filepath = cls.filepath
        agent.modules_collection["pandas.DataFrame"].set_member(namespace, pd_obj)

...then I'd run that and iterate to fix all the things that go wrong 😄

EDIT: deepcopy might be a bit brutal here, as objects are heavily linked together. Could end up copying a lot more than just the class and its members. Maybe it'd be better to re-create a class instance instead.

znicholls added a commit to openscm/continuous-timeseries that referenced this issue Jan 18, 2025
@znichollscr
Copy link
Author

Very nice! I guess the other part of my issue is that I don't know the API well enough to think about I'd need to test to make sure everything works (I guess rendering, the ability to cross-reference, the correct objects end up in the inventory (which means other libraries could cross-reference this...)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config Related to configuration options feature New feature or request fund Issue priority can be boosted griffe extension Can be solved with a Griffe extension insiders Candidate for Insiders
Projects
None yet
Development

No branches or pull requests

2 participants