Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars extension #3356

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

kjgoodrick
Copy link
Contributor

📝 Summary

Fixes issue #3355 by adding LazyFrame extension that allows output of mermaid graphs for polars lazyframe query plans.

🔍 Description of Changes

In this pull request I have added a polars extension that adds a mo namespace to LazyFrames, this namespace includes one "public" method show_graph that accepts the same arguments as polars LazyFrame show_graph and returns an Html object containing the mermaid chart.

Issue #3355 shows side by side outputs.

Before merging it would be necessary to add documentation for this new polars namespace, but I wanted to request your feedback before I took that step.

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • I have added tests for the changes made.
  • I have run the code and verified that it works as expected.

📜 Reviewers

@akshayka

Copy link

vercel bot commented Jan 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 7, 2025 0:35am
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 7, 2025 0:35am

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new namespace and requiring users to learn about it, I believe we can register custom formatters for Polars, in this file:

https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

For example, we could monkey-patch show_graph to do the right thing in marimo.

I agree that the ideal solution is to get Polars to support marimo natively. They can detect if they are running in a marimo notebook by checking mo.running_in_notebook(), and then integrate with marimo in one of various ways (https://docs.marimo.io/guides/integrating_with_marimo/displaying_objects/).

@MarcoGorelli , if you don't mind, I would appreciate hearing your thoughts on whether implementing this upstream in Polars would be possible, or if we should plan to monkey-patch Polars in marimo instead.

Comment on lines +119 to +121
from marimo._polars import (
lazyframe, # noqa: F401 # Import is required for lazyframe registration
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If registration happens on import of the polars module, then this can be removed.

https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

As far as I can tell there's no need to add lazyframe to the marimo public API?

@MarcoGorelli
Copy link
Contributor

@MarcoGorelli , if you don't mind, I would appreciate hearing your thoughts on whether implementing this upstream in Polars would be possible, or if we should plan to monkey-patch Polars in marimo instead.

if the solution isn't too complex on the Polars side, then I think this would be welcome!

@mscolnick
Copy link
Contributor

I don't think we need to monkey-patch anything. showing mermaid should be the default (easier to navigate, adds no deps and not lossy) and instead we can add a formatted for LazyFrame in https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

If anything could be upstreamed, we could add _repr_mermaid_ to polars, but _repr_mermaid_ is not an official repr from jupyter (however, we can support it)

@akshayka
Copy link
Contributor

akshayka commented Jan 7, 2025

I don't think we need to monkey-patch anything. showing mermaid should be the default (easier to navigate, adds no deps and not lossy) and instead we can add a formatted for LazyFrame in https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

If anything could be upstreamed, we could add _repr_mermaid_ to polars, but _repr_mermaid_ is not an official repr from jupyter (however, we can support it)

Lazyframes already render fine in marimo according to #3355.

I believe the core issue is modifying Lazyframe.show_graph to show a mermaid diagram. This cannot be accomplished using a formatter in marimo without monkey-patching Lazyframe.show_graph, unless I am misunderstanding something. So it might be preferable to upstream marimo-aware rendering for this function if Polars team is okay with that.

@mscolnick
Copy link
Contributor

mscolnick commented Jan 7, 2025

I was looking at this:

image

Which is possible to move to mermaid since marimo can change the formatting of the last line. You would still run the issue of explicitly calling show_graph(optimized=True), which then would be nice to upstream show_graph(format="mermaid")

But yea @akshayka you are right, without anything upstreamed w.r.t mermaid, we would need to monkeypatch to cover all cases.

@mscolnick
Copy link
Contributor

Looks like there was some appetite for mermaid formatting in polars: pola-rs/polars#12075

That part seems non-controversial, but could go as far as making the default 'mermaid' instead of 'dot' in marimo

@kjgoodrick
Copy link
Contributor Author

Thank you all for the discussion on this. It sounds like there is already interest on the polars side for introducing support for mermaid, which is great to here. If polars does support mermaid and is ok with us inserting a simple snippet for detecting marimo then I think we could forgo the polars extension which would remove the need to add any imports to the marimo init file.

I think then that the best way forward would be to:

  1. Add Mermaid support to polars show_graph
  2. Add a dataframe formatter to Marimo that will change the behavior of leaving a LazyFrame on the last line
    • Use polars mermaid generation developed in 1
  3. Add support to polars show_graph for detecting marimo and calling the display functionality developed in 2.

If that sounds good to you all, I can close this PR and then make a new one to handle step 2 above.

@akshayka
Copy link
Contributor

akshayka commented Jan 7, 2025

That sounds good to me, thank you!

Would you like to work with Polars to lead 1 and 3 as well? If you opened an issue or PR with them, please feel free to tag me.

@kjgoodrick
Copy link
Contributor Author

Yes, I will open a PR in Polars for 1 to get started and tag you in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants