Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on lazy #465

Closed
maelle opened this issue Jan 16, 2025 · 6 comments · Fixed by #474
Closed

Comments on lazy #465

maelle opened this issue Jan 16, 2025 · 6 comments · Fixed by #474
Milestone

Comments

@maelle
Copy link
Collaborator

maelle commented Jan 16, 2025

@krlmlr

  • it is confusing that read_parquet_duckdb() has an argument called lazy whereas duckdb_tibble()'s similar argument is called .lazy.
  • The README state that you can only have automatic fallbacks for an eager data.frame but isn't this a design choice? Couldn't duckplyr use collect() before the fallback? I think this could be clearer in the docs.
@maelle
Copy link
Collaborator Author

maelle commented Jan 16, 2025

Is what's in https://duckdb.org/2024/04/02/duckplyr.html#eager-vs-lazy-materialization still valid regarding laziness? Hasn't the default of lazy changed over time? (or are there two levels of laziness 🤪 )

Also note that https://duckplyr.tidyverse.org/dev/articles/developers.html#eager-and-lazy-modes has good content

@maelle
Copy link
Collaborator Author

maelle commented Jan 16, 2025

A table of lazy vs eager might make sense.

Mode Eager Lazy
Materialization/computation triggered at creation when requested by collect()
Direct access with $ Works Fails (?)
Automatic fallbacks to dplyr Work Fail
Default for In-memory data (duckdb_tibble()) Remote data (read_parquet_duckdb())
Recommended for small to medium-sized data large data sets

@maelle
Copy link
Collaborator Author

maelle commented Jan 16, 2025

If a vignette is created it should include a full example of compute(lazy = FALSE) usage.

@maelle
Copy link
Collaborator Author

maelle commented Jan 16, 2025

In the reference there's a single section https://duckplyr.tidyverse.org/dev/reference/index.html#connecting-copying-and-retrieving-data

Why not split it into a section with functions for lazy frames, whose title would contain "lazy" and "materialization" (collect, compute*) and other functions (pull and explain).

@krlmlr
Copy link
Member

krlmlr commented Jan 17, 2025

Thanks. I now think this should be a vignette. The creation of the vignette is tracked in #207, below are my clarifications.

  • lazy vs. .lazy : The leading dot seems mandatory for functions where the args in ... are user-supplied, can't find a reference now. On the other hand, it feels superfluous for the other functions.
  • Automatic collect() : we protect the user against a huge result set. Implementation-wise, the only difference between "eager" and "lazy" is a flag that prohibits automatic materialization.
  • Perhaps we shouldn't use "eager" and "lazy" after all: Rethink "eager" vs. "lazy" #473 . The table is incorrect too.
  • https://duckplyr.tidyverse.org/dev/articles/developers.html#eager-and-lazy-modes can be picked up when rearranging vignettes
  • Agree that a vignette about automatic materialization should contain code that demonstrates the behavior in detail.

@krlmlr krlmlr added this to the 1.0.0 milestone Jan 17, 2025
@maelle
Copy link
Collaborator Author

maelle commented Jan 23, 2025

The table is incorrect too.

Could you help me fix it? I think a table would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants