Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse already materialized data frames #442

Open
krlmlr opened this issue Jan 4, 2025 · 1 comment
Open

Reuse already materialized data frames #442

krlmlr opened this issue Jan 4, 2025 · 1 comment
Labels
duckdb 🦆 Issues where work in the duckb package is needed

Comments

@krlmlr
Copy link
Member

krlmlr commented Jan 4, 2025

We want to reuse df2 once it's materialized.

In duckdb, need a new AltrepDataFrameRelation that wraps an ALTREP data frame and forwards to either the parent relation or to a relation that implements the data frame scan.

options(conflicts.policy = list(warn = FALSE))
library(duckplyr)
#> Loading required package: dplyr
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.

df1 <- duck_tbl(a = 1)
df2 <- df1 |> mutate(b = 2)
df3 <- df2 |> filter(b == 2)

duckdb:::rel_from_altrep_df(df2)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [a as a, 2.0 as b]
#>   r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
duckdb:::rel_from_altrep_df(df3)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Filter [(b = 2.0)]
#>   Projection [a as a, 2.0 as b]
#>     r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

collect(df2)
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

# Here, we can already use df2 and don't need to compute anything
duckdb:::rel_from_altrep_df(df2)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [a as a, 2.0 as b]
#>   r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
duckdb:::rel_from_altrep_df(df3)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Filter [(b = 2.0)]
#>   Projection [a as a, 2.0 as b]
#>     r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

Created on 2025-01-04 with reprex v2.1.1

@krlmlr
Copy link
Member Author

krlmlr commented Jan 6, 2025

Desired result:

options(conflicts.policy = list(warn = FALSE))
library(duckplyr)
#> Loading required package: dplyr
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.

df1 <- duck_tbl(a = 1)
df2 <- df1 |> mutate(b = 2)
df3 <- df2 |> filter(b == 2)

duckdb:::rel_from_altrep_df(df2)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [a as a, 2.0 as b]
#>   r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
duckdb:::rel_from_altrep_df(df3)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Filter [(b = 2.0)]
#>   Projection [a as a, 2.0 as b]
#>     r_dataframe_scan(0x115848e88)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

collect(df2)
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

# Desired result
duckdb:::rel_from_altrep_df(df2)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> r_dataframe_scan(0xdeadbeef)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
duckdb:::rel_from_altrep_df(df3)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Filter [(b = 2.0)]
#>   r_dataframe_scan(0xdeadbeef)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

Created on 2025-01-04 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duckdb 🦆 Issues where work in the duckb package is needed
Projects
None yet
Development

No branches or pull requests

1 participant