diff --git a/R/datatype.R b/R/datatype.R index a9bf3baf3..ff777328a 100644 --- a/R/datatype.R +++ b/R/datatype.R @@ -207,7 +207,10 @@ DataType_Duration = function(time_unit = "us") { #' Create Struct DataType #' -#' Struct DataType Constructor +#' One can create a `Struct` data type with `pl$Struct()`. There are also +#' multiple ways to create columns of data type `Struct` in a `DataFrame` or +#' a `Series`, see the examples. +#' #' @param ... RPolarsDataType objects #' @return a list DataType with an inner DataType #' @examples @@ -215,15 +218,38 @@ DataType_Duration = function(time_unit = "us") { #' pl$Struct(pl$Boolean) #' pl$Struct(foo = pl$Int32, bar = pl$Float64) #' -#' # Find any DataType via pl$dtypes -#' print(pl$dtypes) -#' #' # check if an element is any kind of Struct() #' test = pl$Struct(pl$UInt64) #' pl$same_outer_dt(test, pl$Struct()) #' #' # `test` is a type of Struct, but it doesn't mean it is equal to an empty Struct #' test == pl$Struct() +#' +#' # The way to create a `Series` of type `Struct` is a bit convoluted as it involves +#' # `data.frame()`, `list()`, and `I()`: +#' as_polars_series( +#' data.frame(a = 1:2, b = I(list(c("x", "y"), "z"))) +#' ) +#' +#' # A slightly simpler way would be via `tibble::tibble()` or +#' # `data.table::data.table()`: +#' if (requireNamespace("tibble", quietly = TRUE)) { +#' as_polars_series( +#' tibble::tibble(a = 1:2, b = list(c("x", "y"), "z")) +#' ) +#' } +#' +#' # Finally, one can use the method `$to_struct()` to convert existing columns +#' # or `Series` to a `Struct`: +#' x = pl$DataFrame( +#' a = 1:2, +#' b = list(c("x", "y"), "z") +#' ) +#' +#' out = x$select(pl$col("a", "b")$to_struct()) +#' out +#' +#' out$schema DataType_Struct = function(...) { result({ largs = list2(...) diff --git a/man/DataType_Struct.Rd b/man/DataType_Struct.Rd index 71c0764c1..788fbdbe0 100644 --- a/man/DataType_Struct.Rd +++ b/man/DataType_Struct.Rd @@ -13,20 +13,45 @@ DataType_Struct(...) a list DataType with an inner DataType } \description{ -Struct DataType Constructor +One can create a \code{Struct} data type with \code{pl$Struct()}. There are also +multiple ways to create columns of data type \code{Struct} in a \code{DataFrame} or +a \code{Series}, see the examples. } \examples{ # create a Struct-DataType pl$Struct(pl$Boolean) pl$Struct(foo = pl$Int32, bar = pl$Float64) -# Find any DataType via pl$dtypes -print(pl$dtypes) - # check if an element is any kind of Struct() test = pl$Struct(pl$UInt64) pl$same_outer_dt(test, pl$Struct()) # `test` is a type of Struct, but it doesn't mean it is equal to an empty Struct test == pl$Struct() + +# The way to create a `Series` of type `Struct` is a bit convoluted as it involves +# `data.frame()`, `list()`, and `I()`: +as_polars_series( + data.frame(a = 1:2, b = I(list(c("x", "y"), "z"))) +) + +# A slightly simpler way would be via `tibble::tibble()` or +# `data.table::data.table()`: +if (requireNamespace("tibble", quietly = TRUE)) { + as_polars_series( + tibble::tibble(a = 1:2, b = list(c("x", "y"), "z")) + ) +} + +# Finally, one can use the method `$to_struct()` to convert existing columns +# or `Series` to a `Struct`: +x = pl$DataFrame( + a = 1:2, + b = list(c("x", "y"), "z") +) + +out = x$select(pl$col("a", "b")$to_struct()) +out + +out$schema } diff --git a/vignettes/differences-with-python.Rmd b/vignettes/differences-with-python.Rmd new file mode 100644 index 000000000..3c46094bd --- /dev/null +++ b/vignettes/differences-with-python.Rmd @@ -0,0 +1,106 @@ +--- +title: "Differences with Python Polars" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Differences with Python Polars} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +We try to mimic the Python Polars API as much as possible so that one can quickly +switch and copy code between the two languages with as little adjustments to make +as possible (most of the time switching `.` and `$` to chain methods). + +Still, there are a few places where the API diverges. This is often due to +differences in the language itself. This vignette provides a list of those +differences. + + +## Converting data between Polars and R + +### From R to Polars + +The R package provides functions to create polars `DataFrame`, `LazyFrame`, and +`Series`. Like most of the functions, those are designed to be close to their +Python counterparts. + +Still, R users are more used to `as.*` or `as_*` functions to convert from or to +other R objects. Therefore, in the documentation, we sometimes prefer using +`as_polars_df()` rather than `pl$DataFrame()`. + +### From Polars to R + +While Python Polars has `to_pandas()`, we provide methods to convert Polars data +to standard R objects, such as `$to_list()` or `$to_data_frame()`. However, the +standard R user might find it more familiar to call `as.data.frame()`, `as.list()` +or `as.vector()` on Polars structures. + + +## 64-bit integers + +R doesn't natively support 64-bit integers (Int64) but this is a completely valid data +type in Polars, which is based on the Arrow specification. This means that +handling Int64 values in `polars` objects doesn't deviate from the Python +setting. However, we need to implement some extra arguments when we want to pass +data from Polars to R. + +In particular, all functions that convert some polars data to R (`as.data.frame()` +and other methods such as `$to_list()`) have an argument `int64_conversion` which +specifies how Int64 values should be handled. The default is to convert those +Int64 to Float64, but it is also possible to convert them to character or to keep +them as Int64 by using the package `bit64` under the hood. + +This option can be set globally using `options(polars.int64_conversion = "")`. +See `?polars_options()` for more details. + + +## The `Object` data type + +`Object` is a data type for wrapping arbitrary Python objects. Therefore, it +doesn't have an equivalent in R. + +When the user passes R objects with unsupported class to `polars`, it will first +try to convert them to a supported data type. For example, so far the class `hms` +from the eponymous package is not supported, so we try to convert it to a numeric +class: + +``` r +hms::hms(56, 34, 12) +#> 12:34:56 + +pl$DataFrame(x = hms::hms(56, 34, 12)) +#> shape: (1, 1) +#> ┌─────────┐ +#> │ x │ +#> │ --- │ +#> │ f64 │ +#> ╞═════════╡ +#> │ 45296.0 │ +#> └─────────┘ +``` + + +In some cases, there's no conversion possible. For example, one cannot convert +a `geos` geometry to any supported data type. In this case, it will raise an +error: + +``` r +geos::as_geos_geometry("LINESTRING (0 1, 3 9)") +#> +#> [1] + +pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)")) +#> Error: Execution halted with the following contexts +#> 0: In R: in $DataFrame(): +#> 0: During function call [pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))] +#> 1: When constructing polars literal from Robj +#> 2: Encountered the following error in Rust-Polars: +#> expected Series +```