Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette to list some differences between R and Python API #1014

Merged
merged 14 commits into from
Apr 11, 2024
160 changes: 160 additions & 0 deletions vignettes/differences-with-python.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: "Differences with Python Polars"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Differences with Python Polars}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

We try to mimic the Python Polars API as much as possible so that one can quickly
switch and copy code between the two languages with as little adjustments to make
as possible (most of the time switching `.` and `$` to chain methods).

Still, there are a few places where the API diverges. This is often due to
differences in the language itself. This vignette provides a list of those
differences.


## Converting data between Polars and R

### From R to Polars

The R package provides functions to create polars `DataFrame`, `LazyFrame`, and
`Series`. Like most of the functions, those are designed to be close to their
Python counterparts.

Still, R users are more used to `as.*` or `as_*` functions to convert from or to
other R objects. Therefore, in the documentation, we sometimes prefer using
`as_polars_df(<data>)` rather than `pl$DataFrame(<data>)`.

### From Polars to R

While Python Polars has `to_pandas()`, we provide methods to convert Polars data
to standard R objects, such as `$to_list()` or `$to_data_frame()`. However, the
standard R user might find it more familiar to call `as.data.frame()`, `as.list()`
or `as.vector()` on Polars structures.


## 64-bit integers

R doesn't natively support 64-bit integers (Int64) but this is a completely valid data
type in Polars, which is based on the Arrow specification. This means that
handling Int64 values in `polars` objects doesn't deviate from the Python
setting. However, we need to implement some extra arguments when we want to pass
data from Polars to R.

In particular, all functions that convert some polars data to R (`as.data.frame()`
and other methods such as `$to_list()`) have an argument `int64_conversion` which
specifies how Int64 values should be handled. The default is to convert those
Int64 to Float64, but it is also possible to convert them to character or to keep
them as Int64 by using the package `bit64` under the hood.

This option can be set globally using `options(polars.int64_conversion = "<value>")`.
See `?polars_options()` for more details.


## Structs and objects

### Structs

A `Struct` is a data type that is composed of several `Field`s, which themselves
have a name and a data type.

The way to create a `Series` of type `Struct` is a bit convoluted as it involves
`data.frame()`, `list()`, and `I()`:

```python
>>> s = pl.Series([{"a": 1, "b": ["x", "y"]}, {"a": 2, "b": ["z"]}])
>>> s
shape: (2,)
Series: '' [struct[2]]
[
{1,["x", "y"]}
{2,["z"]}
]
>>> s.dtype
Struct({'a': Int64, 'b': List(String)})
```

```{r}
library(polars)

as_polars_series(
data.frame(a = 1:2, b = I(list(c("x", "y"), "z")))
)
```

A slightly simpler way would be via `tibble::tibble()` or `data.table::data.table()`:

```{r}
as_polars_series(
tibble::tibble(a = 1:2, b = list(c("x", "y"), "z"))
)
```

Finally, one can use the method `$to_struct()` to convert existing columns
or `Series` to a `Struct`:

```{r}
x = pl$DataFrame(
a = 1:2,
b = list(c("x", "y"), "z")
)

out = x$select(pl$col("a", "b")$to_struct())
out

out$schema
```
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved

### Objects

`Object` is a data type for wrapping arbitrary Python objects. Therefore, it
doesn't have an equivalent in R.

When the user passes R objects with unsupported class to `polars`, it will first
try to convert them to a supported data type. For example, so far the class `hms`
from the eponymous package is not supported, so we try to convert it to a numeric
class:

```r
> hms::hms(56, 34, 12)
12:34:56

> pl$DataFrame(x = hms::hms(56, 34, 12))
shape: (1, 1)
┌─────────┐
│ x │
│ --- │
│ f64 │
╞═════════╡
│ 45296.0 │
└─────────┘
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be in reprex style so that it can be copied and pasted and executed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the reprex style but those chunks are not evaluated every time (that would require us to put hms and geos in Suggests)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I don't think this needs to be an executable code block for now, but I think hms will need to be included in Suggests at some point. (#919)


In some cases, there's no conversion possible. For example, one cannot convert
a `geos` geometry to any supported data type. In this case, it will raise an
error:

```r
> geos::as_geos_geometry("LINESTRING (0 1, 3 9)")
<geos_geometry[1]>
[1] <LINESTRING (0 1, 3 9)>

> pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))
Error: Execution halted with the following contexts
0: In R: in $DataFrame():
0: During function call [pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))]
1: When constructing polars literal from Robj
2: Encountered the following error in Rust-Polars:
expected Series
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. (Maybe with tryCatch)


Loading