-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vignette to list some differences between R and Python API #1014
Merged
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
e3faf03
init
etiennebacher ded4c77
some progress
etiennebacher 70a3c87
minor [skip ci]
etiennebacher fc45613
more on objects and structs
etiennebacher 5e570a4
Merge branch 'main' into vignette-diff-python
etiennebacher 264606b
trailing whitespace
etiennebacher bb6eccd
Merge branch 'vignette-diff-python' of https://github.com/pola-rs/r-p…
etiennebacher 9e06edf
load polars
etiennebacher 84195c7
Merge branch 'main' into vignette-diff-python
etiennebacher 8d2ec98
update struct construction
etiennebacher dbea25d
minor
etiennebacher 97a9359
remove data.table code
etiennebacher a10e562
move Struct part to the docs of DataType_Struct
etiennebacher 0b20837
reprex formatting
etiennebacher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
--- | ||
title: "Differences with Python Polars" | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Differences with Python Polars} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>" | ||
) | ||
``` | ||
|
||
We try to mimic the Python Polars API as much as possible so that one can quickly | ||
switch and copy code between the two languages with as little adjustments to make | ||
as possible (most of the time switching `.` and `$` to chain methods). | ||
|
||
Still, there are a few places where the API diverges. This is often due to | ||
differences in the language itself. This vignette provides a list of those | ||
differences. | ||
|
||
|
||
## Converting data between Polars and R | ||
|
||
### From R to Polars | ||
|
||
The R package provides functions to create polars `DataFrame`, `LazyFrame`, and | ||
`Series`. Like most of the functions, those are designed to be close to their | ||
Python counterparts. | ||
|
||
Still, R users are more used to `as.*` or `as_*` functions to convert from or to | ||
other R objects. Therefore, in the documentation, we sometimes prefer using | ||
`as_polars_df(<data>)` rather than `pl$DataFrame(<data>)`. | ||
|
||
### From Polars to R | ||
|
||
While Python Polars has `to_pandas()`, we provide methods to convert Polars data | ||
to standard R objects, such as `$to_list()` or `$to_data_frame()`. However, the | ||
standard R user might find it more familiar to call `as.data.frame()`, `as.list()` | ||
or `as.vector()` on Polars structures. | ||
|
||
|
||
## 64-bit integers | ||
|
||
R doesn't natively support 64-bit integers (Int64) but this is a completely valid data | ||
type in Polars, which is based on the Arrow specification. This means that | ||
handling Int64 values in `polars` objects doesn't deviate from the Python | ||
setting. However, we need to implement some extra arguments when we want to pass | ||
data from Polars to R. | ||
|
||
In particular, all functions that convert some polars data to R (`as.data.frame()` | ||
and other methods such as `$to_list()`) have an argument `int64_conversion` which | ||
specifies how Int64 values should be handled. The default is to convert those | ||
Int64 to Float64, but it is also possible to convert them to character or to keep | ||
them as Int64 by using the package `bit64` under the hood. | ||
|
||
This option can be set globally using `options(polars.int64_conversion = "<value>")`. | ||
See `?polars_options()` for more details. | ||
|
||
|
||
## The `Object` data type | ||
|
||
`Object` is a data type for wrapping arbitrary Python objects. Therefore, it | ||
doesn't have an equivalent in R. | ||
|
||
When the user passes R objects with unsupported class to `polars`, it will first | ||
try to convert them to a supported data type. For example, so far the class `hms` | ||
from the eponymous package is not supported, so we try to convert it to a numeric | ||
class: | ||
|
||
```r | ||
> hms::hms(56, 34, 12) | ||
12:34:56 | ||
|
||
> pl$DataFrame(x = hms::hms(56, 34, 12)) | ||
shape: (1, 1) | ||
┌─────────┐ | ||
│ x │ | ||
│ --- │ | ||
│ f64 │ | ||
╞═════════╡ | ||
│ 45296.0 │ | ||
└─────────┘ | ||
``` | ||
|
||
In some cases, there's no conversion possible. For example, one cannot convert | ||
a `geos` geometry to any supported data type. In this case, it will raise an | ||
error: | ||
|
||
```r | ||
> geos::as_geos_geometry("LINESTRING (0 1, 3 9)") | ||
<geos_geometry[1]> | ||
[1] <LINESTRING (0 1, 3 9)> | ||
|
||
> pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)")) | ||
Error: Execution halted with the following contexts | ||
0: In R: in $DataFrame(): | ||
0: During function call [pl$DataFrame(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))] | ||
1: When constructing polars literal from Robj | ||
2: Encountered the following error in Rust-Polars: | ||
expected Series | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. (Maybe with |
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be in reprex style so that it can be copied and pasted and executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the reprex style but those chunks are not evaluated every time (that would require us to put
hms
andgeos
in Suggests)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course I don't think this needs to be an executable code block for now, but I think
hms
will need to be included in Suggests at some point. (#919)