-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter.sf
order of magnitude slower compared to filter
#1889
Comments
PS I looked for issues documenting this but it does not seem to be there. I'm also not aware of any documentation on this. |
How and where would you suggest this to be documented? |
I have not tested but maybe #2059 resolves this? |
Isn't this currently fixed by #1938? It seems to be much better now:
|
Indeed, whereas with #2059 require(sf)
# Loading required package: sf
# Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
suppressPackageStartupMessages(require(dplyr))
#n <- 100000
n <- 30000
d <- data.frame(rr = factor(sample(size = n, c(NA, "a", "b"), replace = T, prob = c(.05, .45, .5))), xx = runif(n), yy = runif(n))
data <- d
b<-bench::mark(min_iterations = 5, check = FALSE,
data |> filter(!is.na(rr)) |> st_as_sf(
coords = c("xx", "yy"),
crs = st_crs(4326L), na.fail = FALSE
),
data |> st_as_sf(
coords = c("xx", "yy"),
crs = st_crs(4326L), na.fail = FALSE
) |> filter(!is.na(rr))
)
# Warning message:
# Some expressions had a GC in every iteration; so filtering is disabled.
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
b%>% select(median,mem_alloc,expression)
# # A tibble: 2 × 3
# median mem_alloc
# <bch:tm> <bch:byt>
# 1 4.16ms 5.27MB
# 2 368.49ms 7.7MB
# # … with 1 more variable: expression <bch:expr> something to chew on... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While doing some investigation in to the performance of my code I found that the order of
filter
andst_as_sf
makes an order of magnitude difference in the performance of code. Its not a bug in the sense that something does not work but it seems that this is maybe unnecessarily slow therefore I thought I would report any way. Most of the time seems to be spend in the functionst_sfc
on avapply
call. In this example case the solution to change the order is easy but that might not always be the case I'm sure not all users are aware of the dramatic difference.Created on 2022-01-20 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: