Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster conversion from all codes (not country names) #342

Merged

Conversation

etiennebacher
Copy link
Contributor

Simple speedup: only apply toupper() on unique strings and then match the original vector on these. Not crucial but it saves a couple of seconds every time I use countrycode() in this direction. It uses a bit more memory though so whatever you prefer.

library(bench)

out <- cross::run(
  pkgs = c("vincentarelbundock/countrycode", "etiennebacher/countrycode@speedup-iso"),
  ~{
    library(countrycode)
    
    test <- data.frame(
      grp1 = sample(codelist$iso3c, 1e7, TRUE),
      grp2 = sample(codelist$cowc, 1e7, TRUE),
      grp3 = sample(codelist$eurostat, 1e7, TRUE)
    )
    
    bench::mark(
      countrycode(test$grp1, "iso3c", "country.name"),
      countrycode(test$grp2, "cowc", "country.name"),
      countrycode(test$grp3, "eurostat", "country.name"),
      iterations = 10,
      check = FALSE
    )
  }
)

tidyr::unnest(out, result) |>
  dplyr::select(pkg, expression, median, mem_alloc) |>
  dplyr::mutate(pkg = ifelse(grepl("vincent", pkg), "main", "fork")) |> 
  dplyr::arrange(expression, desc(pkg))
#> # A tibble: 6 × 4
#>   pkg   expression                                              median mem_alloc
#>   <chr> <bch:expr>                                            <bch:tm> <bch:byt>
#> 1 main  "countrycode(test$grp1, \"iso3c\", \"country.name\")"    3.34s  738.81MB
#> 2 fork  "countrycode(test$grp1, \"iso3c\", \"country.name\")" 991.98ms 1019.44MB
#> 3 main  "countrycode(test$grp2, \"cowc\", \"country.name\")"     3.03s  772.15MB
#> 4 fork  "countrycode(test$grp2, \"cowc\", \"country.name\")"  989.79ms    1.03GB
#> 5 main  "countrycode(test$grp3, \"eurostat\", \"country.name…    3.38s  739.88MB
#> 6 fork  "countrycode(test$grp3, \"eurostat\", \"country.name… 832.43ms 1020.47MB

@cjyetman cjyetman self-requested a review September 7, 2023 11:43
R/countrycode.R Outdated Show resolved Hide resolved
Co-authored-by: CJ Yetman <[email protected]>
@vincentarelbundock vincentarelbundock merged commit 848e9ce into vincentarelbundock:main Sep 8, 2023
6 checks passed
@vincentarelbundock
Copy link
Owner

this is really great, thanks!

@etiennebacher etiennebacher deleted the speedup-iso branch September 8, 2023 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants