allow parallel computation during bootstrapping #436

IndrajeetPatil · 2021-03-08T15:19:51Z

This requires adding a new parallel argument to model_parameters and then passing the value to boot calls:

For example, here we can add parallel = parallel inside the call:

Line 85 in d7fed24

    
           results <- boot::boot(data = data, statistic = boot_function, R = iterations, model = model)

We can also default to parallel = "multicore", so multiple cores - if available - are used by default.

The text was updated successfully, but these errors were encountered:

DominiqueMakowski · 2021-03-09T01:59:48Z

wouldn't it be better to let that be passed throug ellipsis to avoid cluttering the API? Or to retrieve it from the options (as stan does) ?

IndrajeetPatil · 2021-03-09T17:56:25Z

Can't create a reprex because parallel doesn't seem to work with it. But passing the dots works (PR: #439).

> set.seed(123)
> library(parameters)
> 
> mod <- lm(formula = wt ~ mpg, data = mtcars)
> 
> set.seed(123)
> system.time(model_parameters(mod, bootstrap = TRUE, iterations = 1000, parallel = "no")) 
   user  system elapsed 
  1.043   0.007   1.057 
> 
> set.seed(123)
> system.time(
+   model_parameters(
+     mod,
+     bootstrap = TRUE,
+     iterations = 1000,
+     parallel = "multicore",
+     ncpus = 4L
+   )
+ ) 
   user  system elapsed 
  0.078   0.056   0.613

strengejacke · 2021-03-09T18:08:31Z

"multicore" doesn't work on windows.

strengejacke · 2021-03-09T18:12:16Z

Using normal R, or Microsoft R Open doesn't seem to make a difference, increasing used CPUs even slows down:

library(parameters)
#> Warning: Paket 'parameters' wurde unter R Version 4.0.4 erstellt
model <- lm(mpg ~ wt + cyl, data = mtcars)

microbenchmark::microbenchmark(
  model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "snow", ncpus = 4),
  times = 5
)
#> Unit: seconds
#>                                                                                             expr
#>  model_parameters(model, bootstrap = TRUE, iterations = 1000,      parallel = "snow", ncpus = 4)
#>       min       lq    mean   median       uq      max neval
#>  2.146296 2.178574 2.18241 2.179772 2.200774 2.206634     5

microbenchmark::microbenchmark(
  model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "no", ncpus = 4),
  times = 5
)
#> Unit: seconds
#>                                                                                           expr
#>  model_parameters(model, bootstrap = TRUE, iterations = 1000,      parallel = "no", ncpus = 4)
#>       min      lq     mean   median       uq      max neval
#>  1.120941 1.12849 1.132289 1.128846 1.137772 1.145394     5

microbenchmark::microbenchmark(
  model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "multicore", ncpus = 4),
  times = 5
)
#> Unit: seconds
#>                                                                                                  expr
#>  model_parameters(model, bootstrap = TRUE, iterations = 1000,      parallel = "multicore", ncpus = 4)
#>       min      lq     mean   median      uq      max neval
#>  1.102907 1.10788 1.117547 1.114816 1.12571 1.136424     5

^{Created on 2021-03-09 by the reprex package (v1.0.0)}

IndrajeetPatil · 2021-03-10T08:59:38Z

Yeah, I am also seeing the same on my Mac that the computation time actually increases if I use parallel computing with ncpus set to some value > 1.

It's all a bit confusing. And this has nothing to do with parameters functions.

Here is an example from the boot package docs:

library(boot)
library(microbenchmark)

# usual bootstrap of the ratio of means using the city data
ratio <- function(d, w) sum(d$x * w) / sum(d$u * w)

set.seed(123)
microbenchmark::microbenchmark(
  boot(city, ratio, R = 4999, stype = "w"),
  times = 5
)
#> Unit: milliseconds
#>                                      expr      min       lq     mean   median
#>  boot(city, ratio, R = 4999, stype = "w") 30.76705 36.27656 39.59618 40.73334
#>        uq      max neval
#>  42.90163 47.30233     5

options(boot.parallel = "multicore")
set.seed(123)
microbenchmark::microbenchmark(
  boot(city, ratio, R = 4999, stype = "w", ncpus = 5),
  times = 5
)
#> Unit: milliseconds
#>                                                 expr      min       lq    mean
#>  boot(city, ratio, R = 4999, stype = "w", ncpus = 5) 44.64621 47.21875 51.9313
#>    median       uq     max neval
#>  48.56907 50.58117 68.6413     5

^{Created on 2021-03-10 by the reprex package (v1.0.0)}

I think we should stay away from making any changes to parameters until we figure out how to successfully use boot's parallel computation functionality.

strengejacke · 2021-03-10T09:01:31Z

Yes, sounds good.

IndrajeetPatil · 2021-07-04T08:43:24Z

@bwiernik Do you have any ideas about how to get this to work?

bwiernik · 2021-07-04T14:12:08Z

Yeah, I can take a look

vincentarelbundock · 2022-06-25T10:49:45Z

future is probably a better platform for cross-platform parallel computation: https://cran.r-project.org/web/packages/future/index.html

The examples in this thread are probably all too small (OLS with N=32), so the parallel overhead is heavier than the gains.

Perhaps one strategy would be for us to support extracting results from boot and other bootstrap objects. That way, users who want fancy features like parallel computation can use the existing support in the appropriate package, and we can extract and display the estimates.

bwiernik · 2022-06-25T12:30:22Z

One of the major benefits of parameters is that we provide a simple interface for bootstrapping that otherwise are really difficult for new users (learning to use the boot package is a nightmare). I agree that we should use future for parallelization, but I do think we should support it.

vincentarelbundock · 2022-06-25T12:32:01Z

You're right. boot is kind of a nightmare to learn.

IndrajeetPatil added the Enhancement 💥 Implemented features can be improved or revised label Mar 8, 2021

strengejacke added 3 investigators ❔❓ Need to look further into this issue Help us 👀 Help is needed to implement something Low priority 😴 This issue does not impact package functionality much labels Mar 10, 2021

IndrajeetPatil mentioned this issue Mar 10, 2021

stylistic changes #439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow parallel computation during bootstrapping #436

allow parallel computation during bootstrapping #436

IndrajeetPatil commented Mar 8, 2021

DominiqueMakowski commented Mar 9, 2021

IndrajeetPatil commented Mar 9, 2021

strengejacke commented Mar 9, 2021

strengejacke commented Mar 9, 2021

IndrajeetPatil commented Mar 10, 2021

strengejacke commented Mar 10, 2021

IndrajeetPatil commented Jul 4, 2021

bwiernik commented Jul 4, 2021

vincentarelbundock commented Jun 25, 2022

bwiernik commented Jun 25, 2022

vincentarelbundock commented Jun 25, 2022

allow parallel computation during bootstrapping #436

allow parallel computation during bootstrapping #436

Comments

IndrajeetPatil commented Mar 8, 2021

DominiqueMakowski commented Mar 9, 2021

IndrajeetPatil commented Mar 9, 2021

strengejacke commented Mar 9, 2021

strengejacke commented Mar 9, 2021

IndrajeetPatil commented Mar 10, 2021

strengejacke commented Mar 10, 2021

IndrajeetPatil commented Jul 4, 2021

bwiernik commented Jul 4, 2021

vincentarelbundock commented Jun 25, 2022

bwiernik commented Jun 25, 2022

vincentarelbundock commented Jun 25, 2022