-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow parallel computation during bootstrapping #436
Comments
wouldn't it be better to let that be passed throug ellipsis to avoid cluttering the API? Or to retrieve it from the options (as stan does) ? |
Can't create a reprex because parallel doesn't seem to work with it. But passing the dots works (PR: #439). > set.seed(123)
> library(parameters)
>
> mod <- lm(formula = wt ~ mpg, data = mtcars)
>
> set.seed(123)
> system.time(model_parameters(mod, bootstrap = TRUE, iterations = 1000, parallel = "no"))
user system elapsed
1.043 0.007 1.057
>
> set.seed(123)
> system.time(
+ model_parameters(
+ mod,
+ bootstrap = TRUE,
+ iterations = 1000,
+ parallel = "multicore",
+ ncpus = 4L
+ )
+ )
user system elapsed
0.078 0.056 0.613 |
"multicore" doesn't work on windows. |
Using normal R, or Microsoft R Open doesn't seem to make a difference, increasing used CPUs even slows down: library(parameters)
#> Warning: Paket 'parameters' wurde unter R Version 4.0.4 erstellt
model <- lm(mpg ~ wt + cyl, data = mtcars)
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "snow", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "snow", ncpus = 4)
#> min lq mean median uq max neval
#> 2.146296 2.178574 2.18241 2.179772 2.200774 2.206634 5
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "no", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "no", ncpus = 4)
#> min lq mean median uq max neval
#> 1.120941 1.12849 1.132289 1.128846 1.137772 1.145394 5
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "multicore", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "multicore", ncpus = 4)
#> min lq mean median uq max neval
#> 1.102907 1.10788 1.117547 1.114816 1.12571 1.136424 5 Created on 2021-03-09 by the reprex package (v1.0.0) |
Yeah, I am also seeing the same on my Mac that the computation time actually increases if I use parallel computing with It's all a bit confusing. And this has nothing to do with Here is an example from the library(boot)
library(microbenchmark)
# usual bootstrap of the ratio of means using the city data
ratio <- function(d, w) sum(d$x * w) / sum(d$u * w)
set.seed(123)
microbenchmark::microbenchmark(
boot(city, ratio, R = 4999, stype = "w"),
times = 5
)
#> Unit: milliseconds
#> expr min lq mean median
#> boot(city, ratio, R = 4999, stype = "w") 30.76705 36.27656 39.59618 40.73334
#> uq max neval
#> 42.90163 47.30233 5
options(boot.parallel = "multicore")
set.seed(123)
microbenchmark::microbenchmark(
boot(city, ratio, R = 4999, stype = "w", ncpus = 5),
times = 5
)
#> Unit: milliseconds
#> expr min lq mean
#> boot(city, ratio, R = 4999, stype = "w", ncpus = 5) 44.64621 47.21875 51.9313
#> median uq max neval
#> 48.56907 50.58117 68.6413 5 Created on 2021-03-10 by the reprex package (v1.0.0) I think we should stay away from making any changes to |
Yes, sounds good. |
@bwiernik Do you have any ideas about how to get this to work? |
Yeah, I can take a look |
The examples in this thread are probably all too small (OLS with N=32), so the parallel overhead is heavier than the gains. Perhaps one strategy would be for us to support extracting results from |
One of the major benefits of parameters is that we provide a simple interface for bootstrapping that otherwise are really difficult for new users (learning to use the boot package is a nightmare). I agree that we should use future for parallelization, but I do think we should support it. |
You're right. |
This requires adding a new
parallel
argument tomodel_parameters
and then passing the value toboot
calls:For example, here we can add
parallel = parallel
inside the call:parameters/R/bootstrap_model.R
Line 85 in d7fed24
We can also default to
parallel = "multicore"
, so multiple cores - if available - are used by default.The text was updated successfully, but these errors were encountered: