`describe_posterior()` triggers model recompiling & sampling #997

mattansb · 2025-01-27T07:05:26Z

Discussed in easystats/bayestestR#693

^{Originally posted by JWiley January 27, 2025}
I am experiencing some behaviour that I think is unexpected, but perhaps I am missing something.
I fit some models in brms and save them. Later I read them back in and call describe_posterior(), mostly because I want to report 99% CIs which is not the default calculated in brms.

This is the code:

out$Summary <- bayestestR::describe_posterior(
  out$Results,
  centrality = c("mean", "median"), dispersion = TRUE,
  ci = CI, ci_method = CIType,
  test = c("p_direction"),
  diagnostic = c("Rhat", "ESS"),
  effects = "all", component = "all",
  parameters = "(^b_)|(^sd_)|(^cor_)|(^sds_)|(^bs_)")

The part that is unexpected to me is that this seems to trigger recompiling and resampling the model.
Which really slows it all down.

The desired updates require recompiling the model
Compiling Stan program...
Start sampling
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
[snip]
Chain 1 Iteration: 2800 / 3000 [ 93%]  (Sampling) 
Chain 4 finished in 686.1 seconds.
Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 3 finished in 704.9 seconds.
Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 2 finished in 706.4 seconds.
Chain 1 Iteration: 2900 / 3000 [ 96%]  (Sampling) 
Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 1 finished in 727.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 706.3 seconds.
Total execution time: 727.8 seconds.

It's fairly easy to side step this, given all I want that's not already in the model results I saved is the 99% CIs and maybe some of the probability of direction, but I just expected this call to describe_posterior() to be virtually "free" computationally.
As far as I know, I'm not trying to calculate bayes factors or anything where you'd need prior draws, which is the only thing I can think of that's not already drawn and saved in the model.

Any thoughts or insight would be welcome.

I'm not entirely sure what all is needed in the model to trigger this

if the outcome distribution is gaussian, the recompile and resampling is not needed
if there is no random slope even with negative binomial distribution the recompile and resampling is not needed

so my best guess is that it is something to do with a random slope + the negative binomial distribution --- it may also be to do with correlations amongst random effects

with most models, this behaviour is not exhibited, which I think is why I thought it may be unexpected, but it also may be expected and I don't understand what is happening or needs to happen for the posterior summaries that are calculated

Reproducible example:

## analysis related
library(brms)
library(cmdstanr)
library(bayestestR)

CIType <- "HDI"
CI <- 0.99

m <- brm(
  formula = bf(
    hp ~ mpg + (1 | am / cyl) + (1 + mpg | cyl),
    shape ~ 1 + (1 | cyl)
  ), data = mtcars,
  family = negbinomial(),
  save_pars = save_pars(group = TRUE, latent = FALSE, all = TRUE),
  init = "random",
  chains = 4, iter = 3000, warmup = 1000, thin = 1,
  cores = 4,
  normalize = TRUE, algorithm = "sampling",
  seed = 1234, backend = "cmdstanr"
)

bayestestR::describe_posterior(
  m,
  centrality = c("mean", "median"), dispersion = TRUE,
  ci = 0.99, ci_method = "HDI",
  test = c("p_direction"),
  diagnostic = c("Rhat", "ESS"),
  effects = "all", component = "all",
  parameters = "(^b_)|(^sd_)|(^cor_)|(^sds_)|(^bs_)"
)

The text was updated successfully, but these errors were encountered:

mattansb · 2025-01-27T07:13:45Z

This seems to happen when describe_posterior() calls rope_range(), which calls insight::get_sigma() only when the model is from a count distribution:

https://github.com/easystats/bayestestR/blob/fb746e32b652794df90554865374ef0935c5d2ff/R/rope_range.R#L166-L171

Reprex:

library(brms)

m <- brm(
  formula = bf(
    hp ~ mpg + (1 | am / cyl) + (1 + mpg | cyl),
    shape ~ 1 + (1 | cyl)
  ), data = mtcars,
  family = negbinomial(),
  save_pars = save_pars(group = TRUE, latent = FALSE, all = TRUE),
  init = "random",
  chains = 2, iter = 1000, warmup = 500, thin = 1,
  cores = 2,
  normalize = TRUE, algorithm = "sampling",
  seed = 1234
)

insight::get_sigma(m)
#> The desired updates require recompiling the model
#> Compiling Stan program...

mattansb · 2025-01-27T07:20:09Z

This is happening in insight::get_variance_residual() / insight::get_variance(), because it is refitting a completely new model with insight::null_model() over here (line 86):

insight/R/compute_variances.R

Lines 81 to 94 in 3ae8f48

    
           if (faminfo$is_linear && !faminfo$is_tweedie) { 
        
             # we don't need these for linear models 
        
             me_info_null <- NULL 
        
           } else { 
        
             if (is.null(model_null)) { 
        
               model_null <- .safe(null_model(model, verbose = FALSE)) 
        
             } 
        
             me_info_null <- .get_variance_information( 
        
               model_null, 
        
               faminfo = faminfo, 
        
               name_fun = name_fun, 
        
               verbose = verbose 
        
             ) 
        
           }

Refitting a model, especially an MCMC one, can be very costly. This should probably almost never happen without the user requesting this.

strengejacke · 2025-01-27T07:24:13Z

What would you suggest, returning NULL?

mattansb · 2025-01-27T07:36:35Z

I don't know enough about what's going on here, I just tracked this down. What is this function trying to do that it (sometimes?) needs an empty model?

JWiley · 2025-01-27T07:59:21Z

@mattansb why is rope range even getting called when: test = c("p_direction")? I would have thought that would suppress anything to do with ROPEs? Alternately for the short term, do you know of any way I can suppress rope_range getting called?

strengejacke · 2025-01-27T08:16:13Z

I don't know enough about what's going on here, I just tracked this down. What is this function trying to do that it (sometimes?) needs an empty model?

We define the rope range depending on 0.1 * sigma for some distributions. If we can't extract sigma via stats::sigma(), and we have a mixed model, we call insight::get_variance(), which compiles the null-model.

If sigma is not available, we fall back to 1. We could do this without calling get_variance() first.

strengejacke · 2025-01-27T08:19:07Z

For that model:

sigma(m)
#> numeric(0)

mattansb · 2025-01-27T09:26:13Z

@JWiley it shouldn't. See potential fix in easystats/bayestestR#695

mattansb · 2025-01-27T09:47:46Z

@strengejacke why does insight::get_variance() compile the null-model?

JWiley · 2025-01-27T22:49:23Z

I'm not particularly up-to-date in this space, but last I read about estimating variances for GLMMs used for effect sizes, R2, and the like, you could define R2 something like:

$$ R^2 = \frac{var_{fixed}}{var_{fixed} + var_{random} + someterminvolving~\lambda} $$

I don't recall exact equation around the term with lambda, some ratios I think. Anyways,

$$ E(\lambda) = exp(\beta_0 + var_{random}) $$

and $\beta_0$ coming from the null model --- otherwise I think maybe you can do some stuff if all predictors are centered etc. but I guess that gets harder to know / guarantee and ensuring that it's not changing based on the model but based on the data I seem to recall was some issue and so the recommendation was base it off the intercept only model.

There's probably some errors in what I've just said, but that is the gist that I recall, so that when trying to get effect sizes for models like $R^2$ with negative binomial and poisson models, an intercept only model was needed in some part of that calculation.

What I'm guessing on is that since $R^2$ relate to various variance estimates and ratios, that the same may apply when trying to get some variance estimate from a negative binomial model and that's why insight is estimating a null model.

On the plus side, that explain why it only took 10 minutes, not 6 hours like my full model when it did recompile and run. I was very confused because I thought it was running the same model and as slow as it was, it was still way faster than my models.

If I'm on track with that, some thoughts from a user perspective that would be nice:

in describe_posterior() it should be possible to turn ROPE estimation to avoid this whole path
in insight::get_variance() if it is going to automatically compile and sample from a null model, add a short note, at least for MCMC models. Something like "getting the variance for this model requires estimating an intercept-only model"

strengejacke · 2025-01-28T07:09:54Z

The easy solution is to add no_recursion = TRUE to the call get_sigma(), this will not trigger calling get_variance(). That argument is not documented, it's only used internally for now.

And @JWiley is right with his explanation why we need the null model. It's required in the sub-function .variance_distributional().

strengejacke · 2025-01-28T08:14:41Z

Can we close this, since it's fixed in bayestestR?

mattansb · 2025-01-28T09:05:23Z

Let's keep this open - we should think of a better option than re-fitting a null model - especially a Bayesian null-model, since "recycling" the priors from the full model to the null model might not make any sense.

mattansb transferred this issue from easystats/bayestestR Jan 27, 2025

mattansb mentioned this issue Jan 27, 2025

describe_posterior() only conditionally calls rope_range() when needed easystats/bayestestR#695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`describe_posterior()` triggers model recompiling & sampling #997

`describe_posterior()` triggers model recompiling & sampling #997

mattansb commented Jan 27, 2025

mattansb commented Jan 27, 2025

mattansb commented Jan 27, 2025

strengejacke commented Jan 27, 2025

mattansb commented Jan 27, 2025

JWiley commented Jan 27, 2025

strengejacke commented Jan 27, 2025

strengejacke commented Jan 27, 2025

mattansb commented Jan 27, 2025

mattansb commented Jan 27, 2025

JWiley commented Jan 27, 2025 •

edited

Loading

strengejacke commented Jan 28, 2025

strengejacke commented Jan 28, 2025

mattansb commented Jan 28, 2025

describe_posterior() triggers model recompiling & sampling #997

describe_posterior() triggers model recompiling & sampling #997

Comments

mattansb commented Jan 27, 2025

Discussed in easystats/bayestestR#693

mattansb commented Jan 27, 2025

mattansb commented Jan 27, 2025

strengejacke commented Jan 27, 2025

mattansb commented Jan 27, 2025

JWiley commented Jan 27, 2025

strengejacke commented Jan 27, 2025

strengejacke commented Jan 27, 2025

mattansb commented Jan 27, 2025

mattansb commented Jan 27, 2025

JWiley commented Jan 27, 2025 • edited Loading

strengejacke commented Jan 28, 2025

strengejacke commented Jan 28, 2025

mattansb commented Jan 28, 2025

`describe_posterior()` triggers model recompiling & sampling #997

`describe_posterior()` triggers model recompiling & sampling #997

JWiley commented Jan 27, 2025 •

edited

Loading