Why is the dirichlet distribution the prior for our beta's within the WeightedSumFitter class? #326
Replies: 1 comment 1 reply
-
Hi @jbordon619. Thanks for the question. I'll see what I can do to help and explain. If we consider the synthetic control model in general, we are basically modelling the synthetic control unit as a weighted sum of the control units. For this general model, there is no necessity to add a constraint on the sum of the weights. We could just use a
Ok, so now I think we've covered some motivation for why the sum of the weights should be constrained to 1. The next question is what kind of prior could we use in order to achieve this? The Dirichlet distribution is quite handy here because it's a multivariate distribution and the sum of any samples will always be 1. You can run something like this to see: >>> import pymc as pm
>>> d = pm.Dirichlet.dist([1., 1., 1., 1.])
>>> draws = pm.draw(d, draws=10)
>>> draws
array([[0.15981701, 0.10814745, 0.68524163, 0.04679391],
[0.38037676, 0.48804699, 0.06844366, 0.06313259],
[0.4849404 , 0.25824312, 0.16171177, 0.09510471],
[0.48553828, 0.25301148, 0.02597401, 0.23547624],
[0.2162953 , 0.46682933, 0.14262581, 0.17424957],
[0.06870385, 0.10662605, 0.36808074, 0.45658937],
[0.08015813, 0.25092438, 0.52085491, 0.14806258],
[0.69236722, 0.16837711, 0.13146731, 0.00778836],
[0.50637089, 0.31081917, 0.06371161, 0.11909833],
[0.52838634, 0.13247783, 0.22715454, 0.1119813 ]])
>>> draws.sum(axis=1)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) The Dirichlet distribution is also quite nice because the hyper parameters (the Right now, the Briefly on the conjugacy... MCMC methods don't require us to use conjugate distributions. However in PyMC there is a lot of cool work happening with automatic graph re-writing, so in the future there may be automatic detection of conjugate distributions which could allow for graph re-writes which could give significant computational speed-ups. But that specific point is probably best discussed on the pymc discourse for example. |
Beta Was this translation helpful? Give feedback.
-
I've been diving in to the code for causalpy to better understand what's going under the hood to maybe apply it elsewhere in the future. During my dive I found the dirichlet distribution.
I didn't know about this distribution at all previously but what I gathered is that it's similar to the beta distribution but can handle more options than just "success" and "failures" and it's pdf output is a vector that sums to one. From the outputs I've been getting from WeightedSumFitter I see that the beta values seem to add to one, is this correct?
If it is correct how do you feel about this constraint for when we set up our equation between the "control" geo's to predict the "test" geo?
(Sorry if this second part doesn't make sense I'm still struggling understanding some core concepts)
I also found that the dirichlet distribution is good for bayesian modeling for scaling purposes because of it's conjugacy. I thought conjugacy only mattered for non markov chain based bayesian modeling? Does conjugacy help our chains converge faster?
Beta Was this translation helpful? Give feedback.
All reactions