-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get model results data #418
Comments
Hi. Yes this is definitely possible. You can access the raw MCMC samples in result.idata. Sounds like you'll want the posterior predictive samples. You can then calculate the mean and use arviz to calculate the HDI's then export to a csv for example. Is that enough to go on, or are you requesting this as a utility function? |
Thank you for the answer. I'm sorry for asking basic questions, I'm not very experienced in programming. From what I understand, the fitted model data is in posterior_predictive.y_hat? But then I get all the MCMC samples? The one that is plotted by default is the last one? What about the credible interval that is also plotted? How can I access it? Essentially I'm trying to recreate the first plot that appears in the default result.plot method, which is the only one I need, but I need to change the style for a specific one, which is why i'm trying to get the data. |
No problem. I'll consider this a feature request and see if I (or someone else) can have a go at implementing it. |
Thank you very much. That would be greatly helpful |
Hello there! I have been lurking around here for some time now and I feel this could be my time to shine! I would very much like to contribute and, if I'm not mistaken, this could be an easy enough task to tackle as a first one. In any case, I'll try and think about the proper way to do so but please let me know should you have any suggestion @drbenvincent. Thank you! |
Sounds great @lpoug I think it would be really simple to do on an experiment by experiment basic. But it's worth thinking if we can come up with a generic method that would work across all experiments. If it's doable we should try to do that. What do you think? Happy to discuss more, but thought I'd reply quickly to keep the ball rolling. |
Hey @drbenvincent! I have given some thoughts about this, here they are. I'm wondering if there is a use for such a functionality for all experiments. I can clearly see it for the prepostfit experiments as I have encountered the same issue when doing synthetic control (i.e. to reproduce the graphs in the Making Inference part of this chapter of Causal Inference for the Brave and True. I'm not so sure about the other ones and, to be honest, my knowledge of the other experiments is not as sharp as the one for SC. With that being said:
Hope this was clear enough, let me know what you think! In the mean time, I'll try to dive deeper into the other experiments. |
Thanks @lpoug And perhaps right now we could just create a utility function (that lives in from causalpy import export_plot_data
export_plot_data(result, target_filepath) I guess we can add some checks on the inputs being provided. Looks like all the data needed for reproducing those plots could be saved in a csv file. The only exception would be the intervention date. So it could be worth considering exporting a yaml file which can do nested data and lists etc. But totally happy to leave those kind of details to you - that's just a thought I had. |
Or, rather than saving, the API could be intervention_date, plot_data = export_plot_data(result) Maybe that's a simpler place to start and addresses @Guilherme-dL 's original request. |
Thanks for the suggestions @drbenvincent! I have done all the steps in the CONTRIBUTING.md to set up my local environment without any trouble, which is already a success for me 😄 Some thoughts on your suggestions: With respect to the intervention date, I think that in most cases it already exists in a variable within the notebook or script (and if not, it is quite easily accessible in Finally, regarding the location of the function, tbh I'm happy to follow your suggestion at this point. That being said, I'll start working on it to see if my opinion evolves. |
Sounds good. All I'd say is that the function name shouldn't start with |
Hi @drbenvincent! You can find below what I have done so far. I do have a couple more questions before opening a PR if that's ok with you. In fact, I have tried implementing the function in
def get_prepostfit_data(result) -> pd.DataFrame:
"""
Utility function to recover the data of a PrePostFit experiment along with the prediction and causal impact information.
:param result:
The result of a PrePostFit experiment
"""
from causalpy.experiments.prepostfit import PrePostFit
from causalpy.pymc_models import PyMCModel
if isinstance(result, PrePostFit):
pre_data = result.datapre.copy()
post_data = result.datapost.copy()
if isinstance(result.model, PyMCModel):
pre_data["prediction"] = (
az.extract(
result.pre_pred, group="posterior_predictive", var_names="mu"
)
.mean("sample")
.values
)
post_data["prediction"] = (
az.extract(
result.post_pred, group="posterior_predictive", var_names="mu"
)
.mean("sample")
.values
)
pre_data["impact"] = result.pre_impact.mean(dim=["chain", "draw"]).values
post_data["impact"] = result.post_impact.mean(dim=["chain", "draw"]).values
elif isinstance(result.model, RegressorMixin):
pre_data["prediction"] = result.pre_pred
post_data["prediction"] = result.post_pred
pre_data["impact"] = result.pre_impact
post_data["impact"] = result.post_impact
else:
raise ValueError("Other model types are not supported")
ppf_data = pd.concat([pre_data, post_data])
else:
raise ValueError("Other experiments are not supported")
return ppf_data
def get_plot_data(self) -> pd.DataFrame:
"""Recover the data of a PrePostFit experiment along with the prediction and causal impact information.
Internally, this function dispatches to either `get_plot_data_bayesian` or `get_plot_data_ols`
depending on the model type.
"""
if isinstance(self.model, PyMCModel):
return self.get_plot_data_bayesian()
elif isinstance(self.model, RegressorMixin):
return self.get_plot_data_ols()
else:
raise ValueError("Unsupported model type")
def get_plot_data_bayesian(self) -> pd.DataFrame:
"""
Recover the data of a PrePostFit experiment along with the prediction and causal impact information.
"""
if isinstance(self.model, PyMCModel):
pre_data = self.datapre.copy()
post_data = self.datapost.copy()
pre_data["prediction"] = (
az.extract(
self.pre_pred, group="posterior_predictive", var_names="mu"
)
.mean("sample")
.values
)
post_data["prediction"] = (
az.extract(
self.post_pred, group="posterior_predictive", var_names="mu"
)
.mean("sample")
.values
)
pre_data["impact"] = self.pre_impact.mean(dim=["chain", "draw"]).values
post_data["impact"] = self.post_impact.mean(dim=["chain", "draw"]).values
self.data_plot = pd.concat([pre_data, post_data])
return self.data_plot
else:
raise ValueError("Unsupported model type")
def get_plot_data_ols(self) -> pd.DataFrame:
"""
Recover the data of a PrePostFit experiment along with the prediction and causal impact information.
"""
pre_data = self.datapre.copy()
post_data = self.datapost.copy()
pre_data["prediction"] = self.pre_pred
post_data["prediction"] = self.post_pred
pre_data["impact"] = self.pre_impact
post_data["impact"] = self.post_impact
self.data_plot = pd.concat([pre_data, post_data])
return self.data_plot I think there is a point to be made to have the function within the Let me know what you think when you have some time 😃 |
Hi @lpoug I think having separate The I think I agree, having |
Hey all, I wanted to check back in on this - this feels like a critically important feature of this package that shouldn't be buried in the PyMCModel object. Mostly just wanted to +1 the development work here and look forward to this being implemented. |
Hey @merubhanot, I appreciate you taking the time to reply on this. End of year has been a bit heavy on my end professionally wise so I has to put that work aside for a bit. That being said, I'll get back into it in the following days! |
Is there a way to access the synthetic control fitted values as a pandas series or list? I need to create my own custom plots in a specific style, so I need access to the data for the fitted values and the confidence interval.
The text was updated successfully, but these errors were encountered: