Check for observed variables in the trace #7641

zaxtax · 2025-01-12T16:38:13Z

Description

This introduces the enhancement discussed in #7225

This makes it easier to use sample_posterior_predictive in model factory workflows

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7641.org.readthedocs.build/en/7641/

codecov · 2025-01-12T20:51:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.78%. Comparing base (e6767ab) to head (81eef66).
Report is 5 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #7641   +/-   ##
=======================================
  Coverage   92.77%   92.78%           
=======================================
  Files         107      107           
  Lines       18178    18185    +7     
=======================================
+ Hits        16865    16873    +8     
+ Misses       1313     1312    -1

Files with missing lines	Coverage Δ
pymc/sampling/forward.py	`96.26% <100.00%> (+0.11%)`	⬆️

... and 1 file with indirect coverage changes

ricardoV94 · 2025-01-14T15:57:21Z

tests/sampling/test_forward.py

+        # test that trace is used in ppc
+        with pm.Model() as model_ppc:
+            mu = pm.Normal("mu", 0.0, 1.0)
+            a = pm.Normal("a", mu=mu, sigma=1)
+
+        ppc = pm.sample_posterior_predictive(
+            trace=trace, model=model_ppc, return_inferencedata=False
+        )
+        assert "a" in ppc


Can you put this in its own test?

I would also make the test more stringent. Test that only the variables that you want are actually included in the trace. Also add a case where one node is conditionally dependent on the trace.observed_data so that you see that the auto added variables include conditional nodes.

ricardoV94 · 2025-01-14T15:59:57Z

pymc/sampling/forward.py

@@ -817,6 +819,8 @@ def sample_posterior_predictive(
        vars_ = [model[x] for x in var_names]
    else:
        vars_ = model.observed_RVs + observed_dependent_deterministics(model)
+        if observed_data is not None:


BTW, the the observed_dependent_deterministics above is not going to work if these variables are not observed in the model.

That happens with auto-imputation models, which I assume the as_model wrapper won't handle correctly either because the models are different depending on whether you pass data or not.

Just something to keep in mind

I agree with this. You'll have to adapt observed_dependent_deterministics to also accept a list of extra variables that will depend on your observed_data

zaxtax · 2025-01-14T16:11:58Z

Can you give an example of an auto-imputation model?

…

On Tue, 14 Jan 2025, 17:01 Ricardo Vieira, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tests/sampling/test_forward.py <#7641 (comment)>: > + # test that trace is used in ppc + with pm.Model() as model_ppc: + mu = pm.Normal("mu", 0.0, 1.0) + a = pm.Normal("a", mu=mu, sigma=1) + + ppc = pm.sample_posterior_predictive( + trace=trace, model=model_ppc, return_inferencedata=False + ) + assert "a" in ppc Can you put this in its own test? ------------------------------ In pymc/sampling/forward.py <#7641 (comment)>: > @@ -817,6 +819,8 @@ def sample_posterior_predictive( vars_ = [model[x] for x in var_names] else: vars_ = model.observed_RVs + observed_dependent_deterministics(model) + if observed_data is not None: BTW, the the observed_dependent_deterministics above is not going to work if these variables are not observed in the model. That happens with auto-imputation models, which I assume the as_model wrapper won't handle correctly either because the models are different depending on whether you pass data or not. Just something to keep in mind — Reply to this email directly, view it on GitHub <#7641 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACCULTFSIDHCNH5OMFAVT2KUYCHAVCNFSM6AAAAABVBF3DQWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNJQGE4TQNJWGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

lucianopaz

I second Ricardo with his comments. Just a few changes and I think this will be ok to merge.

lucianopaz · 2025-01-14T16:21:46Z

pymc/sampling/forward.py

@@ -817,6 +819,8 @@ def sample_posterior_predictive(
        vars_ = [model[x] for x in var_names]
    else:
        vars_ = model.observed_RVs + observed_dependent_deterministics(model)
+        if observed_data is not None:


I agree with this. You'll have to adapt observed_dependent_deterministics to also accept a list of extra variables that will depend on your observed_data

lucianopaz · 2025-01-14T16:22:53Z

pymc/sampling/forward.py

@@ -817,6 +819,8 @@ def sample_posterior_predictive(
        vars_ = [model[x] for x in var_names]
    else:
        vars_ = model.observed_RVs + observed_dependent_deterministics(model)
+        if observed_data is not None:
+            vars_ += [model[x] for x in observed_data if x in model]


maybe also add and if x not in vars_ to the list comprehension.

lucianopaz · 2025-01-14T16:25:27Z

tests/sampling/test_forward.py

+        # test that trace is used in ppc
+        with pm.Model() as model_ppc:
+            mu = pm.Normal("mu", 0.0, 1.0)
+            a = pm.Normal("a", mu=mu, sigma=1)
+
+        ppc = pm.sample_posterior_predictive(
+            trace=trace, model=model_ppc, return_inferencedata=False
+        )
+        assert "a" in ppc


I would also make the test more stringent. Test that only the variables that you want are actually included in the trace. Also add a case where one node is conditionally dependent on the trace.observed_data so that you see that the auto added variables include conditional nodes.

zaxtax · 2025-01-14T16:59:33Z

Thanks! Can you give an example model where a node is conditionally dependent?

…

On Tue, 14 Jan 2025, 17:27 Luciano Paz, ***@***.***> wrote: ***@***.**** requested changes on this pull request. I second Ricardo with his comments. Just a few changes and I think this will be ok to merge. ------------------------------ In pymc/sampling/forward.py <#7641 (comment)>: > @@ -817,6 +819,8 @@ def sample_posterior_predictive( vars_ = [model[x] for x in var_names] else: vars_ = model.observed_RVs + observed_dependent_deterministics(model) + if observed_data is not None: + vars_ += [model[x] for x in observed_data if x in model] maybe also add and if x not in vars_ to the list comprehension. ------------------------------ In tests/sampling/test_forward.py <#7641 (comment)>: > + # test that trace is used in ppc + with pm.Model() as model_ppc: + mu = pm.Normal("mu", 0.0, 1.0) + a = pm.Normal("a", mu=mu, sigma=1) + + ppc = pm.sample_posterior_predictive( + trace=trace, model=model_ppc, return_inferencedata=False + ) + assert "a" in ppc I would also make the test more stringent. Test that only the variables that you want are actually included in the trace. Also add a case where one node is conditionally dependent on the trace.observed_data so that you see that the auto added variables include conditional nodes. ------------------------------ In pymc/sampling/forward.py <#7641 (comment)>: > @@ -817,6 +819,8 @@ def sample_posterior_predictive( vars_ = [model[x] for x in var_names] else: vars_ = model.observed_RVs + observed_dependent_deterministics(model) + if observed_data is not None: I agree with this. You'll have to adapt observed_dependent_deterministics to also accept a list of extra variables that will depend on your observed_data — Reply to this email directly, view it on GitHub <#7641 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACCUN4ZJSE2S2DWG5FJQL2KU3GBAVCNFSM6AAAAABVBF3DQWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNJQGMYDENBTGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ricardoV94 · 2025-01-14T17:24:02Z

That helper was created specifically for the deterministic created by automatic imputation that joins the observed and unobserved components, so I'm not too worried about it.

But if you wanted just add a deterministic that's like y + 1, where y is the variable that was observed during sampling

zaxtax · 2025-01-19T17:00:01Z

It's not fully clear the best way to make sure the correct conditional nodes are added when the model that produced the trace isn't readily available. So I have written a test we expect to fail in the meanwhile.

ricardoV94 · 2025-01-20T09:40:54Z

pymc/sampling/forward.py

@@ -821,6 +824,7 @@ def sample_posterior_predictive(
        vars_ = model.observed_RVs + observed_dependent_deterministics(model)
        if observed_data is not None:
            vars_ += [model[x] for x in observed_data if x in model and x not in vars_]
+            vars_ += observed_dependent_deterministics(model, vars_)


This is going to duplicate deterministics of an observed variable, in case there's a mix of observed model variables and implied observed variables from the idata. the + observed_dependent_determininstics) should be called only once after the if branch?

zaxtax · 2025-01-20T11:27:02Z

@lucianopaz does this address your concerns?

ricardoV94 · 2025-01-20T15:26:40Z

tests/sampling/test_forward.py

@@ -540,6 +540,50 @@ def test_normal_scalar_idata(self):
            ppc = pm.sample_posterior_predictive(idata, return_inferencedata=False)
            assert ppc["a"].shape == (nchains, ndraws)

+    def test_external_trace(self):


Remove this test? The one with det is strictly more comprehensive than this?

tests/sampling/test_forward.py

zaxtax requested a review from ricardoV94 January 12, 2025 16:38

zaxtax force-pushed the pull_observes_from_idata_in_pps branch from ca7c8f8 to 567fb2a Compare January 12, 2025 16:47

zaxtax force-pushed the pull_observes_from_idata_in_pps branch from 567fb2a to 662595b Compare January 12, 2025 21:22

ricardoV94 reviewed Jan 14, 2025

View reviewed changes

ricardoV94 added maintenance samplers labels Jan 14, 2025

ricardoV94 changed the title ~~Check for observed variables in the trace as well as the model~~ Check for observed variables in the trace Jan 14, 2025

lucianopaz requested changes Jan 14, 2025

View reviewed changes

zaxtax added 2 commits January 19, 2025 17:47

Check for observed variables in the trace as well as the model

a592208

Bugfix

e33e517

zaxtax force-pushed the pull_observes_from_idata_in_pps branch from f24a55c to 8dc0945 Compare January 19, 2025 16:47

Update tests

e895a5c

zaxtax force-pushed the pull_observes_from_idata_in_pps branch from a217113 to e895a5c Compare January 19, 2025 16:52

Add logic to handle conditional nodes for observed variables

d813da1

ricardoV94 reviewed Jan 20, 2025

View reviewed changes

tests/sampling/test_forward.py Outdated Show resolved Hide resolved

Remove redundant test

2fcf395

zaxtax force-pushed the pull_observes_from_idata_in_pps branch from 4a0aa08 to 2fcf395 Compare January 20, 2025 16:02

ricardoV94 approved these changes Jan 20, 2025

View reviewed changes

ricardoV94 merged commit 892c37a into pymc-devs:main Jan 20, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for observed variables in the trace #7641

Check for observed variables in the trace #7641

zaxtax commented Jan 12, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jan 12, 2025 •

edited

Loading

ricardoV94 Jan 14, 2025

lucianopaz Jan 14, 2025

ricardoV94 Jan 14, 2025

lucianopaz Jan 14, 2025

zaxtax commented Jan 14, 2025 via email

lucianopaz left a comment

lucianopaz Jan 14, 2025

lucianopaz Jan 14, 2025

lucianopaz Jan 14, 2025

zaxtax commented Jan 14, 2025 via email

ricardoV94 commented Jan 14, 2025

zaxtax commented Jan 19, 2025

ricardoV94 Jan 20, 2025 •

edited

Loading

zaxtax commented Jan 20, 2025

ricardoV94 Jan 20, 2025

Check for observed variables in the trace #7641

Check for observed variables in the trace #7641

Conversation

zaxtax commented Jan 12, 2025 • edited by github-actions bot Loading

Description

Checklist

Type of change

codecov bot commented Jan 12, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaxtax commented Jan 14, 2025 via email

lucianopaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaxtax commented Jan 14, 2025 via email

ricardoV94 commented Jan 14, 2025

zaxtax commented Jan 19, 2025

ricardoV94 Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

zaxtax commented Jan 20, 2025

Choose a reason for hiding this comment

zaxtax commented Jan 12, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jan 12, 2025 •

edited

Loading

ricardoV94 Jan 20, 2025 •

edited

Loading