Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: return_inferencedata=True causes sample_smc to randomly fail #7192

Open
tmolteno opened this issue Mar 13, 2024 · 3 comments
Open

BUG: return_inferencedata=True causes sample_smc to randomly fail #7192

tmolteno opened this issue Mar 13, 2024 · 3 comments
Labels

Comments

@tmolteno
Copy link

Describe the issue:

Repeated calls to pm.sample_smc() will sometimes generate log_marginal_likelihood structures with wrong dimensions. This applies to both MH and IMH kernels. Here are two outputs from the code below

5.10.4
Initializing SMC sampler...
Sampling 6 chains in 4 jobs
     Log Marginal Likelihood: <xarray.DataArray 'log_marginal_likelihood' (chain: 1, draw: 6)> Size: 48B100.00% [100/100 00:00<?  Stage: 10 Beta: 1.000]
array([[list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -21.44663442876296]),
        list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -21.26028996011558]),
        list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -19.154901913607482]),
        list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -21.01140467150669]),
        list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -20.895884505503734]),
        list([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -20.10582198257819])]],
      dtype=object)
Coordinates:
  * chain    (chain) int64 8B 0
  * draw     (draw) int64 48B 0 1 2 3 4 5
Initializing SMC sampler...
Sampling 6 chains in 4 jobs
     Log Marginal Likelihood: <xarray.DataArray 'log_marginal_likelihood' (chain: 6, draw: 12)> Size: 576B0.00% [100/100 00:00<?  Stage: 11 Beta: 1.000]
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -19.997437506595553],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -21.54992021005074],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -19.7916970741079],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -20.961871906904662],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -20.841782883092847],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,        -19.667591157790675]], dtype=object)
Coordinates:
  * chain    (chain) int64 48B 0 1 2 3 4 5
  * draw     (draw) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
Evidence:  -20.468383456423727

The first log_marginal_likelihood results show the problem. The xArray output is very confused -- the output is an array of lists, notice that the first list has too few elements and the co-ordinates are wrong (it reports 1 chain and 6 draws, when the sampling had 6 chains). The second example is correct, and the chain/draw coordinates are correct.

Reproduceable code example:

import numpy as np
import pymc as pm
import sympy as sp
import pytensor.tensor as T



def flux(nu, a, nu0):
    w = nu/nu0
    logw = T.log(w)

    logS = a[0]
    for i in range(1, len(a)):
        logS += a[i]*logw**i

    logS = T.maximum(logS, -500.0)

    return np.exp(logS)


def get_model(name, freq, mu, sigma, order, nu0):

    mumax = np.max(mu)

    means = [0.0 for i in range(0, order)]
    sigmas = [0.5 for i in range(0, order)]
    means[0] = np.log(mumax)
    sigmas[0] = 1.5
    sigmas[1] = 1

    with pm.Model() as _model:
        _a = [pm.Normal(f"a[{i}]", mu=means[i], sigma=sigmas[i]) for i in range(order)]

        _x = pm.MutableData('frequencies', freq)
        _brightness = flux(_x, _a, nu0)

        _likelihood = pm.StudentT("likelihood", nu=5, mu=_brightness,
                                  sigma=np.array(sigma), observed=np.array(mu))
    return _model


if __name__=="__main__":

    print(pm.__version__)
    original_data = np.array(
        [[0.408,  6.24, 0.312 ],
        [0.843, 13.65, 0.6825],
        [1.38 , 14.96, 0.748 ],
        [1.413, 14.87, 0.7435],
        [1.612, 14.47, 0.7235],
        [1.66 , 14.06, 0.703 ],
        [1.665, 14.21, 0.7105],
        [2.295, 11.95, 0.5975],
        [2.378, 11.75, 0.5875],
        [4.8  ,  5.81, 0.2905],
        [4.8  ,  5.76, 0.288 ],
        [4.835,  5.72, 0.286 ],
        [4.85 ,  5.74, 0.287 ],
        [8.415,  2.99, 0.1495],
        [8.42 ,  2.97, 0.1485],
        [8.64 ,  2.81, 0.1405],
        [8.64 ,  2.81, 0.1405]])

    freq_ghz, mu, sigma = original_data.T
    freq = freq_ghz*1e9
    nu0=1.0e9

    rng = np.random.default_rng(123) # 123 breaks

    model = get_model("J1939-6342", freq, mu, sigma, order=5, nu0=nu0)

    for i in range(2):
        try:
            with model:
                idata = pm.sample_smc(draws=2000, kernel=pm.smc.kernels.IMH,
                                        chains=6, threshold=0.6,
                                        correlation_threshold=0.01,
                                        random_seed=rng)

                lml_nans =  idata.sample_stats["log_marginal_likelihood"]
                print(f"Log Marginal Likelihood: {lml_nans}")

                lml = []
                for c_lml in lml_nans:
                    lml.append(c_lml[-1])

                evidence = np.array(lml).mean()
            print(f"Evidence:  {evidence}")
        except:
            pass

Error message:

No response

PyMC version information:

pymc 5.10.4, running on Debian trixie and bookworm. Pymc installed using pip into a venv.

Context for the issue:

It appears that the nan padding appears to be padding one too few 'nan's, and this messes up the xArray so that it is now ragged (different rows have different sizes). Perhaps related to Issue #5263.

Note that the xArray coordinates are completely incorrect when this bug happens.

Changing the random sample seed modifies this behaviour. The sample code above has a random seed chosen to cause this bug on the first call. Typically in my use case it is roughly 10 percent failure rate.

@tmolteno tmolteno added the bug label Mar 13, 2024
Copy link

welcome bot commented Mar 13, 2024

Welcome Banner]
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

@tmolteno
Copy link
Author

The code in error appears to be in the generation of inferenceData, if I modify to set return_inferencedata=False, i.e.,

                trace = pm.sample_smc(draws=2000, kernel=pm.smc.kernels.IMH,
                                        chains=6, threshold=0.6,
                                        correlation_threshold=0.01,
                                        random_seed=rng, return_inferencedata=False, progressbar=False)

                lml = trace.report.log_marginal_likelihood

Then the problem goes away.

@tmolteno tmolteno changed the title BUG: sample_smc randomly generates log_marginal_likelihood with incorrect structure BUG: return_inferencedata=True causes sample_smc to randomly fail Mar 14, 2024
@deu439
Copy link

deu439 commented Dec 31, 2024

This bug affects me as well. Debugging is quite tricky with return_inferencedata=False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants