You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm one of the reviewers for the JOSS paper you submitted, so here I'll list my questions and concerns about the documentation. This issue will be updated as my reading progresses so maybe don't start answering right away.
Home page
You can also call the low-level inference code (e.g. hmm_smoother()) directly, without first having to construct the HMM object.
This sentence was unclear to me: I still have to pass some model parameters to the smoother, right?
HMMs
Casino HMM: Inference
Overall really good and clear!
Now that we’ve initialize the model parameters, we can use them to draw samples from the HMM.
Typo (initializeD).
Casino HMM: Learning
First we construct an HMM and sample data from it, just as in the preceding notebook.
How would you handle data with several trajectories of varying lengths? Do you have to pad them into a 3D tensor, and then apply some kind of mask?
Perhaps the simplest learning algorithm is to directly maximize the marginal probability with gradient ascent.
How do you perform gradient ascent on the stochastic matrix $A$? Some kind of projection step? It seems far from obvious.
Also the example errors:
OverflowError: Python int too large to convert to C long
Finally, let’s compare the learning curve of EM to those of SGD.
The notion of "epoch" doesn't seem standard for EM, do you take it to mean one E step + one M step?
Not only does EM converge much faster on this example (here, in only a handful of iterations), it also converges to a better estimate of the parameters.
It would be interesting to explain why the asymptotic GD estimate is worse. Theoretically the finite size of the training data should also affect the EM algorithm, both are doing ERM on the loglikelihood.
print_params(em_params)
This example errors.
Gaussian HMM: Cross-validation and model selection
As in the preceding notebooks, we start by sampling data from the model. Here, we add a slight wrinkle: we will sample training and test data, where the latter is only used for model selection.
This example errors:
AttributeError: `np.issctype` was removed in the NumPy 2.0 release. Use `issubclass(rep, np.generic)` instead.
Also, in the "True HMM emission distribution" plot, it may not be obvious what the black lines stand for (transitions I assume).
Now fit a model to all the training data using the chosen number of states
It would be nice to add a comment explaining why the EM log prob can end up above the one of the true model.
Can you clarify what these plots represent? Adding titles / color legends would also help.
Sample emissions from the ARHMM
This example errors and the plot below (while a very pretty starfish) should also be explained.
The dotted lines represent the stationary point of the the corresponding AR state while the solid lines are the actual observations sampled from the HMM.
Maybe just specify that the stationary point is the limit of the recursion $y_{t} = Ay_{t-1} + b$ (not necessarily trivial for every reader).
Find the most likely states
Please annotate the plots.
Linear Gaussian SSMs
Tracking an object using the Kalman filter
params, _ = lgssm.initialize(jr.PRNGKey(0),...)
Why do we need an RNG to initialize the model even though all parameters are already fixed? This may also have been a relevant question for previous notebooks but I just noticed it now.
Sample some data from the model
This example errors.
Online linear regression using Kalman filtering
We perform sequential (recursive) Bayesian inference for the parameters of a linear regression model using the Kalman filter. (This algorithm is also known as recursive least squares.)
Conceptually, do we pay a performance penalty by doing this with an SSM-inspired formulation?
This is a LG-SSM, where...
It would be useful to remind the reader of the two equations defining LG-SSM in matrix form.
Online inference
This example errors.
Plot results
What is the difference between $w_0$ batch and $w_0$?
Parallel filtering and smoothing in an LG-SSM
This notebook shows how can reduce the cost of inference from O(T) to O(log T) time, if we have a GPU device.
What is the difference between using your built-in parallel_smoother and just using jax.vmap on the normal smoother?
Does this option exist for other models too, like HMMs?
Can we also parallelize the EM algorithm, or some parts of it?
Test parallel inference on a single sequence
This example errors.
Also, are we supposed to see a difference between serial and parallel filtering on the plot? Obviously we expect both curves to be superposed but it is still a bit weird to have both in the legend and only see one.
MAP parameter estimation for an LG-SSM using EM and SGD
Data
This example errors.
Plot results
I don't understand what is going on in this plot. In particular, how you predict emissions from smoothing. When you predict, by definition you don't have access to observations beyond $t$. Shouldn't you predict from filtering instead?
Bayesian parameter estimation for an LG-SSM using HMC
Generate synthetic training data
This example errors.
Also I still don't understand what is meant by "smoothed emissions" (same problem as with the previous notebook), to me the only thing you can smooth is a state.
Implement HMC wrapper
An introduction to what HMC is and what you use to implement it (apparently blackjax) would be very useful here.
Call HMC
X and Y labels are wrong in the plot. Also, are we supposed to observe that the log probability increases? If so, the blue curve is not very convincing.
Use HMC to infer posterior over a subset of the parameters
Hi and congrats on the package!
I'm one of the reviewers for the JOSS paper you submitted, so here I'll list my questions and concerns about the documentation. This issue will be updated as my reading progresses so maybe don't start answering right away.
Home page
This sentence was unclear to me: I still have to pass some model parameters to the smoother, right?
HMMs
Casino HMM: Inference
Overall really good and clear!
Typo (initializeD).
Casino HMM: Learning
How would you handle data with several trajectories of varying lengths? Do you have to pad them into a 3D tensor, and then apply some kind of mask?
How do you perform gradient ascent on the stochastic matrix$A$ ? Some kind of projection step? It seems far from obvious.
Also the example errors:
The notion of "epoch" doesn't seem standard for EM, do you take it to mean one E step + one M step?
It would be interesting to explain why the asymptotic GD estimate is worse. Theoretically the finite size of the training data should also affect the EM algorithm, both are doing ERM on the loglikelihood.
This example errors.
Gaussian HMM: Cross-validation and model selection
This example errors:
Also, in the "True HMM emission distribution" plot, it may not be obvious what the black lines stand for (transitions I assume).
It would be nice to add a comment explaining why the EM log prob can end up above the one of the true model.
Is it spherical, diagonal or generic?
AutoRegressive HMM demo
Very nice plots but not enough explanations.
Can you clarify what these plots represent? Adding titles / color legends would also help.
This example errors and the plot below (while a very pretty starfish) should also be explained.
Maybe just specify that the stationary point is the limit of the recursion$y_{t} = Ay_{t-1} + b$ (not necessarily trivial for every reader).
Please annotate the plots.
Linear Gaussian SSMs
Tracking an object using the Kalman filter
Why do we need an RNG to initialize the model even though all parameters are already fixed? This may also have been a relevant question for previous notebooks but I just noticed it now.
This example errors.
Online linear regression using Kalman filtering
Conceptually, do we pay a performance penalty by doing this with an SSM-inspired formulation?
It would be useful to remind the reader of the two equations defining LG-SSM in matrix form.
This example errors.
What is the difference between$w_0$ batch and $w_0$ ?
Parallel filtering and smoothing in an LG-SSM
parallel_smoother
and just usingjax.vmap
on the normal smoother?This example errors.
Also, are we supposed to see a difference between serial and parallel filtering on the plot? Obviously we expect both curves to be superposed but it is still a bit weird to have both in the legend and only see one.
MAP parameter estimation for an LG-SSM using EM and SGD
This example errors.
I don't understand what is going on in this plot. In particular, how you predict emissions from smoothing. When you predict, by definition you don't have access to observations beyond$t$ . Shouldn't you predict from filtering instead?
Bayesian parameter estimation for an LG-SSM using HMC
This example errors.
Also I still don't understand what is meant by "smoothed emissions" (same problem as with the previous notebook), to me the only thing you can smooth is a state.
An introduction to what HMC is and what you use to implement it (apparently blackjax) would be very useful here.
X and Y labels are wrong in the plot. Also, are we supposed to observe that the log probability increases? If so, the blue curve is not very convincing.
Same remarks about the blue curve.
Related issues:
The text was updated successfully, but these errors were encountered: