Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifiability constraint and Basis output consistency #255

Open
BalzaniEdoardo opened this issue Oct 23, 2024 · 0 comments
Open

Identifiability constraint and Basis output consistency #255

BalzaniEdoardo opened this issue Oct 23, 2024 · 0 comments

Comments

@BalzaniEdoardo
Copy link
Collaborator

What are identifiability constraints and why apply them

Identifiability constraints are techniques to ensure that a design matrix has maximal rank (i.e., the rank is the minimum between the number of rows and columns) by removing redundant dimensions.

In statistical models like GLMs, identifiability constraints ensure unique parameter solutions by removing redundant features. Without these constraints, multiple parameter vectors could produce identical outcomes, complicating model interpretation and optimization

Mathematically, if $X\in \mathbb{R}^{n \times m}$ is a design matrix, and $\text{rank}(X) < m\leq n$, there exists at least a $v\in\mathbb{R}^m$ such that $X \cdot v = 0$. This implies that a weight vector $$w \in \mathbb{R}^m$$, that maximizes the model likelihood, $w + \alpha v$ also maximizes the likelihood for every $\alpha \in \mathbb{R}$.

$$ \mathcal{L}(y | X, w) = \log p(y | f(X \cdot w) ) = \log p(y | f(X \cdot (w + \alpha v)) = \mathcal{L}(y | X, w + \alpha v) $$

This shows that any weight vector in the form $w + \alpha v$ will result in the same likelihood, leading to non-unique solutions.

How are the constraint applied in NeMoS

The way we currently apply those constraint is by throwing away columns of until full rank is obtained. This happens within basis when a flag is set at initialization.

What's the problem with this approach

The resulting behavior of the basis transform may be confusing and cause errors if used in pipelines:

  1. Calling compute_features on different inputs may result in a different number of output features, since the rank of the output design matrix depends on the input.
  2. Basis methods for splitting the feature axis of composite basis in individual components -currently work in progress-, rely on the number of output features to be consistent and may fail.
  3. Regularized models are always full rank, but basis is unaware of what model is applied after the features are constructed, and would still apply the constraint to the design matrix.

Solution

It is still valuable to provide a function to enforce identifiablity constraints, but should probably be implemented in a dedicated utility function, that should be called after basis.compute_features and before glm.fit.

This will allow a consistent behavior of basis, and will make more explicit when the user is applying the constraint.

Additional Note

The identifiability constraint changes how parameter should be interpreted, we need to document that.
Imagine a 1-hot encoded trial structure, with 4 trials, the first two of type A, the second of type B. The encoding matrix would be

      [[1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.]]

Fitting a GLM with this design would return two weights, the first can be interpreted as the effect of trial type A on the rate and the second that of trial type B.

Adding an intercept would make it low rank, since it is equivalent to this design

      [[1., 1., 0.],
       [1., 1., 0.],
       [1., 0., 1.],
       [1., 0., 1.]]

and the columns are linearly dependent.

Applying an identifiability constraint by throwing away the last column result in the following

      [[1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 0.]]

And now the interpretation of the weights changes: the first weight is the baseline rate, the second weight is the delta between the baseline rate and the trial type A rate.

The trial rates are fully captured by the model because if one wants to predict the rate at trial type A, must use pass the following array [[1, 1]] as input to predict, for trial type B, [[1, -1]], what changes is how one interprets the weights.

This discussion that may be bread and butter to statisticians, may that obvious to any neuroscientist, and should be highlighted somewhere in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant