Seeking advice for fitting a reduced system. Is it always feasible? #576

dariosannino · 2024-10-29T11:03:44Z

dariosannino
Oct 29, 2024

Hello,

I wanted to ask some help to the community for using pysindy in the proper way.
I will firstly give a bit of context:
Let's say that a system is made up of 300 interacting signals (in our case would be 300 time-series), but we reduce to 100 signals because of spatial recording limits or because we consider them to be in the regions that are mostly critical. Among these 100 time-series I can select around n of them (around 20), because I am mostly interested in these signals and because I can appreciate that they account for most of the variance in the signals.

My first idea was to try to fit these time-series to understand how they are interdependent and I tried to do that through the use of a polynomial library, but since it becomes cumbersome I preferred to move to a reduced space of 3 state variables by applying PCA on the 20 time-series.

So, let's say in the end we managed to have just 3 signals and we try to leverage pysindy to understand how their dynamics is coupled.

First thing that I did was to smooth a bit my signals and getting their ground truth derivatives, so that pysindy could go through a more robust fitting process. Here I applied a very strong smoothing hoping that pysindy could have easier life.

import pynumdiff.total_variation_regularization as tvr

# Parameters
gamma = 5  # Regularization parameter
iterations = 25  # Number of iterations

plt.figure(figsize=(19, 4))

signal = np.zeros_like(projected_deltaFOverF)
derivatives = np.zeros_like(projected_deltaFOverF)
fig, axs = plt.subplots(3, 1, sharex=True, figsize=(17, 9))
# Apply TV regularization to each principal component
for i in range(projected_deltaFOverF.shape[1]):
    projected_deltaFOverF_data = projected_deltaFOverF[:, i]
    x_hat, dxdt_hat = tvr.iterative_velocity(projected_deltaFOverF_data, dt, [iterations, gamma])
    signal[:, i] = x_hat
    derivatives[:, i] = dxdt_hat
    axs[i].plot(time[:], projected_deltaFOverF[:, i], label='Original')
    axs[i].plot(time[:], x_hat, label='TV Regularized')
    axs[i].legend()

plt.show()

Then I prepared everything for the fitting process:

projected_deltaFOverF = signal
d_projected_deltaFOverF = derivatives
# Create a polynomial library with degree 5. Pay attention that high-degree terms are more prone to cause instability.
poly_library = ps.PolynomialLibrary(degree=5, include_interaction=True)

I plotted my pareto curve in order to know which value of threshold should be more appropriate for the STLSQ optimizer. As you may notice, I used a very low value for the alpha parameter, because I manually saw that those regions of values were getting close to a reasonable fitting when calling the simulate() function. This would mean that we are not actually penalizing a lot large coefficients and we potentially risk to overfit. Hope this is not a big mistake.

threshold_scan = np.linspace(0, 0.3, 21)
coefs = []
rmse_values_projected_deltaFOverF_prime = []

for i, threshold in enumerate(threshold_scan):
    opt = ps.STLSQ (threshold=threshold, fit_intercept=True, verbose=False, alpha=0.009)
    model = ps.SINDy(feature_library= poly_library, optimizer=opt)

    model.fit(projected_deltaFOverF, t=dt, x_dot=d_projected_deltaFOverF)
    coefs.append(model.coefficients())

    # Predict the derivatives
    d_projected_deltaFOverF_pred = model.predict(projected_deltaFOverF)

    # Calculate RMSE for the derivatives
    rmse_projected_deltaFOverF_prime  = np.sqrt(np.mean((d_projected_deltaFOverF - d_projected_deltaFOverF_pred) ** 2))
    rmse_values_projected_deltaFOverF_prime.append(rmse_projected_deltaFOverF_prime)

# Function to plot the Pareto curve
def plot_pareto(threshold_scan, rmse_values):
    plt.figure(figsize=(10, 6))
    plt.plot(threshold_scan, rmse_values, '-o', label='RMSE of $X\'$')
    plt.xlabel('Threshold')
    plt.ylabel('RMSE')
    plt.title('Pareto Curve: RMSE vs. Sparsity Threshold')
    plt.legend()
    plt.grid()
    plt.show()

# Plot the Pareto curve
plot_pareto(threshold_scan, rmse_values_projected_deltaFOverF_prime)

My pareto curve looks like this and therefore I decide to take threshold=0.01 for my optimizer

Here the main code for the fitting getting a Score on derivatives: 0.6685255985963231. I have also tried with ensemble methods but I guess they did not have good effects because my signals were already smoothed.

optimizer = ps.STLSQ(threshold=0.01, fit_intercept=True, alpha=0.009, max_iter=100)
model = ps.SINDy(feature_library=poly_library, optimizer=optimizer)
model.fit(projected_deltaFOverF, t=dt, x_dot=d_projected_deltaFOverF)
print(f'Score on derivatives: {model.score(projected_deltaFOverF, t=dt, x_dot=d_projected_deltaFOverF)}') 
model.print()

Now we come to the extracted equations. These could be pretty long suggesting that the threshold is very low and the degree of the polynomial is maybe too high, causing maybe some problems of interpretability or overfitting. However, both lowering the degree of the polynomial or increasing the threshold often gave me worse results.

(x0)' = 0.042 x0 + -0.410 x1 + 0.582 x2 + -0.342 x0^2 + 0.156 x0 x1 + -0.228 x0 x2 + 0.022 x1^2 + -0.295 x1 x2 + 0.473 x2^2 + 0.067 x0^3 + 0.871 x0^2 x1 + -0.626 x0^2 x2 + -0.322 x0 x1^2 + 1.631 x0 x1 x2 + -0.524 x0 x2^2 + 0.309 x1^3 + -1.081 x1^2 x2 + 4.437 x1 x2^2 + 1.038 x2^3 + 0.367 x0^4 + -0.700 x0^3 x1 + 1.490 x0^3 x2 + 0.905 x0^2 x1^2 + -4.791 x0^2 x1 x2 + 4.198 x0^2 x2^2 + -0.477 x0 x1^3 + 5.405 x0 x1^2 x2 + -10.472 x0 x1 x2^2 + 5.546 x0 x2^3 + -0.248 x1^4 + -0.672 x1^3 x2 + 2.788 x1^2 x2^2 + 0.198 x1 x2^3 + -3.042 x2^4 + -0.199 x0^5 + -0.012 x0^4 x1 + -0.987 x0^4 x2 + -0.518 x0^3 x1^2 + 2.926 x0^3 x1 x2 + -2.414 x0^3 x2^2 + 0.014 x0^2 x1^3 + -4.570 x0^2 x1^2 x2 + 5.734 x0^2 x1 x2^2 + 0.686 x0^2 x2^3 + -0.060 x0 x1^4 + 1.353 x0 x1^3 x2 + -9.640 x0 x1^2 x2^2 + -5.226 x0 x1 x2^3 + 5.476 x0 x2^4 + 0.163 x1^5 + -0.396 x1^4 x2 + 2.490 x1^3 x2^2 + -12.471 x1^2 x2^3 + -9.005 x2^5 + 0.087
(x1)' = 0.053 x0 + -0.097 x1 + -0.102 x0^2 + 0.198 x0 x1 + 0.020 x1^2 + -0.302 x1 x2 + -0.610 x2^2 + 0.107 x0^2 x2 + 0.141 x0 x1^2 + -0.346 x0 x1 x2 + -0.380 x0 x2^2 + 0.080 x1^3 + 0.258 x1^2 x2 + -0.392 x1 x2^2 + -0.369 x2^3 + 0.154 x0^4 + -0.519 x0^3 x1 + 0.358 x0^3 x2 + 0.636 x0^2 x1^2 + -1.465 x0^2 x1 x2 + 1.158 x0^2 x2^2 + -0.166 x0 x1^3 + 1.118 x0 x1^2 x2 + -3.903 x0 x1 x2^2 + 1.744 x0 x2^3 + -0.159 x1^4 + 0.527 x1^3 x2 + 2.075 x1^2 x2^2 + -2.163 x1 x2^3 + 1.918 x2^4 + -0.079 x0^5 + 0.333 x0^4 x1 + -0.310 x0^4 x2 + -0.637 x0^3 x1^2 + 2.250 x0^3 x1 x2 + -0.460 x0^3 x2^2 + 0.662 x0^2 x1^3 + -4.403 x0^2 x1^2 x2 + 4.809 x0^2 x1 x2^2 + -1.511 x0^2 x2^3 + -0.472 x0 x1^4 + 3.955 x0 x1^3 x2 + -6.992 x0 x1^2 x2^2 + 3.420 x0 x1 x2^3 + 2.310 x0 x2^4 + 0.183 x1^5 + -1.595 x1^4 x2 + 3.029 x1^3 x2^2 + -2.223 x1^2 x2^3 + -0.640 x1 x2^4 + 1.404 x2^5 + 0.002
(x2)' = -0.025 x0 + -0.020 x1 + -0.016 x2 + -0.080 x0^2 + -0.018 x0 x1 + -0.026 x0 x2 + 0.083 x1^2 + 0.001 x1 x2 + 0.001 x2^2 + 0.066 x0^3 + -0.090 x0^2 x1 + 0.165 x0^2 x2 + 0.217 x0 x1^2 + -0.542 x0 x1 x2 + -0.080 x0 x2^2 + -0.016 x1^3 + 0.331 x1^2 x2 + -0.626 x1 x2^2 + -0.644 x2^3 + 0.073 x0^4 + 0.094 x0^3 x2 + -0.027 x0^2 x1^2 + 0.368 x0^2 x1 x2 + 0.028 x0^2 x2^2 + 0.138 x0 x1^3 + -0.746 x0 x1^2 x2 + 1.519 x0 x1 x2^2 + 0.497 x0 x2^3 + -0.125 x1^4 + 0.316 x1^3 x2 + -0.934 x1^2 x2^2 + -0.932 x1 x2^3 + -0.341 x2^4 + -0.054 x0^5 + 0.053 x0^4 x1 + -0.159 x0^4 x2 + -0.145 x0^3 x1^2 + 0.243 x0^3 x1 x2 + -0.195 x0^3 x2^2 + 0.200 x0^2 x1^3 + -0.328 x0^2 x1^2 x2 + 0.615 x0^2 x1 x2^2 + -0.243 x0^2 x2^3 + -0.258 x0 x1^4 + 0.586 x0 x1^3 x2 + -0.342 x0 x1^2 x2^2 + 1.429 x0 x1 x2^3 + -1.256 x0 x2^4 + 0.106 x1^5 + -0.437 x1^4 x2 + 0.430 x1^3 x2^2 + 1.829 x1^2 x2^3 + 1.568 x1 x2^4 + 1.111 x2^5 + 0.001

In the current model I checked if there is an inherent instability in the linear part of the model, in order to understand if I might need to enforce additional constraints on the model to improve stability.

linear_terms= model.coefficients()[:, 1:1+projected_deltaFOverF.shape[1]]
print('Linear terms of the Coefficient Matrix:')
print(linear_terms)

eigenvalues = np.linalg.eigvals(linear_terms)
print("\nEigenvalues of linear term matrix:", eigenvalues)

Here the output suggesting stability:
Linear terms of the Coefficient Matrix:
[[ 0.04212236 -0.40978422 0.58158849]
[ 0.05307033 -0.09734764 0. ]
[-0.02492854 -0.01981616 -0.01571323]]

Eigenvalues of linear term matrix: [-0.00054895+0.18148227j -0.00054895-0.18148227j -0.0698406 +0.j ]

Let's get now to the last two points. Checking the derivatives fitting and the simulation.
I checked my derivatives fitting in such a way:

d_projected_deltaFOverF_pred = model.predict(projected_deltaFOverF)

fig, axs = plt.subplots(d_projected_deltaFOverF.shape[1], 1, sharex=True, figsize=(17, 9))
for i in range(projected_deltaFOverF.shape[1]):
  axs[i].plot(time[:], d_projected_deltaFOverF[:, i], "k", label="numerical derivative")
  axs[i].plot(time[:], d_projected_deltaFOverF_pred[:, i], "r--", label="model prediction")
  axs[i].legend()
  axs[i].set(xlabel="t", ylabel=r"$\dot x_{}$".format(i))
fig.show()

Here my results, that I guess are not really promising (compared to the examples that I saw on the Lorentz system). That suggests me that while errors accumulates something could go wrong:

Trying to simulate my system starting from the first point of my time-series:

starting_point = 0
ending_point = 650
projected_deltaFOverF_sim = model.simulate(projected_deltaFOverF[starting_point, :], time.flatten()[starting_point:ending_point])

# Plot the original vs simulated data for each principal component
for i in range(projected_deltaFOverF.shape[1]):
    plt.figure(figsize=(13, 3))
    plt.plot(time.flatten()[starting_point:ending_point], projected_deltaFOverF[starting_point:ending_point, i], label=f'Original Component {i+1}', alpha=0.7)
    plt.plot(time.flatten()[starting_point:ending_point], projected_deltaFOverF_sim[:, i], '--', label=f'Simulated Component {i+1}', alpha=0.7)
    plt.xlabel('Time')
    plt.ylabel('Activity')
    plt.title(f'Component {i+1}: Original vs. Simulated')
    plt.legend()
    plt.show()

I tried to start from another initial condition and there seems to be instability there:

starting_point = 1600
ending_point = starting_point + 650
projected_deltaFOverF_sim = model.simulate(projected_deltaFOverF[starting_point, :], time.flatten()[starting_point:ending_point])

# You can plot or calculate metrics (like MSE) to compare projected_deltaFOverF_sim with projected_deltaFOverF

# Plot the original vs simulated data for each principal component
for i in range(projected_deltaFOverF.shape[1]):
    plt.figure(figsize=(13, 3))
    # plt.plot(time.flatten()[:550], projected_deltaFOverF_scaled[:150, i], label=f'Original Component {i+1}', alpha=0.7)
    plt.plot(time.flatten()[starting_point:ending_point], projected_deltaFOverF[starting_point:ending_point, i], label=f'Original Component {i+1}', alpha=0.7)
    plt.plot(time.flatten()[starting_point:ending_point], projected_deltaFOverF_sim[:, i], '--', label=f'Simulated Component {i+1}', alpha=0.7)
    plt.xlabel('Time')
    plt.ylabel('Activity')
    plt.title(f'Component {i+1}: Original vs. Simulated')
    plt.legend()
    plt.show()

overflow encountered in reduce
ValueError: Input X contains infinity or a value too large for dtype('float64').

So, it seems that errors accumulates much faster from such initial condition and probably is due to the high degree of the polynomial, but still my impression was that decreasing the degree sometimes gave anyway the same type of error.
What seems to be critical here are the threshold, alpha and degree parameters.

I have heard that Trapping SINDy could be used for promoting global stability, but I did not completely understand so far if I have to give some constraints and I don't know exactly how to design them. Can someone give some feedback on that?
More importantly, is it possible that I am oversimplifying the problem, namely having just 100 signals out of 300 prevents me to have a full picture of the system (and the same could be valid while selecting 20 neurons out of 100 and applying the PCA on those 20 neurons) and therefore the way I am trying to describe the coupled differential equation does not work?

P.S I am aware that custom library could be developed. So far I tried to concatenate and tensor two libraries together (like the polynomial and the Fourier ones), but I did not obtain good results. In particular the Fourier ones were interacting badly and needed scaling of my signal as far as I remember. Therefore I decided to stick with the polynomials trying to mimic the idea of the Lorentz example, but I don't know if my case is less trivial or I am doing some mistakes.

Hope someone could help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking advice for fitting a reduced system. Is it always feasible? #576

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Seeking advice for fitting a reduced system. Is it always feasible? #576

dariosannino Oct 29, 2024

Replies: 0 comments

dariosannino
Oct 29, 2024