You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.
I'm using py-earth with multicolumn input and multicolumn output. Currently I have the pruning pass enabled. I ran into this problem where sometimes the selected iteration doesn't appear to come out right. I understand that the selected iteration is the one with the minimum GCV among all iterations in the pruning pass? Sometimes the selected iteration comes out to be 0, resulting in a large model, even though that's not the iteration with the minimum GCV.
I ran some tests as shown below. It seems it might have to do with multicolumn output?
Thanks!
Charles
import numpy as np
from pyearth import Earth
# Generate data
np.random.seed(0)
X = np.random.rand(1000,4)
y = np.cos(np.exp(2 * X[:, [1, 0, 3, 2]] + X[:, [2, 3, 0, 1]]**2))
# This is a minimal example I've found where the problem occurs
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X, y[:, [0, 1]])
# The selected iteration is also reflected in the attribute gcv_
print(mars.gcv_)
# and in summary().
# First, we see that no term was pruned.
# Secondly, we see that the coefficients are huge because a large model was selected
print(mars.summary())
0.38522523845256895
Earth Model
-------------------------------------------------------
Basis Function Pruned Coefficient 0 Coefficient 1
-------------------------------------------------------
(Intercept) No -5.74346e+13 -1.95383e+13
h(x1-0.330441) No 9.04795e+12 3.07796e+12
h(0.330441-x1) No -9.04795e+12 -3.07796e+12
h(x0-0.318403) No 7.67987e+12 2.61257e+12
h(0.318403-x0) No -7.67987e+12 -2.61257e+12
h(x1-0.922348) No -9.04795e+12 -3.07796e+12
h(0.922348-x1) No 9.04795e+12 3.07796e+12
h(x3-0.622968) No 0.173292 0.543856
h(0.622968-x3) No -0.0658372 0.415127
h(x0-0.838227) No -8.21228e+12 -2.79368e+12
h(0.838227-x0) No 8.21228e+12 2.79368e+12
h(x0-0.552192) No 5.32415e+11 1.81119e+11
h(0.552192-x0) No -5.32415e+11 -1.81119e+11
x2 No 2.74186e+13 9.32734e+12
h(x2-0.663457) No -6.57405e+12 -2.23638e+12
h(0.663457-x2) No 6.57405e+12 2.23638e+12
h(x2-0.343067) No 3.22098e+12 1.09572e+12
h(0.343067-x2) No -3.22098e+12 -1.09572e+12
h(x2-0.79084) No -1.04684e+13 -3.56118e+12
h(0.79084-x2) No 1.04684e+13 3.56118e+12
h(x2-0.0910697) No 1.09251e+13 3.71653e+12
h(0.0910697-x2) No -1.09251e+13 -3.71653e+12
h(x2-0.851198) No -1.23137e+13 -4.18892e+12
h(0.851198-x2) No 1.23137e+13 4.18892e+12
h(x2-0.757364) No -9.44498e+12 -3.21303e+12
h(0.757364-x2) No 9.44498e+12 3.21303e+12
h(x2-0.15454) No 8.98467e+12 3.05644e+12
h(0.15454-x2) No -8.98467e+12 -3.05644e+12
h(x2-0.0710361) No 1.15376e+13 3.92489e+12
h(0.0710361-x2) No -1.15376e+13 -3.92489e+12
h(x2-0.867167) No -1.28019e+13 -4.355e+12
h(0.867167-x2) No 1.28019e+13 4.355e+12
h(x2-0.671516) No -6.82041e+12 -2.32019e+12
h(0.671516-x2) No 6.82041e+12 2.32019e+12
h(x2-0.6502) No -6.16875e+12 -2.09851e+12
h(0.6502-x2) No 6.16875e+12 2.09851e+12
h(x2-0.175276) No 8.35073e+12 2.84078e+12
h(0.175276-x2) No -8.35073e+12 -2.84078e+12
h(x2-0.639622) No -5.84536e+12 -1.9885e+12
h(0.639622-x2) No 5.84536e+12 1.9885e+12
-------------------------------------------------------
MSE: 0.3131, GCV: 0.3852, RSQ: 0.2765, GRSQ: 0.1115
# Same with max_degree=2
mars = Earth(max_degree=2, max_terms=100, verbose=2)
mars.fit(X, y[:, [0, 1]])
# There appears to be no issue if the output only has one column (I checked the other columns too)
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X, y[:, 0])
# Now, let's look at the effect of training size
# Somehow there's no problem with 490 examples
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X[:490], y[:490, [0, 1]])
# Adding just one example (thus 491 examples), problem occurs...
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X[:491], y[:491, [0, 1]])
Earth Model
-------------------------------------------------------
Basis Function Pruned Coefficient 0 Coefficient 1
-------------------------------------------------------
(Intercept) No 3.98638e+13 -2.37968e+13
h(x1-0.333965) No -3.66554e+12 2.18815e+12
h(0.333965-x1) No 3.66554e+12 -2.18815e+12
h(x0-0.317983) No -3.22491e+12 1.92512e+12
h(0.317983-x0) No 3.22491e+12 -1.92512e+12
h(x3-0.945302) No -3.31348 13.2036
h(0.945302-x3) No -0.163574 0.157959
h(x0-0.845365) No 3.38884e+12 -2.02298e+12
h(0.845365-x0) No -3.38884e+12 2.02298e+12
h(x0-0.562066) No -1.6393e+11 9.78586e+10
h(0.562066-x0) No 1.6393e+11 -9.78586e+10
h(x1-0.918546) No 3.66554e+12 -2.18815e+12
h(0.918546-x1) No -3.66554e+12 2.18815e+12
x2 No -1.28533e+13 7.6728e+12
h(x2-0.718626) No 2.58545e+12 -1.54339e+12
h(0.718626-x2) No -2.58545e+12 1.54339e+12
h(x2-0.730709) No 2.73698e+12 -1.63385e+12
h(0.730709-x2) No -2.73698e+12 1.63385e+12
h(x2-0.739884) No 2.85204e+12 -1.70253e+12
h(0.739884-x2) No -2.85204e+12 1.70253e+12
h(x2-0.208877) No -3.80718e+12 2.27271e+12
h(0.208877-x2) No 3.80718e+12 -2.27271e+12
h(x2-0.107301) No -5.08101e+12 3.03312e+12
h(0.107301-x2) No 5.08101e+12 -3.03312e+12
h(x2-0.221161) No -3.65312e+12 2.18074e+12
h(0.221161-x2) No 3.65312e+12 -2.18074e+12
h(x2-0.983854) No 5.91159e+12 -3.52894e+12
h(0.983854-x2) No -5.91159e+12 3.52894e+12
h(x2-0.951874) No 5.51055e+12 -3.28954e+12
h(0.951874-x2) No -5.51055e+12 3.28954e+12
h(x2-0.919507) No 5.10464e+12 -3.04723e+12
h(0.919507-x2) No -5.10464e+12 3.04723e+12
h(x2-0.930126) No 5.23781e+12 -3.12673e+12
h(0.930126-x2) No -5.23781e+12 3.12673e+12
h(x2-0.962395) No 5.64248e+12 -3.3683e+12
h(0.962395-x2) No -5.64248e+12 3.3683e+12
h(x2-0.90315) No 4.8995e+12 -2.92477e+12
h(0.90315-x2) No -4.8995e+12 2.92477e+12
h(x2-0.778157) No 3.332e+12 -1.98905e+12
h(0.778157-x2) No -3.332e+12 1.98905e+12
h(x2-0.0741245) No -5.49707e+12 3.28149e+12
h(0.0741245-x2) No 5.49707e+12 -3.28149e+12
h(x2-0.271551) No -3.0212e+12 1.80351e+12
h(0.271551-x2) No 3.0212e+12 -1.80351e+12
h(x2-0.16026) No -4.41687e+12 2.63666e+12
h(0.16026-x2) No 4.41687e+12 -2.63666e+12
h(x2-0.193236) No -4.00332e+12 2.38979e+12
h(0.193236-x2) No 4.00332e+12 -2.38979e+12
h(x2-0.140316) No -4.66698e+12 2.78597e+12
h(0.140316-x2) No 4.66698e+12 -2.78597e+12
h(x2-0.766591) No 3.18696e+12 -1.90247e+12
h(0.766591-x2) No -3.18696e+12 1.90247e+12
-------------------------------------------------------
MSE: 0.2940, GCV: 0.5393, RSQ: 0.3169, GRSQ: -0.2481
The text was updated successfully, but these errors were encountered:
@charleszxiong Thanks for reporting this. It definitely looks like a bug. I'll try to figure out what's going on as soon as I can. In the mean time, if you have any additional thought or observations, please post here.
To anyone reading this, additional reports are welcome, as are pull requests.
Hi,
I'm using py-earth with multicolumn input and multicolumn output. Currently I have the pruning pass enabled. I ran into this problem where sometimes the selected iteration doesn't appear to come out right. I understand that the selected iteration is the one with the minimum GCV among all iterations in the pruning pass? Sometimes the selected iteration comes out to be 0, resulting in a large model, even though that's not the iteration with the minimum GCV.
I ran some tests as shown below. It seems it might have to do with multicolumn output?
Thanks!
Charles
The text was updated successfully, but these errors were encountered: