Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Forward pass adds same linear basis function #195

Open
CherylCB opened this issue Mar 13, 2019 · 5 comments
Open

Forward pass adds same linear basis function #195

CherylCB opened this issue Mar 13, 2019 · 5 comments
Labels

Comments

@CherylCB
Copy link

CherylCB commented Mar 13, 2019

I'm trying to run a pyearth model with enable_pruning = False and only linear features with a max_degree of 1. I noticed that the same feature is added twice. See results below:

-------------------------------------------------------------------------
iter  parent  var  knot  mse            terms  gcv         rsq    grsq   
-------------------------------------------------------------------------
0     -       -    -     124390.433112  1      125669.504  0.000  0.000  
1     0       11   -1    58269.498543   2      60407.652   0.532  0.519  
2     0       12   -1    46309.037751   3      49280.000   0.628  0.608  
3     0       9    -1    43489.795470   4      47522.247   0.650  0.622  
4     0       10   -1    40593.709662   5      45564.586   0.674  0.637  
5     0       8    -1    38274.623612   6      44146.607   0.692  0.649  
6     0       1    -1    36957.981690   7      43820.303   0.703  0.651  
7     0       7    -1    34190.204492   8      41688.582   0.725  0.668  
8     0       12   -1    33603.237796   9      42151.901   0.730  0.665  
-------------------------------------------------------------------------

I noticed issue #135 which seems to suggest the same bug, addressed by @jcrudy .
I'm installing from the latest commit git+https://github.com/scikit-learn-contrib/py-earth.git@b209d1916f051dbea5b142af25425df2de469c5a#egg=sklearn-contrib-py-earth

@jcrudy
Copy link
Collaborator

jcrudy commented Mar 14, 2019

@CherylCB Thanks for reporting this. I will look into it as soon as I can, which might be a while. It's annoying, but if this bug is causing problems for you the best workaround might be to add your own code to prune any duplicate basis functions. If you want to do that and need guidance on where to start, please comment here and I can elaborate.

It seems you're using py-earth essentially as a variable selection mechanism for linear regression. Is that right? If so, you could also just extract the set of selected variables and use them with sklearn.LinearRegression.

@jcrudy jcrudy added the bug label Mar 14, 2019
@CherylCB
Copy link
Author

Thanks for your answer @jcrudy. For now indeed I have added my own code to workaround the problem of adding duplicate basis functions.

@jcrudy
Copy link
Collaborator

jcrudy commented Mar 20, 2019

@CherylCB Glad you were able to work around this bug. Please feel free to post code for your workaround in this thread if it's shareable. It might help someone out later. I'll be leaving this issue open until it's fixed.

@CherylCB
Copy link
Author

@jcrudy I have some time coming days to work on this bug, do you have any suggestions on what would be the first place for me to look?

@jcrudy
Copy link
Collaborator

jcrudy commented Mar 22, 2019

@CherylCB That's great. I'll take all the help I can get. Going off of the current master, I'd start by looking here. As you'll see, I wrote some special code to try to prevent exactly what you are seeing. One of two possible things is probably happening:

  1. The has_linear method in that line is not correct. The has_linear method is defined as part of the BasisFunction abstract class, which you'll find in _basis.pyx. It calls linear_in , which has different implementations for the different subclasses. Possibly something in this logic is wrong.

  2. The forward pass is not using variable_can_be_linear correctly in some specific situations. If this is what's happening, you'll have to go over the logic of the forward pass and see where the linear basis function is being added in your example, then figure out how to prevent it from happening. If you find any clues here they could be very helpful for me, even if you're not sure how to fully solve the problem.

You're actually in a good position to figure this out, since you have an example data set that shows the problem. I'd suggest you try to debug what's happening in the forward pass using a script that fits a model to your data set. Unfortunately, it's hard to set up a debugger to work with cython. Perhaps you're more skilled than me in this area, but if not I suggest you just use print statements in the cython files.

The workflow is something like this:

  1. Clone the py-earth repo
  2. cd into the repo and build the code using:
    python setup.py build_ext --inplace --cythonize
  3. cd back out of the repo. Put your test script in the same directory as the repo (not inside the repo)
  4. Make sure your python environment doesn't have py-earth installed.
  5. Now when you run your script, it should import pyearth from the repo.
  6. Whenever you change a cython file (such as to add print statements, for example), you'll need to run the command from step 2 again.

If you have any problems, don't hesitate to get in touch. You can reply here or email me (my address is on my github profile). You're potentially saving me a lot of time by working on this, so of course I'm very happy to spend some time helping you succeed at it. Good luck!

verycherry pushed a commit to verycherry/py-earth that referenced this issue Mar 29, 2019
verycherry pushed a commit to verycherry/py-earth that referenced this issue Mar 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants