Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the separate and surrogate regression method classes #379

Merged
merged 399 commits into from
Apr 17, 2024

Conversation

LHBO
Copy link
Collaborator

@LHBO LHBO commented Feb 22, 2024

In this pull request, we add the option to use the separate and surrogate regression method classes discussed in Olsen et al. (2023). We use the packages in the tidymodels framework to specify the models, tune hyperparameters, and fit them.

Key points in this pull request are the following:

  1. A detailed vignette that explains the method classes and all points below. It also includes several comparisons with the Monte Carlo-based methods, and the regression-based methods perform better (even without CV). Note that this is not a proper simulation study or anything, but it illustrates that the regression-based methods are very competitive both timewise and accuracy-wise regarding the MSEv criterion.
  2. We introduce the separate regression method class to shapr.
  3. We introduce the surrogate regression method class to shapr.
  4. The regression models (both separate and surrogate) can be any regression model from the parsnip/tidymodels package.
  5. Cross-validation allows easy tuning of any parameter in the regression models. The possible hyperparameter values are specified by the user in a data frame or as the output of a user-specified function (in situations where the allowed values change best on the number of features included in the coalitions). The user gets feedback from the CV procedure when verbose = 2.
  6. The data can be pre-processed before being sent to the model using recipes. The user can provide a function that modifies the recipe object.
  7. Allow the user to create/specify their own regression models that the regression method class is to use.
  8. Extend shapr to support explaining any tidymodels fitted using the workflow procedure; see the vignette.
  9. The separate regression models can be trained in parallel using future as usual in shapr. The cross-validation procedure of the surrogate regression methods can be parallelized using the same package. The future package also parallelizes the prediction step for both the separate and surrogate regression methods.
  10. Made changes to Python code: see list below.

We have also made several changes to the Python version of explain() to make the methods work in Python.

  1. Fixed such that we can use regression_separate and regression_surrogate in Python.
  2. Fixed indentation.
  3. Specify where .utils is loaded from.
  4. We added **kwargs to explain() so that we can actually send approach-specific arguments. These must have the form approach_parameter_name in Python, but we added code to translate them to the R syntax approach.parameter_name.
  5. Added documentation and some comments.
  6. I refactored compute_vS() so that it splits between whether we use an MC-based or regression-based method.
  7. As Python and pyr2 do not support tidymodels, we remove all objects related to regression in the internal list before converting them to Python objects and returning them.

martinju and others added 30 commits February 15, 2024 14:31
for speed-up + such that pkgdown don't need the torch stuff
… This is to reduce the .rds save files size and to remove time dependent random path names
…to creatining fig/cache in separate sub folders
@LHBO LHBO changed the title Separate regression approaches Adding the separate and surrogate regression method classes Apr 14, 2024
Copy link
Member

@martinju martinju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. See some comments. I will look at the vignettes (and recompile them) separately.

R/explain.R Outdated Show resolved Hide resolved
python/requirements.txt Outdated Show resolved Hide resolved
python/shaprpy/explain.py Outdated Show resolved Hide resolved
Comment on lines 98 to 101
kwargs: Further arguments passed to specific approaches. See R-documentation for more information about the
approach specific arguments (https://cran.r-project.org/web/packages/shapr/shapr.pdf). Note that the parameters
in R are called 'approach.parameter_name', but in Python the equivalent would be 'approach_parameter_name'.
# TODO: discuss with martin how to get the descriptions passed here. Because we cannot use @inheritDotParams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I think we should refer to the pkgdown website for documentation.

What we could do to more easily refer to the ...-parameters in python is to create a new function in R called tripledot_docs or something like that which takes ... as input and where we use @inheritDotParams explain in the documentation, then we can link directly to that function from this documentation (still describing that you need to replace dots by underscores.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this function and updated the python documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have moved from 2 spaced indentation to 4? I am Ok with that change, but making that change here makes it hard to spot what are actual code changes and not. I suggest to revert that to ease code review. I am sure there is quick-tool for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 standard. I go back to non-standard formatting. We need to determine what to do in the future in another PR.

vignettes/understanding_shapr.Rmd.orig Outdated Show resolved Hide resolved
@LHBO
Copy link
Collaborator Author

LHBO commented Apr 16, 2024

I have made all the alterations requested in the previous review, and I have built the vignettes.

@martinju martinju merged commit 5b15935 into NorskRegnesentral:master Apr 17, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants