-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHAP and orderedfeature minimizers #57
base: main
Are you sure you want to change the base?
Conversation
@@ -13,4 +13,7 @@ | |||
It is also possible to export the generalizations as feature ranges. | |||
|
|||
""" | |||
from apt.minimization.minimizer import GeneralizeToRepresentative | |||
from .minimizer import GeneralizeToRepresentative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use absolute imports instead of relative
def importance_ordered_features(self): | ||
return self._importance_ordered_features | ||
|
||
def _get_ordered_features(self, estimator, encoder, X_train, y_train, numerical_features, categorical_features, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the order is only computed when calling an internal method (begins with _). Please change this to be an external method (with documentation), and if it receives x and y data better to call it fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a hook that is called in the fit method in the superclass (OrderedFeatureMinimizer). It is a protected method that should be overriden to define a custom order to prune according during fit.
The implementation as it is allows access to the order through a property only after fit() is called, the property returns None otherwise. This property has a different name for different classes (here it is importace_ordered_features). I could add this to OrderedFeatureMinimizer as a generic property called features_order if you believe that is more suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand. I suggest to have a consistent naming for the property. Also, at the moment the fit code does not set self._ordered_features, which I think it should (instead of setting a local variable).
@@ -66,7 +66,7 @@ def __init__(self, estimator: Union[BaseEstimator, Model] = None, target_accurac | |||
encoder: Optional[Union[OrdinalEncoder, OneHotEncoder]] = None, | |||
features_to_minimize: Optional[Union[np.ndarray, list]] = None, | |||
train_only_features_to_minimize: Optional[bool] = True, | |||
is_regression: Optional[bool] = False): | |||
is_regression: Optional[bool] = False, accuracy_score: Callable = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this new parameter to the class docstring, stating clearly what are the expectations from this callable (inputs and outputs).
|
||
|
||
# TODO: use mixins correctly | ||
class OrderedFeatureMinimizer: # (BaseEstimator, MetaEstimatorMixin, TransformerMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a doctring for the class and all external (public) methods.
X[:, indices] = cls._transform_categorical_feature(dts[feature].tree_, X[:, indices], | ||
generalizations_arrays[feature], depths[feature], 0) | ||
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we agreed there is no need for class methods. These can be changed to static or removed from the class to some utils file.
return 1 + max(cls._calculate_tree_depth(dt, dt.tree_.children_left[node_id]), | ||
cls._calculate_tree_depth(dt, dt.tree_.children_right[node_id])) | ||
|
||
def fit(self, X: pd.DataFrame, y=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add doctsring, and state the assumptions about the data (order of features etc.). Also, is it necessary that the input be a dataframe? Could we use the generic data wrappers (e.g., ArrayDataset) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add type of y
from sklearn.model_selection import train_test_split | ||
|
||
|
||
class ShapMinimizer(OrderedFeatureMinimizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also missing docstrings.
import pandas as pd | ||
import numpy as np | ||
from typing import List, Dict, Union | ||
from . import OrderedFeatureMinimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use absolute import
background_size, feature_indices: Dict[str, List[int]], random_state, nsamples): | ||
""" | ||
Calculates global shap per feature. | ||
:param nsamples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing param descriptions
@@ -0,0 +1,105 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file (which includes the Benchmark class). Instead add some unit tests to the tests directory with appropriate asserts (see examples in test_minimizer.py)
from sklearn.tree._tree import Tree | ||
|
||
|
||
# TODO: use mixins correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove TODOs and unused params (unless you still plan to add this).
…ed assumptions. * Added general important documentation (not everything is documented). * Explainer is commented out. Instead there is a placeholder so we could continue implementation.
* Added implementations * Added splits in fit method
…ethodology, and a little bug in catergorical feature transformation.
Implemented generalizations partially. (some helpers). - not tested
Models now depend on random state to create reproducible results.
* Cleaned pruning code and moved to a class method. * Pruning now only calls transform once per iteration. It saves data from previous iteration to reverse a generalization. * transform() now depends on generalizations.
* Run experiment on np.arange(0, .8, 0.1). The original minimizer seems not to work past 0.8 target accuracy (or takes tooo long).
* Fixed shapminimizer to only take true label values into account * Added MLP to comparison.ipynb
* level can reach 0 in pruning * discard category is out of the stopping condition * discard category is given by split not by majority
* Implemented DT Importance minimizer * Minimizers now support different accuracy metrics (supplied at init) * SHAP minimizer now makes use of kmeans instead of random samples.
* Implemented DT Importance minimizer * Minimizers now support different accuracy metrics (supplied at init) * SHAP minimizer now makes use of kmeans instead of random samples.
An implementation of SHAP minimizer and the more general class orderedfeature minimizer, which greedily prunes feature DTs based on feature importance values in order to achieve generalizations.