Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Sklearn1.6 compatibility #447

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

TamaraAtanasoska
Copy link

@TamaraAtanasoska TamaraAtanasoska commented Nov 4, 2024

Reference Issues/PRs

Fixes #443. Not fully working yet.

What does this implement/fix? Explain your changes.

A few of the compatibility errors, including the one explained in #443 are now fixed. There are some external failures(errors happening outside of skops) that I am not sure how to fix. @adrinjalali any tips? Below are the remaining failures:

FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[GraphicalLassoCV(cv=3,max_iter=5)] - FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this ...
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDOneClassSVM(max_iter=5)] - AssertionError

Any other comments?

First PR to the project, the fixes might be off.
I understood #443 as I can remove all SGD stuff from the _sklearn.py file when working with sklearn 1.6+, tests pass like that, I am just not sure if the change is supposed to be so severe?

@TamaraAtanasoska TamaraAtanasoska changed the title Sklearn1.6 compatibility MNT Sklearn1.6 compatibility Nov 4, 2024
pyproject.toml Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
Comment on lines +45 to +46
from sklearn.utils._tags import get_tags
from sklearn.utils._test_common.instance_generator import _construct_instances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only works in the new sklearn though. We'd need to have wrappers in fixes.py to support both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to make everything pass and then work on the separation at the end, it might make more sense since there are more incompatibilities in the same file. we can leave this unresolved as a reminder.

@TamaraAtanasoska
Copy link
Author

TamaraAtanasoska commented Nov 11, 2024

Issue in quantile_forest made: zillow/quantile-forest#103

@TamaraAtanasoska
Copy link
Author

TamaraAtanasoska commented Nov 11, 2024

Question for the last few errors in the utils/tests.
The tests fail for a few estimators that are pulled from sklearn.utils.discovery.all_estimators, and contain a Hinge loss (the output below). I added some output for documentation so it is a bit clearer. Is including the losses somehow necessary after all? (also: SGDClassifier, PassiveAggressiveClassifier)
@adrinjalali

________________________________________________________________________________ test_can_persist_fitted[SGDOneClassSVM(max_iter=5)] ________________________________________________________________________________

estimator = SGDOneClassSVM(max_iter=5, random_state=0)

    @pytest.mark.parametrize(
        "estimator", _tested_estimators(), ids=_get_check_estimator_ids
    )
    def test_can_persist_fitted(estimator):
        """Check that fitted estimators can be persisted and return the right results."""
        set_random_state(estimator, random_state=0)

        X, y = get_input(estimator)
        tags = get_tags(estimator)
        if tags.requires_fit:
            with warnings.catch_warnings():
                warnings.filterwarnings("ignore", module="sklearn")
                if y is not None:
                    print(estimator.__class__.__name__)
                    estimator.fit(X, y)
                else:
                    print(estimator.__class__.__name__)
                    estimator.fit(X)

        # test that we can get a list of untrusted types. This is a smoke test
        # to make sure there are no errors running this method.
        # it is in this test to save time, as it requires a fitted estimator.
        dumped = dumps(estimator)
        untrusted_types = get_untrusted_types(data=dumped)

        loaded = loads(dumped, trusted=untrusted_types)
>       assert_params_equal(estimator.__dict__, loaded.__dict__)

skops/io/tests/test_persist.py:390:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
skops/io/tests/_utils.py:176: in assert_params_equal
    _assert_vals_equal(val1, val2)
skops/io/tests/_utils.py:132: in _assert_vals_equal
    _assert_generic_objects_equal(val1, val2)
skops/io/tests/_utils.py:65: in _assert_generic_objects_equal
    _assert_tuples_equal(val1.__reduce__(), val2.__reduce__())
skops/io/tests/_utils.py:71: in _assert_tuples_equal
    _assert_vals_equal(subval1, subval2)
skops/io/tests/_utils.py:116: in _assert_vals_equal
    _assert_tuples_equal(val1, val2)
skops/io/tests/_utils.py:71: in _assert_tuples_equal
    _assert_vals_equal(subval1, subval2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

val1 = 1.0, val2 = 0.0

    def _assert_vals_equal(val1, val2):
        if isinstance(val1, type):  # e.g. could be np.int64
            assert val1 is val2
        elif hasattr(val1, "__getstate__") and (val1.__getstate__() is not None):
            # This includes BaseEstimator since they implement __getstate__ and
            # that returns the parameters as well.
            # Since Python 3.11, all objects have a __getstate__ but they return
            # None by default, in which case this check is not performed.
            # Some objects return a tuple of parameters, others a dict.
            state1 = val1.__getstate__()
            state2 = val2.__getstate__()
            assert type(state1) == type(state2)
            if isinstance(state1, tuple):
                _assert_tuples_equal(state1, state2)
            else:
                assert_params_equal(val1.__getstate__(), val2.__getstate__())
        elif sparse.issparse(val1):
            assert sparse.issparse(val2) and ((val1 - val2).nnz == 0)
        elif isinstance(val1, (np.ndarray, np.generic)):
            if len(val1.dtype) == 0:
                # for arrays with at least 2 dimensions, check that contiguity is
                # preserved
                if val1.squeeze().ndim > 1:
                    assert val1.flags["C_CONTIGUOUS"] is val2.flags["C_CONTIGUOUS"]
                    assert val1.flags["F_CONTIGUOUS"] is val2.flags["F_CONTIGUOUS"]
                if val1.dtype == object:
                    assert val2.dtype == object
                    assert val1.shape == val2.shape
                    for subval1, subval2 in zip(val1, val2):
                        _assert_generic_objects_equal(subval1, subval2)
                else:
                    # simple comparison of arrays with simple dtypes, almost all
                    # arrays are of this sort.
                    np.testing.assert_array_equal(val1, val2)
            elif len(val1.shape) == 1:
                # comparing arrays with structured dtypes, but they have to be 1D
                # arrays. This is what we get from the Tree's state.
                assert np.all([x == y for x, y in zip(val1, val2)])
            else:
                # we don't know what to do with these values, for now.
                assert False
        elif isinstance(val1, (tuple, list)):
            _assert_tuples_equal(val1, val2)
        elif isinstance(val1, float) and np.isnan(val1):
            assert np.isnan(val2)
        elif isinstance(val1, dict):
            # dictionaries are compared by comparing their values recursively.
            assert set(val1.keys()) == set(val2.keys())
            for key in val1:
                _assert_vals_equal(val1[key], val2[key])
        elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"):
            _assert_vals_equal(val1.__dict__, val2.__dict__)
        elif isinstance(val1, np.ufunc):
            assert val1 == val2
        elif val1.__class__.__module__ == "builtins":
            print(val1, val2)
>           assert val1 == val2
E           AssertionError

skops/io/tests/_utils.py:130: AssertionError
----------------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------------
SGDOneClassSVM
{'nu': 0.5, 'loss': 'hinge', 'penalty': 'l2', 'learning_rate': 'optimal', 'epsilon': 0.1, 'alpha': 0.0001, 'C': 1.0, 'l1_ratio': 0, 'fit_intercept': True, 'shuffle': True, 'random_state': 0, 'verbose': 0, 'eta0': 0.0, 'power_t': 0.5, 'early_stopping': False, 'validation_fraction': 0.1, 'n_iter_no_change': 5, 'warm_start': False, 'average': False, 'max_iter': 5, 'tol': 0.001, 'coef_': array([0.54291007, 0.4536724 , 0.52653133, 0.37803877, 0.63524992,
       0.27842522, 0.4320048 , 0.48570224, 0.27875082, 0.32212221,
       0.54981746, 0.41361283, 0.39087453, 0.52256572, 0.32870666,
       0.49484194, 0.54250836, 0.38164309, 0.58330433, 0.36013618]), 'offset_': array([6.16072957]), 't_': 251.0, 'n_features_in_': 20, '_loss_function_': <sklearn.linear_model._sgd_fast.Hinge object at 0x1689f97f0>, 'n_iter_': 5} {'nu': 0.5, 'loss': 'hinge', 'penalty': 'l2', 'learning_rate': 'optimal', 'epsilon': 0.1, 'alpha': 0.0001, 'C': 1.0, 'l1_ratio': 0, 'fit_intercept': True, 'shuffle': True, 'random_state': 0, 'verbose': 0, 'eta0': 0.0, 'power_t': 0.5, 'early_stopping': False, 'validation_fraction': 0.1, 'n_iter_no_change': 5, 'warm_start': False, 'average': False, 'max_iter': 5, 'tol': 0.001, 'coef_': array([0.54291007, 0.4536724 , 0.52653133, 0.37803877, 0.63524992,
       0.27842522, 0.4320048 , 0.48570224, 0.27875082, 0.32212221,
       0.54981746, 0.41361283, 0.39087453, 0.52256572, 0.32870666,
       0.49484194, 0.54250836, 0.38164309, 0.58330433, 0.36013618]), 'offset_': array([6.16072957]), 't_': 251.0, 'n_features_in_': 20, '_loss_function_': <sklearn.linear_model._sgd_fast.Hinge object at 0x1689f98f0>, 'n_iter_': 5}
0.5 0.5
hinge hinge
l2 l2
optimal optimal
0.1 0.1
0.0001 0.0001
1.0 1.0
0 0
True True
True True
0 0
0 0
0.0 0.0
0.5 0.5
False False
0.1 0.1
5 5
False False
False False
5 5
0.001 0.001
251.0 251.0
20 20
1.0 0.0
`

@adrinjalali
Copy link
Member

Can you get the output with pytest -l to see all the local variables on the stack trace?

skops/io/_sklearn.py Outdated Show resolved Hide resolved
Co-authored-by: Adrin Jalali <[email protected]>
@TamaraAtanasoska
Copy link
Author

Can you get the output with pytest -l to see all the local variables on the stack trace?

The issue is about versioning again, taking care of separating the SGD stuff by sklearn version would most likely solve these last errors. I will look into it today, seems a bit more complex in this case then the rest, at least at first glance. I will ping with questions.

@TamaraAtanasoska TamaraAtanasoska marked this pull request as draft November 18, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make skops compatible with scikit-learn 1.6
2 participants