MNT Sklearn1.6 compatibility #447

TamaraAtanasoska · 2024-11-04T14:46:02Z

Reference Issues/PRs

Fixes #443. Not fully working yet.

What does this implement/fix? Explain your changes.

A few of the compatibility errors, including the one explained in #443 are now fixed. There are some external failures(errors happening outside of skops) that I am not sure how to fix. @adrinjalali any tips? Below are the remaining failures:

FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[GraphicalLassoCV(cv=3,max_iter=5)] - FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this ...
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDOneClassSVM(max_iter=5)] - AssertionError

Any other comments?

First PR to the project, the fixes might be off.
I understood #443 as I can remove all SGD stuff from the _sklearn.py file when working with sklearn 1.6+, tests pass like that, I am just not sure if the change is supposed to be so severe?

pyproject.toml

skops/io/_sklearn.py

adrinjalali · 2024-11-11T11:07:58Z

skops/io/tests/test_persist.py

+from sklearn.utils._tags import get_tags
+from sklearn.utils._test_common.instance_generator import _construct_instances


this only works in the new sklearn though. We'd need to have wrappers in fixes.py to support both.

I will try to make everything pass and then work on the separation at the end, it might make more sense since there are more incompatibilities in the same file. we can leave this unresolved as a reminder.

Co-authored-by: Adrin Jalali <[email protected]>

TamaraAtanasoska · 2024-11-11T12:48:48Z

Issue in quantile_forest made: zillow/quantile-forest#103

TamaraAtanasoska · 2024-11-11T14:04:46Z

Question for the last few errors in the utils/tests.
The tests fail for a few estimators that are pulled from sklearn.utils.discovery.all_estimators, and contain a Hinge loss (the output below). I added some output for documentation so it is a bit clearer. Is including the losses somehow necessary after all? (also: SGDClassifier, PassiveAggressiveClassifier)
@adrinjalali

________________________________________________________________________________ test_can_persist_fitted[SGDOneClassSVM(max_iter=5)] ________________________________________________________________________________

estimator = SGDOneClassSVM(max_iter=5, random_state=0)

    @pytest.mark.parametrize(
        "estimator", _tested_estimators(), ids=_get_check_estimator_ids
    )
    def test_can_persist_fitted(estimator):
        """Check that fitted estimators can be persisted and return the right results."""
        set_random_state(estimator, random_state=0)

        X, y = get_input(estimator)
        tags = get_tags(estimator)
        if tags.requires_fit:
            with warnings.catch_warnings():
                warnings.filterwarnings("ignore", module="sklearn")
                if y is not None:
                    print(estimator.__class__.__name__)
                    estimator.fit(X, y)
                else:
                    print(estimator.__class__.__name__)
                    estimator.fit(X)

        # test that we can get a list of untrusted types. This is a smoke test
        # to make sure there are no errors running this method.
        # it is in this test to save time, as it requires a fitted estimator.
        dumped = dumps(estimator)
        untrusted_types = get_untrusted_types(data=dumped)

        loaded = loads(dumped, trusted=untrusted_types)
>       assert_params_equal(estimator.__dict__, loaded.__dict__)

skops/io/tests/test_persist.py:390:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
skops/io/tests/_utils.py:176: in assert_params_equal
    _assert_vals_equal(val1, val2)
skops/io/tests/_utils.py:132: in _assert_vals_equal
    _assert_generic_objects_equal(val1, val2)
skops/io/tests/_utils.py:65: in _assert_generic_objects_equal
    _assert_tuples_equal(val1.__reduce__(), val2.__reduce__())
skops/io/tests/_utils.py:71: in _assert_tuples_equal
    _assert_vals_equal(subval1, subval2)
skops/io/tests/_utils.py:116: in _assert_vals_equal
    _assert_tuples_equal(val1, val2)
skops/io/tests/_utils.py:71: in _assert_tuples_equal
    _assert_vals_equal(subval1, subval2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

val1 = 1.0, val2 = 0.0

    def _assert_vals_equal(val1, val2):
        if isinstance(val1, type):  # e.g. could be np.int64
            assert val1 is val2
        elif hasattr(val1, "__getstate__") and (val1.__getstate__() is not None):
            # This includes BaseEstimator since they implement __getstate__ and
            # that returns the parameters as well.
            # Since Python 3.11, all objects have a __getstate__ but they return
            # None by default, in which case this check is not performed.
            # Some objects return a tuple of parameters, others a dict.
            state1 = val1.__getstate__()
            state2 = val2.__getstate__()
            assert type(state1) == type(state2)
            if isinstance(state1, tuple):
                _assert_tuples_equal(state1, state2)
            else:
                assert_params_equal(val1.__getstate__(), val2.__getstate__())
        elif sparse.issparse(val1):
            assert sparse.issparse(val2) and ((val1 - val2).nnz == 0)
        elif isinstance(val1, (np.ndarray, np.generic)):
            if len(val1.dtype) == 0:
                # for arrays with at least 2 dimensions, check that contiguity is
                # preserved
                if val1.squeeze().ndim > 1:
                    assert val1.flags["C_CONTIGUOUS"] is val2.flags["C_CONTIGUOUS"]
                    assert val1.flags["F_CONTIGUOUS"] is val2.flags["F_CONTIGUOUS"]
                if val1.dtype == object:
                    assert val2.dtype == object
                    assert val1.shape == val2.shape
                    for subval1, subval2 in zip(val1, val2):
                        _assert_generic_objects_equal(subval1, subval2)
                else:
                    # simple comparison of arrays with simple dtypes, almost all
                    # arrays are of this sort.
                    np.testing.assert_array_equal(val1, val2)
            elif len(val1.shape) == 1:
                # comparing arrays with structured dtypes, but they have to be 1D
                # arrays. This is what we get from the Tree's state.
                assert np.all([x == y for x, y in zip(val1, val2)])
            else:
                # we don't know what to do with these values, for now.
                assert False
        elif isinstance(val1, (tuple, list)):
            _assert_tuples_equal(val1, val2)
        elif isinstance(val1, float) and np.isnan(val1):
            assert np.isnan(val2)
        elif isinstance(val1, dict):
            # dictionaries are compared by comparing their values recursively.
            assert set(val1.keys()) == set(val2.keys())
            for key in val1:
                _assert_vals_equal(val1[key], val2[key])
        elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"):
            _assert_vals_equal(val1.__dict__, val2.__dict__)
        elif isinstance(val1, np.ufunc):
            assert val1 == val2
        elif val1.__class__.__module__ == "builtins":
            print(val1, val2)
>           assert val1 == val2
E           AssertionError

skops/io/tests/_utils.py:130: AssertionError
----------------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------------
SGDOneClassSVM
{'nu': 0.5, 'loss': 'hinge', 'penalty': 'l2', 'learning_rate': 'optimal', 'epsilon': 0.1, 'alpha': 0.0001, 'C': 1.0, 'l1_ratio': 0, 'fit_intercept': True, 'shuffle': True, 'random_state': 0, 'verbose': 0, 'eta0': 0.0, 'power_t': 0.5, 'early_stopping': False, 'validation_fraction': 0.1, 'n_iter_no_change': 5, 'warm_start': False, 'average': False, 'max_iter': 5, 'tol': 0.001, 'coef_': array([0.54291007, 0.4536724 , 0.52653133, 0.37803877, 0.63524992,
       0.27842522, 0.4320048 , 0.48570224, 0.27875082, 0.32212221,
       0.54981746, 0.41361283, 0.39087453, 0.52256572, 0.32870666,
       0.49484194, 0.54250836, 0.38164309, 0.58330433, 0.36013618]), 'offset_': array([6.16072957]), 't_': 251.0, 'n_features_in_': 20, '_loss_function_': <sklearn.linear_model._sgd_fast.Hinge object at 0x1689f97f0>, 'n_iter_': 5} {'nu': 0.5, 'loss': 'hinge', 'penalty': 'l2', 'learning_rate': 'optimal', 'epsilon': 0.1, 'alpha': 0.0001, 'C': 1.0, 'l1_ratio': 0, 'fit_intercept': True, 'shuffle': True, 'random_state': 0, 'verbose': 0, 'eta0': 0.0, 'power_t': 0.5, 'early_stopping': False, 'validation_fraction': 0.1, 'n_iter_no_change': 5, 'warm_start': False, 'average': False, 'max_iter': 5, 'tol': 0.001, 'coef_': array([0.54291007, 0.4536724 , 0.52653133, 0.37803877, 0.63524992,
       0.27842522, 0.4320048 , 0.48570224, 0.27875082, 0.32212221,
       0.54981746, 0.41361283, 0.39087453, 0.52256572, 0.32870666,
       0.49484194, 0.54250836, 0.38164309, 0.58330433, 0.36013618]), 'offset_': array([6.16072957]), 't_': 251.0, 'n_features_in_': 20, '_loss_function_': <sklearn.linear_model._sgd_fast.Hinge object at 0x1689f98f0>, 'n_iter_': 5}
0.5 0.5
hinge hinge
l2 l2
optimal optimal
0.1 0.1
0.0001 0.0001
1.0 1.0
0 0
True True
True True
0 0
0 0
0.0 0.0
0.5 0.5
False False
0.1 0.1
5 5
False False
False False
5 5
0.001 0.001
251.0 251.0
20 20
1.0 0.0
`

adrinjalali · 2024-11-13T16:14:24Z

Can you get the output with pytest -l to see all the local variables on the stack trace?

skops/io/_sklearn.py

Co-authored-by: Adrin Jalali <[email protected]>

TamaraAtanasoska · 2024-11-18T11:08:41Z

Can you get the output with pytest -l to see all the local variables on the stack trace?

The issue is about versioning again, taking care of separating the SGD stuff by sklearn version would most likely solve these last errors. I will look into it today, seems a bit more complex in this case then the rest, at least at first glance. I will ping with questions.

TamaraAtanasoska added 6 commits November 4, 2024 13:31

Fix _sgd imports

6901126

Fix _safe_tags import issue

19b71c5

Change _construct_instance import

e718fce

Change get_tags syntax

b3f6401

Ignore FutureWarning in sklearn

dadf4e3

Merge branch 'main' into sklearn1.6-compatibility

0146a74

TamaraAtanasoska changed the title ~~Sklearn1.6 compatibility~~ MNT Sklearn1.6 compatibility Nov 4, 2024

adrinjalali reviewed Nov 11, 2024

View reviewed changes

TamaraAtanasoska and others added 5 commits November 11, 2024 12:20

Update skops/io/_sklearn.py

45af8a0

Co-authored-by: Adrin Jalali <[email protected]>

Update skops/io/_sklearn.py

7abf51e

Co-authored-by: Adrin Jalali <[email protected]>

fix typo

6332470

Fix variable name inconsitency

d8da963

Add clearer message about warning supression

d9a163b

WIP

cb0b215

TamaraAtanasoska added 4 commits November 14, 2024 12:57

Add explicit typing

c367109

Merge branch 'main' into sklearn1.6-compatibility

c96be0b

Remove stray WIP with prints

051eead

Fix tags issues

a1f4344

adrinjalali reviewed Nov 15, 2024

View reviewed changes

skops/io/_sklearn.py Outdated Show resolved Hide resolved

Update skops/io/_sklearn.py

e6b4df3

Co-authored-by: Adrin Jalali <[email protected]>

Make the use of SGD models conditional on sklearn version

ed77ced

TamaraAtanasoska marked this pull request as draft November 18, 2024 14:26

Add relative paths to fix import errors

0983b80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Sklearn1.6 compatibility #447

MNT Sklearn1.6 compatibility #447

TamaraAtanasoska commented Nov 4, 2024 •

edited

Loading

adrinjalali Nov 11, 2024

TamaraAtanasoska Nov 11, 2024

TamaraAtanasoska commented Nov 11, 2024 •

edited

Loading

TamaraAtanasoska commented Nov 11, 2024 •

edited

Loading

adrinjalali commented Nov 13, 2024

TamaraAtanasoska commented Nov 18, 2024

		from sklearn.utils._tags import get_tags
		from sklearn.utils._test_common.instance_generator import _construct_instances

MNT Sklearn1.6 compatibility #447

Are you sure you want to change the base?

MNT Sklearn1.6 compatibility #447

Conversation

TamaraAtanasoska commented Nov 4, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

adrinjalali Nov 11, 2024

Choose a reason for hiding this comment

TamaraAtanasoska Nov 11, 2024

Choose a reason for hiding this comment

TamaraAtanasoska commented Nov 11, 2024 • edited Loading

TamaraAtanasoska commented Nov 11, 2024 • edited Loading

adrinjalali commented Nov 13, 2024

TamaraAtanasoska commented Nov 18, 2024

TamaraAtanasoska commented Nov 4, 2024 •

edited

Loading

TamaraAtanasoska commented Nov 11, 2024 •

edited

Loading

TamaraAtanasoska commented Nov 11, 2024 •

edited

Loading