Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH - Gets SciKeras script working #394

Open
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

lazarust
Copy link
Contributor

WIP still need to fix one test

Reference Issues/PRs

Fixes #388

What does this implement/fix? Explain your changes.

This fixes a recursion error that was happening when dumping a scikeras model.

Any other comments?

@lazarust lazarust marked this pull request as ready for review October 11, 2023 00:08
@lazarust lazarust changed the title Gets SciKeras script working ENH - Gets SciKeras script working Oct 13, 2023
@lazarust
Copy link
Contributor Author

It looks like all the pytest tests are failing with You have exceeded our daily quotas for action: createRepo. We invite you to retry later.

@BenjaminBossan
Copy link
Collaborator

Should be addressed by #398

@BenjaminBossan
Copy link
Collaborator

@lazarust Could you please solve the merge conflict? Regarding the uncovered new line: Would that be solved by adding tests for scikeras?

@lazarust
Copy link
Contributor Author

@BenjaminBossan I've fixed the merge conflict.

Yeah, I believe it would be but I was unsure if we wanted to have tests that included another library like that. Should I add one?

@BenjaminBossan
Copy link
Collaborator

Yeah, I believe it would be but I was unsure if we wanted to have tests that included another library like that. Should I add one?

Yes, it would be good, since that was the initial reason for the change. We have external library tests, see here:

https://github.com/skops-dev/skops/blob/main/skops/io/tests/test_external.py

For scikeras, it won't be possible to add a comprehensive coverage of all possible models, so a simple example should be enough. If it's possible to add a unit test independently of scikeras that explicitly tests weakrefs, that would also be good, not sure how easy it is to do.

Regarding the failing tests, at first glance, it appears to be a change in the model repr in the latest sklearn version, so not related to the changes here.

@lazarust
Copy link
Contributor Author

@BenjaminBossan Sorry this took me so long to get back to. I've added a test to hit that line.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the added scikeras tests. I just have a small request to improve them a little bit.


pipeline = Pipeline([("classifier", clf)])

dump(clf, "keras-test.skops")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just dumping, could we please do a cycle of dumps and loads, similar to the other tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I now have it dumping the model, loading it back in and comparing the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of dump completely in favor of dumps? That way, we also don't need to care about cleaning up any files created during the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Sorry about that. I didn't realize the difference between dumps and dump 🤯

@BenjaminBossan
Copy link
Collaborator

@adrinjalali The tests for sklearn nightly are failing because the model repr was changed (not sure why). Normally, we could fix that by having the test check the sklearn verison, but this is a doctest. Any idea how it can be fixed, short of skipping it whole?


X, y = make_classification(1000, 20, n_informative=10, random_state=0)
clf.fit(X, y)
dumped = dumps(clf, "keras-test.skops")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dumped = dumps(clf, "keras-test.skops")
dumped = dumps(clf)

2nd argument to dumps is the compression level. Honestly, I'm surprised that this didn't raise an error.

@BenjaminBossan
Copy link
Collaborator

Ugh, the list of trusted modules is giant now :D I guess it's related to the tensorflow change. Could you please explain why that was necessary? Also, we now get this error on CI:

E AttributeError: module 'scikeras' has no attribute 'wrappers'

@lazarust
Copy link
Contributor Author

@BenjaminBossan Yeah, sorry I realized I had marked the test method as a @pytest.fixture 🤦🏽, so the test wasn't ever running (which is why it wasn't erroring when switching from dump to dumps). I'm hoping to get the test fixed and cleanup that list of trusted modules today.

@lazarust
Copy link
Contributor Author

@BenjaminBossan After staring at this all day, I could use some help lol. For some reason, there's some infinite recursion happening when constructing the tree that I can't figure out why.

Initially, I thought it was due to the CachedNodes in the tree, but the construct method isn't getting hit for those. I've gone through the tree and didn't see anything outrageous so if you could take a look that'd be great!

@adrinjalali
Copy link
Member

I'll have to check when I'm back. Still off till end of November. But I remember CacheNode caused some issues when I was working on it, partly due to small integers and other common objects having the same id in python.

information for CachedNodes

I'm wodering after looking at the types of a lot of the CachedNodes if there's something weird happening with `None`
@lazarust
Copy link
Contributor Author

@adrinjalali I created an issue in tensorflow to discuss this and make sure I wasn't missing anything tensorflow/tensorflow#68194.

It seems like the best way forward is to ignore the warnings for now until Tensorflow supports protobuf 5.0+. Does that sound good to you? Ignoring warnings doesn't sound like the best thing I'm just unsure of a better way forward.

@adrinjalali
Copy link
Member

Ignoring warnings in cases where we know why they're happening and with a comment as when the ignore statement should be removed is a normal practice indeed. Thanks for the follow up work on the TF side.

@lazarust
Copy link
Contributor Author

@adrinjalali This should be ready for you to look at again. The failing tests seem to just be a blip with codecov

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts looking at this in more details (other than the inline comments)

  • we are delegating the save / load to tensorflow? I'd like to see a test showing the user the right error when they try to load such a model without explicitly trusting including modules.
  • we don't really include the inner modules in our json tree here, which means we have no idea what's inside that keras model, which means we're prone to massive exploits, this seems it beats the purpose of this format.

@lazarust you've done great work so far, let me know if you need me to have a more detailed look. I haven't personally tried to solve this project for TF, but I could have a look.

skops/io/_scikeras.py Outdated Show resolved Hide resolved

with tempfile.TemporaryDirectory() as temp_dir:
file_name = os.path.join(temp_dir, "model.keras")
obj.model.save(file_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we only save the model attribute? This sounds odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of https://keras.io/guides/serialization_and_saving/, it seems that Keras is compressing all the pieces of the models into the .keras file. Should I change the name of the file to make it more clear?

@lazarust
Copy link
Contributor Author

lazarust commented Jul 2, 2024

@adrinjalali I think I've addressed most of your comments, and apologize it took me a bit to get back to this.

For

we don't really include the inner modules in our json tree here, which means we have no idea what's inside that keras model, which means we're prone to massive exploits, this seems it beats the purpose of this format.

I'm a little confused by what you mean by inner modules... could you elaborate on what you mean?

Sorry this PR has been taking so long to get ironed out!

@adrinjalali
Copy link
Member

@lazarust I went down the rabbit hole of reading the persistence code from keras, and I think it's easier if I push to this PR some changes. So I'll update this one, and you can review the work if that's okay.

@lazarust
Copy link
Contributor Author

@adrinjalali Sounds good to me, let me know if there's any thing I can do to help!

@adrinjalali
Copy link
Member

@lazarust , this is what I had in mind.

However, this has a major issue:

The user can use keras.src.saving.saving_lib.save_model with weights_format="npz", or manually create a zip file with npz weights. Then The issue is we have this in tensorflow:

https://github.com/keras-team/keras/blob/d4a51168bfedf69a9aae7ddff289277972dfd85d/keras/src/saving/saving_lib.py#L1027

Which allows loading pickles through numpy. So we can only merge / support this, if we have a way to make sure there are no pickle objects in that zip file.

@lazarust
Copy link
Contributor Author

@adrinjalali Ah the way you used tf directly makes sense to me.

As for the pickle file issue, I don't think there's a good way of doing this... Do you know if the way TF loads the files only looks for .npz files or does it try and load any file in the zip through numpy?

@adrinjalali
Copy link
Member

I don't think it matters which files TF finds to load, as long as it allows loading pickle files. What we can investigate to move this further, is if we can reliably monkey-patch and disable the pickle machinery while loading a skops file, so that in case there's a pickle load happening somewhere, we make sure we fail.

@lazarust
Copy link
Contributor Author

Yeah I agree, I guess I just thought if TF only tries to load .npz files, maybe skops could just ignore all .npz.

Is monkey-patching TF something we'd want to support long term? If TF changed how model saving/loading worked in a future version we'd have to make updates anyways...

@adrinjalali
Copy link
Member

It would be monkey patching pickle, not TF.

@lazarust
Copy link
Contributor Author

Ah, yeah I don't think that would be too bad... we'd just need to monkey-patch the load functionality

@lazarust
Copy link
Contributor Author

lazarust commented Oct 31, 2024

@adrinjalali Sorry I still haven't gotten to this. Have you made any progress on patching pickle?

@adrinjalali
Copy link
Member

No I need to get to it soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: dump fails on keras models with RecursionError: maximum recursion depth exceeded, after calling fit
3 participants