Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Change the way NumPy array are persisted #1159

Open
thomass-dev opened this issue Jan 20, 2025 · 0 comments
Open

feat: Change the way NumPy array are persisted #1159

thomass-dev opened this issue Jan 20, 2025 · 0 comments
Assignees

Comments

@thomass-dev
Copy link
Collaborator

thomass-dev commented Jan 20, 2025

Originally planned in #1045 but postponed to another PR because it's not a blocking point :

Is it planned to use np.save and use the NumPy format to dump NumPy objects? It will be safer when reloading it.

Originally posted by @glemaitre in #1052 (comment)

thomass-dev added a commit that referenced this issue Jan 20, 2025
… you get" principle (#1052)

Closes #1045
Closes #734

Refactor the user API to hide all notions of `Item`, `View`, and to
respect "what you put is what you get" from a user's point of view.
Among others:

- Hide item classes in sub-directory to be less visible by users
- All `@cached_property` in items have been removed, because items are
not used anymore directly by users
- Remove the ability to store anything other than strings in `MediaItem`
- Explode `MediaItem` in new item classes
- Add `PickleItem` class which can persist any object when it cannot be
otherwise
- Add `display_as` parameter to `project.put` to control how a string is
displayed in the frontend
- Remove `project.put_item` in such way user need only to use
`project.put`
- The `project.get` function always returns what the user has put
- The `project.get` and `project.get_item_versions` have been merged
- The `CrossValidationItem` has been replaced by a
`CrossValidationReporterItem` based on pickle
- To go fast, and because a report is composed of complex objects, such
as estimator, X and y, i've made the choice to persist the report as a
pickle. That way, we can get a report from the persistency without
effort. In a next iteration, we should think about how to persist more
efficiencly and env-independently a report which can be rebuilt entirely
from the persistency.

---

- [ ] hide item API
    - [x] hide `put_item`
    - [x] hide `get_item`
    - [x] change `get_item_versions` to be item agnostic
- [ ] change the constructor of the `Project` to hide repositories ->
postponed #1160
- [x] update each item classes to return their original objects
    - [x] cross validation reporter
    - [x] pillow image
    - [x] plotly figure
    - [x] altair figure
    - [x] matplotlib figure
    - [x] media item, to only accept str with `display_as`
    - [x] ~primitive~ -> already 🆗 
    - [x] ~pandas dataframe~ -> already 🆗 
    - [x] ~polars dataframe~ -> already 🆗 
    - [x] ~pandas series~ -> already 🆗 
    - [x] ~polars series~ -> already 🆗 
    - [x] ~numpy array~ -> already 🆗 
    - [x] ~scikit-learn estimator~ -> already 🆗 
- [x] set note in each factory
- [x] update `put` to allow the new parameter `display_as`
- [x] hide view API
- [ ] move `repr_html` to reporters ->
#1161
- [ ] change the way numpy array are serialized -> postponed
#1159

---------

Co-authored-by: Auguste Baum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants