Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get data refactor #481

Merged
merged 37 commits into from
Sep 5, 2023
Merged

Get data refactor #481

merged 37 commits into from
Sep 5, 2023

Conversation

savente93
Copy link
Contributor

@savente93 savente93 commented Aug 18, 2023

Issue addressed

Fixes #335
Fixes #181
Fixes #462
Fixes #493

Explanation

Though the diff might look a decent size there's actually very little new code. Most of it was just shuffled around a bit. First the all the get_*() functions in the adapters were refactored. Here I tried to keep the structure and naming of the functions as similar as possible, but sadly it wasn't possible to keep them the same because the operations needed were too different.

Additionally the methods that were able to be used in the data catalog are now static methods meaning they can be used by anyone. This sis why I also added doc strings for them, I think this is quite a nice development as ti gives users more options to do things consistently.

I wasn't quite sure whether there was any documentation about this specifically that needed updating. I looked around for it, but couldn't find anything. If there turns out to be some that needs to be updated please let me know.

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation if needed
  • Updated changelog.rst if needed

@savente93 savente93 marked this pull request as ready for review August 21, 2023 11:54
Copy link
Contributor

@DirkEilander DirkEilander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @savente93 ! I have two small comments and one point for discussion:

  • For the RasterDataset and GeoDataset also the slice_time_dimension needs to be applied to the xarray objects. /Can you also verify consistency of the name of this method between the different data types?
  • Can you add tests to show that the data test_data_catalog.py::test_get_data to test that the data objects indeed get sliced.
  • For discussion: the order of the loading, renaming, slicing, unit conversion is not the same for all data types. It would be good to consider if we can standardize this (for a separate issue).

@DirkEilander
Copy link
Contributor

DirkEilander commented Sep 1, 2023

TODO

  • check and make log messages consistent
  • add test that verify slicing of data objects when passed to DataCatalog.get_ methods

@DirkEilander
Copy link
Contributor

DirkEilander commented Sep 1, 2023

@savente93 Did some more work on this one including some other open issues. I've already tried to make the get_data methods of the different adapters consistent. The tests still pass without changes to them, so that seems to work. Can you review my changes and let me know what you think?

Copy link
Contributor Author

@savente93 savente93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @DirkEilander except for two small things. I'm not sure that slice_data should be a private method? it seems like something people might want to use from Python? though if you disagree we can leave it like that. What I think is a problem is that the _slice_data is static so it shouldn't be called like self._slice_data but (e.g.) RasterDataAdapter._slice_data. Aside from that it all looks good to me. Thanks for the update!

df = self._rename_vars(df)
df = self._set_nodata(df)
# slice data
df = self._slice_data(df, variables, time_tuple, logger=logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_slice_data is a static method so you should call it as DataframeAdapter._slice_data I'm rather surprised this works doesn't throw a syntax error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure this should be a private method? it seems like something that python users might want to use, or am I wrong in that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_slice_data is a static method so you should call it as DataframeAdapter._slice_data I'm rather surprised this works doesn't throw a syntax error.

I think it's ok to call the static method _slice_data with self._slice_data within the context of the the class. It will be based on the instance of that class, but since it does not use any properties of the class it returns the same as when calling DataFrameAdapter._slice_data. I only used self as I find it easier to read and it is shorter. See also, this SO question

I'm also not sure this should be a private method? it seems like something that python users might want to use, or am I wrong in that?

I do think it should be a private method, but we can discuss about it. In my opinion the public methods of the Adapters should only be the get_data, to_file and to_dict methods as the purpose is really to access data in a uniform way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I pushed a change before I saw you replied. I'm fine with it staying a private method. we can just leave that as is. As for using self I'm surprised this works. I prefer to call static methods by their class to make the static nature a little more obvious and it also makes copy pasting easier for things like testing. (the self code shouldn't work in the REPL for example according to your SO question). Regardless I don't think this is a blocking issue so we can merge now as far as I'm concerned (though you'll actually have to approve it :P )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hydromt/data_adapter/geodataframe.py Outdated Show resolved Hide resolved
hydromt/data_adapter/geodataset.py Outdated Show resolved Hide resolved
hydromt/data_adapter/rasterdataset.py Outdated Show resolved Hide resolved
hydromt/data_adapter/geodataset.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants