-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get data refactor #481
Get data refactor #481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @savente93 ! I have two small comments and one point for discussion:
- For the RasterDataset and GeoDataset also the
slice_time_dimension
needs to be applied to the xarray objects. /Can you also verify consistency of the name of this method between the different data types? - Can you add tests to show that the data
test_data_catalog.py::test_get_data
to test that the data objects indeed get sliced. - For discussion: the order of the loading, renaming, slicing, unit conversion is not the same for all data types. It would be good to consider if we can standardize this (for a separate issue).
…rning if wrong CRS #462
TODO
|
@savente93 Did some more work on this one including some other open issues. I've already tried to make the get_data methods of the different adapters consistent. The tests still pass without changes to them, so that seems to work. Can you review my changes and let me know what you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @DirkEilander except for two small things. I'm not sure that slice_data
should be a private method? it seems like something people might want to use from Python? though if you disagree we can leave it like that. What I think is a problem is that the _slice_data
is static so it shouldn't be called like self._slice_data
but (e.g.) RasterDataAdapter._slice_data
. Aside from that it all looks good to me. Thanks for the update!
hydromt/data_adapter/dataframe.py
Outdated
df = self._rename_vars(df) | ||
df = self._set_nodata(df) | ||
# slice data | ||
df = self._slice_data(df, variables, time_tuple, logger=logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_slice_data
is a static method so you should call it as DataframeAdapter._slice_data
I'm rather surprised this works doesn't throw a syntax error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure this should be a private method? it seems like something that python users might want to use, or am I wrong in that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_slice_data
is a static method so you should call it asDataframeAdapter._slice_data
I'm rather surprised this works doesn't throw a syntax error.
I think it's ok to call the static method _slice_data
with self._slice_data
within the context of the the class. It will be based on the instance of that class, but since it does not use any properties of the class it returns the same as when calling DataFrameAdapter._slice_data
. I only used self
as I find it easier to read and it is shorter. See also, this SO question
I'm also not sure this should be a private method? it seems like something that python users might want to use, or am I wrong in that?
I do think it should be a private method, but we can discuss about it. In my opinion the public methods of the Adapters should only be the get_data
, to_file
and to_dict
methods as the purpose is really to access data in a uniform way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I pushed a change before I saw you replied. I'm fine with it staying a private method. we can just leave that as is. As for using self
I'm surprised this works. I prefer to call static methods by their class to make the static nature a little more obvious and it also makes copy pasting easier for things like testing. (the self code shouldn't work in the REPL for example according to your SO question). Regardless I don't think this is a blocking issue so we can merge now as far as I'm concerned (though you'll actually have to approve it :P )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Issue addressed
Fixes #335
Fixes #181
Fixes #462
Fixes #493
Explanation
Though the diff might look a decent size there's actually very little new code. Most of it was just shuffled around a bit. First the all the
get_*()
functions in the adapters were refactored. Here I tried to keep the structure and naming of the functions as similar as possible, but sadly it wasn't possible to keep them the same because the operations needed were too different.Additionally the methods that were able to be used in the data catalog are now static methods meaning they can be used by anyone. This sis why I also added doc strings for them, I think this is quite a nice development as ti gives users more options to do things consistently.
I wasn't quite sure whether there was any documentation about this specifically that needed updating. I looked around for it, but couldn't find anything. If there turns out to be some that needs to be updated please let me know.
Checklist
main