`GeoDataFrameAdapter` slices data even if the source is empty for the geometry #584

Jaapel · 2023-10-13T14:51:13Z

HydroMT version checks

I have checked that this issue has not already been reported.
I have checked that this bug exists on the latest version of HydroMT.

Reproducible Example

import geopandas as gpd
from hydromt.data_adapter import GeoDataFrameAdapter
from shapely import box

fn = "s3://hydromt-data/hydrography/reservoirs/reservoir-db.fgb"
geom = (9.654999999999973, 0.3475000000002808, 9.861666666666451, 0.4866666666668209)
if __name__ == "__main__":
    da = GeoDataFrameAdapter(path=fn, driver="vector")
    gdf = da.get_data(bbox=None, geom=gpd.GeoSeries(box(*geom)).set_crs(epsg=4326))
    gdf

Current behaviour

Raises Exception because of no data in the current DataFrame

Desired behaviour

Should not raise exception

Additional context

No response

Jaapel · 2023-10-13T14:56:17Z

For P drive location: fn = "P:\\wflow_global\hydromt\...

DirkEilander · 2023-10-16T15:38:35Z

Do you have a suggestion of what the desired behavior should be. Should it return None and empty GeoDataFrame, something else? A potential issue is that these could result in unclear error messages downstream. Therefore, we decided that a consistent and clear error across all the Adapter.get_data methods would be best and where applicable these can be captured with try-except statements where needed.

Jaapel · 2023-10-18T15:11:48Z

Somehow, this behaviour is not observed in the 0.8 conda version of hydroMT. Do you know where the difference could come from?

Jaapel · 2023-10-18T15:12:20Z

Otherwise, the hydroMT-wflow plugin should catch this Exception

DirkEilander · 2023-10-18T15:34:00Z

Somehow, this behaviour is not observed in the 0.8 conda version of hydroMT. Do you know where the difference could come from?

This has likely changed as part of #481 I think catching the error in the plugins makes sense. But happy to discuss alternatives. I think it's important that the behaviour is consistent across all Adapters (this was not the case before).

savente93 · 2023-10-19T06:56:39Z

This would need to be coordinated with the plugins to be implemented correctly, but personally I quite like the pandas approach here as demonstrated in their to_datetime method:

errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
 - If 'raise', then invalid parsing will raise an exception.
 - If 'coerce', then invalid parsing will be set as NaT
 - If 'ignore', then invalid parsing will return the input.

(source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) this would give the users a bit more flexibility in how these things are handled. Just a suggestion

Jaapel · 2023-10-19T13:33:39Z

So in this case we would do some sort of a nodata check in the get_data method. Can you think of more strategies here @savente93 @DirkEilander ? Otherwise it can be a flag.

class NoDataStrategy(Enum):
    _raise = "raise"
    ignore = "ignore"

def get_data(
    self,
    bbox=None,
    geom=None,
    buffer=0,
    predicate="intersects",
    logger=logger,
    variables=None,
    handle_empty=NoDataStrategy._raise
):
    ...

savente93 · 2023-10-19T14:03:07Z

No, I don't think a coerce option makes sense here, I can't really imagine when you'd like to get back just an empty dataset, so I think this is fine. Though perhaps if one of the plugins has a need for another we could consider it

DirkEilander · 2023-10-20T09:26:45Z

You suggestion looks good to me @Jaapel! Some questions / considerations:

In case of ignore, what would be returned? None?
If I'm not mistaken the old behavior in some cases was to actually return empty objects. But I agree with Sam this has no added value here.
What is the advantage of a NoDataStrategy Class here over a simple string?
Should we make this option broader than just the empty data handling to deal with errors within this method in general with an errors='raise' option similar to the pandas method referred earlier?
Note that this should also be implemented in the DataCatalog.get_rasterdataset methods and similar.

Jaapel · 2023-10-20T12:15:54Z

You suggestion looks good to me @Jaapel! Some questions / considerations:

* In case of ignore, what would be returned? `None`?
* If I'm not mistaken the _old_ behavior in some cases was to actually return empty objects. But I agree with Sam this has no added value here.

Empty DataFrame? Possibly, some metadata in the data objects may be useful? Otherwise None is fine

* What is the advantage of a `NoDataStrategy` Class here over a simple string?

I like Enums here, as that is what they are made for. You can easily see which options are available and you get early errors, but we can deal with strings too if that is the style of the repository.

* Should we make this option broader than just the empty data handling to deal with errors within this method in general with an `errors='raise'` option similar to the pandas method referred earlier?

For pandas the errors method is about parsing. If we want to ignore errors is general, I would be more specific with errors (like NoDataException and have usings of the package catch these themselves.

* Note that this should also be implemented in the `DataCatalog.get_rasterdataset` methods and similar.

Yes I am thinking about adding a @abstractmethod to a new DataAdapter.empty(self, data_obj) so that we can have different definitions of empty per DataAdapter

savente93 · 2023-10-20T13:04:28Z

I'm actually really in favor of enums, Honestly I think they could be use much more throughout the repository.

Jaapel added Bug Something isn't working Needs refinement issue still needs refinement labels Oct 13, 2023

savente93 assigned DirkEilander and Jaapel Oct 19, 2023

Jaapel mentioned this issue Oct 19, 2023

Compatibility with nodata requirements of core Deltares/hydromt_wflow#208

Closed

savente93 added this to the Q4 milestone Oct 20, 2023

Jaapel mentioned this issue Oct 27, 2023

Handle nodata data adapter #621

Merged

5 tasks

Jaapel closed this as completed in #621 Oct 30, 2023

savente93 removed the Needs refinement issue still needs refinement label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`GeoDataFrameAdapter` slices data even if the source is empty for the geometry #584

`GeoDataFrameAdapter` slices data even if the source is empty for the geometry #584

Jaapel commented Oct 13, 2023

Jaapel commented Oct 13, 2023

DirkEilander commented Oct 16, 2023

Jaapel commented Oct 18, 2023

Jaapel commented Oct 18, 2023

DirkEilander commented Oct 18, 2023 •

edited

Loading

savente93 commented Oct 19, 2023 •

edited

Loading

Jaapel commented Oct 19, 2023 •

edited

Loading

savente93 commented Oct 19, 2023

DirkEilander commented Oct 20, 2023 •

edited

Loading

Jaapel commented Oct 20, 2023 •

edited

Loading

savente93 commented Oct 20, 2023

GeoDataFrameAdapter slices data even if the source is empty for the geometry #584

GeoDataFrameAdapter slices data even if the source is empty for the geometry #584

Comments

Jaapel commented Oct 13, 2023

HydroMT version checks

Reproducible Example

Current behaviour

Desired behaviour

Additional context

Jaapel commented Oct 13, 2023

DirkEilander commented Oct 16, 2023

Jaapel commented Oct 18, 2023

Jaapel commented Oct 18, 2023

DirkEilander commented Oct 18, 2023 • edited Loading

savente93 commented Oct 19, 2023 • edited Loading

Jaapel commented Oct 19, 2023 • edited Loading

savente93 commented Oct 19, 2023

DirkEilander commented Oct 20, 2023 • edited Loading

Jaapel commented Oct 20, 2023 • edited Loading

savente93 commented Oct 20, 2023

`GeoDataFrameAdapter` slices data even if the source is empty for the geometry #584

`GeoDataFrameAdapter` slices data even if the source is empty for the geometry #584

DirkEilander commented Oct 18, 2023 •

edited

Loading

savente93 commented Oct 19, 2023 •

edited

Loading

Jaapel commented Oct 19, 2023 •

edited

Loading

DirkEilander commented Oct 20, 2023 •

edited

Loading

Jaapel commented Oct 20, 2023 •

edited

Loading