Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoDataFrameAdapter slices data even if the source is empty for the geometry #584

Closed
2 tasks done
Jaapel opened this issue Oct 13, 2023 · 11 comments · Fixed by #621
Closed
2 tasks done

GeoDataFrameAdapter slices data even if the source is empty for the geometry #584

Jaapel opened this issue Oct 13, 2023 · 11 comments · Fixed by #621
Assignees
Labels
Bug Something isn't working
Milestone

Comments

@Jaapel
Copy link
Contributor

Jaapel commented Oct 13, 2023

HydroMT version checks

  • I have checked that this issue has not already been reported.
  • I have checked that this bug exists on the latest version of HydroMT.

Reproducible Example

import geopandas as gpd
from hydromt.data_adapter import GeoDataFrameAdapter
from shapely import box

fn = "s3://hydromt-data/hydrography/reservoirs/reservoir-db.fgb"
geom = (9.654999999999973, 0.3475000000002808, 9.861666666666451, 0.4866666666668209)
if __name__ == "__main__":
    da = GeoDataFrameAdapter(path=fn, driver="vector")
    gdf = da.get_data(bbox=None, geom=gpd.GeoSeries(box(*geom)).set_crs(epsg=4326))
    gdf

Current behaviour

Raises Exception because of no data in the current DataFrame

Desired behaviour

Should not raise exception

Additional context

No response

@Jaapel Jaapel added Bug Something isn't working Needs refinement issue still needs refinement labels Oct 13, 2023
@Jaapel
Copy link
Contributor Author

Jaapel commented Oct 13, 2023

For P drive location: fn = "P:\\wflow_global\hydromt\...

@DirkEilander
Copy link
Contributor

Do you have a suggestion of what the desired behavior should be. Should it return None and empty GeoDataFrame, something else? A potential issue is that these could result in unclear error messages downstream. Therefore, we decided that a consistent and clear error across all the Adapter.get_data methods would be best and where applicable these can be captured with try-except statements where needed.

@Jaapel
Copy link
Contributor Author

Jaapel commented Oct 18, 2023

Somehow, this behaviour is not observed in the 0.8 conda version of hydroMT. Do you know where the difference could come from?

@Jaapel
Copy link
Contributor Author

Jaapel commented Oct 18, 2023

Otherwise, the hydroMT-wflow plugin should catch this Exception

@DirkEilander
Copy link
Contributor

DirkEilander commented Oct 18, 2023

Somehow, this behaviour is not observed in the 0.8 conda version of hydroMT. Do you know where the difference could come from?

This has likely changed as part of #481 I think catching the error in the plugins makes sense. But happy to discuss alternatives. I think it's important that the behaviour is consistent across all Adapters (this was not the case before).

@savente93
Copy link
Contributor

savente93 commented Oct 19, 2023

This would need to be coordinated with the plugins to be implemented correctly, but personally I quite like the pandas approach here as demonstrated in their to_datetime method:

errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
 - If 'raise', then invalid parsing will raise an exception.
 - If 'coerce', then invalid parsing will be set as NaT
 - If 'ignore', then invalid parsing will return the input.

(source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) this would give the users a bit more flexibility in how these things are handled. Just a suggestion

@Jaapel
Copy link
Contributor Author

Jaapel commented Oct 19, 2023

So in this case we would do some sort of a nodata check in the get_data method. Can you think of more strategies here @savente93 @DirkEilander ? Otherwise it can be a flag.

class NoDataStrategy(Enum):
    _raise = "raise"
    ignore = "ignore"

def get_data(
    self,
    bbox=None,
    geom=None,
    buffer=0,
    predicate="intersects",
    logger=logger,
    variables=None,
    handle_empty=NoDataStrategy._raise
):
    ...

@savente93
Copy link
Contributor

No, I don't think a coerce option makes sense here, I can't really imagine when you'd like to get back just an empty dataset, so I think this is fine. Though perhaps if one of the plugins has a need for another we could consider it

@DirkEilander
Copy link
Contributor

DirkEilander commented Oct 20, 2023

You suggestion looks good to me @Jaapel! Some questions / considerations:

  • In case of ignore, what would be returned? None?
  • If I'm not mistaken the old behavior in some cases was to actually return empty objects. But I agree with Sam this has no added value here.
  • What is the advantage of a NoDataStrategy Class here over a simple string?
  • Should we make this option broader than just the empty data handling to deal with errors within this method in general with an errors='raise' option similar to the pandas method referred earlier?
  • Note that this should also be implemented in the DataCatalog.get_rasterdataset methods and similar.

@savente93 savente93 added this to the Q4 milestone Oct 20, 2023
@Jaapel
Copy link
Contributor Author

Jaapel commented Oct 20, 2023

You suggestion looks good to me @Jaapel! Some questions / considerations:

* In case of ignore, what would be returned? `None`?
* If I'm not mistaken the _old_ behavior in some cases was to actually return empty objects. But I agree with Sam this has no added value here.

Empty DataFrame? Possibly, some metadata in the data objects may be useful? Otherwise None is fine

* What is the advantage of a `NoDataStrategy` Class here over a simple string?

I like Enums here, as that is what they are made for. You can easily see which options are available and you get early errors, but we can deal with strings too if that is the style of the repository.

* Should we make this option broader than just the empty data handling to deal with errors within this method in general with an `errors='raise'` option similar to the pandas method referred earlier?

For pandas the errors method is about parsing. If we want to ignore errors is general, I would be more specific with errors (like NoDataException and have usings of the package catch these themselves.

* Note that this should also be implemented in the `DataCatalog.get_rasterdataset` methods and similar.

Yes I am thinking about adding a @abstractmethod to a new DataAdapter.empty(self, data_obj) so that we can have different definitions of empty per DataAdapter

@savente93
Copy link
Contributor

I'm actually really in favor of enums, Honestly I think they could be use much more throughout the repository.

@Jaapel Jaapel mentioned this issue Oct 27, 2023
5 tasks
@savente93 savente93 removed the Needs refinement issue still needs refinement label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants