Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DL4GAMAlps dataset #2508

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

dcodrut
Copy link

@dcodrut dcodrut commented Jan 9, 2025

This PR adds a a Multi-modal Dataset for Glacier Mapping (Segmentation) in the European Alps.

The dataset consists of Sentinel-2 images from 2015 (mainly), 2016 and 2017, and binary segmentation masks for glaciers, based on an inventory built by glaciology experts (Paul et al. 2020).
Given that glacier ice is not always visible in the images, due to seasonal snow, shadow/cloud cover and, most importantly, debris cover, the dataset also includes additional features that can help in the segmentation task.

Here's a sample extracted with the plot function implemented in the dataset:
image

A preprint is available here that describes in detail how the dataset was constructed.
For a shorter description check also: https://huggingface.co/datasets/dcodrut/dl4gam_alps.

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jan 9, 2025
Copy link
Collaborator

@nilsleh nilsleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution :) I hope this first round of review is able to resolve the import xarray error that is preventing the unit tests from running. Once they are working, will take another look for anything else!

torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
@nilsleh nilsleh added this to the 0.7.0 milestone Jan 9, 2025
@github-actions github-actions bot added the dependencies Packaging and dependencies label Jan 9, 2025
@dcodrut
Copy link
Author

dcodrut commented Jan 9, 2025

@microsoft-github-policy-service agree

@dcodrut
Copy link
Author

dcodrut commented Jan 9, 2025

Thanks a lot for the contribution :) I hope this first round of review is able to resolve the import xarray error that is preventing the unit tests from running. Once they are working, will take another look for anything else!

I hope it will be somehow useful. Happy to contribute and learn new things on the way :)
Thanks for the comments and please let me know if there are additional things I should change.

@adamjstewart
Copy link
Collaborator

I'm completely fine with adding xarray and netcdf4 as optional dependencies. We've been talking about adding xarray support for a long time. We're still not sure if it will be directly through xarray or something like rioxarray, but that doesn't matter for this PR.

@adamjstewart
Copy link
Collaborator

Can you resolve the merge conflicts so the tests run?

@adamjstewart
Copy link
Collaborator

adamjstewart commented Jan 15, 2025

Can you resolve the merge conflicts? These requirements files unfortunately get updated a lot, causing merge issues.

adamjstewart
adamjstewart previously approved these changes Jan 15, 2025
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fantastic, just a few minor formatting comments. Thanks for taking the time to contribute your dataset to TorchGeo, let's hope it makes it easier for people to actually use and do cool science with!

requirements/datasets.txt Outdated Show resolved Hide resolved
requirements/datasets.txt Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved
docs/api/datasets/non_geo_datasets.csv Outdated Show resolved Hide resolved
@dcodrut dcodrut requested a review from nilsleh January 20, 2025 14:30
nilsleh
nilsleh previously approved these changes Jan 20, 2025
torchgeo/datasets/dl4gam_alps.py Outdated Show resolved Hide resolved
@dcodrut dcodrut changed the title Add GlacierMappingAlps dataset Add DL4GAMAlps dataset Jan 20, 2025
@nilsleh nilsleh requested a review from adamjstewart January 21, 2025 20:31
@@ -21,6 +21,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
`Forest Damage`_,OD,Drone imagery,"CDLA-Permissive-1.0","1,543",4,"1,500x1,500",,RGB
`GeoNRW`_,S,Aerial,"CC-BY-4.0","7,783",11,"1,000x1,000",1,"RGB, DEM"
`GID-15`_,S,Gaofen-2,-,150,15,"6,800x7,200",3,RGB
`DL4GAM Alps`_,S,"Sentinel-2","CC-BY-4.0","2,251 or 11,440","2","256x256","10","MSI"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the license

Comment on lines +90 to +91
# netcdf4 1.5.4+ required for xarray.open_dataset with engine="netcdf4"
"netcdf4>=1.5.4",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# netcdf4 1.5.4+ required for xarray.open_dataset with engine="netcdf4"
"netcdf4>=1.5.4",
# netcdf4 1.5.8+ required for Python 3.10 wheels
"netcdf4>=1.5.8",

Older versions might work, but are annoying to test in CI due to lack of wheels.

@@ -97,6 +99,8 @@ datasets = [
"scikit-image>=0.19",
# scipy 1.7.2+ required for Python 3.10 wheels
"scipy>=1.7.2",
# xarray 2023.9+ required for xarray.open_dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure xarray.open_dataset existed before 2023, what error do you see for older versions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to update requirements/min-reqs.old with the minimum supported versions so we can test those.

from torchgeo.datasets import DatasetNotFoundError, DL4GAMAlps

pytest.importorskip('xarray', minversion='2023.9')
pytest.importorskip('netCDF4', minversion='1.5.4')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pytest.importorskip('netCDF4', minversion='1.5.4')
pytest.importorskip('netCDF4', minversion='1.5.8')

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""DL4GAMAlps Dataset."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans to release DL4GAM datasets for other locations? If so, we could rename most of these files to dl4gam.py and then have region-specific datasets grouped together.

'B11',
'B12',
)
rgb_nir_swir_bands = ('B4', 'B3', 'B2', 'B8', 'B11') # the subset used in our paper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rgb_nir_swir_bands = ('B4', 'B3', 'B2', 'B8', 'B11') # the subset used in our paper
rgb_nir_swir_bands = ('B4', 'B3', 'B2', 'B8', 'B11') # the subset used in the paper

This dataset requires the following additional libraries to be installed:

* `xarray <https://pypi.org/project/xarray/>`_
* `netcdf4 <https://pypi.org/project/netCDF4/>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically h5netcdf is also a valid engine

Comment on lines +276 to +278
# convert to torch tensors
for k, v in sample.items():
sample[k] = torch.from_numpy(v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid dynamic typing, could we convert to torch tensors earlier or have two different dicts?

a matplotlib Figure with the rendered sample
"""
# we expect the RGB bands to be present
assert {'B4', 'B3', 'B2'}.issubset(set(self.bands)), (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See RGBBandsMissingError. Also need to add Raises: to the docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets dependencies Packaging and dependencies documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants