-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support for multiple data catalogs with same source names #329
Comments
also see #331 |
Discussed possibilities around this issue with @DirkEilander and putting down a quick summary of the relevant bits here: In essence there are three ways we could structure the {
'esa_worldcover_2020_v100':
{
'aws': RasterDataAdapter,
'deltares': RasterDataAdapter,
'last': 'aws',
},
...:...
}
{
'aws': {
'esa_worldcover_2020_v100':RasterDataAdapter,
...:...
},
'deltares': {
'esa_worldcover_2020_v100': RasterDataAdapter,
...:...
}
} or {
'aws:esa_worldcover_2020_v100': RasterDataAdapter,
'deltares:esa_worldcover_2020_v100' : RasterDataAdapter,
'esa_worldcover_2020_v100': RasterDataAdapter,
...:...
} After some brainstorming we decided to go for the first option. Fistly this is more efficient both computationally as well as conceptually (users are much more likely to ask for a dataset that might have different providers, than they are to ask for "all aws data". The last option would be the closes to the current implementation, but requires a lot of extra bookkeeping on the part of the user, whereas with a nested dictionary we can do pretty much all of that bookkeeping for them. It will also mesh better with an idea we have to introduce versions and how we want to change how the alias is used, but I'll detail that part more in the relevant question. To avoid feature creep I'll just focus on this level, and after that we should probably discuss implications of versions with the team down the line. (Dirk, feel free to comment if I've forgotten, or misunderstood anything) |
In addition to the comment above. When users specify a source in the configuration yml file the user can specify the source. We should keep in mind that we might also want to specify the data source version in a similar manner going forward (see #148) Option 1: in one string using e.g. setup_lulcmaps:
lulc: esa_worldcover_2020_v100:aws Option 2: explicit extra argument. This would mean that the DataCatalog.get_rasterdataset, DataCatalog.get_geodataset, etc. methods get a dictionary of arguments instead of a single string. setup_lulcmaps:
lulc:
source: esa_worldcover_2020_v100
catalog: aws |
After another descussion we decided to start deprecating the esa_worldcover:
base:
crs: 4326
data_type: RasterDataset
driver: raster
kwargs:
chunks:
x: 36000
y: 36000
meta:
category: landuse
source_license: CC BY 4.0
source_url: https://doi.org/10.5281/zenodo.5571936
path: landuse/esa_worldcover/esa-worldcover.vrt
versions:
- version: 2020
kwargs:
chunks:
x: 36000
y: 36000
path: landuse/esa_worldcover/esa-worldcover-2020.vrt
- version: 2021
kwargs:
chunks:
x: 36000
y: 36000
path: landuse/esa_worldcover/esa-worldcover-2021.vrt (note that here the |
Feature Type
Enhancement Description
When a
DataCatalog
is initialized based on multiple data catalog (yaml) files with the same source names, only the source of last data catalog will be kept and earlier sources with the same name overwritten.By storing the sources internally as e.g., <data_catalog>/<source_name> names become unique, we avoid overwriting, and we can have the access to the same data with the same name from different sources.
Feature Description
DataCatalog.update
requires a new data_catalog argument which should be used to set combined name.DataCatalog.__getitem__
should be able to obtain data based on the just the source_name for backwards compatibility based on some logic which data_catalog to select.Additional Context
We need to decide on the "couple sign" (
/
in the example above) to combine the data_catalog and source_nameThe text was updated successfully, but these errors were encountered: