DOC: better document in the data catalogs if datasets were pre-processed #356

hboisgon · 2023-05-23T04:20:48Z

HydroMT version checks

I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

In the docs here: https://deltares.github.io/hydromt/latest/user_guide/data_existing_cat.html
Maybe also here (meta section) depending on implementation: https://deltares.github.io/hydromt/latest/user_guide/data_prepare_cat.html

Documentation problem

Some of the datasets in the pre-defined catalogs are actually not original but pre-processed data (eg. modis_lai, merit_hydro for some of the layers etc).
Maybe we should find a standard way of letting the user know about this?
See also issue in hydromt-wflow:#157

Known issues:

merit_hydro
merit_hydro_patch
modis_lai
era5 other than hourly (daily, zarr)
chirps

Possibly related:

hydro_lakes
hydro_reservoirs

Suggested fix for documentation

I think so far we tried to use source_url and notes in meta to say if processing was done. For some data sources it's missing but I also wonder if this way is clear to the user or if we should do it differently ?

For example only add source_url if no processing was done.
In case of processing, use new keywords processing_from_url, processing_from_doi, processing_steps?

The text was updated successfully, but these errors were encountered:

DirkEilander · 2023-06-28T14:56:51Z

Part of the solution is found in updating the meta data section in deltares_data.yml & documentation according to:

  meta:
    source_url: zenodo.org/my_dataset # should point to processed data OR original in combi with processing_notes/script
    source_license: CC-BY-3.0
    source_version: vX.X
    paper_ref: Author et al. (year)
    paper_doi: doi
    processing_notes:  <description of process in script OR simple processing steps (e.g. filter / gdalbuildvrt)>
    processing_script: <url to script>
    category: category

It should be checked case by case what is required for reproducibility. there are several options:

publish pre-processed data together with the script on Zenodo (e.g. MODIS_LAI/ MERIT Hydro basins map) and point to this data in source_url
point to scripts in processing_script to download and/or process (e.g. ERA5)
add processing_notes for simple processing to filter data (e.g. hydro_lakes) or create a vrt (merit)
documentation of required data (e.g. bounds is required in the hydrographic region argument unless the basin map and index are present)
check used data sources in examples (e.g. replace merit_hydro with merit_hydro_ihu).

DirkEilander · 2023-06-28T15:00:56Z

In this issue we add the processing_notes to the sources mentioned above. We will follow up in separate issues (#537 ) on the next step

DirkEilander · 2023-10-18T11:25:03Z

FYI: This issue is split into #537 (to identify and make notes on datasets with preprocessing) and more (to be created) issues.

hboisgon added Documentation Improvements or additions to documentation DataCatalog & DataAdapters issues related to the DataCatalog and DataAdapters labels May 23, 2023

hboisgon changed the title ~~DOC: better document in the data catalogs if datasets were pro-processed~~ DOC: better document in the data catalogs if datasets were pre-processed Jun 1, 2023

alimeshgi added this to the Q3 milestone Jun 21, 2023

hboisgon added Datasets request to update or add new datasets and removed DataCatalog & DataAdapters issues related to the DataCatalog and DataAdapters labels Jun 28, 2023

savente93 assigned savente93 and unassigned savente93 Jul 17, 2023

savente93 mentioned this issue Aug 24, 2023

to_stac for creating STAC catalog from HydroMT catalog #405

Closed

savente93 modified the milestones: Q3, Q4 Sep 21, 2023

DirkEilander mentioned this issue Sep 29, 2023

Improve reproducibility of "deltares_data" catalog sources #537

Closed

4 tasks

savente93 assigned hboisgon Oct 5, 2023

DirkEilander assigned Tjalling-dejong Oct 17, 2023

savente93 unassigned hboisgon Oct 20, 2023

Tjalling-dejong added a commit that referenced this issue Nov 10, 2023

Added processing scripts/notes tags where necessary #356

d93f9a5

Tjalling-dejong mentioned this issue Dec 1, 2023

Improve reproducibility dd #667

Merged

5 tasks

savente93 closed this as completed in #667 Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: better document in the data catalogs if datasets were pre-processed #356

DOC: better document in the data catalogs if datasets were pre-processed #356

hboisgon commented May 23, 2023 •

edited by DirkEilander

Loading

DirkEilander commented Jun 28, 2023

DirkEilander commented Jun 28, 2023 •

edited

Loading

DirkEilander commented Oct 18, 2023

DOC: better document in the data catalogs if datasets were pre-processed #356

DOC: better document in the data catalogs if datasets were pre-processed #356

Comments

hboisgon commented May 23, 2023 • edited by DirkEilander Loading

HydroMT version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

DirkEilander commented Jun 28, 2023

DirkEilander commented Jun 28, 2023 • edited Loading

DirkEilander commented Oct 18, 2023

hboisgon commented May 23, 2023 •

edited by DirkEilander

Loading

DirkEilander commented Jun 28, 2023 •

edited

Loading