This is a collection of python applications and a helper docker image used to index data into a datacube using odc-tools.
The functionality is exposed in form of various <storage backend>-to-dc utilities which accept URI/GLOB parameters and product name(s) to index into a default datacube. These utilities include:
- bootstrap-odc.sh: Shell script to consume URL based metadata and product catalogs and bootstrap a datacube.
- s3-to-dc: Index from S3 storage to a Datacube database.
- thredds-to-dc: Index from Thredds server to a Datacube database.
- sqs-to-dc: Index from SQS queue to a Datacube database.
- stac-to-dc: Index from a STAC API into a Datacube database.
It has code to perform the follow steps:
- Crawl S3 to find datasets using s3-find and produce a generator.
- Crawl Thredds using Thredds Crawler with NCI specific defaults (overrideable).
- Index dataset YAML's found into datacube using generator/list equivalent of dc-index-from-tar while skipping the tar file.
Production deployments of OpenDataCube typically have follow on steps to a new product or new datasets for an existing product getting indexed. These steps are outlined below:
- Use OWS Update ranges to update layer extents for products in OWS managed tables in a separate container.
- Use Explorer Summary generation to generate summaries.
- The 3-containers are tied together by an Airflow DAG using a K8S Executor.
- Utilities in the 3 parts of the datacube applications/library ecosystem are tied together by custom Python scripts.