Catalogue service for the web CSW is a standardised protocol to query remote catalogues.
This library can be used to fetch records from CSW servers.
Some relevant CSW catalogues are:
- Bonares data repository (csw)
- ejpsoil catalogue (csw)
- isric catalogue (csw)
- islandr project (csw)
The script uses the owslib library to fetch records and stores them on a PostGreSQL database table harvest.items
with structure
CREATE TABLE IF NOT EXISTS harvest.items
(
identifier text COLLATE pg_catalog."default" NOT NULL,
identifiertype character varying(50) COLLATE pg_catalog."default",
itemtype character varying(50) COLLATE pg_catalog."default",
resultobject text COLLATE pg_catalog."default" NOT NULL,
resulttype character varying(50) COLLATE pg_catalog."default",
uri text COLLATE pg_catalog."default" NOT NULL,
insert_date timestamp without time zone,
source text COLLATE pg_catalog."default",
hash text COLLATE pg_catalog."default",
turtle text COLLATE pg_catalog."default",
date character varying(10) COLLATE pg_catalog."default",
error text COLLATE pg_catalog."default",
language character varying(9) COLLATE pg_catalog."default",
project text COLLATE pg_catalog."default",
CONSTRAINT item_hash UNIQUE (hash)
)
A harvester run is best configured as a CI-CD pipeline in GIT
environment variables can also be added to a .env file
- POSTGRES_HOST
- POSTGRES_PORT
- POSTGRES_DB
- POSTGRES_USER
- POSTGRES_PASSWORD
- HARVEST_URL
- HARVEST_FILTER
Format json, key-value pairs:
export HARVEST_FILTER='{"keywords":"Soil","type":"dataset"}'