Import data from external data sources into the portal.
The docker-importer is a docker container which has functionalities for metadata import. Currently the following sources are supported:
- Wikidata
- zbMATH
- CRAN
- arXiv
- polyDB
- crossref
- Zenodo
- ORCID
The importer functionality is encapsulated within the python package mardi_importer. This package is installed in a docker environment to run cronjobs that schedule the import tasks.
The documentation for the package mardi_importer is available at:
mardi4nfdi.github.io/docker-importer
The importer interacts with the wikibase instance deployed at portal.mardi4nfdi.de using the mardiclient package.
- Imports should run at least once a day and import only new data
- The importer shall be extendable, i.e. imports from different sources should be possible
- Ideally, import operations should be switchable by configuration, i.e. without editing the program
- If an import doesn't succeeed, a rollback should be possible
Copy config/import_config.config.template to config/import_config.config and edit.
Create a local image:
docker build -t ghcr.io/mardi4nfdi/docker-importer:main .
Update portal-compose-dev.yml
or portal-compose.override.yml
in the portal-compose
folder with the appropriate volume route to link to the mardi_importer
folder, e.g.
importer:
restart: ${RESTART}
volumes:
- ../docker-importer/mardi_importer:/mardi_importer:ro
The documentation for the mardi_importer
package is updated for every push to main by running
make html
in docs/
and deploying to the gh-pages
branch.
The result is directly available at
mardi4nfdi.github.io/docker-importer
First install the requirements from requirements.txt
,
pip install -r requirements.txt
Then install the python package bundle ("mardi-importer") via
pip install -U -e .
-U
enforces reinstalling the package, with -e
modifications in
the source files are automatically taken into account.
Note: it is recommended (when not using docker) for local installations to use virtual environments.
To run the tests, do:
docker exec -ti mardi-importer /bin/bash /tests/run_tests.sh