Knowledge Graph quality checks for KWG data
This repository contains tests that are run against a Knowledge Graph, most likely the staging GraphDB deployment. Each test file contains SPARQL queries that address a set of competency questions about a dataset, domain of thought, or aspect of triplification.
The repository is built as a set of unit tests which can be expanded on as the data changes.
The application config is set in the root config.json
where the endpoint and SPARQL prefixes can be set. New prefixes or
additional project functionality can be injected into this file.
The project is managed with poetry and requires at least Python 10.
To set up a virtual environment with Python 10 run the following
pyenv install 3.10
pyenv local 3.10
To install dependencies and run the SPARQL queries against the database
poetry install
poetry run pytest
To run an individual test
poetry install
poetry run pytest tests/<your_test_.py>
For coverage statistics on the quality_checks/
code,
poetry run pytest --cov=quality_checks/
New queries should be added as tests to an existing file if it seems to fit, otherwise in a new file in the tests/
directory.
There are two suggested ways of testing the graph for content.
Counting This is the recommended approach where the goal is to write a query that counts the number of results, based on some filter. For example,
Count the number of labels that contain unsupported charaters.
If the query returns anything greater than 0, it means that there are un-sanitized labels.
Another example is
Count the number of administrative region level 2's that kwg-ont:sfOverlap
The query should have a count of 0.
Iterating
Fetching all nodes that match some pattern is supported, but may be slow. Because GraphDB limits the number of
results, use the QueryPaginator
class to paginate over all results.
The project makes use of a number of tools for code maintenance. Before committing changes, run the following commands to process the files.
poetry run isort .
poetry run black .
poetry run mypy .
poetry run flake518
Submit pull requests to the main
branch with a small description of the changes and any additional testing steps.