Open-source Data integration platform
OpenHEXA is an open-source data integration platform developed by Bluesquare.
Its goal is to facilitate data integration and analysis workflows, in particular in the context of public health projects.
Please refer to the OpenHEXA wiki for more information about OpenHEXA.
This repository contains the code for what we call the app
component, which mostly offers a GraphQL API and an
infrastructure to run data pipelines.
OpenHEXA App is published as a Docker Image on Docker Hub: blsq/openhexa-app.
You can use docker run blsq/openhexa-app help
to list the available commands.
The Installation instructions section of our wiki gives an overview of the local development setup required to run OpenHEXA locally.
To ease the setup of the environment and management of dependencies, we are using containerization, in particular
Docker. As such, we provide a docker-compose.yaml
file for local development.
When running the App component using docker compose
, the code of this repository is mounted as a volume within the
container, so that any change you make in your local copy of the codebase is directly reflected in the running
container.
The following steps will get you up and running:
cp .env.dist .env # adapt the .env file with the required configuration values
# Set WORKSPACE_STORAGE_LOCATION to a local directory to use a local storage backend for workspaces
docker network create openhexa
docker compose build
docker compose run app fixtures
docker compose up
This will correctly configure all the environment variables, fill the database with some initial data and start the base
db
and app
services. The app is then exposed on localhost:8000
. Two main paths are available:
- http://localhost:8000/graphql for the GraphQL API
- http://localhost:8000/ready for the readiness endpoint
Anything else will be redirected to the frontend served at http://localhost:3000
.
You can then log in with the following credentials: [email protected]
/root
Python requirements are handled with pip-tools, you will need to install it.
When you want to add a requirement, simply update requirements.in
and run pip-compile
in the root directory. You
can then rebuild the Docker image.
If you want to run the app in debugger mode, you can override the default command to execute by adding a
docker-compose.debug.yaml
file in order to use the your favorite debugger package and wait for a debugger to attach.
# docker-compose.debug.yaml
services:
app:
entrypoint: []
command:
- "sh"
- "-c"
# If you want to wait for the debugger client to be attached before running the server
# - |
# pip install debugpy \
# && python -m debugpy --listen 0.0.0.0:5678 --wait-for-client /code/manage.py runserver 0.0.0.0:8000
- |
pip install debugpy \
&& python -m debugpy --listen 0.0.0.0:5678 /code/manage.py runserver 0.0.0.0:8000
ports:
- "8000:8000"
- "5678:5678"
You can then add a new configuration in VSCode to run the app in debugger mode:
# .vscode/launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach OpenHEXA Debugger",
"type": "debugpy",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/code"
}
],
"django": true,
"justMyCode": false
}
]
}
Run the app with docker compose -f docker-compose.yaml -f docker-compose.debug.yaml up
& start the debugger from VSCode.
# docker-compose.debug.yaml
services:
app:
entrypoint: []
# Used when running in normal mode.
command: ["/code/docker-entrypoint.sh", "manage", "runserver", "0.0.0.0:8000"]
ports:
- "8000:8000"
Create a new interpreter configuration in Pycharm with the following settings:
Create a new django server run configuration by setting the following options:
- Python interpreter: The one you just created
- In "Docker Compose" section; Command and options:
-f docker-compose.yaml -f docker-compose.debug.yaml up
Run the configuration in debug mode.
By default, we assume that the frontend is served outside from this project. If you want to run the frontend locally,
you can use the frontend
profile:
docker compose --profile frontend up
The frontend is then served on http://localhost:3000
.
By default, the latest
tag will be used but you can set a PR number or use the main branch:
FRONTEND_VERSION=main docker compose --profile frontend up
FRONTEND_VERSION=pr-604 docker compose --profile frontend up
If you need the pipelines or want to work on them, there are 2 optional services to run: pipelines_runner
and/or
pipelines_scheduler
. You can run them with the following command instead of docker compose up
:
docker compose --profile pipelines up
The Writing OpenHEXA pipelines section of the wiki contains the instructions needed to build and deploy a data pipeline on OpenHEXA.
To deploy and run data pipelines locally, you will need to:
- Create a workspace on your local instance
- Configure the SDK to use your local instance as the backend
openhexa config set_url http://localhost:8000
You can now deploy your pipelines to your local OpenHEXA instance.
Please refer to the SDK documentation for more information.
Generation of file samples and metadata calculation are done in separate worker, in order to run it locally you
can make use of dataset_worker
by adding dataset_worker
profile to the list of enabed profiles.
docker compose --profile dataset_worker up
If you need the optional services dataworker
, you can run the following command instead of docker compose up
:
docker compose --profile dataworker up
The app Docker image contains an entrypoint. You can use the following to list the available commands:
docker compose run app help
As an example, use the following command to run the migrations:
docker compose run app migrate
We use Mixpanel to track users and their actions. If you want to enable it, set the MIXPANEL_TOKEN
environment variable with the token from your Mixpanel project and restart the application.
Running the tests is as simple as:
docker compose run app test --settings=config.settings.test
Some tests call external resources (such as the public DHIS2 API) and will slow down the suite. You can exclude them when running the test suite for unrelated parts of the codebase:
docker compose run app test --exclude-tag=external --settings=config.settings.test
You can run a specific test as it follows:
docker compose run app test hexa.core.tests.CoreTest.test_ready_200 --settings=config.settings.test
There are many other options, if you want to find out more, look at the documentation of Django test harness, as it is what we are using.
You can extract the strings to translate with the following command:
docker compose run app manage makemessages -l fr # Where fr is the language code
You can then translate the strings in the hexa/locale
folder.
To compile the translations, run the following command:
docker compose run app manage compilemessages
Our python code is linted using ruff
. It also handles code formatting, and import sorting.
We currently target the Python 3.9 syntax.
We use a pre-commit hook to lint the code before committing. Make sure that pre-commit
is
installed, and run pre-commit install
the first time you check out the code. Linting will again be checked
when submitting a pull request.
You can run the lint tools manually using pre-commit run --all
.
This library follows Semantic Versioning. Tagging and releases' creation are managed by release-please that will create and maintain a pull request with the next release based on the commit messages of the new commits.
Triggering a new release is done by merging the pull request created by release-please. The result is:
- the changelog.md is updated with the commit messages
- a GitHub release is created
- a docker image is built for the new tag and pushed on the docker registry
This project assumes you are using Conventional Commit messages.
The most important prefixes you should have in mind are:
fix:
which represents bug fixes, and correlates to a SemVer patch.feat:
which represents a new feature, and correlates to a SemVer minor.feat!:
, orfix!:
,refactor!:
, etc., which represent a breaking change (indicated by the!
) and will result in a SemVer major.