Monitoring the endpoints where the "besluiten" are harvested. This is done by using the delta-consumer-services. When running this app it will do an inital sync of all the harvested data from the endpoints. The data that is added to the virtuoso can be filtered out by defining interesting types in the configuration.
These values can be added as the DCR_SYNC_BASE_URL.
- https://lokaalbeslist-harvester-0.s.redhost.be/
- https://lokaalbeslist-harvester-1.s.redhost.be/
- https://lokaalbeslist-harvester-2.s.redhost.be/
- https://lokaalbeslist-harvester-3.s.redhost.be/
Delta production consists of four stages.
The first stage is an initial sync of the publication graph. This stage takes the necessary data from the source graph and populates the publication graph. Afterward, it creates a dump file (as a dcat:Dataset
) to make it available for consumers. The reasons for this stage are:
- Usually, there is already relevant data available in the database for consumers.
- Packaging it as a dump file speeds up the first sync for consumers compared to using small delta files.
The second stage, after the initial sync, is the 'normal operation mode'. In this stage, internal deltas come in, and the publication graph maintainer decides whether the data needs to be published to the outside world.
The third stage is 'healing mode', where a periodic job checks if any internal deltas were missed and corrects this by updating the published information. This can occur due to migration (not creating deltas), service crashes, premature shutdowns, etc.
The fourth stage involves creating a periodic dump file (or snapshot) of the published data. This allows new consumers to start from the latest snapshot instead of replaying all the small delta files from the beginning.
Note: All these steps can be turned off, but this is not the default setting.
To ensure that the app can share data, it is necessary to set up the producers. First, ensure a significant dataset has been ingested by the consumers.
The delta-producer-background-jobs-initiator is responsible for initiating the initial sync job. To trigger this job, follow the steps below.
-
ONLY in case you are flushing and restarting from scratch (i.e.
rm -rf data
), ensure in./config/delta-producer/background-jobs-initiator/config.json
[ { "name": "mandatees-decisions", # (...) other config "startInitialSync": false, # changed from 'true' to 'false' # (...) other config } ]
- And also ensure some data has been ingested before starting the initial sync.
-
Make sure the app is up and running, and the migrations have run.
-
In
./config/delta-producer/background-jobs-initiator/config.json
file, make sure the following configuration is changed:[ { "name": "mandatees-decisions", # (...) other config "startInitialSync": true, # changed from 'false' to 'true' # (...) other config } ]
-
Restart the services:
drc restart delta-producer-background-jobs-initiator
-
You can follow the status of the job, through the dashboard frontend.
-
If the job was a success; 'normal operation mode' will take over automatically
Please note that the system expects this initial sync job to run only once. If something fails (or gets stuck on busy for an excessive amount of time), delete the job through the dashboard. Assuming the configuration is still the same, simply run drc restart delta-producer-background-jobs-initiator
.
There are also other ways to trigger this job; please refer to the docs of delta-producer-background-jobs-initiator
.
Also; if something goes wrong; the first logs to check are these of the delta-producer-publication-graph-maintainer
.
If the initial sync is successful, it should automatically work. Note that if the healing job is running, it will temporarily disable normal operation mode until the healing is finished.
If the initial sync is successful, the default configuration will ensure healing kicks in periodically. The service for managing these jobs is again delta-producer-background-jobs-initiator. Check for ./config/delta-producer/background-jobs-initiator/config.json
.
Please note that the system expects only one healing job to run at a time. If you want to restart it, first delete the previous healing job through the dashboard. To restart the healing job manually, please refer to the documentation of delta-producer-background-jobs-initiator
.
Basically, it comes down to running the command:
docker-compose exec delta-producer-background-jobs-initiator curl -X POST http://localhost/mandatees-decisions/healing-jobs
Also; if something goes wrong; the first logs to check are these of the delta-producer-publication-graph-maintainer
.
Dumps are used by consumers as a snapshot to start from, this is faster than consuming all delta's. They are generated by the delta-producer-dump-file-publisher which is started by a task created by the delta-producer-background-jobs-initiator. The necessary config is already present in this repository, but you need to enable them by updating the config. It's recommended to set up dumps on a regular interval.
To enable dumps, edit ./config/delta-producer/background-job-initiator/config.json
enable creation by setting disableDumpFileCreation
to false
and set the cron pattern you need:
"dumpFileCreationJobOperation": "http://redpencil.data.gift/id/jobs/concept/JobOperation/deltas/deltaDumpFileCreation/besluiten",
"initialPublicationGraphSyncJobOperation": "http://redpencil.data.gift/id/jobs/concept/JobOperation/deltas/initialPublicationGraphSyncing/besluiten",
"healingJobOperation": "http://redpencil.data.gift/id/jobs/concept/JobOperation/deltas/healingOperation/besluiten",
"cronPatternDumpJob": "0 10 0 * * 6",
"cronPatternHealingJob": "0 0 2 * * *",
"startInitialSync": false,
"errorCreatorUri": "http://lblod.data.gift/services/delta-producer-background-jobs-initiator-besluiten",
"disableDumpFileCreation": false
}
Make sure to restart the background-job-initiator service after changing the config.
Dumps will be generated in data/files/delta-producer-dumps.
docker compose restart delta-producer-background-jobs-initiator
Please note that the system expects only one dump job to run at a time.
You can delete the respective job in the jobs-dashboard
. To trigger it manually on the spot, refer to the delta-producer-background-jobs-initiator
documentation. Also, if something goes wrong, the first logs to check are those of the delta-producer-dump-file-publisher
.
The publication triple store was introduced for several reasons:
- The data it contains is operational data (published info) which is not the source data of your app. This makes it easier to manage code-wise, as you don't need to account for this data in your original triplestore (e.g., migrations remain the same, and the code doesn't need to consider that graph).
- Performance-wise, it is usually better for the source database since it doesn't need to manage a potential duplicate of your data.
- In some apps, this triple store is also used as the store for the landing zone of the consumer, serving as a safe space for messy (incomplete) data, which you can filter out when storing in the source database.
It's a bit of a workaround for the future features of mu-auth.