Skip to content

Commit

Permalink
Merge pull request #462 from neicnordic/docs/restructure-service-desc…
Browse files Browse the repository at this point in the history
…-pos

move service description & communication upper
  • Loading branch information
blankdots authored Nov 28, 2023
2 parents 4364f9d + ec4cc44 commit 2dddc27
Show file tree
Hide file tree
Showing 8 changed files with 197 additions and 195 deletions.
49 changes: 25 additions & 24 deletions sda/cmd/finalize/finalize.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,31 @@ Handles the so-called _Accession ID (stable ID)_ to filename mappings from Centr
At the same time the service fulfills the replication requirement of having distinct backup copies.
For more information see [Federated EGA Node Operations v2](https://ega-archive.org/assets/files/EGA-Node-Operations-v2.pdf) document.

## Service Description

`Finalize` adds stable, shareable _Accession ID_'s to archive files.
If a backup location is configured it will perform backup of a file.
When running, `finalize` reads messages from the configured RabbitMQ queue (commonly: `accession`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `ingestion-accession` schema. If the message can’t be validated it is discarded with an error message in the logs.
2. If the service is configured to perform backups i.e. the `ARCHIVE_` and `BACKUP_` storage backend are set. Archived files will be copied to the backup location.
1. The file size on disk is requested from the storage system.
2. The database file size is compared against the disk file size.
3. A file reader is created for the archive storage file, and a file writer is created for the backup storage file.
3. The file data is copied from the archive file reader to the backup file writer.
4. If the type of the `DecryptedChecksums` field in the message is `sha256`, the value is stored.
5. A new RabbitMQ `complete` message is created and validated against the `ingestion-completion` schema. If the validation fails, an error message is written to the logs.
6. The file accession ID in the message is marked as *ready* in the database. On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
7. The complete message is sent to RabbitMQ. On error, a message is written to the logs.
8. The original RabbitMQ message is Ack'ed.

## Communication

- `Finalize` reads messages from one RabbitMQ queue (commonly: `accession`).
- `Finalize` publishes messages with one routing key (commonly: `completed`).
- `Finalize` assigns the accession ID to a file in the database using the `SetAccessionID` function.

## Configuration

There are a number of options that can be set for the `finalize` service.
Expand Down Expand Up @@ -98,27 +123,3 @@ and if `*_TYPE` is `POSIX`:

- `*_LOCATION`: POSIX path to use as storage root

## Service Description

`Finalize` adds stable, shareable _Accession ID_'s to archive files.
If a backup location is configured it will perform backup of a file.
When running, `finalize` reads messages from the configured RabbitMQ queue (commonly: `accession`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `ingestion-accession` schema. If the message can’t be validated it is discarded with an error message in the logs.
2. If the service is configured to perform backups i.e. the `ARCHIVE_` and `BACKUP_` storage backend are set. Archived files will be copied to the backup location.
1. The file size on disk is requested from the storage system.
2. The database file size is compared against the disk file size.
3. A file reader is created for the archive storage file, and a file writer is created for the backup storage file.
3. The file data is copied from the archive file reader to the backup file writer.
4. If the type of the `DecryptedChecksums` field in the message is `sha256`, the value is stored.
5. A new RabbitMQ `complete` message is created and validated against the `ingestion-completion` schema. If the validation fails, an error message is written to the logs.
6. The file accession ID in the message is marked as *ready* in the database. On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
7. The complete message is sent to RabbitMQ. On error, a message is written to the logs.
8. The original RabbitMQ message is Ack'ed.

## Communication

- `Finalize` reads messages from one RabbitMQ queue (commonly: `accession`).
- `Finalize` publishes messages with one routing key (commonly: `completed`).
- `Finalize` assigns the accession ID to a file in the database using the `SetAccessionID` function.
78 changes: 39 additions & 39 deletions sda/cmd/ingest/ingest.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,45 @@
Splits the Crypt4GH header and moves it to database. The remainder of the file
is sent to the storage backend (archive). No cryptographic tasks are done.

## Service Description

The `ingest` service copies files from the file inbox to the archive, and registers them in the database.

When running, `ingest` reads messages from the configured RabbitMQ queue (commonly: `ingest`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `ingestion-trigger` schema.
If the message can’t be validated it is discarded with an error message in the logs.
2. If the message is of type `cancel`, the file will be marked as `disabled` and the next message in the queue will be read.
3. A file reader is created for the filepath in the message.
If the file reader can’t be created an error is written to the logs, the message is Nacked and forwarded to the error queue.
4. The file size is read from the file reader.
On error, the error is written to the logs, the message is Nacked and forwarded to the error queue.
5. A uuid is generated, and a file writer is created in the archive using the uuid as filename.
On error the error is written to the logs and the message is Nacked and then re-queued.
6. The filename is inserted into the database along with the user id of the uploading user. In case the file is already existing in the database, the status is updated.
Errors are written to the error log.
Errors writing the filename to the database do not halt ingestion progress.
7. The header is read from the file, and decrypted to ensure that it’s encrypted with the correct key.
If the decryption fails, an error is written to the error log, the message is Nacked, and the message is forwarded to the error queue.
8. The header is written to the database.
Errors are written to the error log.
9. The header is stripped from the file data, and the remaining file data is written to the archive.
Errors are written to the error log.
10. The size of the archived file is read.
Errors are written to the error log.
11. The database is updated with the file size, archive path, and archive checksum, and the file is set as *archived*.
Errors are written to the error log.
This error does not halt ingestion.
12. A message is sent back to the original RabbitMQ broker containing the upload user, upload file path, database file id, archive file path and checksum of the archived file.

## Communication

- `Ingest` reads messages from one RabbitMQ queue (commonly: `ingest`).
- `Ingest` publishes messages to one RabbitMQ queue (commonly: `archived`).
- `Ingest` inserts file information in the database using three database functions, `InsertFile`, `StoreHeader`, and `SetArchived`.
- `Ingest` reads file data from inbox storage and writes data to archive storage.

## Configuration

There are a number of options that can be set for the `ingest` service.
Expand Down Expand Up @@ -99,42 +138,3 @@ and if `*_TYPE` is `POSIX`:
- `error`
- `fatal`
- `panic`

## Service Description

The `ingest` service copies files from the file inbox to the archive, and registers them in the database.

When running, `ingest` reads messages from the configured RabbitMQ queue (commonly: `ingest`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `ingestion-trigger` schema.
If the message can’t be validated it is discarded with an error message in the logs.
2. If the message is of type `cancel`, the file will be marked as `disabled` and the next message in the queue will be read.
3. A file reader is created for the filepath in the message.
If the file reader can’t be created an error is written to the logs, the message is Nacked and forwarded to the error queue.
4. The file size is read from the file reader.
On error, the error is written to the logs, the message is Nacked and forwarded to the error queue.
5. A uuid is generated, and a file writer is created in the archive using the uuid as filename.
On error the error is written to the logs and the message is Nacked and then re-queued.
6. The filename is inserted into the database along with the user id of the uploading user. In case the file is already existing in the database, the status is updated.
Errors are written to the error log.
Errors writing the filename to the database do not halt ingestion progress.
7. The header is read from the file, and decrypted to ensure that it’s encrypted with the correct key.
If the decryption fails, an error is written to the error log, the message is Nacked, and the message is forwarded to the error queue.
8. The header is written to the database.
Errors are written to the error log.
9. The header is stripped from the file data, and the remaining file data is written to the archive.
Errors are written to the error log.
10. The size of the archived file is read.
Errors are written to the error log.
11. The database is updated with the file size, archive path, and archive checksum, and the file is set as *archived*.
Errors are written to the error log.
This error does not halt ingestion.
12. A message is sent back to the original RabbitMQ broker containing the upload user, upload file path, database file id, archive file path and checksum of the archived file.

## Communication

- `Ingest` reads messages from one RabbitMQ queue (commonly: `ingest`).
- `Ingest` publishes messages to one RabbitMQ queue (commonly: `archived`).
- `Ingest` inserts file information in the database using three database functions, `InsertFile`, `StoreHeader`, and `SetArchived`.
- `Ingest` reads file data from inbox storage and writes data to archive storage.
32 changes: 16 additions & 16 deletions sda/cmd/intercept/intercept.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

The `intercept` service relays messages between Central EGA and Federated EGA nodes.

## Service Description

When running, `intercept` reads messages from the configured RabbitMQ queue (commonly: `from_cega`).
For each message, these steps are taken:

1. The message type is read from the message `type` field.
1. If the message `type` is not known, an error is logged and the message is Ack'ed.
2. The correct queue for the message is decided based on message type.
3. The message is sent to the queue. This has no error handling as the resend-mechanism hasn't been finished.
4. The message is Ack'ed.

## Communication

- `Intercept` reads messages from one queue (commonly: `from_cega`).
- `Intercept` publishes messages to three queues, `accession`, `ingest`, and `mappings`.

## Configuration

There are a number of options that can be set for the `intercept` service.
Expand Down Expand Up @@ -43,19 +59,3 @@ These settings control how `intercept` connects to the RabbitMQ message broker.
- `error`
- `fatal`
- `panic`

## Service Description

When running, `intercept` reads messages from the configured RabbitMQ queue (commonly: `from_cega`).
For each message, these steps are taken:

1. The message type is read from the message `type` field.
1. If the message `type` is not known, an error is logged and the message is Ack'ed.
2. The correct queue for the message is decided based on message type.
3. The message is sent to the queue. This has no error handling as the resend-mechanism hasn't been finished.
4. The message is Ack'ed.

## Communication

- `Intercept` reads messages from one queue (commonly: `from_cega`).
- `Intercept` publishes messages to three queues, `accession`, `ingest`, and `mappings`.
46 changes: 23 additions & 23 deletions sda/cmd/mapper/mapper.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,29 @@
The mapper service registers mapping of accessionIDs (stable ids for files) to datasetIDs.
Once the file accession ID has been mapped to a dataset ID, the file is removed from the inbox.

## Service Description

The `mapper` service maps file `accessionIDs` to `datasetIDs`.

When running, `mapper` reads messages from the configured RabbitMQ queue (commonly: `mappings`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `dataset-mapping` schema.
If the message can’t be validated it is discarded with an error message is logged.
2. AccessionIDs from the message are mapped to a datasetID (also in the message) in the database.
On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
3. The uploaded files related to each AccessionID is removed from the inbox
If this fails an error will be written to the logs.
4. The RabbitMQ message is Ack'ed.

## Communication

- `Mapper` reads messages from one RabbitMQ queue (commonly: `mappings`).
- `Mapper` maps files to datasets in the database using the `MapFilesToDataset` function.
- `Mapper` retrieves the inbox filepath from the database for each file using the `GetInboxPath` function.
- `Mapper` sets the status of a dataset in the database using the `UpdateDatasetEvent` function.
- `Mapper` removes data from inbox storage.

## Configuration

There are a number of options that can be set for the `mapper` service.
Expand Down Expand Up @@ -93,26 +116,3 @@ and if `*_TYPE` is `POSIX`:
- `error`
- `fatal`
- `panic`

## Service Description

The `mapper` service maps file `accessionIDs` to `datasetIDs`.

When running, `mapper` reads messages from the configured RabbitMQ queue (commonly: `mappings`).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):

1. The message is validated as valid JSON that matches the `dataset-mapping` schema.
If the message can’t be validated it is discarded with an error message is logged.
2. AccessionIDs from the message are mapped to a datasetID (also in the message) in the database.
On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
3. The uploaded files related to each AccessionID is removed from the inbox
If this fails an error will be written to the logs.
4. The RabbitMQ message is Ack'ed.

## Communication

- `Mapper` reads messages from one RabbitMQ queue (commonly: `mappings`).
- `Mapper` maps files to datasets in the database using the `MapFilesToDataset` function.
- `Mapper` retrieves the inbox filepath from the database for each file using the `GetInboxPath` function.
- `Mapper` sets the status of a dataset in the database using the `UpdateDatasetEvent` function.
- `Mapper` removes data from inbox storage.
30 changes: 15 additions & 15 deletions sda/cmd/s3inbox/s3inbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

The `s3inbox` proxies uploads to the an S3 compatible storage backend. Users are authenticated with a JWT instead of `access_key` and `secret_key` used normally for `S3`.

## Service Description

The `s3inbox` proxies uploads to an S3 compatible storage backend.

1. Parses and validates the JWT token (`access_token` in the S3 config file) against the public keys, either locally provisioned or from OIDC JWK endpoints.
2. If the token is valid the file is passed on to the S3 backend
3. The file is registered in the database
4. The `inbox-upload` message is sent to the `inbox` queue, with the `sub` field from the token as the `user` in the message. If this fails an error will be written to the logs.

## Communication

- `s3inbox` proxies uploads to inbox storage.
- `s3inbox` inserts file information in the database using the `RegisterFile` database function and marks it as uploaded in the `file_event_log`
- `s3inbox` writes messages to one RabbitMQ queue (commonly: `inbox`).

## Configuration

There are a number of options that can be set for the `s3inbox` service.
Expand Down Expand Up @@ -91,18 +106,3 @@ These settings control how verify connects to the RabbitMQ message broker.
- `error`
- `fatal`
- `panic`

## Service Description

The `s3inbox` proxies uploads to an S3 compatible storage backend.

1. Parses and validates the JWT token (`access_token` in the S3 config file) against the public keys, either locally provisioned or from OIDC JWK endpoints.
2. If the token is valid the file is passed on to the S3 backend
3. The file is registered in the database
4. The `inbox-upload` message is sent to the `inbox` queue, with the `sub` field from the token as the `user` in the message. If this fails an error will be written to the logs.

## Communication

- `s3inbox` proxies uploads to inbox storage.
- `s3inbox` inserts file information in the database using the `RegisterFile` database function and marks it as uploaded in the `file_event_log`
- `s3inbox` writes messages to one RabbitMQ queue (commonly: `inbox`).
Loading

0 comments on commit 2dddc27

Please sign in to comment.