Startup health checks #619

cam-schultz · 2025-01-08T18:00:05Z

Context and scope
Currently, the relayer and signature aggregator return positive health checks as soon as the health API is initialized. Instead, we should wait to signal healthy until the respective application is ready to relay messages/aggregate signatures.

The sole requirement for this is that the relayer/aggregator is connected to sufficient stake for any L1 that it will be aggregating signatures for.

In the relayer case, all such L1s are provided in the config, and are known at startup. Further, the relayer tracks the health of source chains separately. We can mark each source chain as healthy as soon as we are connected to a quorum of stake necessary to construct a valid signature. (The required quorum is determined by the receiving chain, but using the default value of 67% for health check purposes seems reasonable)
For the aggregator, an initial list of L1s may be provided via config, but it also is able to aggregate signatures from an L1 not provided via config. The health check criteria should be that we're connected to sufficient stake on the primary network, and for any initial L1s.

Open questions
This change would expand the semantics of the health check to include readiness on startup. Before, a failed health check was only possible after initialization, and always indicated an unrecoverable error. Here, a failed health check may also indicate that the application is not ready, but may become so. Should this functionality be split into a separate readiness check?

cam-schultz added the enhancement New feature or request label Jan 8, 2025

iansuvak mentioned this issue Jan 27, 2025

Add network connectivity to health-checks #644

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Startup health checks #619

Startup health checks #619

cam-schultz commented Jan 8, 2025

Startup health checks #619

Startup health checks #619

Comments

cam-schultz commented Jan 8, 2025