-
Notifications
You must be signed in to change notification settings - Fork 56
Monitoring
The monitoring plugin has both gateway and agent modes.
- The gateway mode is responsible for creating/updating/deleting the
MonitoringCluster
custom resource, which the Opni Manager reconciles to deploy the various Cortex components in the upstream cluster. - The agent mode is responsible for creating/updating/deleting a
Prometheus
custom resource, which the (currently external) Prometheus Operator reconciles to deploy a Prometheus agent-mode instance in the downstream cluster.
The gateway-side plugin (or "capability backend") and the agent-side plugin (or "capability node") form the logical metrics
capability, which can be "installed" onto clusters by adding the capability name to a list in the cluster's metadata. Adding the capability to a cluster will change the desired state for that node, then re-sync the node (see below).
The CortexAdmin
API is a management API extension that allows programmatic access to a selection of useful Cortex APIs. These APIs are mostly used for obtaining status and diagnostics, debugging and troubleshooting, or performing uncommon administrative tasks.
The CortexOps
API is a Management API extension that are used in the CLI and admin dashboard to create, update, or delete the upstream Cortex cluster by reconciling a MonitoringCluster
custom resource, which is controlled by the Opni Manager.
Note: the mechanism for deploying the gateway-side components is currently left as an implementation detail for each backend. However, a suitable generic set of APIs will likely be added to the management api to control upstream capability lifecycle in the future.
Changing the upstream configuration using the CortexOps
API or installing/uninstalling the metrics capability on a cluster using the management API changes the desired state of the associated nodes. When the metrics backend detects that a node (or nodes) no longer matches the current desired state, it will send a SyncNow
message to the affected node(s). This triggers each node to send a Sync
request to the backend, which then responds with a configuration describing the desired state for that node's resources. The node then reconciles its resources to match the desired state.
On startup, the node will send a Sync
request to the backend to obtain the initial desired state as soon as it connects to the stream.
The backend may also send periodic sync requests at a low frequency (on the order of minutes) to help recover from issues such as accidental modification of managed resources.
Architecture
- Backends
- Core Components