diff --git a/docs/03-Metrics/01-metrics-intro.md b/docs/03-Metrics/01-metrics-intro.md new file mode 100644 index 0000000000..70075df8b2 --- /dev/null +++ b/docs/03-Metrics/01-metrics-intro.md @@ -0,0 +1,12 @@ +# Metrics + +Prometheus metrics available depend on the Retina control plane deployed. + +## Control Planes + +There are two control planes used in the Retina project: Hubble and the legacy control plane. Both control planes create metrics and traces which are generated by the eBPF data plane, which has a single implementation. Only one control plane should be deployed at a given time. Helm charts for the deployment are found under `deploy/hubble/manifests/controller/helm/retina` and `deploy/legacy/manifests/controller/helm/retina`. + +1. [Hubble metrics](./hubble_metrics.md) +2. [Legacy metrics](./modes/modes.md) + +> Note: Hubble offers additional features and metrics that the legacy control plane does not support. The plan is to deprecate the legacy control plane in favor of Hubble. For further documentation on Hubble, check [Cilium/Hubble repository](https://github.com/cilium/hubble/?tab=readme-ov-file#features) and official [Hubble metrics documentation](https://docs.cilium.io/en/stable/observability/metrics/#hubble-metrics) diff --git a/docs/03-Metrics/02-hubble_metrics.md b/docs/03-Metrics/02-hubble_metrics.md new file mode 100644 index 0000000000..502ab5ae90 --- /dev/null +++ b/docs/03-Metrics/02-hubble_metrics.md @@ -0,0 +1,53 @@ +# Hubble Metrics + +When Retina is deployed with Hubble control plane, the metrics include Node-level and Pod-level. Metrics are stored in Prometheus format, and can be viewed in Grafana. + +## Metrics + +* Node-Level Metrics: These metrics provide insights into traffic volume, dropped packets, number of connections, etc. by node. +* Hubble Metrics (DNS and Pod-Level Metrics): These metrics include source and destination pod information allowing to pinpoint network-related issues at a granular level. Metrics cover traffic volume, dropped packets, TCP resets, L4/L7 packet flows, etc. DNS metrics include DNS errors and DNS requests missing responses. + +### Node-Level Metrics + +The following metrics are aggregated per node. All metrics include labels: + +* `cluster` +* `instance` (Node name) + +Retina provides metrics for both Linux and Windows operating systems. +The table below outlines the different metrics generated. + +| Metric Name | Description | Extra Labels | Linux | Windows | +|------------------------------------------------|-------------|--------------|-------|---------| +| **networkobservability_forward_count** | Total forwarded packet count | `direction` | ✅ | ✅ | +| **networkobservability_forward_bytes** | Total forwarded byte count | `direction` | ✅ | ✅ | +| **networkobservability_drop_count** | Total dropped packet count | `direction`, `reason` | ✅ | ✅ | +| **networkobservability_drop_bytes** | Total dropped byte count | `direction`, `reason` | ✅ | ✅ | +| **networkobservability_tcp_state** | TCP currently active socket count by TCP state. | `state` | ✅ | ✅ | +| **networkobservability_tcp_connection_remote** | TCP currently active socket count by remote IP/port. | `address` (IP), `port` | ✅ | ❌ | +| **networkobservability_tcp_connection_stats** | TCP connection statistics. (ex: Delayed ACKs, TCPKeepAlive, TCPSackFailures) | `statistic` | ✅ | ✅ | +| **networkobservability_tcp_flag_counters** | TCP packets count by flag. | `flag` | ❌ | ✅ | +| **networkobservability_ip_connection_stats** | IP connection statistics. | `statistic` | ✅ | ❌ | +| **networkobservability_udp_connection_stats** | UDP connection statistics. | `statistic` | ✅ | ❌ | +| **networkobservability_udp_active_sockets** | UDP currently active socket count | | ✅ | ❌ | +| **networkobservability_interface_stats** | Interface statistics. | InterfaceName, `statistic` | ✅ | ✅ | + +### Pod-Level Metrics (Hubble Metrics) + +The following metrics are aggregated per pod (node information is preserved). All metrics include labels: + +* `cluster` +* `instance` (Node name) +* `source` +* `destination` + +For *outgoing traffic*, there will be a `source` label with source pod namespace/name. +For *incoming traffic*, there will be a `destination` label with destination pod namespace/name. + +| Metric Name | Description | Extra Labels | Linux | Windows | +|----------------------------------|------------------------------|-----------------------|-------|---------| +| **hubble_dns_queries_total** | Total DNS requests by query | `source` or `destination`, `query`, `qtypes` (query type) | ✅ | ❌ | +| **hubble_dns_responses_total** | Total DNS responses by query/response | `source` or `destination`, `query`, `qtypes` (query type), `rcode` (return code), `ips_returned` (number of IPs) | ✅ | ❌ | +| **hubble_drop_total** | Total dropped packet count | `source` or `destination`, `protocol`, `reason` | ✅ | ❌ | +| **hubble_tcp_flags_total** | Total TCP packets count by flag. | `source` or `destination`, `flag` | ✅ | ❌ | +| **hubble_flows_processed_total** | Total network flows processed (L4/L7 traffic) | `source` or `destination`, `protocol`, `verdict`, `type`, `subtype` | ✅ | ❌ | diff --git a/docs/03-Metrics/modes/modes.md b/docs/03-Metrics/modes/modes.md index 11eb3ac720..a19dba64a8 100644 --- a/docs/03-Metrics/modes/modes.md +++ b/docs/03-Metrics/modes/modes.md @@ -1,7 +1,7 @@ --- -sidebar_position: 1 +sidebar_position: 2 --- -# Metric Modes +# Legacy Metric Modes Retina provides **three modes** with their own metrics and scale capabilities. Each mode is **fully customizable** (only create the metrics/labels you need).