Skip to content

Commit

Permalink
chore(docs): Add doc for Hubble control plane and metrics (microsoft#…
Browse files Browse the repository at this point in the history
…1194)

# Description

Initial PR to add documentation for Hubble control plane and metrics.

## Related Issue

microsoft#1055 
microsoft#1093

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

![image](https://github.com/user-attachments/assets/eafb42b9-795d-42d6-86cf-a9271eaaea4c)


![image](https://github.com/user-attachments/assets/f491ce0e-4456-4190-bb54-7dfa64f7b626)


![image](https://github.com/user-attachments/assets/eab86779-ed75-4d94-877d-e6f48213e865)


![image](https://github.com/user-attachments/assets/d82cc71d-3d74-427d-8e91-e7c1c4b495d9)


Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
SRodi authored and kamilprz committed Jan 13, 2025
1 parent 05e18bd commit 4deb1bf
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 2 deletions.
12 changes: 12 additions & 0 deletions docs/03-Metrics/01-metrics-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Metrics

Prometheus metrics available depend on the Retina control plane deployed.

## Control Planes

There are two control planes used in the Retina project: Hubble and the legacy control plane. Both control planes create metrics and traces which are generated by the eBPF data plane, which has a single implementation. Only one control plane should be deployed at a given time. Helm charts for the deployment are found under `deploy/hubble/manifests/controller/helm/retina` and `deploy/legacy/manifests/controller/helm/retina`.

1. [Hubble metrics](./hubble_metrics.md)
2. [Legacy metrics](./modes/modes.md)

> Note: Hubble offers additional features and metrics that the legacy control plane does not support. The plan is to deprecate the legacy control plane in favor of Hubble. For further documentation on Hubble, check [Cilium/Hubble repository](https://github.com/cilium/hubble/?tab=readme-ov-file#features) and official [Hubble metrics documentation](https://docs.cilium.io/en/stable/observability/metrics/#hubble-metrics)
53 changes: 53 additions & 0 deletions docs/03-Metrics/02-hubble_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Hubble Metrics

When Retina is deployed with Hubble control plane, the metrics include Node-level and Pod-level. Metrics are stored in Prometheus format, and can be viewed in Grafana.

## Metrics

* Node-Level Metrics: These metrics provide insights into traffic volume, dropped packets, number of connections, etc. by node.
* Hubble Metrics (DNS and Pod-Level Metrics): These metrics include source and destination pod information allowing to pinpoint network-related issues at a granular level. Metrics cover traffic volume, dropped packets, TCP resets, L4/L7 packet flows, etc. DNS metrics include DNS errors and DNS requests missing responses.

### Node-Level Metrics

The following metrics are aggregated per node. All metrics include labels:

* `cluster`
* `instance` (Node name)

Retina provides metrics for both Linux and Windows operating systems.
The table below outlines the different metrics generated.

| Metric Name | Description | Extra Labels | Linux | Windows |
|------------------------------------------------|-------------|--------------|-------|---------|
| **networkobservability_forward_count** | Total forwarded packet count | `direction` |||
| **networkobservability_forward_bytes** | Total forwarded byte count | `direction` |||
| **networkobservability_drop_count** | Total dropped packet count | `direction`, `reason` |||
| **networkobservability_drop_bytes** | Total dropped byte count | `direction`, `reason` |||
| **networkobservability_tcp_state** | TCP currently active socket count by TCP state. | `state` |||
| **networkobservability_tcp_connection_remote** | TCP currently active socket count by remote IP/port. | `address` (IP), `port` |||
| **networkobservability_tcp_connection_stats** | TCP connection statistics. (ex: Delayed ACKs, TCPKeepAlive, TCPSackFailures) | `statistic` |||
| **networkobservability_tcp_flag_counters** | TCP packets count by flag. | `flag` |||
| **networkobservability_ip_connection_stats** | IP connection statistics. | `statistic` |||
| **networkobservability_udp_connection_stats** | UDP connection statistics. | `statistic` |||
| **networkobservability_udp_active_sockets** | UDP currently active socket count | |||
| **networkobservability_interface_stats** | Interface statistics. | InterfaceName, `statistic` |||

### Pod-Level Metrics (Hubble Metrics)

The following metrics are aggregated per pod (node information is preserved). All metrics include labels:

* `cluster`
* `instance` (Node name)
* `source`
* `destination`

For *outgoing traffic*, there will be a `source` label with source pod namespace/name.
For *incoming traffic*, there will be a `destination` label with destination pod namespace/name.

| Metric Name | Description | Extra Labels | Linux | Windows |
|----------------------------------|------------------------------|-----------------------|-------|---------|
| **hubble_dns_queries_total** | Total DNS requests by query | `source` or `destination`, `query`, `qtypes` (query type) |||
| **hubble_dns_responses_total** | Total DNS responses by query/response | `source` or `destination`, `query`, `qtypes` (query type), `rcode` (return code), `ips_returned` (number of IPs) |||
| **hubble_drop_total** | Total dropped packet count | `source` or `destination`, `protocol`, `reason` |||
| **hubble_tcp_flags_total** | Total TCP packets count by flag. | `source` or `destination`, `flag` |||
| **hubble_flows_processed_total** | Total network flows processed (L4/L7 traffic) | `source` or `destination`, `protocol`, `verdict`, `type`, `subtype` |||
4 changes: 2 additions & 2 deletions docs/03-Metrics/modes/modes.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_position: 1
sidebar_position: 2
---
# Metric Modes
# Legacy Metric Modes

Retina provides **three modes** with their own metrics and scale capabilities.
Each mode is **fully customizable** (only create the metrics/labels you need).
Expand Down

0 comments on commit 4deb1bf

Please sign in to comment.