Skip to content

Commit

Permalink
docs: Update Kubernetes terminology in operations pages (grafana#12235)
Browse files Browse the repository at this point in the history
Co-authored-by: J Stickler <[email protected]>
Co-authored-by: Jack Baldry <[email protected]>
  • Loading branch information
3 people authored and rhnasc committed Apr 12, 2024
1 parent 5950b9b commit 98c1340
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/sources/operations/loki-canary/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ The `-labelname` and `-labelvalue` flags should also be provided, as these are
used by Loki Canary to filter the log stream to only process logs for the
current instance of the canary. Ensure that the values provided to the flags are
unique to each instance of Loki Canary. Grafana Labs' Tanka config
accomplishes this by passing in the pod name as the label value.
accomplishes this by passing in the Pod name as the label value.
If Loki Canary reports a high number of `unexpected_entries`, Loki Canary may
not be waiting long enough and the value for the `-wait` flag should be
Expand Down
14 changes: 8 additions & 6 deletions docs/sources/operations/storage/wal.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The WAL also includes a backpressure mechanism to allow a large WAL to be replay

## Changes to deployment

1. Since ingesters need to have the same persistent volume across restarts/rollout, all the ingesters should be run on [statefulset](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) with fixed volumes.
1. Since ingesters need to have the same persistent volume across restarts/rollout, all the ingesters should be run on [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) with fixed volumes.

2. Following flags needs to be set
* `--ingester.wal-enabled` to `true` which enables writing to WAL during ingestion.
Expand All @@ -48,7 +48,7 @@ The WAL also includes a backpressure mechanism to allow a large WAL to be replay

## Changes in lifecycle when WAL is enabled

1. Flushing of data to chunk store during rollouts or scale down is disabled. This is because during a rollout of statefulset there are no ingesters that are simultaneously leaving and joining, rather the same ingester is shut down and brought back again with updated config. Hence flushing is skipped and the data is recovered from the WAL.
1. Flushing of data to chunk store during rollouts or scale down is disabled. This is because during a rollout of StatefulSet there are no ingesters that are simultaneously leaving and joining, rather the same ingester is shut down and brought back again with updated config. Hence flushing is skipped and the data is recovered from the WAL.

## Disk space requirements

Expand All @@ -62,7 +62,7 @@ You should not target 100% disk utilisation.

## Migrating from stateless deployments

The ingester _deployment without WAL_ and _statefulset with WAL_ should be scaled down and up respectively in sync without transfer of data between them to ensure that any ingestion after migration is reliable immediately.
The ingester _Deployment without WAL_ and _StatefulSet with WAL_ should be scaled down and up respectively in sync without transfer of data between them to ensure that any ingestion after migration is reliable immediately.

Let's take an example of 4 ingesters. The migration would look something like this:

Expand All @@ -83,7 +83,7 @@ Scaling up is same as what you would do without WAL or statefulsets. Nothing to

When scaling down, we must ensure existing data on the leaving ingesters are flushed to storage instead of just the WAL. This is because we won't be replaying the WAL on an ingester that will no longer exist and we need to make sure the data is not orphaned.

Consider you have 4 ingesters `ingester-0 ingester-1 ingester-2 ingester-3` and you want to scale down to 2 ingesters, the ingesters which will be shutdown according to statefulset rules are `ingester-3` and then `ingester-2`.
Consider you have 4 ingesters `ingester-0 ingester-1 ingester-2 ingester-3` and you want to scale down to 2 ingesters, the ingesters which will be shut down according to StatefulSet rules are `ingester-3` and then `ingester-2`.

Hence before actually scaling down in Kubernetes, port forward those ingesters and hit the [`/ingester/shutdown?flush=true`]({{< relref "../../reference/api#flush-in-memory-chunks-and-shut-down" >}}) endpoint. This will flush the chunks and remove itself from the ring, after which it will register as unready and may be deleted.

Expand All @@ -95,13 +95,15 @@ After hitting the endpoint for `ingester-2 ingester-3`, scale down the ingesters

Statefulsets are significantly more cumbersome to work with, upgrade, and so on. Much of this stems from immutable fields on the specification. For example, if one wants to start using the WAL with single store Loki and wants separate volume mounts for the WAL and the boltdb-shipper, you may see immutability errors when attempting updates the Kubernetes statefulsets.

In this case, try `kubectl -n <namespace> delete sts ingester --cascade=false`. This will leave the pods alive but delete the statefulset. Then you may recreate the (updated) statefulset and one-by-one start deleting the `ingester-0` through `ingester-n` pods _in that order_, allowing the statefulset to spin up new pods to replace them.
In this case, try `kubectl -n <namespace> delete sts ingester --cascade=false`.
This will leave the Pods alive but delete the StatefulSet.
Then you may recreate the (updated) StatefulSet and one-by-one start deleting the `ingester-0` through `ingester-n` Pods _in that order_, allowing the StatefulSet to spin up new pods to replace them.

#### Scaling Down Using `/flush_shutdown` Endpoint and Lifecycle Hook

1. **StatefulSets for Ordered Scaling Down**: Loki's ingesters should be scaled down one by one, which is efficiently handled by Kubernetes StatefulSets. This ensures an ordered and reliable scaling process, as described in the [Deployment and Scaling Guarantees](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees) documentation.

2. **Using PreStop Lifecycle Hook**: During the pod scaling down process, the PreStop [lifecycle hook](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) triggers the `/flush_shutdown` endpoint on the ingester. This action flushes the chunks and removes the ingester from the ring, allowing it to register as unready and become eligible for deletion.
2. **Using PreStop Lifecycle Hook**: During the Pod scaling down process, the PreStop [lifecycle hook](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) triggers the `/flush_shutdown` endpoint on the ingester. This action flushes the chunks and removes the ingester from the ring, allowing it to register as unready and become eligible for deletion.

3. **Using terminationGracePeriodSeconds**: Provides time for the ingester to flush its data before being deleted, if flushing data takes more than 30 minutes, you may need to increase it.

Expand Down
16 changes: 8 additions & 8 deletions docs/sources/operations/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,14 +123,14 @@ promtail -log.level=debug
The Promtail configuration contains a `__path__` entry to a directory that
Promtail cannot find.

## Connecting to a Promtail pod to troubleshoot
## Connecting to a Promtail Pod to troubleshoot

First check [Troubleshooting targets](#troubleshooting-targets) section above.
If that doesn't help answer your questions, you can connect to the Promtail pod
If that doesn't help answer your questions, you can connect to the Promtail Pod
to investigate further.

If you are running Promtail as a DaemonSet in your cluster, you will have a
Promtail pod on each node, so figure out which Promtail you need to debug first:
Promtail Pod on each node, so figure out which Promtail you need to debug first:


```shell
Expand All @@ -145,10 +145,10 @@ promtail-bth9q 1/1 Running 0 3h 10.56.
That output is truncated to highlight just the two pods we are interested in,
you can see with the `-o wide` flag the NODE on which they are running.

You'll want to match the node for the pod you are interested in, in this example
You'll want to match the node for the Pod you are interested in, in this example
NGINX, to the Promtail running on the same node.

To debug you can connect to the Promtail pod:
To debug you can connect to the Promtail Pod:

```shell
kubectl exec -it promtail-bth9q -- /bin/sh
Expand Down Expand Up @@ -182,12 +182,12 @@ $ helm upgrade --install loki loki/loki --set "loki.tracing.enabled=true"

## Running Loki with Istio Sidecars

An Istio sidecar runs alongside a pod. It intercepts all traffic to and from the pod.
When a pod tries to communicate with another pod using a given protocol, Istio inspects the destination's service using [Protocol Selection](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/).
An Istio sidecar runs alongside a Pod. It intercepts all traffic to and from the Pod.
When a Pod tries to communicate with another Pod using a given protocol, Istio inspects the destination's service using [Protocol Selection](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/).
This mechanism uses a convention on the port name (for example, `http-my-port` or `grpc-my-port`)
to determine how to handle this outgoing traffic. Istio can then do operations such as authorization and smart routing.

This works fine when one pod communicates with another pod using a hostname. But,
This works fine when one Pod communicates with another Pod using a hostname. But,
Istio does not allow pods to communicate with other pods using IP addresses,
unless the traffic type is `tcp`.

Expand Down
2 changes: 1 addition & 1 deletion docs/sources/operations/zone-ingesters.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ These instructions assume you are using the zone aware ingester jsonnet deployme

1. if you're using an automated reconcilliation/deployment system like flux, disable it now (for example using flux ignore), if possible for just the default ingester StatefulSet

1. Shutdown flush the default ingesters, unregistering them from the ring, you can do this by port-forwarding each ingester pod and using the endpoint: `"http://url:PORT/ingester/shutdown?flush=true&delete_ring_tokens=true&terminate=false"`
1. Shutdown flush the default ingesters, unregistering them from the ring, you can do this by port-forwarding each ingester Pod and using the endpoint: `"http://url:PORT/ingester/shutdown?flush=true&delete_ring_tokens=true&terminate=false"`

1. manually scale down the default ingester StatefulSet to 0 replicas, we do this via `tk apply` but you could do it via modifying the yaml

Expand Down

0 comments on commit 98c1340

Please sign in to comment.