Skip to content

Commit

Permalink
Introduce the NativeLink Kubernetes operator (TraceMachina#1088)
Browse files Browse the repository at this point in the history
A single `kubectl apply -k` now deploys NativeLink in a
self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously
reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on
GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization
to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts,
but now happens fully automated in the cluster and no longer requires a
local Nix installation as all tag evaluations have become implementation
details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice"
Kustomize directory layout. This further reduces code duplication and
gives third parties greater flexibility and more useful reference points
to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.
  • Loading branch information
aaronmondal authored Oct 30, 2024
1 parent 3e8dc29 commit b44383f
Show file tree
Hide file tree
Showing 44 changed files with 751 additions and 584 deletions.
1 change: 0 additions & 1 deletion .github/styles/config/vocabularies/TraceMachina/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ Cloudflare
ELB
GPUs
Goma
Kustomization
[Hh]ermeticity
Kustomization
LLD
Expand Down
95 changes: 90 additions & 5 deletions .github/workflows/lre.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,19 +83,104 @@ jobs:
uses: >- # v4
DeterminateSystems/magic-nix-cache-action@fc6aaceb40b9845a02b91e059ec147e78d1b4e41
- name: Start Kubernetes cluster (Infra)
- name: Start Kubernetes cluster
run: >
nix run .#native up
- name: Start Kubernetes cluster (Operations)
- name: Start NativeLink operator
env:
PR_URL: ${{ github.event.pull_request.head.repo.clone_url }}
PR_BRANCH: ${{ github.event.pull_request.head.ref }}
PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: |
nix develop --impure --command bash -c 'cat > kustomization.yaml << EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
components:
- kubernetes/components/operator
patches:
- patch: |-
- op: replace
path: /spec/path
value: ./kubernetes/overlays/lre
target:
kind: Kustomization
name: nativelink
- patch: |-
- op: replace
path: /spec/url
value: ${PR_URL}
- op: replace
path: /spec/ref/branch
value: ${PR_BRANCH}
- op: replace
path: /spec/ref/commit
value: ${PR_COMMIT}
target:
kind: GitRepository
name: nativelink
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#image
target:
kind: Alert
name: nativelink-image-alert
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#nativelink-worker-init
target:
kind: Alert
name: nativelink-worker-init-alert
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#nativelink-worker-lre-cc
target:
kind: Alert
name: nativelink-worker-alert
EOF
kubectl apply -k . &&
rm kustomization.yaml'
- name: Wait for Tekton pipelines
run: >
nix develop --impure --command
bash -c "kubectl wait \
--for=condition=Succeeded \
--timeout=45m \
pipelinerun \
-l tekton.dev/pipeline=rebuild-nativelink"
- name: Wait for Configmaps
run: >
nix develop --impure --command
bash -c "flux reconcile kustomization -n default \
--timeout=15m \
nativelink-configmaps"
- name: Wait for NativeLink Kustomization
run: >
nix develop --impure --command
bash -c "flux reconcile kustomization -n default \
--timeout=15m \
nativelink"
- name: Wait for CAS
run: >
nix develop --impure --command
bash -c "kubectl rollout status deploy/nativelink-cas"
- name: Wait for scheduler
run: >
nix develop --impure --command
bash -c "./deployment-examples/kubernetes/01_operations.sh"
bash -c "kubectl rollout status deploy/nativelink-scheduler"
- name: Start Kubernetes cluster (Application)
- name: Wait for worker
run: >
nix develop --impure --command
bash -c "./deployment-examples/kubernetes/02_application.sh"
bash -c "kubectl rollout status deploy/nativelink-worker"
- name: Get gateway IPs
id: gateway-ips
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ To start, you can deploy NativeLink as a Docker image (as shown below) or by usi

The setups below are **production-grade** installations. See the [contribution docs](https://nativelink.com/docs/contribute/nix/) for instructions on how to build from source with [Bazel](https://nativelink.com/docs/contribute/bazel/), [Cargo](https://nativelink.com/docs/contribute/cargo/), and [Nix](https://nativelink.com/docs/contribute/nix/).

You can find a few example deployments in the [Docs](https://nativelink.com/docs/deployment-examples/kubernetes).

### 📦 Prebuilt images

Expand Down
21 changes: 21 additions & 0 deletions deploy/chromium-example/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

components:
- ../../kubernetes/components/operator

patches:
- patch: |-
- op: replace
path: /spec/path
value: ./kubernetes/overlays/chromium
target:
kind: Kustomization
name: nativelink
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: github:TraceMachina/nativelink#nativelink-worker-siso-chromium
target:
kind: Alert
name: nativelink-worker-alert
64 changes: 64 additions & 0 deletions deploy/dev/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

components:
- ../../kubernetes/components/operator

# Change this value to deploy custom overlays.
patches:
- patch: |-
- op: replace
path: /spec/path
value: ./kubernetes/overlays/lre
target:
kind: Kustomization
name: nativelink

# Modify this value to change the URL of the repository with deployment files.
#
# This is usually only necessary if you change deployment YAML files or
# NativeLink config files. If you only intend to change the Rust sources you can
# leave this as is and need to ensure that the Alerts below are patched to build
# your local sources.
- patch: |-
- op: replace
path: /spec/url
value: https://github.com/TraceMachina/nativelink
# Optionally, change the tracked branch.
# - op: replace
# path: /spec/ref/branch
# value: somecustombranch
target:
kind: GitRepository
name: nativelink

# Setting the flake outputs to `./src_root#xxx` causes the Tekton pipelines to
# build nativelink from your local sources.
#
# During development, the following formats might be useful as well:
#
# `github:user/repo#outname` to build an image from an arbitrary flake output.
#
# `github:TraceMachina/nativelink?ref=pull/<PR_NUMBER>/head#<OUT>` to deploy a
# outputs from a Pull request.
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#image
target:
kind: Alert
name: nativelink-image-alert
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#nativelink-worker-init
target:
kind: Alert
name: nativelink-worker-init-alert
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: ./src_root#nativelink-worker-lre-cc
target:
kind: Alert
name: nativelink-worker-alert
21 changes: 21 additions & 0 deletions deploy/kubernetes-example/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

components:
- ../../kubernetes/components/operator

patches:
- patch: |-
- op: replace
path: /spec/path
value: ./kubernetes/overlays/lre
target:
kind: Kustomization
name: nativelink
- patch: |-
- op: replace
path: /spec/eventMetadata/flakeOutput
value: github:TraceMachina/nativelink#nativelink-worker-lre-cc
target:
kind: Alert
name: nativelink-worker-alert
2 changes: 0 additions & 2 deletions deployment-examples/chromium/.gitignore

This file was deleted.

39 changes: 0 additions & 39 deletions deployment-examples/chromium/01_operations.sh

This file was deleted.

30 changes: 0 additions & 30 deletions deployment-examples/chromium/02_application.sh

This file was deleted.

6 changes: 0 additions & 6 deletions deployment-examples/chromium/04_delete_application.sh

This file was deleted.

Loading

0 comments on commit b44383f

Please sign in to comment.