Skip to content

Commit

Permalink
feat: v1 migration and adaptation (#528)
Browse files Browse the repository at this point in the history
* chore: update devcontainer go version

* chore: refresh toolcain

* chore: additional processing on verify
(and migration to kube-system)

* chore: bump dependencies

* chore: refresh Helm charts

* chore: update golangci config

* chore: remove feature gate for drift

* chore: update pre-commit tooling

* chore: update the shape of main

* chore: update the alt operator

* chore: update the API (move kubelet config to AKSNodeClass)

* chore: migrate cloud provider to v1 API

* chore: migrate operator to v1 API

* chore: migrate controllers to v1 API

* chore: add nodeclass status controller

* chore: migrate providers to v1 API

* chore: migrate test pkg to v1 API

* chore: update utils

* chore: update and migrate E2E tests to v1 API

* feat: refresh and relink CRDs

* fix: move code generation into subfolders to fix golangci-lint

(typecheck detecting multiple main.go)

* fix: enable most of govet in golangci

* fix(linting): exclude alt operator logger

* fix: add nodeclass termination controller

* fix(lint): restore linting on verify

* feat: add nodeclass hash controller

* fix: register additional nodeclass and status controllers

* fix(e2e): better selection of karpenter pod for logs

* fix(e2e): fix utilization suite

* chore(e2e): add events to dump-logs (and simplify)

* chore: rename v1 to corev1

* fix: remove extra $

* fix(e2e): add cilium label and taint

* fix(e2e): fix labels and disruption for deamonset test

* feat: update kubelet configuration

* fix: conflicting nodeclaim.garbagecollcation controller name

* chore: restore webhooks in alt operator

* Clean up commented out webhook code

* fix(test): fix test for credential provider URL in custom data

* Make webhooks work in AKS CCP context (#537)

This requires quite a bit of hacking, mostly overriding certain things
in the ctx. The major items are:

 * Copy and modify knative/pkg/webhook/resourcesemantics/conversion to
   support CRD clientConfig.url in addition to clientConfig.service.
 * Copy and modify karpenter/pkg/webhooks/webhooks.go to support
   overriding the informer factory, so that we can point it at the
   CCP APIServer rather than overlay.
 * Override Start and supporting methods on the provider specific
   operator in pkg/operator/operator.go to allow invoking our modified
   version of karpenter/pkg/webhooks/webhooks.go.

* chore: remove failSwapOn from kubelet settings in AKSNodeClass

* fix: populate nodeClaim.Status.ImageID

* fix: record NodeClass hash and add drift on static fields

* chore: rename variabled

* fix: remove outdated comment

* fix: typo

* chore: update CRDs

---------

Co-authored-by: Matthew Christopher <[email protected]>
  • Loading branch information
tallaxes and matthchr authored Oct 24, 2024
1 parent 2054da0 commit 46fc0d3
Show file tree
Hide file tree
Showing 121 changed files with 8,735 additions and 1,832 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"build": {
"dockerfile": "Dockerfile",
"args": {
"VARIANT": "1.22-bullseye"
"VARIANT": "1.23-bullseye"
}
},
"runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],
Expand Down
23 changes: 10 additions & 13 deletions .github/actions/e2e/dump-logs/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,26 +31,23 @@ runs:
client-id: ${{ inputs.client-id }}
tenant-id: ${{ inputs.tenant-id }}
subscription-id: ${{ inputs.subscription-id }}
- name: az set sub
- name: update cluster context
shell: bash
run: az account set --subscription ${{ inputs.subscription-id }}
run: |
az aks get-credentials --name ${{ inputs.cluster_name }} --resource-group ${{ inputs.resource_group }}
- name: controller-logs
shell: bash
run: |
echo "step: controller-logs"
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
POD_NAME=$(kubectl get pods -n karpenter --no-headers -o custom-columns=":metadata.name" | tail -n 1)
echo "logs from pod ${POD_NAME}"
kubectl logs "${POD_NAME}" -n karpenter -c controller
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --all-containers --ignore-errors
- name: describe-karpenter-pods
shell: bash
run: |
echo "step: describe-karpenter-pods"
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
kubectl describe pods -n karpenter
kubectl describe pods -n kube-system -l app.kubernetes.io/name=karpenter
- name: describe-nodes
shell: bash
run: |
echo "step: describe-nodes"
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
kubectl describe nodes
kubectl describe nodes
- name: get-karpenter-events
shell: bash
run: |
kubectl get events -A --field-selector source=karpenter
11 changes: 8 additions & 3 deletions .golangci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ linters:
- bidichk
- errorlint
- errcheck
- exportloopref
- copyloopvar
- gosec
- revive
- stylecheck
Expand All @@ -33,8 +33,9 @@ linters-settings:
gocyclo:
min-complexity: 11
govet:
enable:
- shadow
enable-all: true
disable:
- fieldalignment
revive:
rules:
- name: dot-imports
Expand Down Expand Up @@ -79,3 +80,7 @@ issues:
- hack
- charts
- designs
- pkg/alt/knative # copy
- pkg/alt/karpenter-core/pkg/webhooks # copy
exclude-files:
- pkg/alt/karpenter-core/pkg/operator/logger.go # copy
9 changes: 5 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.1
rev: v8.20.1
hooks:
- id: gitleaks
- repo: https://github.com/golangci/golangci-lint
rev: v1.55.2
rev: v1.61.0
hooks:
- id: golangci-lint
- repo: https://github.com/jumanjihouse/pre-commit-hooks
rev: 3.0.0
hooks:
- id: shellcheck
- repo: https://github.com/crate-ci/typos
rev: v1.17.2
rev: v1.26.0
hooks:
- id: typos
args: [--write-changes, --force-exclude, --exclude, go.mod]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v5.0.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ GOFLAGS ?= $(LDFLAGS)
WITH_GOFLAGS = GOFLAGS="$(GOFLAGS)"

# # CR for local builds of Karpenter
KARPENTER_NAMESPACE ?= karpenter
KARPENTER_NAMESPACE ?= kube-system

# Common Directories
# TODO: revisit testing tools (temporarily excluded here, for make verify)
Expand Down Expand Up @@ -80,9 +80,12 @@ verify: toolchain tidy download ## Verify code. Includes dependencies, linting,
cp $(KARPENTER_CORE_DIR)/pkg/apis/crds/* pkg/apis/crds
yq -i '(.spec.versions[0].additionalPrinterColumns[] | select (.name=="Zone")) .jsonPath=".metadata.labels.karpenter\.azure\.com/zone"' \
pkg/apis/crds/karpenter.sh_nodeclaims.yaml
hack/validation/kubelet.sh
hack/validation/labels.sh
hack/validation/requirements.sh
hack/validation/common.sh
cp pkg/apis/crds/* charts/karpenter-crd/templates
hack/mutation/conversion_webhooks_injection.sh
hack/github/dependabot.sh
$(foreach dir,$(MOD_DIRS),cd $(dir) && golangci-lint run $(newline))
@git diff --quiet ||\
Expand Down

This file was deleted.

250 changes: 250 additions & 0 deletions charts/karpenter-crd/templates/karpenter.azure.com_aksnodeclasses.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.4
name: aksnodeclasses.karpenter.azure.com
spec:
group: karpenter.azure.com
names:
categories:
- karpenter
kind: AKSNodeClass
listKind: AKSNodeClassList
plural: aksnodeclasses
shortNames:
- aksnc
- aksncs
singular: aksnodeclass
scope: Cluster
versions:
- name: v1alpha2
schema:
openAPIV3Schema:
description: AKSNodeClass is the Schema for the AKSNodeClass API
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: |-
AKSNodeClassSpec is the top level specification for the AKS Karpenter Provider.
This will contain configuration necessary to launch instances in AKS.
properties:
imageFamily:
default: Ubuntu2204
description: ImageFamily is the image family that instances use.
enum:
- Ubuntu2204
- AzureLinux
type: string
kubelet:
description: |-
Kubelet defines args to be used when configuring kubelet on provisioned nodes.
They are a subset of the upstream types, recognizing not all options may be supported.
Wherever possible, the types and names should reflect the upstream kubelet types.
properties:
allowedUnsafeSysctls:
description: |-
A comma separated whitelist of unsafe sysctls or sysctl patterns (ending in `*`).
Unsafe sysctl groups are `kernel.shm*`, `kernel.msg*`, `kernel.sem`, `fs.mqueue.*`,
and `net.*`. For example: "`kernel.msg*,net.ipv4.route.min_pmtu`"
Default: []
items:
type: string
type: array
containerLogMaxFiles:
default: 5
description: |-
containerLogMaxFiles specifies the maximum number of container log files that can be present for a container.
Default: 5
format: int32
minimum: 2
type: integer
containerLogMaxSize:
default: 50Mi
description: |-
containerLogMaxSize is a quantity defining the maximum size of the container log
file before it is rotated. For example: "5Mi" or "256Ki".
Default: "10Mi"
AKS CustomKubeletConfig has containerLogMaxSizeMB (with units), defaults to 50
pattern: ^\d+(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)$
type: string
cpuCFSQuota:
default: true
description: |-
CPUCFSQuota enables CPU CFS quota enforcement for containers that specify CPU limits.
Note: AKS CustomKubeletConfig uses cpuCfsQuota (camelCase)
type: boolean
cpuCFSQuotaPeriod:
default: 100ms
description: |-
cpuCfsQuotaPeriod sets the CPU CFS quota period value, `cpu.cfs_period_us`.
The value must be between 1 ms and 1 second, inclusive.
Default: "100ms"
type: string
cpuManagerPolicy:
default: none
description: cpuManagerPolicy is the name of the policy to use.
enum:
- none
- static
type: string
imageGCHighThresholdPercent:
description: |-
ImageGCHighThresholdPercent is the percent of disk usage after which image
garbage collection is always run. The percent is calculated by dividing this
field value by 100, so this field must be between 0 and 100, inclusive.
When specified, the value must be greater than ImageGCLowThresholdPercent.
Note: AKS CustomKubeletConfig does not have "Percent" in the field name
format: int32
maximum: 100
minimum: 0
type: integer
imageGCLowThresholdPercent:
description: |-
ImageGCLowThresholdPercent is the percent of disk usage before which image
garbage collection is never run. Lowest disk usage to garbage collect to.
The percent is calculated by dividing this field value by 100,
so the field value must be between 0 and 100, inclusive.
When specified, the value must be less than imageGCHighThresholdPercent
Note: AKS CustomKubeletConfig does not have "Percent" in the field name
format: int32
maximum: 100
minimum: 0
type: integer
podPidsLimit:
description: |-
podPidsLimit is the maximum number of PIDs in any pod.
AKS CustomKubeletConfig uses PodMaxPids, int32 (!)
Default: -1
format: int64
type: integer
topologyManagerPolicy:
default: none
description: |-
topologyManagerPolicy is the name of the topology manager policy to use.
Valid values include:
- `restricted`: kubelet only allows pods with optimal NUMA node alignment for requested resources;
- `best-effort`: kubelet will favor pods with NUMA alignment of CPU and device resources;
- `none`: kubelet has no knowledge of NUMA alignment of a pod's CPU and device resources.
- `single-numa-node`: kubelet only allows pods with a single NUMA alignment
of CPU and device resources.
enum:
- restricted
- best-effort
- none
- single-numa-node
type: string
type: object
x-kubernetes-validations:
- message: imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent
rule: 'has(self.imageGCHighThresholdPercent) && has(self.imageGCLowThresholdPercent)
? self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent :
true'
maxPods:
description: MaxPods is an override for the maximum number of pods
that can run on a worker node instance.
format: int32
minimum: 0
type: integer
osDiskSizeGB:
default: 128
description: osDiskSizeGB is the size of the OS disk in GB.
format: int32
minimum: 100
type: integer
tags:
additionalProperties:
type: string
description: Tags to be applied on Azure resources like instances.
type: object
vnetSubnetID:
description: |-
VNETSubnetID is the subnet used by nics provisioned with this nodeclass.
If not specified, we will use the default --vnet-subnet-id specified in karpenter's options config
pattern: (?i)^\/subscriptions\/[^\/]+\/resourceGroups\/[a-zA-Z0-9_\-().]{0,89}[a-zA-Z0-9_\-()]\/providers\/Microsoft\.Network\/virtualNetworks\/[^\/]+\/subnets\/[^\/]+$
type: string
type: object
status:
description: AKSNodeClassStatus contains the resolved state of the AKSNodeClass
properties:
conditions:
description: Conditions contains signals for health and readiness
items:
description: Condition aliases the upstream type and adds additional
helper methods
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
type: object
type: object
served: true
storage: true
subresources:
status: {}

This file was deleted.

Loading

0 comments on commit 46fc0d3

Please sign in to comment.