The AKS Karpenter Provider enables node autoprovisioning using Karpenter on your AKS cluster.
The API for AKS Karpenter Provider is currently alpha (v1alpha2
).
A GitHub Codespaces development flow is described below, which you can use to test karpenter functionality on your own cluster, and to aid rapid development of this project.
-
Install VSCode: Go here to download VSCode for your platform. After installation, in your VSCode app install the "GitHub Codespaces" Extension. See here for more information about this extension.
-
Create Codespace (~2min): In browser, click Code / "Create a codespace on main" (for better experience customize to use 4cores/8GB), wait for codespace to be created. It is created with everything needed for development (Go, Azure CLI, kubectl, skaffold, useful plugins, etc.) Now you can open up the Codespace in VSCode: Click on Codespaces in the lower left corner in the browser status bar region, choose "Open in VSCode Desktop". (Pretty much everything except for
az login
and someaz role assignment
works in browser; but VSCode is a better experience anyway.)
More information on GitHub Codespaces is here.
- Provision cluster, build and deploy Karpenter (~5min): Set
AZURE_SUBSCRIPTION_ID
to your subscription (and customize region inMakefile-az.mk
if desired). Then at the VSCode command line runmake az-all
. This logs into Azure (follow the prompts), provisions AKS and ACR (using resource group$CODESPACE_NAME
, so everything is unique / scoped to codespace), builds and deploys Karpenter, deploys sampledefault
Provisioner andinflate
Deployment workload.
Manually scale the inflate
Deployment workload, watch Karpenter controller log and Nodes in the cluster. Explore further with make help
(mostly az-*
targets).
To debug Karpenter in-cluster, use make az-debug
, wait for it to deploy, and attach from VSCode using Start Debugging (F5). After that you should be able to set breakpoints, examine variables, single step, etc. (Behind the scenes, besides building and deploying Karpenter, skaffold debug
automatically and transparently applies the necessary flags during build, instruments the deployment with Delve, adjusts health probe timeouts - to allow for delays introduced by breakpoints, sets up port-forwarding, etc.; more on how this works is here.
Once done, you can delete all infra with make az-rmrg
(it deletes the resource group), and can delete the codespace (though it will be automatically suspended when not used, and deleted after 30 days.)
- During step 1 you will observe
Running postCreateCommand...
which takes ~10+ minutes. You don't have to wait for it to finish to proceed to step 2. - The following errors can be ignored during step 2:
ERRO[0007] gcloud binary not found
...
ERRO[0003] gcloud binary not found
...
ERRO[0187] walk.go:74: found symbolic link in path: /workspaces/karpenter/charts/karpenter/crds resolves to /workspaces/karpenter/pkg/apis/crds. Contents of linked file included and used subtask=0 task=Render
- If you see platform architecture error during
skaffold debug
, adjust (or comment out)--platform
argument. - If you are not able to set/hit breakpoints, it could be an issue with source paths mapping; see comments in debug launch configuration (
launch.json
)
Q: I was able to trigger Karpenter to execute scaling up nodes as expected, using my own customized deployment of pods. However, scaling down was not handled automatically when I removed the deployment. The two new nodes created by Karpenter were left around. What is going on?
A: Additional system workloads (such as metrics server) can get scheduled on the new nodes, preventing Karpenter from removing them. Note that you can always use kubectl delete node <node>
, which will have Karpenter drain the node and terminate the instance from cloud provider.
Q: When running some of the tests locally, the environment failed to start. How can I resolve this?
A: Oftentimes, especially for pre-existing tests, running make toolchain
will fix this. This target will ensure that you have the correct versions of binaries installed.
Karpenter is an open-source node provisioning project built for Kubernetes. Karpenter improves the efficiency and cost of running workloads on Kubernetes clusters by:
- Watching for pods that the Kubernetes scheduler has marked as unschedulable
- Evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints) requested by the pods
- Provisioning nodes that meet the requirements of the pods
- Removing the nodes when the nodes are no longer needed
Notice: Files in this source code originated from a fork of https://github.com/aws/karpenter which is under an Apache 2.0 license. Those files have been modified to reflect environmental requirements in AKS and Azure.
Many thanks to @ellistarn, @jonathan-innis, @tzneal, @bwagner5, @njtran, and many other developers active in the Karpenter community for laying the foundations of a Karpenter provider ecosystem!
Many thanks to @Bryce-Soghigian, @rakechill, @charliedmcb, @jackfrancis, @comtalyst, @aagusuab, @matthchr, @gandhipr, @dtzar for contributing to AKS Karpenter Provider!
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Come discuss Karpenter in the #karpenter channel in the Kubernetes slack!
Check out the Docs to learn more.