Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eks): add eks cluster builder #1259

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

jijiechen
Copy link
Member

@jijiechen jijiechen commented Jan 16, 2025

closes #285

Features

This PR provides the following features:

  • a cluster Builder for creating EKS clusters on AWS
  • a Cluster representing the cluster created and provides the functionality of cleaning up the cluster and all underlying resources created
  • a NewFromExisting helper function to retrieve kubeconfig from an existing cluster

Above features are tested manually and a CI run can be found here:
https://github.com/kumahq/kuma-smoke/actions/runs/12785877388/job/35641958835

Implementation

Unlike eksctl which creates the cluster using AWS CloudFormation, this PR creates underlying AWS resources directly using the Golang SDK provided by AWS.

  • The entrypoint for creating a new cluster is the function eks.aws_operations.CreateEKSClusterAll

  • The entrypoint for cleaning up a cluster is the function eks.aws_operations.DeleteEKSClusterAll

Reused the logic of eksctl to generate userdata for bootstraping the nodes.

Limitations

  1. only IPv4 networking is supported as of now (this will be resolved by a new feature plan)
  2. nodes created for now are not "managed" nodes, they are backed by launch templates (this will be resolved by a new feature plan)
  3. no load balancer integration was tested (this will be resolved by a new feature/test plan)
  4. only tested using a fedarated user and did not test IAM users with programmatic access as of now (technically, it should work for free as we are using the official SDK to read credentials, and I will test as soon as I get a user with programmatic access)
  5. does not provide any retry mechanism when calling AWS APIs, this could lead to unstable API calls and in-complete cluster creation/cleanup.
  6. cluster cleanup takes more time than the gke version implementation, because AWS does not provide a "fire-and-forget" API to use for cleaning up all the resources involved.
  7. due to AWS platform characteristics, a cluster creation attempt takes ~15mins and a deletion takes ~10mins.
  8. due to AWS EKS limitation, the cluster client kube config has a maximum of 900s (15mins). So for long running tests, we need to export the client config peridlically to refresh the validity.

Usage

	t.Log("configuring EKS cloud environment for tests")
	require.NotEmpty(t, EKSAccessKeyId, "%s not set", eks.EnvAccessKeyId)
	require.NotEmpty(t, EKSAccessKey, "%s not set", eks.EnvAccessKey)
	require.NotEmpty(t, EKSRegion, "%s not set", eks.EnvRegion)

	t.Logf("configuring the EKS cluster KEY_ID=(%s) REGION=(%s)", EKSAccessKeyId, EKSRegion)
	builder := eks.NewBuilder()
	builder.WithClusterVersion(fmt.Sprintf("%d.%d.0", EKSVersionMajor, EKSVersionMinor))

	t.Logf("building cluster %s (this can take some time)", builder.Name)
	cluster, err := builder.Build(ctx)
	require.NoError(t, err)

	t.Logf("setting up cleanup for cluster %s", cluster.Name())
	t.Cleanup(func() {
		t.Logf("running cluster cleanup for %s", cluster.Name())
		// don't use test ctx as it may be cancelled already
		assert.NoError(t, cluster.Cleanup(context.Background()))
	})

	t.Log("verifying that the cluster can be communicated with")
	version, err := cluster.Client().ServerVersion()
	require.NoError(t, err)
	t.Logf("server version found: %s", version)

I'll provide an integration test soon in a new PR.

@jijiechen jijiechen requested a review from a team as a code owner January 16, 2025 03:24
pkg/clusters/types/eks/aws-operations/eks.go Outdated Show resolved Hide resolved
pkg/clusters/types/eks/aws-operations/eks.go Outdated Show resolved Hide resolved
pkg/clusters/types/eks/aws-operations/eks.go Outdated Show resolved Hide resolved
pkg/clusters/types/eks/aws-operations/eks.go Show resolved Hide resolved
pkg/clusters/types/eks/aws-operations/eks.go Outdated Show resolved Hide resolved
}

func waitForNodeGroupReady(ctx context.Context, eksClient *eks.Client, clusterName, nodeGroupName string) error {
childCtx, cancel := context.WithTimeout(ctx, 10*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here for wait time and wait tick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for a node group to become ready can take about 5 minutes.
I'll change to 5 seconds here.


amiId, err := resolveAMI(ctx, ec2Client, cfg.Region, k8sMinorVersion, nodeMachineType, eksctlapi.DefaultNodeImageFamily)
if err != nil {
return errors.Wrap(err, "failed to resolve AMI")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we print the Region or cluster name here?

Copy link
Member Author

@jijiechen jijiechen Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name is the environment/cluster-builder name, so it's known by the caller.
Region is loaded from the environment variable as well and I'm open to print this. The concern is, this framework tends to not print anything during a cluster building. I'm not sure whether should I print or how should I log since it looks like we don't have a logging mechanism provided in KTF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS EKS cluster support
3 participants