Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow deployment of Workbench when running in AWS EKS Fargate #636

Open
Cecilsingh opened this issue Jan 10, 2025 · 2 comments
Open

Slow deployment of Workbench when running in AWS EKS Fargate #636

Cecilsingh opened this issue Jan 10, 2025 · 2 comments

Comments

@Cecilsingh
Copy link
Contributor

Hi Team, we've had two clusters on EKS - the first one is using managed nodes which worked fine prior to its decommissioning. The second one is using EKS fargate. I've noticed that the Fargate nodes takes substantially longer to pull an image, both for deployment of the Workbench pod and session pods. Typically this takes approximately 10 minutes. Images are not cached either, so a new image pull is created on every session launch, resulting in each session taking ~10 minutes to start. See below:
Image

The values.yaml file is also very minimal:

userCreate: true
userName: "rstudio"
userPassword: "rstudio"
license:
  key: **Omitted**
homeStorage:
  create: false
  name: "efs-claim"
  path: "/home"
  mount: true
  storageClassName: "efs-sc"
  accessModes:
    - ReadWriteMany
  requests:
    storage: "2Gi"

Is this a known issue with EKS Fargate, or are there any additional components needed for Workbench to work with EKS fargate?

@bschwedler
Copy link
Contributor

bschwedler commented Jan 10, 2025

@Cecilsingh Using Fargate on EKS stands up a brand new Kubernetes node for each pod that is scheduled using a Fargate Profile. Since it is a new node, it starts with an empty image cache.

https://docs.aws.amazon.com/eks/latest/userguide/fargate.html#fargate-considerations

Amazon EKS Fargate adds defense-in-depth for Kubernetes applications by isolating each Pod within a Virtual Machine (VM). This VM boundary prevents access to host-based resources used by other Pods in the event of a container escape, which is a common method of attacking containerized applications and gain access to resources outside of the container.

Unfortunately, the startup time is a combination of the time it takes for AWS to create a new compute instance, add it to the Kubernetes cluster, and pull the image.

@Cecilsingh
Copy link
Contributor Author

Thank you for your help here!

I noticed that the creation of each "node" takes about 45 seconds, which is the same no matter which image is used. Using nginx as an example, this takes around 45 seconds:

Image

This also works for our Workbench images, in that the allocation of a new node takes ~45 seconds. Strangely, the image pull for Workbench in fargate takes almost 10 minutes. The longest time seems to be allocated to the pulling of the image. Is this an expected behaviour for Fargate? This didn't seem to happen when using managed nodes on EKS!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants