Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Better support for removing instance metadata endpoint access #1060

Open
Niksko opened this issue Sep 4, 2020 · 15 comments
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@Niksko
Copy link

Niksko commented Sep 4, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request

Removing access to the instance metadata endpoint is documented as good security posture in your documentation: https://docs.aws.amazon.com/eks/latest/userguide/restrict-ec2-credential-access.html. However, there are a couple of improvements that could be made here:

  • (less important): This feels like it could be a checkbox somewhere in EKS. I'm not sure if I should have to use a custom launch template with userdata to achieve this.
  • (more important): Performing the actions in the documentation linked above (with a custom launch template and custom userdata) stops the Amazon Cloudwatch agent from working. Logs below:
2020/09/02 06:07:45 I! 2020/09/02 06:07:42 E! ec2metadata is not available
2020/09/02 06:07:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2020/09/02 06:07:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2020/09/02 06:07:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/..2020_09_02_04_57_01.343707504/cwagentconfig.json ...
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/..data
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded

2020/09/02 06:07:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2020/09/02 06:07:45 I! AmazonCloudWatchAgent Version 1.245315.0.
2020-09-02T06:07:45Z I! will use file based credentials provider
2020-09-02T06:07:45Z I! Starting AmazonCloudWatchAgent (version 1.245315.0)
2020-09-02T06:07:45Z I! Loaded outputs: cloudwatchlogs
2020-09-02T06:07:45Z I! Loaded inputs: cadvisor k8sapiserver
2020-09-02T06:07:45Z I! Tags enabled:
2020-09-02T06:07:45Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-21-189-66.ap-southeast-2.compute.internal", Flush Interval:1s
2020-09-02T06:07:45Z I! k8sapiserver Switch New Leader: ip-172-21-188-238.ap-southeast-2.compute.internal
2020-09-02T06:08:06Z E! ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance

I suspect the Cloudwatch Agent needs access to the instance metadata service. However it feels like it shouldn't, and should be able to collect the information it needs via other IRSA permissions, and possibly by looking at the labels on the node (if all it really needs is the instance id)

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Trying to run the Cloudwatch Agent using IRSA, in a cluster with a strong security posture that has had the instance metadata endpoint disabled for use by pods.

Are you currently working around this issue?
You would need to use something else for network policy enforcement, such as Calico. However this comes with extra overhead of managing another cluster service, and deficiencies with the deployment of Calico resources (having to use calicoctl instead of kubectl for some resources).

@Niksko Niksko added the Proposed Community submitted issue label Sep 4, 2020
@mikestef9 mikestef9 added the EKS Amazon Elastic Kubernetes Service label Sep 4, 2020
@mikestef9
Copy link
Contributor

As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit

https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html

We do plan to add this as a checkbox option to managed groups in the future

@sukrit007
Copy link

sukrit007 commented Jan 3, 2021

Is there an example on how we can add this for managed groups for now during bootstrapping (especially using eksctl)?

@mikestef9 mikestef9 added the EKS Managed Nodes EKS Managed Nodes label Apr 27, 2021
@avoidik
Copy link

avoidik commented May 23, 2021

for EKS managed groups you may want to set eksctl CLI option --disable-pod-imds or config option disablePodIMDS

@sharkztex
Copy link

Experience the same issue on cloudwatch agents for EKS when enabling IMDSv2 with token hop limit to 1 on the launch template as specified from the best practice documentation. We set this to token hop limit 2 for now, which feels like it goes against what it recommended.

@gfvirga
Copy link

gfvirga commented Jun 23, 2021

@sharkztex Did you also configure the iptables? I couldn't get it to work when using the iptables.

@sharkztex
Copy link

@gfvirga No we didn't as we don't have a requirement to access IMDSv1.

@apanzerj
Copy link

apanzerj commented Jul 6, 2021

Any update on how to run the CloudWatch Agent with IMDS? We also have it disabled but want to run the CloudWatch Agent and are getting the same issue.

@bersanf
Copy link

bersanf commented Jun 30, 2022

is there any news about this issue?
I am unable to enable cloudwatch metrics in my EKS cluster

@bersanf
Copy link

bersanf commented Jun 30, 2022

I've been finally able to fix this issue, i found the right configuration.

Since i'm using terraform to provision the EKS cluster and the managed node groups, i had to add the following configuration for the aws_launch_template resource:

  metadata_options {
    http_endpoint = "enabled"
    #the following two configuration are fixing https://github.com/aws/containers-roadmap/issues/1060 
    http_tokens   = "optional"
    http_put_response_hop_limit = 2
  }

After that, in case you're using bottle rocket you have to set the right socket path on the cloud watch pods.
Since i'm using the helm chart aws-cloudwatch-metrics i had to add the following configuration according to this issue aws/amazon-cloudwatch-agent#188):
"containerdSockPath" = "/run/dockershim.sock"

@vsantoshaws
Copy link

@bersanf did you also configured iptables?

@orirawlings
Copy link

As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit

https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html

We do plan to add this as a checkbox option to managed groups in the future

@mikestef9 are there still plans to expose IMDS customization/restriction in the managed node groups API? I'm looking for a means to restrict down to best practice (tokens required, hop limit 1) without needing to define a custom launch template.

@joebowbeer
Copy link
Contributor

joebowbeer commented May 16, 2024

Any updates?

We blocked access to IMDS as per best practice:

Restrict access to the instance profile assigned to the worker node

But we bumped hop count to 2 because apparently some application needs IMDS:

When your application needs access to IMDS, use IMDSv2 and increase the hop limit on EC2 instances to 2

As a consequence, I assume, mkat complains that IMDSv2 is accessible:

IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role

How to improve?

% mkat eks test-imds-access
Connected to EKS cluster my-cluster
Testing if IMDSv1 and IMDSv2 are accessible from pods by creating a pod that attempts to access it
IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role
IMDSv1 is not accessible to pods in your cluster: able to establish a network connection to the IMDS, but no credentials were returned

@Daemoen
Copy link

Daemoen commented Jun 6, 2024

It is my understanding (and documented) that it is a best practice to disable imds unless it is being used. Unfortunately, there seems to be a dependency within the eks control plane that necessitates the imds endpoint being available.

We had the unfortunate mishap of our entire cluster going haywire when we disabled imds entirely (endpoint disabled). The nodes stopped responding properly, pods terminated but also failed to start, et cetera.

This requires better and clearer documentation for real best practices FOR EKS specifically. Following best practices should never result in a service becoming unusable or unstable.

@joebowbeer
Copy link
Contributor

@Daemoen These are the instructions for EKS:

https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node

This strong recommendation to block access to IMDS also includes this caution:

Blocking access to instance metadata will prevent pods that do not use IRSA or EKS Pod Identities from inheriting the role assigned to the worker node.

and this caution:

Do not disable instance metadata as this will prevent components like the node termination handler and other things that rely on instance metadata from working properly.

Perhaps an improvement regarding the first caution would be to include steps to identify pods that are NOT using IRSA or EKS Pod Identities?

Steps like the following, but specific to EKS?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html

By the way, I find the subtle distinction in the docs between disabling and blocking access to be confusing. The docs are also unclear as to when access to instance metadata refers to IMDS or IMDSv2 or either.

As I reported above, we fell into a third category requiring a hop limit of 2, and you may be in that boat, too.

There need to be better instructions for identifying this case, and for eventually decreasing hop limit to 1.

@Daemoen
Copy link

Daemoen commented Jun 7, 2024

@joebowbeer Hah! Thanks for that github link. I think that's the piece of what I was getting at, that this needs to be much clearer. I used the docs.aws guide re: IMDSv2 which does not say anything about those same cautions. I hadn't even thought to check the github doc site until a buddy and I found this thread/discussion after we hit the issue and started discussing it. (I landed in both of the cautions, sadly). TYVM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
Status: Researching
Development

No branches or pull requests