Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter & External residing CMK #2052

Closed
1 task done
devops-inthe-east opened this issue Dec 2, 2024 · 4 comments
Closed
1 task done

Karpenter & External residing CMK #2052

devops-inthe-east opened this issue Dec 2, 2024 · 4 comments
Labels

Comments

@devops-inthe-east
Copy link

devops-inthe-east commented Dec 2, 2024

  • ✋ I have searched the open/closed issues and my issue is not listed.

Please describe your question here

A quite simple problem statement has bugged me lately,

Karpenter is unable to provision node groups with AMI that have the EBS volume encrypted with a CMK in an external account.

As the node get created, however instantaneous get terminated due error message : [Client.InvalidKMSKey.InvalidState]

I followed this AWS Document, that'll help me add permissions to the karpenter-worker-nodes roles. however I still get the same error.

The role file looks like this ::

#IAM role and policy for worker node EC2 instances
resource "aws_iam_role" "eks_worker_node_role" {
  name = "${var.cluster_name}-workernode-role"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_CloudWatchAgentServerPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEC2RoleforSSM" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonS3ReadOnlyAccess" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonSSMManagedInstanceCore" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_instance_profile" "worker_node_instances_profile" {
  name = "${var.cluster_name}-instance-profile"
  role = aws_iam_role.eks_worker_node_role.name
}


data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

locals {
  worker_node_role_arn = aws_iam_role.eks_worker_node_role.arn
}

data "aws_iam_policy_document" "karpenter_controller_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"
    condition {
      test     = "StringEquals"
      variable = "${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }  
    condition {
      test     = "StringEquals"
      variable = "${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }
    principals {
      identifiers = ["arn:aws:iam::${var.tenantaccount}:oidc-provider/${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}"]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "karpenter_controller" {
  assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role_policy.json
  name               = "${var.cluster_name}-KarpenterRole"
}

resource "aws_iam_policy" "karpenter_controller" {
  name        = "${var.cluster_name}-KarpenterPolicy"
  description = "Karpenter controller policy for autoscaling"
  policy = <<EOF
{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts",
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
                
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "${local.worker_node_role_arn}",
            "Sid": "PassNodeIAMRole"
        },
        {
            "Effect": "Allow",
            "Action": "eks:DescribeCluster",
            "Resource": "arn:aws:eks:${var.region}:${var.tenantaccount}:cluster/${var.cluster_name}",
            "Sid": "EKSClusterEndpointLookup"
        },
        {
            "Sid": "AllowScopedInstanceProfileCreationActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:CreateInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:RequestTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileTagActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:TagInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${var.region}",
                "aws:RequestTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:AddRoleToInstanceProfile",
            "iam:RemoveRoleFromInstanceProfile",
            "iam:DeleteInstanceProfile",
            "iam:CreateServiceLinkedRole",
            "iam:ListRoles",
            "iam:ListInstanceProfiles"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowInstanceProfileReadActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": "iam:GetInstanceProfile"
        }
    ],
    "Version": "2012-10-17"
  }
  EOF
}

resource "aws_iam_role_policy_attachment" "karpenter_controller_attach" {
depends_on = [aws_iam_policy.karpenter_controller, aws_iam_role.karpenter_controller]
role = aws_iam_role.karpenter_controller.name
policy_arn = aws_iam_policy.karpenter_controller.arn
}

`


Few Qs,

I currently use following seq to provision my nodegroup

  1. EKS Control provisioned by Terraform.
  2. Karpenter Pods with Fargate Profile will provision the nodegroup referencing the AMI.
  3. NodeGroup is provisioned with CoreDNS the first pod to get placed.

A grant is been created on the Key owner account & cross-account-kms on the consumer account

I was curious to know, if there are any other piece of infra need to integrated so that Karpenter can create nodegroups from AMIs that have encrypted snapshot with an external residing CMK.

@devops-inthe-east
Copy link
Author

There is one thing that I missed out on, I was able it work by creating a 'new' Grant in the account that is consuming the KMS for the autoscaling role.

Support engineers indicate that this should be a one time activity for when a new autoscaling role get created.

@devops-inthe-east
Copy link
Author

devops-inthe-east commented Dec 3, 2024

The updated iam.tf will have the following lines of permission to add the CreateGrant action.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCreationOfGrantForTheKMSKeyinExternalAccount444455556666",
      "Effect": "Allow",
      "Action": "kms:CreateGrant",
      "Resource": "arn:aws:kms:us-west-2:444455556666:key/1a2b3c4d-5e6f-1a2b-3c4d-5e6f1a2b3c4d"
    }
  ]
}

The awscli cmd :


aws kms create-grant \
  --region us-west-2 \
  --key-id arn:aws:kms:us-west-2:444455556666:key/1a2b3c4d-5e6f-1a2b-3c4d-5e6f1a2b3c4d \
  --grantee-principal arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling \
  --operations "Encrypt" "Decrypt" "ReEncryptFrom" "ReEncryptTo" "GenerateDataKey" "GenerateDataKeyWithoutPlaintext" "DescribeKey" "CreateGrant" 



However what bugs me is the fact that the CT logs indicate the my full admin role as 'username' making the 'CreateGrant' API call.

When it should be the autoscaling group role making the trigger.

Copy link
Contributor

github-actions bot commented Jan 3, 2025

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Jan 3, 2025
Copy link
Contributor

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant