Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AWS Route53 documentation to better explain the authentication options #1555

Merged

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Sep 11, 2024

Preview: https://deploy-preview-1555--cert-manager.netlify.app/docs/configuration/acme/dns01/route53/

The aim of this PR is to help users understand that fundamentally cert-manager needs to get an access key somehow,
so that it can authenticate to Route53.

There are legacy mechanisms, where it can use an IAM User and a long-term access key from a mounted file, or from an environment variable in the cert-manager Pod, or by loading it from a Secret.

There are best-practice mechanisms, where it can use an IAM Role and get a temporary access key from STS, either by explicitly sending a Kubernetes ServiceAccount token to STS (AssumeRoleWithWebIdentity).
Or by fetching a temporary access token from a local server such as IMDS on an EC2 node or from a Pod Identity agent in a DaemonSet Pod on a Kubernetes Node.

There are ambient mechanisms, where credentials are global to the cert-manager controller.
And there are non-ambient mechanisms, where the credentials can be unique to each Issuer or ClusterIssuer.

I have also added documentation about how cert-manager chooses an AWS region and how that region is used.

Fixes: #56
Fixes: #753
Fixes: cert-manager/cert-manager#4738

@cert-manager-prow cert-manager-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 11, 2024
Copy link

netlify bot commented Sep 11, 2024

Deploy Preview for cert-manager ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 4d8ddd2
🔍 Latest deploy log https://app.netlify.com/sites/cert-manager/deploys/6707978b0916e20008ab20dc
😎 Deploy Preview https://deploy-preview-1555--cert-manager.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@wallrj wallrj force-pushed the aws-route53-authentication-summary branch from d8de949 to 251cd75 Compare September 11, 2024 12:07
@cert-manager-prow cert-manager-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 11, 2024
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch 6 times, most recently from 57d106f to 6668546 Compare September 12, 2024 08:20
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch 3 times, most recently from 10bf1cb to 2b85a62 Compare October 8, 2024 14:27
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch from 2b85a62 to c989f4c Compare October 8, 2024 16:10
@cert-manager-prow cert-manager-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 8, 2024
@wallrj wallrj marked this pull request as ready for review October 8, 2024 16:22
@wallrj wallrj changed the title WIP: Update AWS Route53 documentation to better explain the authentication options Update AWS Route53 documentation to better explain the authentication options Oct 8, 2024
@cert-manager-prow cert-manager-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 8, 2024
@wallrj wallrj requested review from SgtCoDFish and removed request for maelvls October 8, 2024 19:32
where cert-manager loads credentials from files (`~/.aws/config` and `~/.aws/credentials`) which are mounted into the cert-manager controller Pod.

The advantage of ambient credentials is that they are easier to set up, well
documented, and AWS provides ways to automate the configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed those words. What I meant was that AWS provides helpers which inject the necessary environment variables into the cert-manager Pod, rather than the application team having to configure everything in the issuer.
But that's already covered by "easier to set up".

Signed-off-by: Richard Wall <[email protected]>
Comment on lines -15 to +16
You will configure cert-manager to use the [Let's Encrypt DNS-01 challenge protocol](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge) with AWS Route53 DNS,
using IAM Roles for Service Accounts (IRSA) to authenticate to AWS.
You will configure cert-manager to use the [Let's Encrypt DNS-01 challenge protocol](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge) with AWS Route53 DNS.
You will authenticate to Route53 using a [dedicated Kubernetes ServiceAccount token](../../configuration/acme/dns01/route53.md#referencing-your-own-serviceaccount-within-in-an-issuer-or-clusterissuer).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I wrote the tutorial, I thought that IRSA referred to any auth mechanism that used a Kubernetes ServiceAccount token.
I now think that IRSA more of a marketing product name which comprises annotated Kubernetes ServiceAccount + mutating webhook (to inject a projected service account volume) + Kubernetes ServiceAccount token (signed JWT) loaded from /var/run/aws/token.

This tutorial does not use the webhook or projected service account token volume.

In future it might be worth updating the tutorial to use the much simpler "Pod Identity" mechanism,
because that will probably be more familiar to AWS EKS users.

Signed-off-by: Richard Wall <[email protected]>
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch from e519b51 to ae9a9e0 Compare October 9, 2024 07:42
@@ -138,251 +490,39 @@ And the following trust relationship (Add AWS `Service`s as needed):
}
```

## Creating an Issuer (or `ClusterIssuer`)
## Region
Copy link
Member Author

@wallrj wallrj Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #56 @stephen-dexda and @TBBle showed that the region field should be optional. We made it optional in cert-manager/cert-manager#7287 and updated the API docs there.
Here I have added the same background information to the the website documentation.

@stephen-dexda and @TBBle: Please give feedback if you have time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I looked at this stuff (so EKS Pod Identities is new to me (and very neat!)), so I'm not in a position to validate what's written here beyond the linked documentation.

With that qualification, the Region section looks fine to me.

> 📖 Read about [actions supported by Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/APIReference/API_Operations_Amazon_Route_53.html),
> in the [Amazon Route 53 API Reference](https://docs.aws.amazon.com/Route53/latest/APIReference/Welcome.html).
>
> 📖 Learn how [`eksctl` can automatically create the cert-manager IAM policy](https://eksctl.io/usage/iam-policies/#cert-manager-policy), if you use EKS.

## Credentials
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #753 @AlverezYari wrote:

Route53 - AWS IAM Account Setup is confusing

I've tried to summarize all the options in this revision of the documentation.

@AlverezYari Please give feedback if you have time:

Signed-off-by: Richard Wall <[email protected]>
Signed-off-by: Richard Wall <[email protected]>
Copy link
Member

@SgtCoDFish SgtCoDFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted a couple of minor things, but this looks like a great improvement.

I've not run any code for this (don't have easy access to dev env), just a visual review.

Comment on lines 68 to 70
You have two options:
1. (Legacy) Use an [IAM User and a long-term access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
2. (Best Practice) Use an [IAM Role with temporary security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#bp-workloads-use-roles).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: With a big caveat that I've not really done much AWSing in the last few years, I probably wouldn't say that using a User is "legacy" because it's required for a specific use case - if you're using Route53 but you're running outside of AWS.

If all my compute is hosted on, say, DigitalOcean, I think an IAM User with long lived credentials would be a fine way to go if my DNS was in AWS.

Maybe if we re-order the best practice IAM role one to the top (i.e. number 1 in this list), and then change legacy to "for use when running code outside of AWS" or something similar. Or we could mention the "outside of AWS" use case below? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I've removed "legacy" and explained that it is appropriate to use that mechanism when cert-manager is outside AWS.

content/docs/configuration/acme/dns01/route53.md Outdated Show resolved Hide resolved
content/docs/configuration/acme/dns01/route53.md Outdated Show resolved Hide resolved

cert-manager supports multiple ways to get the access key and these can be categorized as either "ambient" or "non-ambient".

### Ambient Credentials
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cert-manager/cert-manager#4738 @twe-syde requested better "Documentation about ambient-credentials" and that:

The documentation should mentioned at least these two flags and why they are needed for IRSA.
--issuer-ambient-credentials=true|false
--clusterissuer-ambient-credentials=true|false

I've tried to address that here.

@twe-syde Please give feedback if you have time.

Comment on lines 231 to 248
3. **(optional) Update file system permissions**

You may also need to modify the cert-manager `Deployment` with the correct file system permissions, so the `ServiceAccount` token can be read.

```yaml
spec:
template:
spec:
securityContext:
fsGroup: 1001
```

The cert-manager Helm chart provides a variable for modifying cert-manager's `Deployment` like so:

```yaml
securityContext:
fsGroup: 1001
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have labelled this as optional, because I did not need this when I tested IRSA on an EKS cluster.
The reason may be found in cert-manager/cert-manager#2147 (comment) where @TBBle wrote:

Mentioned on the k8s 1.19 for EKS release notes, I believe the fsGroup setting should not be needed for k8s 1.19 onwards, due to the implementation of kubernetes/enhancements#1598.

I think it may be specifically required when using Fargate, based on @iainlane 's feedback in #697.
I will leave it to Fargate users to help us clarify this part of the documentation in future PRs.

Copy link
Contributor

@TBBle TBBle Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at my own comment, it was based on a direct comment in the release notes but I don't recall testing it myself.

You're no longer required to provide a security context for non-root containers that need to access the web identity token file for use with IAM roles for service accounts. For more information, see IAM roles for service accounts andproposal for file permission handling in projected service account volume on GitHub.

Looking at the relevant KEP (subsequently subsumed into this KEP), the effect indeed should have been that if you don't have a runAsUser set or an fsGroup set, the token permissions were 0644, as of k8s 1.19. (A quick tour through the PR and the code in k8s repo HEAD shows this to still be the case, nothing jumped out as having changed since that PR.)

So I'm not sure what was up in #697, which was on k8s 1.21. Maybe fsGroup doesn't/didn't work properly on Fargate? I haven't found any direct references to this, except in the context of Fargate-hosted Pods with EFS mounts also requiring runAsUser and runAsGroup, to match the EFS configuration; I'm pretty sure that's an EFS-related issue though, and shouldn't have any bearing here. It might have been a bug in the Fargate implementation/packaging of kubelet?

Anyway, your testing says it's no longer needed, and that behaviour matches what passes for documentation in k8s, so I'm happy to see this demoted to "optional".

It might make sense to change the advice to use runAsUser: 1000 since the Docker container runs as USER 1000 by default. It would be relatively harmless to add that the Helm chart defaults too, since that's effectively just documenting something already defined in the container image. We already use runAsNonRoot, but sadly that does not interact with the projected volume mounting system as at that point, the actual in-pod user ID has not yet been extracted/verified. There's no harm in having both runAsNonRoot and runAsUser, and fsGroup has a (minimal in this case) runtime cost.

Copy link
Member Author

@wallrj wallrj Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note to encourage users to tell us once and for all whether this step is obsolete.

@SgtCoDFish I see that you also asked whether this step is necessary:

@wallrj wallrj requested a review from SgtCoDFish October 9, 2024 10:25
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch 2 times, most recently from 8b96699 to dbdddfb Compare October 10, 2024 08:54
Along with a call for feedback from EKS Fargate users

Signed-off-by: Richard Wall <[email protected]>
@wallrj wallrj force-pushed the aws-route53-authentication-summary branch from dbdddfb to 4d8ddd2 Compare October 10, 2024 08:59
Copy link
Member

@SgtCoDFish SgtCoDFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I've not run any code or tested this, but the text looks like a clear improvement so it makes sense to merge. Thanks for all the work on this @wallrj and thanks everyone who commented and got involved 🚀

@cert-manager-prow cert-manager-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SgtCoDFish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2024
@cert-manager-prow cert-manager-prow bot merged commit 41b93f6 into cert-manager:master Oct 10, 2024
7 checks passed
@wallrj wallrj deleted the aws-route53-authentication-summary branch October 10, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
4 participants