Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use --profile instead of AWS_PROFILE for kubeconfig #1484

Merged
merged 7 commits into from
Nov 14, 2024
Merged

Conversation

blampe
Copy link
Contributor

@blampe blampe commented Nov 13, 2024

This PR changes our kubeconfig logic to use a --profile arg instead of an AWS_PROFILE environment variable so it will always use the expected profile. It also parallelizes the relevant tests and simplifies workflows slightly.

As a user, if I generate a kubeconfig for a particular profile I would expect that configuration to always use the profile I specified. However, because we rely on AWS_PROFILE it is possible for our generated kubeconfig to be inadvertently overridden by the presence of AWS_ACCESS_KEY_ID.

Credentials from environment variables have precedence over credentials from the shared credentials and AWS CLI config file. Credentials specified in the shared credentials file have precedence over credentials in the AWS CLI config file. If AWS_PROFILE environment variable is set and the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set, then the credentials provided by AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override the credentials located in the profile provided by AWS_PROFILE.

https://docs.aws.amazon.com/cli/latest/topic/config-vars.html#id1

If I got this wrong and the overriding behavior is a "feature not a bug" please let me know!

I'll note that I discovered this as part of the prep work for pulumi/ci-mgmt#1142. In particular, our tests currently do a few things to exercise profile switching behavior:

  1. During CI setup, we set some fixed access keys for the default profile. (This is unnecessary.)
  2. During CI setup, we set some fixed access keys for an alt profile. This is the profile we expect to use in TestAccAwsProfile* tests.
  3. During TestAccAwsProfile* tests we unset AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, and AWS_SESSION_TOKEN for our process.

Importantly, (3) is currently implemented such that (a) it prevents parallelization, and (b) subsequent queries to the k8s API server also lack ambient credentials.

After I refactored (3) to allow parallelization the tests started failing. Eventually I realized this was because I was unsetting credentials for the pulumi subprocess and our test's k8s client now had ambient credentials taking priority over the expected profile.

Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

Comment on lines +222 to +225
// Use --profile instead of AWS_PROFILE because the latter can be
// overridden by ambient credentials:
// https://docs.aws.amazon.com/cli/latest/topic/config-vars.html#id1
args = [...args, "--profile", opts.profileName];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only potentially user-facing change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change sounds good to me!

Tbh, I'm a bit surprised that even aws eks update-kubeconfig uses the env variable for specifying the profile.

I think it would make sense to extend the aws-profile test to actually deploy something to the cluster. That way we also cover the change with upgrade tests to ensure this doesn't cause any cascading replacements

Copy link
Contributor Author

@blampe blampe Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, based on aws/aws-cli#7794 and aws/aws-cli#4337 I'm even more convinced --profile is the correct behavior here.

I'll look into the upgrade test to make sure this doesn't trigger replacements. I don't think it will since the k8s provider primarily looks at the cluster/context (not user, which this impacts), but if it does we should be able to leverage the new clusterIdentifier config to pin the provider to the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to extend the aws-profile test to actually deploy something to the cluster. That way we also cover the change with upgrade tests to ensure this doesn't cause any cascading replacements

@flostadler I re-enabled this upgrade test and recorded a baseline from master with a ConfigMap deployed into the cluster. It passes locally for me and I confirmed manually that no replacements were triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's awesome!

@@ -1,7 +1,6 @@
env:
ALT_AWS_ACCESS_KEY_ID: ${{ secrets.ALT_AWS_ACCESS_KEY_ID }}
ALT_AWS_SECRET_ACCESS_KEY: ${{ secrets.ALT_AWS_SECRET_ACCESS_KEY }}
ALT_AWS_PROFILE: ${{ secrets.ALT_AWS_PROFILE }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and "Configure AWS CLI" are now handled directly by the tests.

Comment on lines 12 to 14
if not os.getenv("AWS_REGION"):
raise Exception("AWS_REGION must be set")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirroring the ts test.

@@ -25,6 +28,7 @@
# Create the cluster using the AWS provider and credential opts.
cluster = eks.Cluster(project_name,
provider_credential_opts=kubeconfig_opts,
# TODO(#1475): bootstrap_self_managed_addons=false, # To speed up the test.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion -- I noticed the coredns addon adds ~900s to the test, and it would be really great to disable!

Copy link
Contributor

@flostadler flostadler Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The culprit here is not aws adding the coredns addon, they do that as a self-managed addon (just a helm chart) and it doesn't wait for coredns to be healthy.

We're now using the managed coredns addon by default and the managed addon for coredns indeed takes a very long time to detect that the deployment has enough replicas (~900s like you mentioned). If you want to disable it for this test, you can do so by setting corednsAddonOptions.enabled to false. I don't think we need it for this test because we're already testing that with others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a huge time saver, thank you!

pulumi-eks>=3.0.0,<4.0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes local debugging easier.

Comment on lines 303 to 319
profile := "aws-profile-node"
setProfileCredentials(t, profile)

test := getJSBaseOptions(t).
With(integration.ProgramTestOptions{
Dir: path.Join(getCwd(t), "aws-profile"),
OrderedConfig: []integration.ConfigValue{
{Key: "pulumi:disable-default-providers[0]", Value: "aws", Path: true},
},
RetryFailedSteps: false,
Env: []string{
"ALT_AWS_PROFILE=" + profile,
"AWS_PROFILE=", // unset
"AWS_SECRET_ACCESS_KEY=", // unset
"AWS_ACCESS_KEY_ID=", // unset
"AWS_SESSION_TOKEN=", // unset
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unsets the credentials for the pulumi subprocess instead of globally.

@flostadler
Copy link
Contributor

LGTM, thanks for this improvement!

Copy link
Member

@rquitales rquitales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For context, we previously wrote the profile name into the generated kubeconfig. This had the unintended consequence of the kubeconfig not being portable if another user has a different AWS profile name. This logic was switched to using ambient env vars to determine the right AWS profile to make the kubeconfig portable. However, env vars are volatile as you've surfaced in this PR, so using the --profile flag is a more robust solution!

@blampe blampe merged commit 9801c10 into master Nov 14, 2024
36 checks passed
@blampe blampe deleted the blampe/profile branch November 14, 2024 00:01
@pulumi-bot
Copy link
Contributor

This PR has been shipped in release v3.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants