-
Notifications
You must be signed in to change notification settings - Fork 6
Secrets Vault
Secrets are stored in a LHDI's HashiCorp Vault, which resides in the VA network. Secrets include credentials, tokens, and certificates for all deployment environments. Scripts and Helm configurations have been created to formalize and reproduce deployment of secrets to all LHDI environments.
Secrets for all LHDI deployment environments are stored in a single vault at https://ldx-mapi.lighthouse.va.gov/vault, which requires VA network access.
Following the security principle of least privilege, only members of the VRO Admins GitHub Team can log in using their GitHub credentials. Log in to the web UI using these instructions using vro-admins
(which corresponds to the new VRO Admins GitHub Team) as the "Role".
(Context: A separate VRO Admins GitHub Team was created to limit access to secrets. By default, LHDI allows all members of VA-ABD-RRD GitHub Team to have access to a vault store, which is contrary to the principle of least privilege. There's a vault store for va-abd-rrd
but it is unused.)
In the Vault, secrets are organized under the deploy/
folder. Subfolders for each environment are used as follows:
-
default
: provides default secrets for all environments; used for the LHDIdev
environment -
qa
,sandbox
,prod-test
,prod
: used for the respective LHDI environment and overrides any default secrets- Only differences from default secrets need to be present. As a result, there are few secrets in the
qa
environment and there is nodev
subfolder.
- Only differences from default secrets need to be present. As a result, there are few secrets in the
Within each environment subfolder are other subfolders, which will be referred to as "groups". Each group contains key-value pairs. Typically the key is an environment variable name that is mapped verbatim for use by containers. Occasionally, Helm configurations map the secret to a different environment variable name as expected by different container -- for an example, search for DB_CLIENTUSER_NAME
.
In summary, the full Vault path to a group is $TEAM_NAME/deploy/$ENV/$GROUP
. There are no subfolders deeper than the group level.
The groups are as follows:
-
db
: secrets for VRO's database; maps to Kubernetes secret namedvro-db
-
mq
: secrets for the message queue; maps to Kubernetes secret namedvro-mq
-
redis
: secrets for the Redis cache; maps to Kubernetes secret namedvro-redis
-
VRO_SECRETS_API
,VRO_SECRETS_LH
,VRO_SECRETS_MAS
, ...: secrets used by VRO components; theseVRO_SECRETS_*
groups map to a Kubernetes secrets namedvro-secrets-...
.- These
VRO_SECRETS_*
groups are treated differently than the above groups to allow new secrets to be added without having to update Helm configurations, thereby minimizing maintenance. Most new secrets will be added in these groups. - Unlike the other groups, each
VRO_SECRETS_*
group is passed as a single aggregate environment variable to VRO containers that use them, as specified in Helm configurations. For example, theVRO_SECRETS_API
group maps to theVRO_SECRETS_API
environment variable for theapp
container. The aggregate environment variable contains multiple export commands likeexport APIAUTH_KEY01=...
. Upon startup, the container runsset-env-secrets.src
to execute the export commands in the aggregate environment variable, resulting in exported environment variables (such asAPIAUTH_KEY01
) being available for the application. - To handle multiline strings and special characters, secret values can be base64-encoded. These secrets use a key name that ends with
_BASE64
so that theset-k8s-secrets.sh
script will decode the value properly and sets an environment variable without the_BASE64
suffix.
- These
While key-value pairs are organized in separate subfolders, the key names (which are typically used as environment variable names) should be unique within each LHDI environment to avoid any collisions when they are mapped to environment variables for Docker containers. For example, if there was a MY_SECRET
key name in both the redis
and VRO_SECRETS_API
group subfolders AND a container uses both groups, then the container will only have one environment variable rather than the desired two. Note that this collision can also occur between MY_SECRET
and MY_SECRET_BASE64
key names because the _BASE64
suffix is elided from the container's environment variable name.
Ask a VRO Admin to add, remove, or update the secret in Vault. Securely provide the secret for each LHDI environment -- minimally, one secret value for dev
and another for prod
.
- If the secret is added to an existing
VRO_SECRETS_*
group, no Helm configuration changes are needed. - If the secret is added to another group, Helm configurations should be updated to use the new secret.
Run Deploy secrets from Vault for each LHDI environment to update the Kubernetes secrets. The Docker containers will not use the secrets until they are redeployed. This action is broken.
There are circumstances where Kubernetes logs "Error: couldn't find key VRO_SECRETS_... in Secret va-abd-rrd-.../vro-secrets" -- see
Slack thread for screenshots.
This occurred because a single aggregate vro-secrets
secret was used for all VRO_SECRETS_*
groups, as that introduces issues with propagation of secret updates because containers still references the old aggregate secret:
- Symptom: Sometimes redeploying the pod works and sometimes it fails with this error.
-
Current hypothesis for this inconsistent error: If other running pods reference the
vro-secrets
secret, then old versions of it may be available and is being used by new pods. This article prompted the hypothesis. -
Workaround: Restart all old pods that reference the
vro-secrets
secret. Then start everything back up. If a restart isn't sufficient, a complete shutdown of all pods may not be necessary to remove all references to the old secret. - Additionally, marking the secret immutable may be contributing to the use of old secrets because immutable secrets aren't expected to change, so any changes (included a destroy and re-create) are not propagated. As a result, the
vro-secrets
secret is marked mutable inset-k8-secrets.sh
.
Now the vro-secrets-*
secrets are individual secrets, where an individual secret is used by one or a very small number of containers. This reduces the number of containers that need to be shut down simultaneously to release all references to the old secret. This improvement should mitigate this probably of this problem.
To set a non-secret environment variable for a container in an LHDI environment, add it to the relevant Helm chart(s) under helm/
. If the variable value is different for each environment, also add it to helm/values-for-*.yaml
files.
With that said, before adding an environment variable, please read the next section.
It is preferred to use a configuration file scoped to only the application/microservice/container (e.g., ).
An environment variable is needed when any of the following are true:
- it is a secret (username, password, token, private certificate, ...) -- use Hashicorp Vault (as described on this page)
- used by multiple containers -- set it in
helm/values*.yaml
files and reference it in Helm charts (underhelm/
) - needs to be manually changed in deployment environments -- let's discuss
We should minimize the number of unnecessary Helm configurations, which will reduce DevOps maintenance and overhead, and reduce the number of factors that can cause VRO deployments to fail.
A Vault token is needed to access vault. The automation (a self-hosted GitHub Runner) expects a the Vault token to be a Kubernetes secret named vro-vault
in the LHDI dev
environment. The token expires monthly. Run scripts/set-secret-vault-token.sh "$VAULT_TOKEN"
to set the token, where $VAULT_TOKEN
equals the string copied from the Vault web UI (click "Copy token" in the upper-right corner drop-down menu).
Kubernetes access tokens for each cluster (i.e., non-prod and prod) are needed to be able to deploy the secrets to the LHDI environments. The access tokens expire in 90 days. Run scripts/set-secret-kube-config.sh
to set the devops-kubeconfig
secret.
A GHCR secret in Kubernetes named devops-ghcr
needs to be set for LHDI to pull images. Run scripts/set-secret-ghcr.sh "$ENV" "$PAT"
for each LHDI environment, where $ENV
is dev
, qa
, etc. The $PAT
is a GitHub personal access token -- generate one using the abd-vro-machine
account. This only needs to be run once (or every time the PAT expires).
It is a centralized, secure location in the VA's network designed to hold secrets. From the Vault, secrets can be quickly and consistently redeployed to various LHDI environments in case they need to be reset or rotated.
Our GitHub Action workflow starts a self-host runner that runs within our LHDI dev
environment to pull Vault secrets and set Kubernetes secrets, all within the VA's network. This is more secure than using HashiCorp's GitHub Action which would pull Vault secrets outside the VA network and into the GitHub Action workflow environment.
The runner is a container in the vro-set-secrets-...
Kubernetes pod and can deploy secrets to any LHDI environment when initiated by the Deploy secrets from Vault GitHub Action workflow. The pod is deployed to the dev
LHDI environment (because that environment doesn't require SecRel-signed images) and can deploy secrets to other environments.
There has been unexplained occurrences where Kubernetes secrets have changed and caused problems. Making them immutable aims to reduce (but not entirely prevents) this problem.