Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: improve user journeys when provisioning new hosts #60

Open
nicdumz opened this issue Jan 3, 2025 · 2 comments
Open

Discussion: improve user journeys when provisioning new hosts #60

nicdumz opened this issue Jan 3, 2025 · 2 comments

Comments

@nicdumz
Copy link
Contributor

nicdumz commented Jan 3, 2025

Heya,

First of, thanks for this module, it helped me reason about secret management.

One user journey which seemed tricky to me was the provisioning of a new host, after reading some material on this I wondered how to improve documentation.

One thing in particular which wasn't clear to me is that age.rekey.hostPubKey can actually be omitted, which results in "valid" configuration, although rebuilds will emit a helpful warning:

$ nixos-rebuild build --flake .
evaluation warning: You have not yet specified rekey.hostPubkey for your host nixos.
                    All secrets for this host will be rekeyed with a dummy key, resulting in an activation failure.

                    This is intentional so you can initially deploy your system to read the actual pubkey.
                    Once you have the pubkey, set rekey.hostPubkey to the content or a file containing the pubkey.

Perhaps the first suggestion would be to make this possibility more obvious in docs and/or examples.

In ended up organizing my agenix-rekey usage in the following way:

agenix-rekey module:

{
  config,
  lib,
  self,
  inputs,
  ...
}:
let
  ageMasterIdentities = [
    # ... omitted
  ];
  # Relative to flake directory.
  publicKeyRelPath = "identities/host/${config.networking.hostName}.pub";
  publicKeyAbsPath = self.outPath + "/" + publicKeyRelPath;
in
{
  imports = [
    inputs.agenix.nixosModules.default
    inputs.agenix-rekey.nixosModules.default
  ];

  options = {
    me.foundPublicKey = lib.mkOption {
      type = lib.types.bool;
      default = builtins.pathExists publicKeyAbsPath;
    };
  };

  config.warnings = [
    (lib.mkIf (!config.me.foundPublicKey) ''
      [me]: no public key configured for target system.

      This means that some features (e.g. Tailscale, ... ) are not enabled.

      After initial target provisioning, fetch the target ssh identity:

        ssh-keyscan -qt ssh-ed25519 $target | cut -d' ' -f2,3 > ./${publicKeyRelPath}

      And rebuild NixOS.
    '')
  ];

  config.age.rekey =
    {
      masterIdentities = ageMasterIdentities;
      localStorageDir = self.outPath + "/secrets/rekeyed/${config.networking.hostName}";
      storageMode = "local";
    }
    # Only set the pubkey if we find it.
    // lib.optionalAttrs config.me.foundPublicKey {
      hostPubKey = builtins.readFile config.me.publicKeyPath;
    };
}

Usage:

  age = {
    # (mkIf is needed otherwise agenix-rekey will complain about missing rekeyed secrets)
    secrets = lib.mkIf config.me.foundPublicKey {
      tailscaleAuthKey.rekeyFile = ./secrets/tailscale.age;
      # ...
    };
  };

  services.tailscale = lib.optionalAttrs config.me.foundPublicKey {
    authKeyFile = config.age.secrets.tailscaleAuthKey.path;
    # ...
  };

And the intended flow would be a two-step install:

  1. install new remote target
  2. fetch identity of remote target, check it into the flake
  3. rebuild / switch nixos on target in a second phase

If this matches intended usage, do we think that it would make sense to do document this time of provisioning flow?

In my experience, this has been the trickiest point to figure out, and I would want to make it easier to understand for others.

Thanks again!

@nicdumz
Copy link
Contributor Author

nicdumz commented Jan 3, 2025

Perhaps one aspect I don't find completely intuitive:

error: host nixos: Rekeyed secret for age.secrets.tailscaleAuthKey not found, please run `agenix rekey -a` again and make sure to add the results to git.

This message, to me, doesn't particularly make sense when I don't set a hostPubKey. Would it make sense to skip this check when the key is the dummy key?

Otherwise we would force a workflow such as

  1. edit config to add new host
  2. rekey (with dummy)
  3. install remote target
  4. fetch identity of remote target, check it into the flake, rekey
  5. rebuild / switch nixos on target in a second phase

I don't think (?) that (2) rekey is really necessary here.

@oddlama
Copy link
Owner

oddlama commented Jan 3, 2025

One thing in particular which wasn't clear to me is that age.rekey.hostPubKey can actually be omitted, which results in "valid" configuration [...]
Perhaps the first suggestion would be to make this possibility more obvious in docs and/or examples.

Sure, that'd be a good thing to mention.

config.warnings = [
(lib.mkIf (!config.me.foundPublicKey) ''
[me]: no public key configured for target system.

  This means that some features (e.g. Tailscale, ... ) are not enabled.

  After initial target provisioning, fetch the target ssh identity:

    ssh-keyscan -qt ssh-ed25519 $target | cut -d' ' -f2,3 > ./${publicKeyRelPath}

  And rebuild NixOS.
'')

];

A warning like this one would also be great to have generally, in case the host pubkey is unset.

  services.tailscale = lib.optionalAttrs config.me.foundPublicKey {
    authKeyFile = config.age.secrets.tailscaleAuthKey.path;
    # ...
  };

Depending on the service, it will not always be easily possible to guard this. Sometimes the secrets are just required to get a valid configuration and therefore you'd have to disable the whole service. Since you cannot get around deploying twice, I'd personally rather have a dummy secret and just let the service complain if it doesn't find the decrypted secret. This saves me the work of having to guard everything (which would be a lot).

error: host nixos: Rekeyed secret for age.secrets.tailscaleAuthKey not found, please run `agenix rekey -a` again and make sure to add the results to git.

This message, to me, doesn't particularly make sense when I don't set a hostPubKey. Would it make sense to skip this check when the key is the dummy key?

I agree, this is a bit misleading. There are kind-of two concepts of dummy secrets at the moment. First there is a true dummy secret, i.e. one that is decryptable by the host at runtime, but contains a dummy/bogus value. You can create those while rekeying if you e.g. don't plug your yubikey in. The second one is a dummy secret in the sense of a placeholder, so something that cannot be decrypted by the host, e.g. becuase its pubkey wasn't known at build time. This is what we need for initial deploys.

I don't think (?) that (2) rekey is really necessary here.

Yes that's right. We should improve that. Maybe a better way to handle this would be to detect if the dummy pubkey is in use, and in that case just use a placeholder file for all secrets instead of the rekeyed file. We can add a small hint to that file in case someone looks at it at runtime.

If I'm not mistaken, this would just require:

  • Making this a no-op if the dummy key is in use, maybe just echo a small hint/warning
  • Prepending a new if else case here to check whether the dummy pubkey is in use, and return something similar to dummySecret but with another content.

Would you like to change the documentation or even implement this? Otherwise I can have a look at it in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants