NATS is a simple, secure and performant communications system for digital systems, services and devices.
NATS servers can be deployed easily using the nats-server
excutable:
Usage: nats-server [options]
Server Options:
-a, --addr, --net <host> Bind to host address (default: 0.0.0.0)
-p, --port <port> Use port for clients (default: 4222)
-n, --name
--server_name <server_name> Server name (default: auto)
-P, --pid <file> File to store PID
-m, --http_port <port> Use port for http monitoring
-ms,--https_port <port> Use port for https monitoring
-c, --config <file> Configuration file
-t Test configuration and exit
[Many options are omitted...]
Common Options:
-h, --help Show this message
-v, --version Show version
--help_tls TLS help
Note that configuration can be provided as a configuration file using the
--config
argument.
In order to offer secure communications, administrator must deploy NATS using TLS encryption.
This is optional, and not enabled by default, but again, administrators MUST ensure that any "production deployment" uses TLS encryption.
TLS configuration is specified in the tls section of a configuration file, e.g:
tls {
cert_file: "./certs/server-cert.pem"
key_file: "./certs/server-key.pem"
}
-
When running in TLS mode, NATS still expect clients to connect using raw TCP protocol, and then upgrade the TCP connection to a TLS connection.
In other words, it's not possible to serve NATS behind a reverse-proxy which terminates the TLS encryption.
It's indicated in the documentation: https://docs.nats.io/running-a-nats-service/configuration/securing_nats/tls#tls-terminating-reverse-proxies
Any nats-server
deployed in production must have access to and be configured to use a valid TLS certificate !
This certificate must also be renewed before it expires.
Most of the time, certificates are issued for a period of 90 days, so any administrator planning to run NATS for more than 3 months will face the problem of certificate expiration and certificate renewal.
In order to use TLS encryption, it is necessary to:
- configure NATS to use existing TLS certificates
- reload NATS server on certificate renewal (we prefer reloading over restarting to avoid downtime)
Doing so is not so easy to achieve, because in order to reload NATS server without downtime, a unix signal (SIGHUP
) must be sent to the nats-server
process.
-
Official NATS Helm charts (K8S) rely on cert-manager to automate certificate generation and renewal, but this solution is not adequate for non kubernetes deployment scenarios.
-
When deploying NATS server as a systemd service, it's possible to automate certificate generation and renewal using
lego
CLI, and execute a renew hook to reload NATS server when new certificates are received. This solution is not adequate for docker deployment scenarios. -
When deploying NATS server using Docker, it's possible to rely on a similar solution than with systemd. Use volumes to mount certificates into the container, and when new certificates are received instead of sending a
SIGHUP
signal to NATS server directly, restart the docker container, or exec into the container in order to send aSIGHUP
signal.
There is no solution which "fits" all deployment scenarios.
Extend nats-server to include TLS certificates generation and renewal logic as part of the process.
-
Create custom nats-server binary using the library interface to integrate TLS certificate generation.
-
Rely on Lego project to generate TLS certificates
letsgo is an example of how to use Lego within a Go project.
- Rely on Chrono project to run tasks periodically.
A proof-of-concept implementation is available in ./letsgo-nats.go.
Start-up order:
- Parse Let's encrypt configuration from environment variables
- Attempt to read existing certificates (according to config)
- If certificate exists:
- Check if certiciate expiration date
- If certificate is not valid or certificate will expire soon, generate certificates
- Parse command line arguments (--help / --version are parsed AFTER certificate generation)
- Parse NATS server configuration
- Initialize NATS server
- Start NATS server
- Wait until server is ready for connection
- Schedule certificates expiration check every 24 hours (first task is executed immediately)
- Wait for server shutdown
- ACME-related options can only be configured through environment variables. Only NATS-related command line arguments are supported.
Environment Variable | Optional | Default | Description |
---|---|---|---|
DNS_AUTH_TOKEN_VAULT |
✅ | Name or URI of Azure Keyvault holding auth token | |
DNS_AUTH_TOKEN_SECRET |
✅ | "do-auth-token" |
Name of secret stored in Azure Keyvault |
DNS_AUTH_TOKEN_FILE |
✅ | Path to file holding auth token | |
DNS_AUTH_TOKEN |
✅ | Auth token value |
💥 At least one of
DNS_AUTH_TOKEN_VAULT
,DNS_AUTH_TOKEN_FILE
, orDNS_AUTH_TOKEN
must be set to a non-null value
Environment Variable | Optional | Default | Description |
---|---|---|---|
DOMAINS |
💥 | Comma-separated list of domain names | |
FILENAME |
✅ | Name under which certificate files will be stored. Default to the first domain found within DOMAINS envionment variable, after replacing * with _ . This variable is not used when requesting the certificate, only when criting certificate to file. |
|
OUTPUT_DIRECTORY |
✅ | Directory under which certificate files will be stored. Default to current working directory. If OUTPUT_DIRECTORY is configured and does not exist yet, it will be created with 511 permission. |
DOMAINS
environment variable must be set to a non-null value.
Environment Variable | Optional | Default | Description |
---|---|---|---|
ACCOUNT_EMAIL |
💥 | Email of Let's Encrypt account for which certificate is issued | |
ACCOUNT_KEY_FILE |
✅ | "./account.key" |
Path to account key file. If account key does not exist, it is generated and saved to path. |
LE_TOS_AGREED |
✅ | true |
Agree to Let's Encrypt terms of usage |
ACCOUNT_EMAIL
environment variable must be set to a non-null value.
Environment Variable | Required | Default | Description |
---|---|---|---|
CA_DIR |
✅ | "STAGING" |
Name of CA directory environment or URL to CA directory. Allowed values are PRODUCTION, STAGING, TEST, or any http URL. |
LE_CRT_KEY_TYPE |
✅ | "RSA2048" |
Certificate key type. Both Let's Encrypt staging and production environments use the RSA2048 key type. |
Environment Variable | Optional | Default | Description |
---|---|---|---|
DNS_RESOLVERS |
✅ | A comma-separated list of DNS resolvers used to verify challenge in host:port format |
|
DNS_TIMEOUT |
✅ | Timeout in seconds for DNS challenge resolution | |
DISABLE_CP |
✅ | true |
Disable complete propagation check, I.E, only a single resolver must verify the DNS challenge to succeed. When enbled, all resolvers must verify the challenge. |
NATS TLS configuration blocks must be coherent with DOMAINS
, FILENAME
and OUTPUT_DIRECTORY
when specified.
Aside from that, the letsgo-nats
binary behaves just like NATS.
-
If certificate renewal fails, it is not retried. Instead, certificate will be requested on next schedule, I.E, 24 hours later. If certificates are requested 21 days before they expire, it means that there can be up to 20 attempts before certificate is expired.
(Low priority)
. -
Let's Encrypt configuration is parsed from environment only (
Low priority
). -
Only DigitalOcean DNS provider is supported at the moment. This was done by design to reduce the size of the executable (
Medium priority
). -
If a certificate issued by a different CA than target CA (possibly untrusted) exists and is valid, no certificate is generatedand no warning/error is raised.
(Medium priority)
. -
NATS Options are parsed AFTER TLS certificates are generated. It does not seem easy to bypass this limitation without writing much code (
Medium priority
). -
It's possible to misconfigure application because configuration is redundant at some places (
HIGH priority
):- Certificates are generated according to Let's Encrypt config
- Certificates are loaded by NATS according to NATS config
- NATS fails to start if there is a configuration mismatch
Even though this POC requires some configuration, and it's possible to have a configuration mismatch, it reduces a lot of complexity when deploying NATS servers on mixed environments.
For example, if we want to deploy NATS server as an Azure Container Instance, we should be able to allow container instance to access a keyvault, and can put the DNS Provider secret into a keyvault. When deploying, we only need to:
- specify
DNS_AUTH_TOKEN_VAULT
andDNS_AUTH_TOKEN_SECRET
propertly. - Mount a volume with fileshare backend holding NATS configuration OR use commands to specify options
- Mount a volume with fileshare backend to store certificates (security concerns to be discussed)
It's important to store certificates within a volume to avoid requesting new certificates on each startup. Volume for configuration is optional since configuration can be provided as command line arguments.
-
Draft a specification for configuration and implement it
-
Embbed a file server to optionally host web applications