This repo is a practical tutorial for setting up NixOps deployments.
It will walk you through various examples, from simpler to more complicated, explaining concepts and generalising on the way.
I will try to keep it up-to-date with new versions of Nix, nixpkgs and NixOps.
The examples assume basic familiarity with the Nix language, and NixOS configuration options, but you can also try to read through the tutorial and look up everything that you don't understand on the fly.
NixOps was originally designed to store its state in your home directory and use your globally configured version of nixpkgs. To make things very reproducible, we will change some of its defaults. In particular we will:
- pin the version of
nixpkgs
using agit submodule
- This means you must have cloned this repo with
git clone --recursive
, or rungit submodule update --init --recursive
after a normal clone. - It also lets us easily improve nixpkgs by making changes in this submodule (and then upstream them). This is very common when working with nixpkgs.
- This means you must have cloned this repo with
- pin the versions of
nix
andnixops
to those that are in the submodule - place all NixOps state in the current directory (in a file called
localstate.nixops
)
All of the above are done with the small script ./mynix
; read its code to check out what it does.
You should run ./mynix
in front of all nix-related commands, e.g. use ./mynix nixops
instead of nixops
or ./mynix nix-build
instead of nix-build
.
I recommend you to do the same for any production use of NixOps.
The files in this repo relevant to this pinning are (in case you want to copy them into your projects):
mynix
pinned-tools.nix
nix-channel/nixpkgs
-
An Amazon AWS account
-
AWS credentials set up in
~/.aws/credentials
(see here); should look like this:[nixops-example-user] aws_access_key_id = AAAAAAAAAAAAAAAAAAAA aws_secret_access_key = ssssssssssssssssssssssssssssssssssssssss
The account must have EC2 permissions.
-
The tutorial currently requires running the steps on Linux.
Read through example-nginx-deployment.nix
, and check using the NixOps manual and the NixOS options search page what each of the options does.
./mynix nixops create example-nginx-deployment.nix -d example-nginx-deployment
./mynix nixops deploy -d example-nginx-deployment
Then run
./mynix nixops info -d example-nginx-deployment
copy the shown IP, and curl it from your machine using:
curl IP
You should get 404 Not Found
in the output, but also nginx
, indicating that your nginx is running.
If it does not work or hang, then your VPC/security group/firewall settings in AWS are probably off.
You can SSH into the machine you have declared there using:
./mynix nixops ssh -d example-nginx-deployment machine1
In the SSH session, run the htop
monitoring tool.
You can quit it with q
, and disconnect the SSH with Ctrl+D
.
Now remove the entry pkgs.htop
from environment.systemPackages
, and run
./mynix nixops deploy -d example-nginx-deployment
again (let's abbreviate this step "deploy").
If you SSH into the machine again, you will see that htop
is no longer available.
This is a big difference to many other configuration management tools, where adding a line to install a package will install it, but deleting a that line will not uninstall it.
The property that after a deploy
the machine will be exactly in the configured state (containing no more and no less) is called "congruent" system management.
Now let's give our nginx some content.
Change the services.nginx
attrset from
services.nginx = {
enable = true;
};
into (again, look up each option on the NixOS options search page)
services.nginx = {
enable = true;
virtualHosts."someDefaultHost" = {
default = true; # makes this the default vhost if no other one matches
locations."/" = {
root = pkgs.writeTextDir "index.html" "Hello world!";
};
};
};
and deploy. You will see output like:
% ./mynix nixops deploy -d example-nginx-deployment
building all machine configurations...
these derivations will be built:
/nix/store/g4y1hxlcj5vzrar9a436h3qm6h7hlngs-nginx.conf.drv
/nix/store/9mylbbv0k2y812vaj257wg2nzarcwkqf-unit-script-nginx-pre-start.drv
/nix/store/71165073r4y7pbas7dwdi1963lbbrqgs-unit-nginx.service.drv
/nix/store/ajyk1ircw2f6k6cv0fqh5j4drjwjr6nv-system-units.drv
/nix/store/skm2d9yazfgrkcwxqlsc9sf4zvai773a-etc.drv
/nix/store/nd9hra6l0cv0lqqkhwky6qqx9shyrlhi-nixos-system-machine1-18.09.git.cd1b649.drv
/nix/store/x9kgn5wrhjg53sm87xr5h3id36dp6dsf-nixops-machines.drv
building '/nix/store/g4y1hxlcj5vzrar9a436h3qm6h7hlngs-nginx.conf.drv'...
building '/nix/store/9mylbbv0k2y812vaj257wg2nzarcwkqf-unit-script-nginx-pre-start.drv'...
building '/nix/store/71165073r4y7pbas7dwdi1963lbbrqgs-unit-nginx.service.drv'...
building '/nix/store/ajyk1ircw2f6k6cv0fqh5j4drjwjr6nv-system-units.drv'...
building '/nix/store/skm2d9yazfgrkcwxqlsc9sf4zvai773a-etc.drv'...
building '/nix/store/nd9hra6l0cv0lqqkhwky6qqx9shyrlhi-nixos-system-machine1-18.09.git.cd1b649.drv'...
building '/nix/store/x9kgn5wrhjg53sm87xr5h3id36dp6dsf-nixops-machines.drv'...
machine1...> copying closure...
machine1...> copying 6 paths...
machine1...> copying path '/nix/store/y2h2idchc86qmdzzvp2wvxww9bzqkhwb-nginx.conf' to 'ssh://[email protected]'...
machine1...> copying path '/nix/store/5cmkg7arw5cazafdgynkl6y5s96v1vrf-unit-script-nginx-pre-start' to 'ssh://[email protected]'...
machine1...> copying path '/nix/store/0san3qp2xl9dz894ailylfypx44i809p-unit-nginx.service' to 'ssh://[email protected]'...
machine1...> copying path '/nix/store/29ibinhgl3a77d0fv4ffvhqlffa69dx9-system-units' to 'ssh://[email protected]'...
machine1...> copying path '/nix/store/4a81l4nsjry32gc43y4jxylwhc4hqdij-etc' to 'ssh://[email protected]'...
machine1...> copying path '/nix/store/jv8z2mv6j2kmsdqr19lm8zyjsfjzv20r-nixos-system-machine1-18.09.git.cd1b649' to 'ssh://[email protected]'...
example-nginx-deployment> closures copied successfully
machine1...> updating GRUB 2 menu...
machine1...> activating the configuration...
machine1...> setting up /etc...
machine1...> reloading user units for root...
machine1...> setting up tmpfiles
machine1...> restarting the following units: nginx.service
machine1...> activation finished successfully
example-nginx-deployment> deployment finished successfully
What's happening here?
- First
nixops
callsnix
to build our machine declarations into the files involved.- The
.drv
files are descriptions of what is to be built (you cancat
them), and they are built into corresponding outputs files or dirs, like the...-nginx.conf
(cat
it!). - The top-level one for the machine we have declared is the
...-nixos-system-machine1...
one.ls -l
it to see that it's the full root file system for that machine! - The
...-nixops-machines.drv
describes our entire network of machines (we only have 1 for now).
- The
- Then
nixops
callsnix-copy-closure
, copying each file involved and the recursive dependencies to each machine (but only those that aren't already there). - Then
nixops
runs the NixOSswitch-to-configuration
script on each machine, that activates the new machine configuration.
Notice how it figured out that only the changed nginx service needed to be reloaded (restarting the following units: nginx.service
), without us having to tell that explicitly!
Now you should be able to
curl IP
again and see the output Hello World!
.
Don't forget to destroy the created machines with:
./mynix nixops destroy -d example-nginx-deployment
You can pass the --confirm
option if you don't want it to ask interactive questions.
If you also want to delete all local information about past versions of the deployment, you can run:
./mynix nixops delete -d example-nginx-deployment
We've deployed a simple web server -- boring! Let's do something that's traditionally difficult.
If you have upgraded other Linux distributions before, you may remember it as an unpleasant process.
For example, in Ubuntu's do-release-upgrade
, there are often large amounts of waiting, interspersed with occasional questions that you need to answer, such as how to merge your own modified config files with newer versions provided by the OS upstream.
That means you cannot just step away and let an upgrade complete by itself.
Further, upgrades often fail, and many distributions provide only assisted upgrades, not downgrades. For example, there exists no do-release-downgrade
on Ubuntu.
With NixOps (and NixOS in general), these issues are addressed on a fundamental level.
- Because machines are configured declaratively, there are no interactive questions to be asked.
- Because NixOS configurations are immutable and stay on disk until your garbage-collect them, you can easily roll back to any previous configuration.
- One caveats applies: Stateful software like
consul
, that writes its own mutable data into/var
and auto-upgrades its schema when a new version is launched, may not allow to read a newer schema version with an older version of the software. You need to read the Changelogs of the software you use to determine this.
- One caveats applies: Stateful software like
Let's try to upgrade our running server from the version of nixpkgs
(and thus, NixOS) that is pinned in this git repository's nix-channel/nixpkgs
submodule to a newer version.
This will provide us with a newer kernel, newer nginx, newer everything.
Prerequisites:
- Deploy your server as in Tutorial 1, but do not shut it down at the end.
You can also SSH into the server and run systemctl status nginx.service
(you can press q
to quit the pager and get back to the shell if you aren't already).
It should show you a line like:
├─2868 nginx: master process /nix/store/j8kzb88g64bk2baxmz94r074kv84yl32-nginx-1.14.1/bin/nginx -c /nix/store/9g1affc46wvyahihk1d4gq52j8vqagjw-nginx.conf -p /var/spool/nginx
Because Nix's store paths include the versions of packages in the directory name, you can easily determine that you're running nginx-1.14.1
here.
Also run uname -a
to see that your Linux kernel version is e.g. 4.14.111
.
Now execute the upgrade:
- Upgrade the
nix-channel/nixpkgs
submodule to a newer version:
-
cd nix-channel/nixpkgs/
-
git fetch
to fetch the latest commits. -
git checkout f6c1d3b1
That is the latest commit on the
release-19.09
branch at the time of writing. You couldgit checkout origin/release-19.09
here, but we use an explicit commit for full reproducibility of this tutorial. -
cd ../..
back into the top-level directory.
-
Deploy with:
./mynix nixops deploy -d example-nginx-deployment
That's it. If you now SSH into the machine and run systemctl status nginx.service
again, you should observe that you are now running the newer version nginx-1.16.1
.
NixOps restarted all changed services for you, but running uname -a
you can see that the kernel version is still the same as before.
That is because upgrading the kernel requires a reboot.
Deploy with reboot to ensure everything is upgraded:
./mynix nixops deploy -d example-nginx-deployment --force-reboot
Now uname -a
should show the new kernel version.
In production you likely want to upgrade one machine after the other ("rolling") as to not interrupt your users.
As of writing, NixOps does not have built-in functionality for that.
Instead, simply deploy individual machines sequentially:
./mynix nixops deploy -d example-nginx-deployment --force-reboot --include machine1
./mynix nixops deploy -d example-nginx-deployment --force-reboot --include machine2
# ...
It is recommended that you check that each machine is working fine before proceeding to the next, for minimal disruption.
There are 2 methods you can use to roll back:
- Using
nixops rollback
. - Simply bringing our configuration files into the old state and deploying again.
The second option is usually better, because it is more declarative, and you can commit your rollback into version control, like any other change.
But nixops rollback
can be useful because it is even faster, and it is useful to know how it works because it showcases NixOS's immutability.
-
List the past deployment generations using:
./mynix nixops list-generations -d example-nginx-deployment
. Example output:1 2020-04-14 20:00:00 2 2020-04-14 20:15:01 (current)
-
Roll back to generation
1
using:./mynix nixops rollback 1 -d example-nginx-deployment
You will see output like:
switching from generation 2 to 1 ... machine1..........................> activation finished successfully
As before, you can append --force-reboot
to reboot into the changed kernel.
The rollback only takes 10 seconds for me, or 18 seconds including reboot.
-
(cd nix-channel/nixpkgs/ && git checkout -)
This is similar to what we did when upgrading, but written as a one-liner, using
(
subshell parenthesis)
to avoid having tocd
back, and usinggit checkout -
to checkout whatever the previously checked out commit was (you could also give an explicit commit). -
Deploy
./mynix nixops deploy -d example-nginx-deployment --force-reboot
And for the fun of it (as well as for Tutorial 3), let's switch again to the newer OS version:
(cd nix-channel/nixpkgs/ && git checkout f6c1d3b1)
./mynix nixops deploy -d example-nginx-deployment --force-reboot
By now you should have a feeling for how fast doing OS upgrades is with NixOps.
In the previous tutorials, we set up an HTTP server with nixops, and could open its IP address in our browser to see the returned content.
But modern sites should usually run on HTTPS!
Let's use Let's Encrypt's Automated Certificate Management Environment (ACME) to automatically get HTTPs certificates for our nginx web server.
Prequisites:
- This requires that you have executed Tutorial 2 to upgrade to a newer NixOS, because current Let's Encrypt no longer accepts the older ACME protocol.
- You need to own a domain name to point at your server's IP.
Ephemeral domains like AWS's
ec2-1-2-3-4.eu-central-1.compute.amazonaws.com
are intentionally rejected by Let's Encrypt. If you do not have a domain name, you must skip executing this tutorial; but still read it!
Change your deployment:
-
Make a variable to contain your domain name:
- machine1 = { resources, nodes, ... }: { + machine1 = { resources, nodes, ... }: + let + dnsName = "machine1.nixops-tutorial.aws.nh2.me"; + in + {
Replace
machine1.nixops-tutorial.aws.nh2.me
by whatever your domain is. -
Point your domain name to your server's public IP (from
./mynix nixops info -d example-nginx-deployment
) by creating an DNSA
record to it with your domain registrar.If you use AWS's Route53 for your domains, like I do for my AWS Hosted Zone
aws.nh2.me
, then you can also let NixOps set it to your server's IP automatically, by adding next to the otherdeployment.ec2
options:deployment.route53 = { accessKeyId = awsKeyId; hostName = dnsName; ttl = 1; };
-
Open the HTTPS port 443 in the firewall:
networking.firewall.allowedTCPPorts = [ 80 # HTTP + 443 # HTTPs ];
-
Change your nginx config to reply to your
dnsName
, enable SSL and automatic ACME certificate fetching:# Enable nginx service services.nginx = { enable = true; - virtualHosts."someDefaultHost" = { + virtualHosts.${dnsName} = { default = true; # makes this the default vhost if no other one matches locations."/" = { root = pkgs.writeTextDir "index.html" "Hello world!"; }; + addSSL = true; + enableACME = true; }; };
Now deploy.
You should now be able to visit your domain in your browser with https://
prefix.
If it does not work, there was probably an issue getting a certificate from Let's Encrypt. In that case, SSH into your server and run (replace the domain by yours accordingly):
journalctl -e -u acme-machine1.nixops-tutorial.aws.nh2.me.service
This will show you the last errors of the service that fetches the certificate, hopefully allowing you to diagnose the problem.