From 5de42be9c56729837142a743738269a5086d1e5f Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Tue, 5 Nov 2024 12:19:35 +0100 Subject: [PATCH] docs: how to check we backup properly Fixes: #3390 --- doc/maintenance/backup_check.rst | 104 ++++++++++++++++++++++++++++++ doc/maintenance_documentation.rst | 1 + 2 files changed, 105 insertions(+) create mode 100644 doc/maintenance/backup_check.rst diff --git a/doc/maintenance/backup_check.rst b/doc/maintenance/backup_check.rst new file mode 100644 index 000000000..faa398082 --- /dev/null +++ b/doc/maintenance/backup_check.rst @@ -0,0 +1,104 @@ +.. _backup_check: + +Check that Fedora Copr Backups are OK +===================================== + +This document explains how Fedora Copr backups are performed, so we can +periodically verify that everything is in place and functioning properly. For +disaster recovery, refer to :ref:`backup_recovery`. + +Copr Backend +------------ + +The backend storage uses a complex RAID setup to provide redundancy directly on +the server (in EC2). Backups are then +`synchronized periodically `_ +to the storinator01 host as incremental backups via rsnapshot. + +To verify backups, follow these steps: +1. Confirm the timestamp of the most recent backup start. +2. Choose a random build that completed just before that time. +3. Verify that this build was successfully backed up to storinator01. + + +1) SSH into the ``copr-be`` machine and review the ``/var/log/cron`` file. You + may also want to check the output of the latest `crontab -l` to confirm the + backup schedule (typically Fridays) and open an older compressed log file:: + + $ xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD' + ... + Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null) + ... + + Note that the backup process might take several days. If there’s no + corresponding ``CMDEND`` entry in the cron log, the backup is still in + progress—wait for it to complete or check the previous backup. + +2) Find the build ID, for instance in the ``@copr/copr-pull-requests`` or + ``@copr/copr-dev`` projects. For example `8185411 + `_. + +3) SSH into the `storinator01` box and locate the latest incremental backup:: + + $ find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm$ + /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el8.x86_64.rpm + /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.src.rpm + /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.x86_64.rpm + /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-9-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el9.x86_64.rpm + ... + +This confirms the backups are working correctly. While you’re there, ensure +there is adequate free space on the filesystem by running ``df -h +/srv/nfs/copr-be``. + + +Copr Frontend +------------- + +For Frontend, we only backup the PostgreSQL database (hourly). Check +``/etc/cron.d/cron-backup-database-coprdb`` cron config, and the corresponding +``/backups`` directory. That one should have the current timestamp, like:: + + [root@copr-fe ~][PROD]# ls -alh /backups/ + total 662M + drwxr-xr-x. 1 postgres root 50 Nov 5 01:21 . + dr-xr-xr-x. 1 root root 160 Nov 28 2023 .. + -rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz + +If we provide such an updated tarball, `rdiff-backup +`_ +periodically comes and pulls the backups "out"; as long as the box is in an +appropriate `Ansible group +`_ +and we `configure +`_ +the backup dir. + + +Copr Keygen +----------- + +We don't do filesystem backups there. The important data —keypairs— are stored +on a separate volume ``/var/lib/copr-keygen``, and periodically snapshotted in +EC2. Check for `the volume `_. +Volume snapshots may be filtered with ``OriginalVolume=vol-0108e05e229bf7eaf``. + + +We don't perform filesystem backups for this system. Instead, crucial data, +specifically keypairs, are stored on a dedicated volume at +``/var/lib/copr-keygen``, which is regularly snapshotted within EC2. You can +check the current snapshots for this volume in EC2: + +- **Volume ID**: `vol-0108e05e229bf7eaf `_ +- **Snapshot Filter**: Use ``OriginalVolume=vol-0108e05e229bf7eaf`` to list all related snapshots in the AWS console. + + +Copr DistGit +------------ + +Due to Copr's design (see :ref:`architecture `), Copr DistGit data +is extensive, measuring in terabytes, yet it’s not critical enough to require +formal backups. It primarily serves as a temporary "proxy" between Copr and +upstream repositories. The reliability of the EC2 volume is adequate for this +purpose, and in the event of a complete failure, we would simply initialize a +new, empty volume. diff --git a/doc/maintenance_documentation.rst b/doc/maintenance_documentation.rst index a62f1c5e4..8c88c9e9c 100644 --- a/doc/maintenance_documentation.rst +++ b/doc/maintenance_documentation.rst @@ -17,6 +17,7 @@ This section contains information about maintenance topics. You may also be inte Fedora Copr hypervisors Fedora Copr outage announcements Fedora Copr credentials + How to check we do backup .. toctree::