Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: how to check we backup properly #3506

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions doc/maintenance/backup_check.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
.. _backup_check:

Check that Fedora Copr Backups are OK
=====================================

This document explains how Fedora Copr backups are performed, so we can
periodically verify that everything is in place and functioning properly. For
disaster recovery, refer to :ref:`backup_recovery`.

Copr Backend
------------

The backend storage uses a complex RAID setup to provide redundancy directly on
the server (in EC2). Backups are then
`synchronized periodically <https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/roles/rsnapshot-push/tasks/main.yml#_67>`_
to the storinator01 host as incremental backups via rsnapshot.

To verify backups, follow these steps:
1. Confirm the timestamp of the most recent backup start.
2. Choose a random build that completed just before that time.
3. Verify that this build was successfully backed up to storinator01.


1) SSH into the ``copr-be`` machine and review the ``/var/log/cron`` file. You
may also want to check the output of the latest `crontab -l` to confirm the
backup schedule (typically Fridays) and open an older compressed log file::

$ xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD'
...
Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null)
...

Note that the backup process might take several days. If there’s no
corresponding ``CMDEND`` entry in the cron log, the backup is still in
progress—wait for it to complete or check the previous backup.

2) Find the build ID, for instance in the ``@copr/copr-pull-requests`` or
``@copr/copr-dev`` projects. For example `8185411
<https://copr.fedorainfracloud.org/coprs/g/copr/copr-pull-requests/build/8185411/>`_.

3) SSH into the `storinator01` box and locate the latest incremental backup::

$ find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm$
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el8.x86_64.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.src.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.x86_64.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-9-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el9.x86_64.rpm
...

This confirms the backups are working correctly. While you’re there, ensure
there is adequate free space on the filesystem by running ``df -h
/srv/nfs/copr-be``.


Copr Frontend
-------------

For Frontend, we only backup the PostgreSQL database (hourly). Check
``/etc/cron.d/cron-backup-database-coprdb`` cron config, and the corresponding
``/backups`` directory. That one should have the current timestamp, like::

[root@copr-fe ~][PROD]# ls -alh /backups/
total 662M
drwxr-xr-x. 1 postgres root 50 Nov 5 01:21 .
dr-xr-xr-x. 1 root root 160 Nov 28 2023 ..
-rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz

If we provide such an updated tarball, `rdiff-backup
<https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/rdiff-backup/>`_
periodically comes and pulls the backups "out"; as long as the box is in an
appropriate `Ansible group
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/backups#_4>`_
and we `configure
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/host_vars/copr-fe.aws.fedoraproject.org#_6>`_
the backup dir.


Copr Keygen
-----------

We don't do filesystem backups there. The important data —keypairs— are stored
on a separate volume ``/var/lib/copr-keygen``, and periodically snapshotted in
EC2. Check for `the volume <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_.
Volume snapshots may be filtered with ``OriginalVolume=vol-0108e05e229bf7eaf``.


We don't perform filesystem backups for this system. Instead, crucial data,
specifically keypairs, are stored on a dedicated volume at
``/var/lib/copr-keygen``, which is regularly snapshotted within EC2. You can
check the current snapshots for this volume in EC2:

- **Volume ID**: `vol-0108e05e229bf7eaf <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_
- **Snapshot Filter**: Use ``OriginalVolume=vol-0108e05e229bf7eaf`` to list all related snapshots in the AWS console.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xsuchy would you mind checking the policy for automated creating of snapshots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xsuchy gently ping :)



Copr DistGit
------------

Due to Copr's design (see :ref:`architecture <architecture>`), Copr DistGit data
is extensive, measuring in terabytes, yet it’s not critical enough to require
formal backups. It primarily serves as a temporary "proxy" between Copr and
upstream repositories. The reliability of the EC2 volume is adequate for this
purpose, and in the event of a complete failure, we would simply initialize a
new, empty volume.
1 change: 1 addition & 0 deletions doc/maintenance_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This section contains information about maintenance topics. You may also be inte
Fedora Copr hypervisors <maintenance/hypervisors>
Fedora Copr outage announcements <maintenance/announce_outage>
Fedora Copr credentials <maintenance/credentials>
How to check we do backup <maintenance/backup_check>


.. toctree::
Expand Down