-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: how to check we backup properly #3506
Open
praiskup
wants to merge
1
commit into
fedora-copr:main
Choose a base branch
from
praiskup:praiskup-backup-checking
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
.. _backup_check: | ||
|
||
Check that Fedora Copr Backups are OK | ||
===================================== | ||
|
||
This document explains how Fedora Copr backups are performed, so we can | ||
periodically verify that everything is in place and functioning properly. For | ||
disaster recovery, refer to :ref:`backup_recovery`. | ||
|
||
Copr Backend | ||
------------ | ||
|
||
The backend storage uses a complex RAID setup to provide redundancy directly on | ||
the server (in EC2). Backups are then | ||
`synchronized periodically <https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/roles/rsnapshot-push/tasks/main.yml#_67>`_ | ||
to the storinator01 host as incremental backups via rsnapshot. | ||
|
||
To verify backups, follow these steps: | ||
1. Confirm the timestamp of the most recent backup start. | ||
2. Choose a random build that completed just before that time. | ||
3. Verify that this build was successfully backed up to storinator01. | ||
|
||
|
||
1) SSH into the ``copr-be`` machine and review the ``/var/log/cron`` file. You | ||
may also want to check the output of the latest `crontab -l` to confirm the | ||
backup schedule (typically Fridays) and open an older compressed log file:: | ||
|
||
$ xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD' | ||
... | ||
Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null) | ||
... | ||
|
||
Note that the backup process might take several days. If there’s no | ||
corresponding ``CMDEND`` entry in the cron log, the backup is still in | ||
progress—wait for it to complete or check the previous backup. | ||
|
||
2) Find the build ID, for instance in the ``@copr/copr-pull-requests`` or | ||
``@copr/copr-dev`` projects. For example `8185411 | ||
<https://copr.fedorainfracloud.org/coprs/g/copr/copr-pull-requests/build/8185411/>`_. | ||
|
||
3) SSH into the `storinator01` box and locate the latest incremental backup:: | ||
|
||
$ find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm$ | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el8.x86_64.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.src.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.x86_64.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-9-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el9.x86_64.rpm | ||
... | ||
|
||
This confirms the backups are working correctly. While you’re there, ensure | ||
there is adequate free space on the filesystem by running ``df -h | ||
/srv/nfs/copr-be``. | ||
|
||
|
||
Copr Frontend | ||
------------- | ||
|
||
For Frontend, we only backup the PostgreSQL database (hourly). Check | ||
``/etc/cron.d/cron-backup-database-coprdb`` cron config, and the corresponding | ||
``/backups`` directory. That one should have the current timestamp, like:: | ||
|
||
[root@copr-fe ~][PROD]# ls -alh /backups/ | ||
total 662M | ||
drwxr-xr-x. 1 postgres root 50 Nov 5 01:21 . | ||
dr-xr-xr-x. 1 root root 160 Nov 28 2023 .. | ||
-rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz | ||
|
||
If we provide such an updated tarball, `rdiff-backup | ||
<https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/rdiff-backup/>`_ | ||
periodically comes and pulls the backups "out"; as long as the box is in an | ||
appropriate `Ansible group | ||
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/backups#_4>`_ | ||
and we `configure | ||
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/host_vars/copr-fe.aws.fedoraproject.org#_6>`_ | ||
the backup dir. | ||
|
||
|
||
Copr Keygen | ||
----------- | ||
|
||
We don't do filesystem backups there. The important data —keypairs— are stored | ||
on a separate volume ``/var/lib/copr-keygen``, and periodically snapshotted in | ||
EC2. Check for `the volume <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_. | ||
Volume snapshots may be filtered with ``OriginalVolume=vol-0108e05e229bf7eaf``. | ||
|
||
|
||
We don't perform filesystem backups for this system. Instead, crucial data, | ||
specifically keypairs, are stored on a dedicated volume at | ||
``/var/lib/copr-keygen``, which is regularly snapshotted within EC2. You can | ||
check the current snapshots for this volume in EC2: | ||
|
||
- **Volume ID**: `vol-0108e05e229bf7eaf <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_ | ||
- **Snapshot Filter**: Use ``OriginalVolume=vol-0108e05e229bf7eaf`` to list all related snapshots in the AWS console. | ||
|
||
|
||
Copr DistGit | ||
------------ | ||
|
||
Due to Copr's design (see :ref:`architecture <architecture>`), Copr DistGit data | ||
is extensive, measuring in terabytes, yet it’s not critical enough to require | ||
formal backups. It primarily serves as a temporary "proxy" between Copr and | ||
upstream repositories. The reliability of the EC2 volume is adequate for this | ||
purpose, and in the event of a complete failure, we would simply initialize a | ||
new, empty volume. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xsuchy would you mind checking the policy for automated creating of snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xsuchy gently ping :)