-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup-cluster backing up SSTables that have not changed. #775
Comments
Hi @chrisjmiller1 ! This looks very similar to something we recently fixed in #754. I don't think that PR made it to any release, so the quickest way to "fix" this for you is to start using the But, if you're already using it and this is still happening, then I'm wrong and you're probably hitting some other issue. |
I had the same problem and I can confirm, that using a prefix fixed my problem. |
Thanks @rzvoncek and @MarcLangsdorf - I will try the prefix option and will let you know how it goes. |
Thanks folks, that worked. |
Hi @rzvoncek and @MarcLangsdorf, I'm now trying to restore a backup using this configuration but getting the following errors.
Contents of medusa.log
Any ideas? |
Hi @chrisjmiller1. This looks like you're doing restore with the prefix present. Since the prefix is something that separates data on the storage backend, it's very likely it simply doesn't see the backup done with different (none) prefix. This should get resolved automatically if you do a backup with the new prefix (althought that backup will have to upload everything). A restore after doing this should be fine. We could, of course, go tinker with how files are organised in the storage, but that would be manual and error-prone. |
Hi @rzvoncek , yes, restore was completed with the prefix present as set in the medusa.ini. All the backups were also taken with the same prefix. So I would have expected things to have been resolved automatically. Is there anything else that could result in this behaviour? Note that I also cleared down everything in the bucket after changing the prefix parameter to ensure the index was clean. Chris. |
Ugh, this is starting to look ugly. But let's dig deeper. From the log snippet I see two executions happening. There are two lines:
Their differ a bit with how their log lines look like. But it's hard to me to reverse-engineer what precisely they were doing. Could you please tell me what two commands you ran to cause those logs? And what exact version (or commit) of Medusa are you running? |
Hi @rzvoncek , |
OKay and does the list command see the backups you'd expect? |
Hi @rzvoncek , yes and I used the latest backup name in the restore command. |
btw using 0.21.0 |
Hi @chrisjmiller1 , I'm having tough time reproducing this. Aside from actually deleting an object from the storage, I cannot get this (a 404 error) to happen. Are you hitting this consistenlty? Could you come up with a sequence of steps to reproduce this deterministically? Another idea to narrow this down is to add a try-catch with a log printing the key being accessed into |
HI @rzvoncek , could you provide the complete code you would like me to update in s3_base_storage.py. Thanks. |
The
Let's make it something like:
|
hi @rzvoncek ,
And from medusa.log
|
Okay, so now we have an URI to check:
Does this actually exist ? |
Hi @rzvoncek
Chris. |
I wasn't able to work out how would the extra prefix get there. I've tried the following on both 0.21.0 and
(the config file had a prefix set in it as well). The behaviour I observed was that:
I'll try to debug this more, but I'll only get to it later. |
Hi @rzvoncek - thanks for the update. FYI, I am only using prefix in the config file not on the command line. |
Hi @rzvoncek , just checking in to see how this issue is progressing. |
Project board link
Hi,
I'm currently testing Medusa and have created a 3 node cluster with 160GB of data on each instance with 7 keyspaces, 1 table in each keyspace.
I took a differential backup using backup-cluster.
I then completed a compaction of one of the keyspaces on one instance expecting that only the "new" SSTables would only be uploaded but all the SSTables were uploaded.
See below screen shot from OCI storage bucket which shows the same SSTables have been re-uploaded.
If no keyspaces are compacted then medusa seems to recognise that files are already uploaded and doesn't re-upload them.
Here are the backup summaries from each instance:
Instance #1
[2024-06-05 10:45:07,441] INFO: Backup done
[2024-06-05 10:45:07,441] INFO: - Started: 2024-06-05 09:30:32
- Started extracting data: 2024-06-05 09:30:33
- Finished: 2024-06-05 10:45:07
[2024-06-05 10:45:07,441] INFO: - Real duration: 1:14:33.865099 (excludes time waiting for other nodes)
[2024-06-05 10:45:07,446] INFO: - 733 files, 159.15 GB
[2024-06-05 10:45:07,446] INFO: - 733 files copied from host (733 new, 0 reuploaded)
[2024-06-05 10:45:07,446] INFO: - 0 kept from previous backup (20240605-cluster-1)
Instance #2
[2024-06-05 09:31:41,530] INFO: Backup done
[2024-06-05 09:31:41,530] INFO: - Started: 2024-06-05 09:30:32
- Started extracting data: 2024-06-05 09:30:35
- Finished: 2024-06-05 09:31:41
[2024-06-05 09:31:41,530] INFO: - Real duration: 0:01:06.444491 (excludes time waiting for other nodes)
[2024-06-05 09:31:41,531] INFO: - 826 files, 162.80 GB
[2024-06-05 09:31:41,531] INFO: - 120 files copied from host
[2024-06-05 09:31:41,531] INFO: - 706 copied from previous backup (20240604-cluster-1)
Instance #3
[2024-06-05 09:31:36,955] INFO: Backup done
[2024-06-05 09:31:36,955] INFO: - Started: 2024-06-05 09:30:32
- Started extracting data: 2024-06-05 09:30:34
- Finished: 2024-06-05 09:31:36
[2024-06-05 09:31:36,955] INFO: - Real duration: 0:01:02.416448 (excludes time waiting for other nodes)
[2024-06-05 09:31:36,956] INFO: - 875 files, 162.80 GB
[2024-06-05 09:31:36,957] INFO: - 112 files copied from host
[2024-06-05 09:31:36,957] INFO: - 763 copied from previous backup (20240604-cluster-1)
Thanks in advance,
Chris.
┆Issue is synchronized with this Jira Story by Unito
┆Fix Versions: 2024-10,2024-11
┆Issue Number: MED-2
The text was updated successfully, but these errors were encountered: