Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming / continuous backups? #1030

Open
ardigan6 opened this issue Oct 20, 2024 · 3 comments
Open

Streaming / continuous backups? #1030

ardigan6 opened this issue Oct 20, 2024 · 3 comments
Assignees
Milestone

Comments

@ardigan6
Copy link

What we'd like to do is accept up to N minutes of loss (i.e. much less than the retention window of our queues) and run XXL single node CH instances with sharding / merging externally managed, since CH replicated mergetree is much slower than large nodes with plain mergetree, and inserts are >10x more ops replicated according to the docs.

However, this requires being able to bring up new nodes pretty quickly without a long rebuild process to handle failover / node replacement. We can then replay inserts very easily from the last row.

Right now backups seem too heavy to run every 60s / every N rows committed. We are self-hosters but I notice this is a feature gap in CH cloud too: backups are only daily.

Any plans to make this more efficient?

@Slach
Copy link
Collaborator

Slach commented Oct 20, 2024

Unfortunately, clickhouse-server doesn't allow watching in real time a new data parts outside clickhouse-server process and doesn't have transaction log to achieve PITR backup similar way with MySQL/PostgreSQL.

We thought about system.part_log + LIVE VIEWS https://clickhouse.com/docs/en/sql-reference/statements/create/view#live-view-deprecated, but live view is deprecated now

Moreover, we thought about https://github.com/fsnotify/fsnotify and watch new data parts, make hard links in upload folder and upload data parts to remote storage
but remote storage is not just a "remote disk"
object storage has many restrictions and can't be used as file layout "as is".
We have a old issue #665 without any progress

As some workaround
If you have only local disks
you can try to use following approach
clickhouse-backup watch --full-interval=24h --watch-interval=1m
and remote_storage: custom in config and rsync --link-dest as upload.sh to your XXL clickhouse-server node and come cp -l as download.sh
look details
https://github.com/Altinity/clickhouse-backup/tree/master/test/integration/rsync
and
https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/config-custom-rsync.yml

In any case, we are open to a pull requests.

@Slach Slach added this to the 2.7.0 milestone Oct 20, 2024
@ardigan6
Copy link
Author

ardigan6 commented Oct 20, 2024

OK, thanks. Hoped we were missing something more convenient.

We did consider trying to abuse https://clickhouse.com/docs/en/sql-reference/statements/create/view#window-view-experimental - have you looked at that?

@Slach
Copy link
Collaborator

Slach commented Oct 20, 2024

We don't need aggregation (WINDOW VIEW goal), we need a proper way to watch new data parts

Try workaround which i suggested above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants