Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore-cluster (to different cluster) fails when pssh pool size is smaller than the cluster size #803

Open
serban21 opened this issue Sep 16, 2024 · 0 comments

Comments

@serban21
Copy link

serban21 commented Sep 16, 2024

Project board link

When --pssh-pool-size is smaller than the cluster size the list of hosts is split in multiple batches (see https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/orchestration.py#L57). The problem is that the list of hosts sent to pssh as host_args is not split (the list of old hosts taken from --host-list). So when a cluster of 12 nodes is restored to a different cluster with the same number of nodes with the pool size of 3 the first 3 nodes will get the correct data from the first 3 old nodes in the host list. But the next 3 will get the same data, from the same first 3 nodes (token ranges and SSTables). The end result is quite strange, Cassandra 4 will actually start on all 12 nodes, with errors in logs, and with nodetool status reporting only the first 3 nodes (but running on all nodes).

The solution is simple, split the host lists too. I'll create a PR without tests today, and try to see if I can add tests too.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant