-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafe recovery partially fills key range hole #6859
Comments
Maybe not relevant, just for references, region 1965 received 1 vote from the dead store 1
Members:
|
I can't find any clue from the log. I think the the snapshot related stuff was "ok" in this case, the key is to find out why PD decided to tombstone 1965 on store 216 (tikv3), this only happens when another newer region covers the range of 1965, but from the log, I could not find such regions. @overvenus I suggest we add some info log in PD, print out any overlap regions while building the range tree. And wait for this problem occur again? |
Besides adding logs, can we check if all regions have quorum replicas alive before exiting unsafe recovery? |
…6959) ref #6859 Add log for overlapping regions in unsafe recovery. We were unable to find the root cause of #6859, adding this log may help us better identify the issue, by printing out the regions that overlap with each other, that causes some of them to be marked as tombstone. Signed-off-by: Yang Zhang <[email protected]>
Bug Report
On a 4-node TiKV cluster, we stops two nodes and then starts unsafe recovery using pd-ctl.
After unsafe recovery, we find there are lots of PD server timeout, and it turns out there is
a region fails to be created.
Failed TiKV: tikv-0 and tikv-1
Alive TiKV: tikv-2 and tikv-3
Original region ID: 1965
New region ID: 2991
Timeline:
There are actually two questions:
four nodes cluster should not lost replica data completely.
Note: the issue is found on a multi-rocksdb cluster. But I think it may affect single rocksdb cluster too.
Log:
What did you do?
See above.
What version of PD are you using (
pd-server -V
)?v7.1.0
The text was updated successfully, but these errors were encountered: