Draft: prevent follower panic on commit index regression #25
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
panic: tocommit(%d) is out of range [lastIndex(%d)]. Was the raft log corrupted, truncated, or lost?
The follower recovers from commit index regression and sends MsgHeartbeatResp with a rejection
The leader receives MsgHeartbeatResp rejection, reconciles and decreases follower progress
Logs:
Leader Node:
2023/02/11 16:45:27 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=20 next=21]
2023/02/11 16:45:27 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=20 next=21]
2023/02/11 16:45:27 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=20 next=21]
2023/02/11 16:45:27 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=20 next=21]
2023/02/11 16:45:28 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=20 next=21]
2023/02/11 16:45:28 INFO: 1 received MsgHeartbeatResp(rejected, hint: (index 2)) from 2 for index 20
2023/02/11 16:45:28 INFO: 1 decreased progress of 2 to [StateProbe match=2 next=3]
Follower Node:
2023/02/11 16:45:28 INFO: 2 became follower at term 5
2023/02/11 16:45:28 PANIC: tocommit(20) is out of range [lastIndex(2)]. Was the raft log corrupted, truncated, or lost?
2023/02/11 16:45:28 INFO: raft.node: 2 elected leader 1 at term 5
It also sends a snapshot
Leader Node:
2023/02/11 18:41:51 INFO: 1 failed to send message to 2 because it is unreachable [StateProbe match=14 next=15]
2023/02/11 18:41:52 INFO: 1 received MsgHeartbeatResp(rejected, hint: (index 2)) from 2 for index 14
2023/02/11 18:41:52 INFO: 1 decreased progress of 2 to [StateProbe match=2 next=3]
2023/02/11 18:41:52 INFO: 1 [firstindex: 8, commit: 14] sent snapshot[index: 12, term: 6] to 2 [StateProbe match=2 next=3]
2023/02/11 18:41:52 INFO: 1 paused sending replication messages to 2 [StateSnapshot match=2 next=3 paused pendingSnap=12]
2023/02/11 18:41:52 INFO: 1 snapshot succeeded, resumed sending replication messages to 2 [StateProbe match=2 next=13]
Follower Node:
2023/02/11 18:41:52 INFO: log [committed=2, applied=2, unstable.offset=3, len(unstable.Entries)=0] starts to restore snapshot [index: 12, term: 6]
2023/02/11 18:41:52 INFO: 2 switched to configuration voters=(1 2)
2023/02/11 18:41:52 INFO: 2 [commit: 12, lastindex: 12, lastterm: 6] restored snapshot [index: 12, term: 6]
2023/02/11 18:41:52 INFO: 2 [commit: 12] restored snapshot [index: 12, term: 6]