cluster_3_racks_multi_shotover_with_2_shotover_down fix intermittent failures #1813
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the intermittent failures in cluster_3_racks_multi_shotover_with_2_shotover_down.
Locally I could reproduce the issue in about 1/5 test runs. I've run this PR for 40 runs and not reproduced the issue.
My understanding of the problem is:
My understanding is this behavior of the driver is entirely reasonable because at any time the client could crash and any records that were consumed but not committed will be reconsumed by whatever client takes the place of the crashed client. So for the driver to be sure that records will not be duplicated it needs to commit them after any processing is done.
So the issue is purely with our test not in shotover's implementation.
The fix to the test is to just add a commit call to the consumer before we kill the shotover nodes.