Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional error handling for txn to increase stability #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ballard26
Copy link
Contributor

Errors such as OperationNotAttempted and InvalidTxnState can be pretty common while doing transactions on clusters whose nodes are being started and stopped randomly. This PR attempts to recover a producer from these errors.

@jcsp
Copy link
Contributor

jcsp commented Dec 1, 2022

Can you add error counters in the producer status struct?

I suspect there will be some transaction tests that expect no such errors, and some that will expect errors.

As I understand it, the transaction code should cope gracefully with intentional node shutdowns (where leaderships are transferred away first), but is allowed to hit an error when nodes unexpectedly stop. So each test case should know which kind of restarts it is doing, and be tolerant or intolerant of errors accordingly.

@ballard26
Copy link
Contributor Author

Yeah, I'll add an error counter. Will talk to Bharath about whether this behavior is always expected or not. If not I'll add flag to enable this new behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants