Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix #1135 BZ 2214288 | Remove Printing inside rpc_WS to Avoid Race Condition #1139

Merged
merged 1 commit into from
Jun 18, 2023

Conversation

shirady
Copy link
Contributor

@shirady shirady commented Jun 18, 2023

Explain the changes

  1. Remove printing inside rpc_ws.go to avoid a race condition (anyway, there was a plan to refactor and remove it in RPC WS refactor #811 ).

Some details about the investigation:

  • PendingRequests is a map inside RPCConnWS.
  • 2 goroutines are relevant in this scope: goroutine SendPings() and goroutine ReadMessages() which inside it we have HandleResponse().
  • When we call HandleResponse() we lock the part of the code which deletes an element from a map - but a concurrent goroutine might at this time call ping and inside it print RPCConnW which contains the map.
  • We can have a concurrent write (inside HandleResponse we delete) and iterate over the map (inside ping read the map).
  • In order to help with the investigation we added before the deep-print (%+v of the struct RPCConnW) a printing of each property separately. When we had an operator crash the error message changed to fatal error: concurrent map iteration and map write.

Issues: Fixed #1135 , BZ 2214288

  1. Both issues are about the restarts of the operator pod, same stack trace.

Testing Instructions:

  1. Deploy Noobaa On Minikube (Instructions are in draft Doc | Deploy Noobaa On Minikube or Rancher Desktop (For Developers) #1097).
  2. See if there are any restarts in the operator pod when running kubectl get pods.
  • Doc added/updated
  • Tests added

@shirady shirady merged commit a4d35ac into noobaa:master Jun 18, 2023
@shirady shirady deleted the remove-printings branch June 19, 2023 05:56
@guymguym
Copy link
Member

A classic Heisenbug !!!

Heisenbugs occur because common attempts to debug a program, such as inserting output statements or running it with a debugger, usually have the side-effect of altering the behavior of the program in subtle ways, such as changing the memory addresses of variables and the timing of its execution.

https://en.wikipedia.org/wiki/Heisenbug

@shirady @nimrod-becker @romayalon

@shirady shirady self-assigned this Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NooBaa operator pod is in panic sometimes :)
3 participants