-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
align with grpc base/balancer to trigger reconnect in Idle state (when we move to gRPC 1.41+) #335
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
a7a4fce
build(deps): bump github.com/prometheus/client_golang
dependabot[bot] 9a244c0
align with grpc base/balancer to trigger reconnect in Idle state
ddowker 428e6ab
Merge branch 'master' into broker-restart-reconnect
ddowker bb9682e
Merge remote-tracking branch 'upstream/dependabot/go_modules/github.c…
ddowker 048cd39
Merge remote-tracking branch 'origin/master' into broker-restart-reco…
ddowker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it important to return here, where we don't return with the Connect() call below?
As far as I can tell, the purpose is to not call UpdateState below. Is that true, and why would it a problem to do so?
(I'm trying to figure out if this code block could be eliminated, and instead have the check below be the only place where Connect() is called).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the core problem (as I understand it) is that Idle sub-connections won't actually re-connect unless
sc.Connect()
is explicitly called, right?Was this a change in gRPC behavior? I'm confused how this isn't a problem we've experienced ourselves. For that matter, help me understand: how does a previously-active SubConn come to be Idle again (my very old recollection is they bounce between failure and
Connecting
)?In any case, since the check below will cause Connect to be called, is there a particular reason that we don't want to update
connState[sc]
if it'sstate.Idle
now?Put another way, would the root issue would be resolve with just the addition below of:
Almost certainly I'm missing something 🤷 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I am going to re-look into the changes that occurred in gRPC over time, as it has been updated in our master a couple of times, to see if it plays any part. Agree that it is strange it has not popped up in your deployment.
A lot of my investigation was the evolution (via blame) in https://github.com/grpc/grpc-go/blob/master/balancer/base/balancer.go#L180 as we appeared to be lined up pretty closely with the overall pattern of UpdateSubConnState() but they have been tweaking it over time.
I will get back with hopefully good answers to your questions and your simplification may be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking but i think this gRPC PR grpc/grpc-go@03268c8#diff-f9772420643575b997f617c6d4c1934aaa26f057042c7fa71a521ef5bb2af253 is the one where gRPC have introduced the transition to the Idle state and adjusted their own example loadbalancer to work with that.
In their example balancer/base/balancer.go they were returning in the top check for Idle state and bypassing the call to UpdateState() so i followed that. I can dig deeper on that to see why that may or may not make a difference.
In looking deeper i think this change above may be in a later version of gRPC than the gazette repo is using (as gazette go.mod shows v1.40.0). Our application mono-repo uses gRPC in a lot of services and our go.mod is at v.1.52.0. It may explain why you do not see this issue and we do (in our consumers). Let me confirm the exact version that the commit above shows up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the gRPC change above was for the 1.41.0 release which the gazette repo does not yet use. They spell out the behavior changes here: https://github.com/grpc/grpc-go/releases/tag/v1.41.0 eventhough the commit i link to is not explicitly called out it follows the
balancer
#4613 PR they mention.So my PR may be premature for wider release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will talk to Michael when he comes back next week to see if we should close this PR to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgraettinger i guess these(or similar) changes will be required when gRPC is upgraded to 1.41+. For now i changed the title to reflect this. I will leave open for now but will close if you think that is best.