Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align with grpc base/balancer to trigger reconnect in Idle state (when we move to gRPC 1.41+) #335

Merged
merged 5 commits into from
Nov 21, 2023

Conversation

ddowker
Copy link
Contributor

@ddowker ddowker commented Mar 16, 2023

When a k8s rolling restart occurs (e.g. kubectl rollout restart deployment gazette) we see the broker's IP addresses change and the consumers need to be restarted to re-establish their GRPC connections. Without a restart of the consumers they have their default service SubConns waiting forever in the connectivity Idle state on their open Read GRPCs.

This change adds extra Idle state handling to the dispatcher's UpdateSubConnState method to align with the GRPC base balancer's implementation (https://github.com/grpc/grpc-go/blob/master/balancer/base/balancer.go#L180) where it tries to connect if in the Idle state.

The end result of this change is that connectivity is re-established from the consumers to the brokers (after the brokers have restarted) but these new connections often go via the default service SubConn and so do not honor the intra-zone routing preferences (as the Pick method is called without a Route) as outlined below. The changes do not appear to affect the existing setting up of SubConns and improve the robustness under restarts (though not perfect from a zone perspective).

Prior to the change:

After initial system bringup or restarting of the consumers the GRPC connections are following the expected intra-zone routing (in the case below the consumer is in zone: us-central1-b):

2023/03/16 04:25:08.256374	111.300601	/protocol.Journal/Read
04:25:08.256376	 .     2	... RPC: to 10.40.1.42:8080 deadline:none
04:25:08.256382	 .     6	... Pick(Route: members:<zone:"us-central1-a" suffix:"gazette-756b95d7d4-jbr4k" > members:<zone:"us-central1-b" suffix:"gazette-756b95d7d4-hbzqx" > members:<zone:"us-central1-c" suffix:"gazette-756b95d7d4-htr8b" > primary:2 endpoints:"http://10.40.2.46:8080" endpoints:"http://10.40.1.42:8080" endpoints:"http://10.40.0.34:8080" , ID: ) => zone:"us-central1-b" suffix:"gazette-756b95d7d4-hbzqx" (READY)
04:25:08.256405	 .    23	... sent: journal:"arize-edge/records/part=000" offset:6687 block:true do_not_proxy:true

Perform a kubectl rollout restart deployment gazette and the open Read GRPCs remain in the Idle state. Sometimes a few (~25% in my case) of the SubConns do successfully navigate the transition. Those that do were found to have received a NOT_JOURNAL_BROKER response in their broker/client/reader.go Read method which made them pass in a full Route to the Pick routine (not shown below as this case all Read GRPCs went Idle). This non-empty Route often is composed of new and old broker information though similar to #215 but potentially that was a temporary situation that was happening when the Pick was triggered.

2023/03/16 04:30:36.264197	300.525850	/protocol.Journal/Read
04:30:36.264201	 .     4	... RPC: to <nil> deadline:none
04:30:36.264208	 .     7	... Pick(Route: primary:-1 , ID: ) => (IDLE)
2023/03/16 04:30:36.246371	300.543793	/protocol.Journal/Read
04:30:36.246375	 .     4	... RPC: to <nil> deadline:none
04:30:36.246383	 .     8	... Pick(Route: primary:-1 , ID: ) => (IDLE)

After the change:

(ignore that extra DJD traces that dump the dispatcher structures)

Again at initial system bringup or restarting of the consumers the GRPC connections are following the expected intra-zone routing (in the case below the consumer is in zone: us-central1-c):

2023/03/16 03:27:01.396172	517.010541	/protocol.Journal/Read
03:27:01.396176	 .     4	... RPC: to 10.40.0.20:8080 deadline:none
03:27:01.396181	 .     4	... DJD Pick idConn: map[{Zone: Suffix:}:{subConn:0xc00b9c9608 mark:3}], connID: map[0xc00b9c9608:{Zone: Suffix:}], connState: map[0xc00b9c9608:READY], zone: us-central1-c
03:27:01.396188	 .     8	... DJD1 id: {Zone:us-central1-c Suffix:gazette-756b95d7d4-4f2hk}
03:27:01.396191	 .     3	... Pick(Route: members:<zone:"us-central1-a" suffix:"gazette-756b95d7d4-mq8kf" > members:<zone:"us-central1-b" suffix:"gazette-756b95d7d4-lkvkk" > members:<zone:"us-central1-c" suffix:"gazette-756b95d7d4-4f2hk" > primary:2 endpoints:"http://10.40.2.22:8080" endpoints:"http://10.40.1.23:8080" endpoints:"http://10.40.0.20:8080" , ID: ) => zone:"us-central1-c" suffix:"gazette-756b95d7d4-4f2hk" (READY)
03:27:01.396192	 .      	... DJD11
03:27:01.396228	 .    37	... sent: journal:"arize-edge/records/part=000" offset:5201 block:true do_not_proxy:true
03:28:47.935838	106.539610	... recv: offset:4458 write_head:5201 fragment:<journal:"arize-edge/records/part=000" begin:4458 end:520
03:28:47.935888	 .    50	... recv: offset:4458 write_head:5201 fragment:<journal:"arize-edge/records/part=000" begin:4458 end:520

Perform a kubectl rollout restart deployment gazette and the open Read GRPCs reattach to the brokers.
A few of them that went through the NOT_JOURNAL_BROKER handling were given a non-empty route and reattach directly to a broker pod. Note in the case below the first Pick route entry (gazette-756b95d7d4-mq8kf) is actually in the process of being replaced but is not chosen due its zone. I think a GRPC can cross zones if the new broker for that zone is not present in the list at the time of the Pick call.

2023/03/16 03:38:00.196490	6.145368	/protocol.Journal/Read
03:38:00.196494	 .     5	... RPC: to 10.40.0.23:8080 deadline:none
03:38:00.196501	 .     7	... DJD Pick idConn: map[{Zone: Suffix:}:{subConn:0xc00b9c9608 mark:25} {Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs}:{subConn:0xc00b490648 mark:24}], connID: map[0xc00b490648:{Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs} 0xc00b9c9608:{Zone: Suffix:}], connState: map[0xc00b490648:READY 0xc00b9c9608:READY], zone: us-central1-c
03:38:00.196509	 .     8	... DJD1 id: {Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs}
03:38:00.196510	 .     1	... DJD2
03:38:00.196919	 .    44	... (7 events discarded)
03:38:00.197589	 .   670	... DJD Pick idConn: map[{Zone: Suffix:}:{subConn:0xc00b9c9608 mark:25} {Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs}:{subConn:0xc00b490648 mark:24}], connID: map[0xc00b490648:{Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs} 0xc00b9c9608:{Zone: Suffix:}], connState: map[0xc00b490648:READY 0xc00b9c9608:READY], zone: us-central1-c
03:38:00.197593	 .     4	... DJD1 id: {Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs}
03:38:00.197594	 .     1	... Pick(Route: members:<zone:"us-central1-a" suffix:"gazette-756b95d7d4-mq8kf" > members:<zone:"us-central1-b" suffix:"gazette-574d6db9d6-8j575" > members:<zone:"us-central1-c" suffix:"gazette-574d6db9d6-mnwrs" > endpoints:"http://10.40.2.22:8080" endpoints:"http://10.40.1.25:8080" endpoints:"http://10.40.0.23:8080" , ID: ) => zone:"us-central1-c" suffix:"gazette-574d6db9d6-mnwrs" (READY)
03:38:00.197594	 .      	... DJD11
03:38:00.197645	 .    51	... sent: journal:"arize-edge/pre-production-records/part=000" block:true do_not_proxy:true

Most of the open Read GRPCs actually have the Pick method called with no route and connect via the default service SubConn (10.44.x.x address) which does not trigger any intra-zone rule handling to occur.

2023/03/16 03:38:00.835925	5.504142	/protocol.Journal/Read
03:38:00.835928	 .     2	... RPC: to 10.44.12.190:8080 deadline:none
03:38:00.835931	 .     3	... DJD Pick idConn: map[{Zone: Suffix:}:{subConn:0xc00b9c9608 mark:25} {Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs}:{subConn:0xc00b490648 mark:24}], connID: map[0xc00b490648:{Zone:us-central1-c Suffix:gazette-574d6db9d6-mnwrs} 0xc00b9c9608:{Zone: Suffix:}], connState: map[0xc00b490648:READY 0xc00b9c9608:READY], zone: us-central1-c
03:38:00.835933	 .     3	... DJD1 id: {Zone: Suffix:}
03:38:00.835935	 .     2	... Pick(Route: primary:-1 , ID: ) => (READY)
03:38:00.835936	 .      	... DJD11
03:38:00.835959	 .    23	... sent: journal:"arize-edge/pre-production-records/part=003" block:true do_not_proxy:true


This change is Reviewable

dependabot bot and others added 2 commits February 15, 2023 02:41
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.11.0 to 1.11.1.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.11.0...v1.11.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Copy link
Contributor

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions below. I knew this stuff pretty well at one point, but it's mostly left my head and I need to be hand-held through it a bit more. Thanks! And sorry for the delay

sc.Connect()
}
d.mu.Unlock()
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important to return here, where we don't return with the Connect() call below?
As far as I can tell, the purpose is to not call UpdateState below. Is that true, and why would it a problem to do so?

(I'm trying to figure out if this code block could be eliminated, and instead have the check below be the only place where Connect() is called).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the core problem (as I understand it) is that Idle sub-connections won't actually re-connect unless sc.Connect() is explicitly called, right?

Was this a change in gRPC behavior? I'm confused how this isn't a problem we've experienced ourselves. For that matter, help me understand: how does a previously-active SubConn come to be Idle again (my very old recollection is they bounce between failure and Connecting)?

In any case, since the check below will cause Connect to be called, is there a particular reason that we don't want to update connState[sc] if it's state.Idle now?

Put another way, would the root issue would be resolve with just the addition below of:

	if state.ConnectivityState == connectivity.Idle {
		sc.Connect()
	}

Almost certainly I'm missing something 🤷 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I am going to re-look into the changes that occurred in gRPC over time, as it has been updated in our master a couple of times, to see if it plays any part. Agree that it is strange it has not popped up in your deployment.

A lot of my investigation was the evolution (via blame) in https://github.com/grpc/grpc-go/blob/master/balancer/base/balancer.go#L180 as we appeared to be lined up pretty closely with the overall pattern of UpdateSubConnState() but they have been tweaking it over time.

I will get back with hopefully good answers to your questions and your simplification may be valid.

Copy link
Contributor Author

@ddowker ddowker Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking but i think this gRPC PR grpc/grpc-go@03268c8#diff-f9772420643575b997f617c6d4c1934aaa26f057042c7fa71a521ef5bb2af253 is the one where gRPC have introduced the transition to the Idle state and adjusted their own example loadbalancer to work with that.

In their example balancer/base/balancer.go they were returning in the top check for Idle state and bypassing the call to UpdateState() so i followed that. I can dig deeper on that to see why that may or may not make a difference.

In looking deeper i think this change above may be in a later version of gRPC than the gazette repo is using (as gazette go.mod shows v1.40.0). Our application mono-repo uses gRPC in a lot of services and our go.mod is at v.1.52.0. It may explain why you do not see this issue and we do (in our consumers). Let me confirm the exact version that the commit above shows up.

Copy link
Contributor Author

@ddowker ddowker Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the gRPC change above was for the 1.41.0 release which the gazette repo does not yet use. They spell out the behavior changes here: https://github.com/grpc/grpc-go/releases/tag/v1.41.0 eventhough the commit i link to is not explicitly called out it follows the balancer #4613 PR they mention.

So my PR may be premature for wider release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will talk to Michael when he comes back next week to see if we should close this PR to master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgraettinger i guess these(or similar) changes will be required when gRPC is upgraded to 1.41+. For now i changed the title to reflect this. I will leave open for now but will close if you think that is best.

@ddowker ddowker changed the title align with grpc base/balancer to trigger reconnect in Idle state align with grpc base/balancer to trigger reconnect in Idle state (when we move to gRPC 1.41+) Apr 27, 2023
@ddowker
Copy link
Contributor Author

ddowker commented Nov 6, 2023

Just an FYI. Only the dispatcher.go changes relate to this PR. We somewhat polluted this PR by merging in open PR #331 (go.mod, go.sum) into our fork (as we required it also in our product). It should have been done on a new branch on our fork.

@jgraettinger
Copy link
Contributor

Thanks, sorry for the long delay here. We've got a further PR that builds upon this PR to update gRPC to 1.59, that we've been testing. We're planning to land both this and that PR on Monday.

@ddowker
Copy link
Contributor Author

ddowker commented Nov 17, 2023

Thanks for the heads up.

@mdibaiee mdibaiee merged commit def1636 into gazette:master Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants