-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Add member replace e2e test to repro issue 17052 #17100
Conversation
Hi @ZhouJianMS. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8701692
to
595b7a2
Compare
b7c09c1
to
78298d1
Compare
Signed-off-by: ZhouJianMS <[email protected]>
78298d1
to
2fb9fce
Compare
Signed-off-by: ZhouJianMS <[email protected]>
2fb9fce
to
ea9334a
Compare
I reproduced two issues based on this PR,
|
When I use |
time.Sleep(etcdserver.HealthInterval) | ||
|
||
t.Logf("Removing member %s", memberName) | ||
_, err = c.MemberRemove(ctx, memberID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client selects that member which is going to be removed, it might return error because that server stops.
client won't retry if the error is code = Unavailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow the guide in https://etcd.io/docs/v3.5/tutorials/how-to-deal-with-membership/ to remove member. Which saying is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run into this issue in my local one time.
etcd/server/etcdserver/server.go
Line 1115 in dfdffe4
go s.stopWithDelay(10*100*time.Millisecond, fmt.Errorf("the member has been permanently removed from the cluster")) |
1 second might good enough to return response. never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client selects that member which is going to be removed, it might return error because that server stops.
Yes, actually there are two "issues" (one real issue, one improvement) in the test cases:
- [Issue] The client might connect to the member which will be removed, so the client might get an error response.
[Improvement] We should use client sdk instead of etcdctl to communicate with etcdserver directly. Otherwise when the client failed, it can only see etcdctl 's log instead of etcdserver's log
Actually I already updated the # 1 above. Let me deliver a PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. add "EXPECT_DEBUG=true" can get the detailed etcdserver's log,
EXPECT_DEBUG=true go test -run TestMemberReplaceMultiple -v
Exactly, thanks for the quick investigation. Let me fix it together with the PR mentioned in #17100 (comment) |
Resolved in #17119. Thanks @ZhouJianMS and @fuweid |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.