Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add member replace e2e test to repro issue 17052 #17100

Closed
wants to merge 2 commits into from

Conversation

ZhouJianMS
Copy link
Contributor

@k8s-ci-robot
Copy link

Hi @ZhouJianMS. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: ZhouJianMS <[email protected]>
Signed-off-by: ZhouJianMS <[email protected]>
@ahrtr
Copy link
Member

ahrtr commented Dec 13, 2023

I reproduced two issues based on this PR,

@fuweid
Copy link
Member

fuweid commented Dec 14, 2023

When I use taskset to use few cores during testing, the test run into the etcdserver: Peer URLs already exists. I think it might be related to txReadBuffer. That cache wasn't be updated. The batchTxBuffer only updates cache for Put. I think we can use tx.Commit during removing member.

time.Sleep(etcdserver.HealthInterval)

t.Logf("Removing member %s", memberName)
_, err = c.MemberRemove(ctx, memberID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client selects that member which is going to be removed, it might return error because that server stops.

client won't retry if the error is code = Unavailable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the guide in https://etcd.io/docs/v3.5/tutorials/how-to-deal-with-membership/ to remove member. Which saying is correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run into this issue in my local one time.

go s.stopWithDelay(10*100*time.Millisecond, fmt.Errorf("the member has been permanently removed from the cluster"))

1 second might good enough to return response. never mind.

Copy link
Member

@ahrtr ahrtr Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client selects that member which is going to be removed, it might return error because that server stops.

Yes, actually there are two "issues" (one real issue, one improvement) in the test cases:

  • [Issue] The client might connect to the member which will be removed, so the client might get an error response.
  • [Improvement] We should use client sdk instead of etcdctl to communicate with etcdserver directly. Otherwise when the client failed, it can only see etcdctl 's log instead of etcdserver's log

Actually I already updated the # 1 above. Let me deliver a PR to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. add "EXPECT_DEBUG=true" can get the detailed etcdserver's log,

EXPECT_DEBUG=true go test -run TestMemberReplaceMultiple -v

@ahrtr
Copy link
Member

ahrtr commented Dec 14, 2023

That cache wasn't be updated.

Exactly, thanks for the quick investigation. Let me fix it together with the PR mentioned in #17100 (comment)

@ahrtr
Copy link
Member

ahrtr commented Dec 14, 2023

Resolved in #17119.

Thanks @ZhouJianMS and @fuweid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants