`/members` peer handler is not guarded with linearizable check #16687

chaochn47 · 2023-10-03T17:07:09Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via [email protected].
This is not a support request, support requests should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

Restart etcd server right after member reconfiguration and query the member list via HTTP /members, its handler will bypass the linearizable check. The member list response would be stale during bootstrap (restart) where the members are restored from v2 store and WAL is still replaying...

What did you expect to happen?

When peer certificate is not used, /members should be protected by linearizable check.

How can we reproduce it (as minimally and precisely as possible)?

checkout code in https://github.com/chaochn47/etcd/tree/v3.4.20-eks.0-reproduce-member-name-mismatch
cd integration && go test -v -run TestReproduceMemberNameMismatch

Anything else we need to know?

Relevant to discussion

Restart etcd server right after member reconfiguration and query the member list via HTTP /members, its handler will bypass the linearizable check.

Actually the main also has this "issue", but this might not be a problem. Two reasons:
* The `/members` is only supposed to accessed during bootstrap by a new etcd member who has just been added to the cluster;

* The `/members` is protected by peer certificate; in other words, users shouldn't be able to access this endpoint at all.
EDIT: but I still think it may be better to guard the /members by linearizable check. Please raise a separate issue to track it, and we can discuss it separately.

#16666 (comment)

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

chaochn47 · 2023-10-03T17:07:40Z

/cc @ahrtr

serathius · 2023-10-03T19:12:36Z

QQ, why we have a non-grpc endpoint for listing members? Can someone do some archaeology because for me it seems like legacy from v2 API.

chaochn47 · 2023-10-04T02:13:03Z

From what I can tell, it is used in runtime reconfiguration (adding a new member).

etcd/server/etcdserver/bootstrap.go

Line 289 in 0f4d7a7

    
           existingCluster, gerr := GetClusterFromRemotePeers(cfg.Logger, getRemotePeerURLs(cl, cfg.Name), prt)

When the new member process starts, it needs to verify the existing cluster ID matches the local --initial-cluster configuration, assign ID to this new member and ID is from the remote, validate the max learners does not exceed the configuration... etc.

serathius · 2023-10-04T08:30:09Z

This looks like code used to bootstrap member joining a existing cluster. For example when increasing cluster size. From brief look, call to members is used to get mapping between member names and ids.

I would need to double check how fetching of those ids are later synchronized with raft, but I suspect that getting the freshest data might not be necessary as long as raft is properly initiated and replayed.

As this is internal endpoint our only worry should be whether this stale data here could impact member bootstrap. We should be very careful when adding a linearizability requirement here. It could bring more harm than benefit. First I would want to see that we are able to find and reproduce issue with etcd member bootstrap that could be caused by stale data returned on /members endpoint.

chaochn47 · 2023-10-04T19:17:21Z

It could bring more harm than benefit.

Agree. Here is a historical issue I opened #14174 with reproduce. The conclusion is #14174 (comment)

Discussed in #14175, the feasible solution in short term is

* Add retry in client side to ensure the membership reconfiguration applied to all members.

* Wait until retry succeeds, start new member.

chaochn47 added the type/bug label Oct 3, 2023

chaochn47 changed the title ~~Guard /members peer handler with linearizable check~~ /members peer handler is not guarded with linearizable check Oct 3, 2023

chaochn47 self-assigned this Oct 3, 2023

serathius mentioned this issue Oct 4, 2023

--experimental-wait-cluster-ready-timeout causing stale response to linearizable read #16666

Closed

chaochn47 removed their assignment Oct 4, 2023

chaochn47 added the stage/tracked label Oct 4, 2023

This was referenced Dec 13, 2023

Draft: Add member replace e2e test to repro issue 17052 #17100

Closed

Commit transaction for each configuration change #17119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/members` peer handler is not guarded with linearizable check #16687

`/members` peer handler is not guarded with linearizable check #16687

chaochn47 commented Oct 3, 2023

paste your configuration here

chaochn47 commented Oct 3, 2023

serathius commented Oct 3, 2023

chaochn47 commented Oct 4, 2023 •

edited

Loading

serathius commented Oct 4, 2023

chaochn47 commented Oct 4, 2023 •

edited

Loading

/members peer handler is not guarded with linearizable check #16687

/members peer handler is not guarded with linearizable check #16687

Comments

chaochn47 commented Oct 3, 2023

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

chaochn47 commented Oct 3, 2023

serathius commented Oct 3, 2023

chaochn47 commented Oct 4, 2023 • edited Loading

serathius commented Oct 4, 2023

chaochn47 commented Oct 4, 2023 • edited Loading

`/members` peer handler is not guarded with linearizable check #16687

`/members` peer handler is not guarded with linearizable check #16687

chaochn47 commented Oct 4, 2023 •

edited

Loading

chaochn47 commented Oct 4, 2023 •

edited

Loading