-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement single node downgrades #13405
Conversation
83c7227
to
d3d264f
Compare
0e958f3
to
c2d1582
Compare
7d59020
to
530ad01
Compare
walsnap.Term = sn.Metadata.Term | ||
walsnap.ConfState = &sn.Metadata.ConfState | ||
} | ||
w, err := st.w.Reopen(st.lg, walsnap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this call needs to reopen 'w' while the other calls keep working on the same WAL ?
It's counter intuitive that a 'getter like method' is performing mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed getter.
This is tricky so let me know what would be simplest way to implement it. Based on documentation in comments, WAL can be either in read or write mode, it starts in read mode and when all entries are read it switches back to write mode. Problem is that during etcd runtime, WAL is in write mode, but to verify possibility of downgrades we need to switch it back to read mode.
What I did here is basically lock access to WAL, reopen it from last snapshot and read all entries to make it writable back again. Please let me know if there is a better way to reread entries in WAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a simillar problem to this method:
etcd/server/storage/wal/wal.go
Line 621 in ef1f71a
func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) { |
Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that and I was already experimenting with when working on static analysis of WAL annotation. However I would definitely want to keep this PR focused on downgrades and do this refactor as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Thank you.
A few clarification questions in the comments.
We should stabilize tests (as the situation looks worse then usually) before submitting such logical changes.
Please also modify PR description as its not only about the tests.
87874b8
to
ad31f7b
Compare
I found a deadlock in current downgrade implementation, fixed it so the tests should pass. |
307ff26
to
7a5e622
Compare
walsnap.Term = sn.Metadata.Term | ||
walsnap.ConfState = &sn.Metadata.ConfState | ||
} | ||
w, err := st.w.Reopen(st.lg, walsnap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a simillar problem to this method:
etcd/server/storage/wal/wal.go
Line 621 in ef1f71a
func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) { |
Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?
Could I please have the weekend to review this before it merges? It looks great in general I just have not had the time to look through it completely. Thanks again for the hard work. |
7a5e622
to
530da33
Compare
I wonder whether flakes of The test fails with:
|
@hexfusion Did you have time to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question otherwise lgtm
Problem with old code was that during downgrade only members with downgrade target version were allowed to join. This is unrealistic as it doesn't handle any members to disconnect/rejoin.
…her version This is because etcd v3.5 will panic when it encounters ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to work we need to make sure this entry is snapshotted.
By validating if WAL doesn't include any incompatible entries we can implement storage downgrades.
Run test alone 10 times without any failures. Don't think there is correlation, but maybe its also correlated with other test parameters (parallel execution with --cpu etcd) |
530da33
to
9d47a97
Compare
Grpc failure looks like a flake tests $ go test go.etcd.io/etcd/tests/v3/integration/clientv3/lease --run TestLeaseWithRequireLeader -timeout=5m -tags cluster_proxy --race=true --cpu=4 --count 10
ok go.etcd.io/etcd/tests/v3/integration/clientv3/lease 6.798s
tests $ go test go.etcd.io/etcd/tests/v3/integration/clientv3/lease -timeout=5m -tags cluster_proxy --race=true --cpu=4
ok go.etcd.io/etcd/tests/v3/integration/clientv3/lease 118.937s |
Thank you. Merging. |
if target.LessThan(current) { | ||
minVersion := w.MinimalEtcdVersion() | ||
if minVersion != nil && target.LessThan(*minVersion) { | ||
return fmt.Errorf("cannot downgrade storage, WAL contains newer entries") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this error message in downgrade test occasionally. See example below,
{"level":"error","ts":"2024-12-18T09:53:14.334189Z","caller":"version/monitor.go:120","msg":"failed to update storage version","cluster-version":"3.5.0","error":"cannot downgrade storage, WAL contains newer entries","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded\n\tgo.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:120\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2296\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2477"}
The reason is it read the version "3.6.0" included in ClusterVersionSet
request,
2 9 norm header:<ID:7587883512398433796 > cluster_version_set:<ver:"3.6.0" >
See link below,
etcd/server/storage/wal/version.go
Lines 115 to 120 in 0966b4d
if raftReq.ClusterVersionSet != nil { | |
ver, err := semver.NewVersion(raftReq.ClusterVersionSet.Ver) | |
if err != nil { | |
return err | |
} | |
err = visitor(msg.Descriptor().FullName(), ver) |
When etcd applies the ClusterVersionSet
request, it will forcibly create a snapshot, but they (setting cluster version, and creating snapshot) are NOT an atomic operation. If etcd happens to try to update the storage version in between, then you will see the error message. It won't do any harm, but it may cause confusion to users.
etcd/server/etcdserver/apply/apply.go
Lines 417 to 427 in 0966b4d
a.cluster.SetVersion(newVersion, api.UpdateCapability, shouldApplyV3) | |
// Force snapshot after cluster version downgrade. | |
if prevVersion != nil && newVersion.LessThan(*prevVersion) { | |
lg := a.lg | |
if lg != nil { | |
lg.Info("Cluster version downgrade detected, forcing snapshot", | |
zap.String("prev-cluster-version", prevVersion.String()), | |
zap.String("new-cluster-version", newVersion.String()), | |
) | |
} | |
a.snapshotServer.ForceSnapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for debugging this. You are correct with the downgrade not being atomic, maybe we could make it a warning as the procedure is repeating and inform user that it might happen and we will try again.
if raftReq.ClusterVersionSet != nil { | ||
ver, err = semver.NewVersion(raftReq.ClusterVersionSet.Ver) | ||
if err != nil { | ||
panic(err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serathius You only check ClusterVersionSet
here, I am curious why we don't check other message type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look below at the return function we not only check ClusterVersionSet
we pick max with etcdVersionFromMessage(msg)
. The etcdVersionFromMessage
checks etcd_version proto tags on WAL which is meant to be the main mechanism to check version, based on non-empty proto fields. Check for ClusterVersionSet
is the only case that we also check contents of the proto field.
Unique handling of ClusterVersionSet
comes from assumption for online downgrade that we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot. This is ok for online downgrade as we snapshot as part of the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for the clarification.
we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot.
Please take a look at #19263, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to remove this code and consider adding a separate field to WAL for testing.
This PR implement single downgrades as proposed in https://docs.google.com/document/d/1yD0GDkxqWBPAax6jLZ97clwAz2Gp0Gux6xaTrtJ6wHE/edit?usp=sharing with the goal of introducing e2e tests that can confirm that storage versioning properly validates WAL entries during downgrade.
This doesn't mean that with this PR etcd supports downgrades, there are still a lot of testing, small problems that we need to fix before we can say that downgrades are safe. This is meant to allow us to expand testing of downgrades with different scenarios to confirm it's reliability.
Problem detected during implementation that will need to be fixed:
etcdutl migrate
to drop this entry.cc @ptabor @lilic