Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement single node downgrades #13405

Merged
merged 6 commits into from
Oct 29, 2021
Merged

Implement single node downgrades #13405

merged 6 commits into from
Oct 29, 2021

Conversation

serathius
Copy link
Member

This PR implement single downgrades as proposed in https://docs.google.com/document/d/1yD0GDkxqWBPAax6jLZ97clwAz2Gp0Gux6xaTrtJ6wHE/edit?usp=sharing with the goal of introducing e2e tests that can confirm that storage versioning properly validates WAL entries during downgrade.

This doesn't mean that with this PR etcd supports downgrades, there are still a lot of testing, small problems that we need to fix before we can say that downgrades are safe. This is meant to allow us to expand testing of downgrades with different scenarios to confirm it's reliability.

Problem detected during implementation that will need to be fixed:

  • As Etcd v3.5 imminently panics on SetClusterVersion with version set "3.6" entry in WAL I added it to MinEtcdVersion logic. We should consider adding logic to etcdutl migrate to drop this entry.
  • Sometimes setting ClusterVersion after upgrade timeouts, we should debug why.
  • I didn't implement snapshoting WAL after lowering cluster version (required to remove non-backward compatible entries). To work around I use very low snapshot count in tests.

cc @ptabor @lilic

@serathius serathius requested a review from ptabor October 8, 2021 12:31
@serathius serathius force-pushed the downgrade-b branch 5 times, most recently from 83c7227 to d3d264f Compare October 11, 2021 10:51
@serathius serathius mentioned this pull request Oct 11, 2021
13 tasks
@serathius serathius force-pushed the downgrade-b branch 3 times, most recently from 0e958f3 to c2d1582 Compare October 11, 2021 14:30
@serathius serathius changed the title Implement single node downgrades Implement single node downgrades tests Oct 11, 2021
@serathius serathius force-pushed the downgrade-b branch 2 times, most recently from 7d59020 to 530ad01 Compare October 14, 2021 12:14
server/etcdserver/api/membership/cluster.go Show resolved Hide resolved
tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved
server/etcdserver/version/monitor.go Show resolved Hide resolved
walsnap.Term = sn.Metadata.Term
walsnap.ConfState = &sn.Metadata.ConfState
}
w, err := st.w.Reopen(st.lg, walsnap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this call needs to reopen 'w' while the other calls keep working on the same WAL ?

It's counter intuitive that a 'getter like method' is performing mutations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed getter.

This is tricky so let me know what would be simplest way to implement it. Based on documentation in comments, WAL can be either in read or write mode, it starts in read mode and when all entries are read it switches back to write mode. Problem is that during etcd runtime, WAL is in write mode, but to verify possibility of downgrades we need to switch it back to read mode.

What I did here is basically lock access to WAL, reopen it from last snapshot and read all entries to make it writable back again. Please let me know if there is a better way to reread entries in WAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a simillar problem to this method:

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {
?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that and I was already experimenting with when working on static analysis of WAL annotation. However I would definitely want to keep this PR focused on downgrades and do this refactor as a separate PR.

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Thank you.
A few clarification questions in the comments.

We should stabilize tests (as the situation looks worse then usually) before submitting such logical changes.

Please also modify PR description as its not only about the tests.

@serathius serathius changed the title Implement single node downgrades tests Implement single node downgrades Oct 14, 2021
@serathius serathius force-pushed the downgrade-b branch 2 times, most recently from 87874b8 to ad31f7b Compare October 15, 2021 14:34
@serathius
Copy link
Member Author

I found a deadlock in current downgrade implementation, fixed it so the tests should pass.

@serathius serathius force-pushed the downgrade-b branch 8 times, most recently from 307ff26 to 7a5e622 Compare October 21, 2021 15:29
walsnap.Term = sn.Metadata.Term
walsnap.ConfState = &sn.Metadata.ConfState
}
w, err := st.w.Reopen(st.lg, walsnap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a simillar problem to this method:

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {
?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

server/etcdserver/version/monitor.go Show resolved Hide resolved
@hexfusion
Copy link
Contributor

Could I please have the weekend to review this before it merges? It looks great in general I just have not had the time to look through it completely. Thanks again for the hard work.

@ptabor
Copy link
Contributor

ptabor commented Oct 22, 2021

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change
https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true
or independent...

The test fails with:

...
    ordering_util_test.go:77: While speaking to partitioned leader, we should get ErrNoGreaterRev error
...

@serathius
Copy link
Member Author

@hexfusion Did you have time to take a look?

Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise lgtm

server/etcdserver/version/monitor.go Show resolved Hide resolved
Problem with old code was that during downgrade only members with
downgrade target version were allowed to join. This is unrealistic as
it doesn't handle any members to disconnect/rejoin.
…her version

This is because etcd v3.5 will panic when it encounters
ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to
work we need to make sure this entry is snapshotted.
By validating if WAL doesn't include any incompatible entries we can
implement storage downgrades.
@serathius
Copy link
Member Author

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true or independent...

Run test alone 10 times without any failures. Don't think there is correlation, but maybe its also correlated with other test parameters (parallel execution with --cpu etcd)

@serathius
Copy link
Member Author

Grpc failure looks like a flake

tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease --run TestLeaseWithRequireLeader -timeout=5m -tags cluster_proxy --race=true --cpu=4 --count 10
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     6.798s
tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease  -timeout=5m -tags cluster_proxy --race=true --cpu=4 
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     118.937s

@ptabor
Copy link
Contributor

ptabor commented Oct 29, 2021

Thank you. Merging.

@ptabor ptabor merged commit 6c2f5dc into etcd-io:main Oct 29, 2021
@serathius serathius deleted the downgrade-b branch June 15, 2023 20:39
if target.LessThan(current) {
minVersion := w.MinimalEtcdVersion()
if minVersion != nil && target.LessThan(*minVersion) {
return fmt.Errorf("cannot downgrade storage, WAL contains newer entries")
Copy link
Member

@ahrtr ahrtr Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this error message in downgrade test occasionally. See example below,

{"level":"error","ts":"2024-12-18T09:53:14.334189Z","caller":"version/monitor.go:120","msg":"failed to update storage version","cluster-version":"3.5.0","error":"cannot downgrade storage, WAL contains newer entries","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded\n\tgo.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:120\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2296\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2477"}

The reason is it read the version "3.6.0" included in ClusterVersionSet request,

2	         9	norm	header:<ID:7587883512398433796 > cluster_version_set:<ver:"3.6.0" >

See link below,

if raftReq.ClusterVersionSet != nil {
ver, err := semver.NewVersion(raftReq.ClusterVersionSet.Ver)
if err != nil {
return err
}
err = visitor(msg.Descriptor().FullName(), ver)

When etcd applies the ClusterVersionSet request, it will forcibly create a snapshot, but they (setting cluster version, and creating snapshot) are NOT an atomic operation. If etcd happens to try to update the storage version in between, then you will see the error message. It won't do any harm, but it may cause confusion to users.

a.cluster.SetVersion(newVersion, api.UpdateCapability, shouldApplyV3)
// Force snapshot after cluster version downgrade.
if prevVersion != nil && newVersion.LessThan(*prevVersion) {
lg := a.lg
if lg != nil {
lg.Info("Cluster version downgrade detected, forcing snapshot",
zap.String("prev-cluster-version", prevVersion.String()),
zap.String("new-cluster-version", newVersion.String()),
)
}
a.snapshotServer.ForceSnapshot()

Copy link
Member Author

@serathius serathius Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for debugging this. You are correct with the downgrade not being atomic, maybe we could make it a warning as the procedure is repeating and inform user that it might happen and we will try again.

Comment on lines +78 to +83
if raftReq.ClusterVersionSet != nil {
ver, err = semver.NewVersion(raftReq.ClusterVersionSet.Ver)
if err != nil {
panic(err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serathius You only check ClusterVersionSet here, I am curious why we don't check other message type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look below at the return function we not only check ClusterVersionSet we pick max with etcdVersionFromMessage(msg). The etcdVersionFromMessage checks etcd_version proto tags on WAL which is meant to be the main mechanism to check version, based on non-empty proto fields. Check for ClusterVersionSet is the only case that we also check contents of the proto field.

Unique handling of ClusterVersionSet comes from assumption for online downgrade that we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot. This is ok for online downgrade as we snapshot as part of the process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the clarification.

we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot.

Please take a look at #19263, thx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to remove this code and consider adding a separate field to WAL for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants