Implement single node downgrades #13405

serathius · 2021-10-08T10:58:34Z

This PR implement single downgrades as proposed in https://docs.google.com/document/d/1yD0GDkxqWBPAax6jLZ97clwAz2Gp0Gux6xaTrtJ6wHE/edit?usp=sharing with the goal of introducing e2e tests that can confirm that storage versioning properly validates WAL entries during downgrade.

This doesn't mean that with this PR etcd supports downgrades, there are still a lot of testing, small problems that we need to fix before we can say that downgrades are safe. This is meant to allow us to expand testing of downgrades with different scenarios to confirm it's reliability.

Problem detected during implementation that will need to be fixed:

As Etcd v3.5 imminently panics on SetClusterVersion with version set "3.6" entry in WAL I added it to MinEtcdVersion logic. We should consider adding logic to etcdutl migrate to drop this entry.
Sometimes setting ClusterVersion after upgrade timeouts, we should debug why.
I didn't implement snapshoting WAL after lowering cluster version (required to remove non-backward compatible entries). To work around I use very low snapshot count in tests.

cc @ptabor @lilic

server/etcdserver/api/membership/cluster.go

tests/e2e/cluster_downgrade_test.go

server/etcdserver/version/monitor.go

ptabor · 2021-10-14T12:21:56Z

server/storage/storage.go

+		walsnap.Term = sn.Metadata.Term
+		walsnap.ConfState = &sn.Metadata.ConfState
+	}
+	w, err := st.w.Reopen(st.lg, walsnap)


Why this call needs to reopen 'w' while the other calls keep working on the same WAL ?

It's counter intuitive that a 'getter like method' is performing mutations.

Fixed getter.

This is tricky so let me know what would be simplest way to implement it. Based on documentation in comments, WAL can be either in read or write mode, it starts in read mode and when all entries are read it switches back to write mode. Problem is that during etcd runtime, WAL is in write mode, but to verify possibility of downgrades we need to switch it back to read mode.

What I did here is basically lock access to WAL, reopen it from last snapshot and read all entries to make it writable back again. Please let me know if there is a better way to reread entries in WAL.

This seems to be a simillar problem to this method:

etcd/server/storage/wal/wal.go

Line 621 in ef1f71a

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {

?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

I agree with that and I was already experimenting with when working on static analysis of WAL annotation. However I would definitely want to keep this PR focused on downgrades and do this refactor as a separate PR.

ptabor

It looks good to me. Thank you.
A few clarification questions in the comments.

We should stabilize tests (as the situation looks worse then usually) before submitting such logical changes.

Please also modify PR description as its not only about the tests.

serathius · 2021-10-19T14:30:02Z

I found a deadlock in current downgrade implementation, fixed it so the tests should pass.

ptabor · 2021-10-21T15:26:34Z

server/storage/storage.go

+		walsnap.Term = sn.Metadata.Term
+		walsnap.ConfState = &sn.Metadata.ConfState
+	}
+	w, err := st.w.Reopen(st.lg, walsnap)


This seems to be a simillar problem to this method:

etcd/server/storage/wal/wal.go

Line 621 in ef1f71a

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {

?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

server/etcdserver/version/monitor.go

hexfusion · 2021-10-21T15:43:12Z

Could I please have the weekend to review this before it merges? It looks great in general I just have not had the time to look through it completely. Thanks again for the hard work.

ptabor · 2021-10-22T09:26:14Z

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change
https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true
or independent...

The test fails with:

...
    ordering_util_test.go:77: While speaking to partitioned leader, we should get ErrNoGreaterRev error
...

serathius · 2021-10-26T11:37:11Z

@hexfusion Did you have time to take a look?

hexfusion

One question otherwise lgtm

server/etcdserver/version/monitor.go

Problem with old code was that during downgrade only members with downgrade target version were allowed to join. This is unrealistic as it doesn't handle any members to disconnect/rejoin.

…her version This is because etcd v3.5 will panic when it encounters ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to work we need to make sure this entry is snapshotted.

By validating if WAL doesn't include any incompatible entries we can implement storage downgrades.

serathius · 2021-10-29T10:55:10Z

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true or independent...

Run test alone 10 times without any failures. Don't think there is correlation, but maybe its also correlated with other test parameters (parallel execution with --cpu etcd)

serathius · 2021-10-29T12:11:14Z

Grpc failure looks like a flake

tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease --run TestLeaseWithRequireLeader -timeout=5m -tags cluster_proxy --race=true --cpu=4 --count 10
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     6.798s
tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease  -timeout=5m -tags cluster_proxy --race=true --cpu=4 
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     118.937s

ptabor · 2021-10-29T21:22:06Z

Thank you. Merging.

ahrtr · 2024-12-18T14:56:52Z

server/storage/schema/schema.go

+	if target.LessThan(current) {
+		minVersion := w.MinimalEtcdVersion()
+		if minVersion != nil && target.LessThan(*minVersion) {
+			return fmt.Errorf("cannot downgrade storage, WAL contains newer entries")


I see this error message in downgrade test occasionally. See example below,

{"level":"error","ts":"2024-12-18T09:53:14.334189Z","caller":"version/monitor.go:120","msg":"failed to update storage version","cluster-version":"3.5.0","error":"cannot downgrade storage, WAL contains newer entries","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded\n\tgo.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:120\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2296\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2477"}

The reason is it read the version "3.6.0" included in ClusterVersionSet request,

2 9 norm header:<ID:7587883512398433796 > cluster_version_set:<ver:"3.6.0" >

See link below,

etcd/server/storage/wal/version.go

Lines 115 to 120 in 0966b4d

if raftReq.ClusterVersionSet != nil {

ver, err := semver.NewVersion(raftReq.ClusterVersionSet.Ver)

if err != nil {

return err

}

err = visitor(msg.Descriptor().FullName(), ver)

When etcd applies the ClusterVersionSet request, it will forcibly create a snapshot, but they (setting cluster version, and creating snapshot) are NOT an atomic operation. If etcd happens to try to update the storage version in between, then you will see the error message. It won't do any harm, but it may cause confusion to users.

etcd/server/etcdserver/apply/apply.go

Lines 417 to 427 in 0966b4d

a.cluster.SetVersion(newVersion, api.UpdateCapability, shouldApplyV3)

// Force snapshot after cluster version downgrade.

if prevVersion != nil && newVersion.LessThan(*prevVersion) {

lg := a.lg

if lg != nil {

lg.Info("Cluster version downgrade detected, forcing snapshot",

zap.String("prev-cluster-version", prevVersion.String()),

zap.String("new-cluster-version", newVersion.String()),

)

}

a.snapshotServer.ForceSnapshot()

Thanks for debugging this. You are correct with the downgrade not being atomic, maybe we could make it a warning as the procedure is repeating and inform user that it might happen and we will try again.

ahrtr · 2025-01-23T20:42:20Z

server/storage/wal/version.go

+		if raftReq.ClusterVersionSet != nil {
+			ver, err = semver.NewVersion(raftReq.ClusterVersionSet.Ver)
+			if err != nil {
+				panic(err)
+			}
+		}


@serathius You only check ClusterVersionSet here, I am curious why we don't check other message type?

If you look below at the return function we not only check ClusterVersionSet we pick max with etcdVersionFromMessage(msg). The etcdVersionFromMessage checks etcd_version proto tags on WAL which is meant to be the main mechanism to check version, based on non-empty proto fields. Check for ClusterVersionSet is the only case that we also check contents of the proto field.

Unique handling of ClusterVersionSet comes from assumption for online downgrade that we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot. This is ok for online downgrade as we snapshot as part of the process.

thx for the clarification.

we shouldn't allow downgrading etcd if there was a period of higher version in WAL that was not covered by snapshot.

Please take a look at #19263, thx.

My suggestion would be to remove this code and consider adding a separate field to WAL for testing.

serathius force-pushed the downgrade-b branch from 9747e54 to 0507740 Compare October 8, 2021 11:10

serathius requested a review from ptabor October 8, 2021 12:31

serathius force-pushed the downgrade-b branch 5 times, most recently from 83c7227 to d3d264f Compare October 11, 2021 10:51

serathius mentioned this pull request Oct 11, 2021

Implement storage versioning #13168

Open

13 tasks

serathius force-pushed the downgrade-b branch 3 times, most recently from 0e958f3 to c2d1582 Compare October 11, 2021 14:30

serathius changed the title ~~Implement single node downgrades~~ Implement single node downgrades tests Oct 11, 2021

serathius force-pushed the downgrade-b branch 2 times, most recently from 7d59020 to 530ad01 Compare October 14, 2021 12:14

ptabor reviewed Oct 14, 2021

View reviewed changes

serathius changed the title ~~Implement single node downgrades tests~~ Implement single node downgrades Oct 14, 2021

serathius force-pushed the downgrade-b branch 2 times, most recently from 87874b8 to ad31f7b Compare October 15, 2021 14:34

serathius force-pushed the downgrade-b branch 8 times, most recently from 307ff26 to 7a5e622 Compare October 21, 2021 15:29

ptabor approved these changes Oct 21, 2021

View reviewed changes

serathius force-pushed the downgrade-b branch from 7a5e622 to 530da33 Compare October 21, 2021 15:59

hexfusion approved these changes Oct 27, 2021

View reviewed changes

server/etcdserver/version/monitor.go Show resolved Hide resolved

serathius added 6 commits October 29, 2021 12:47

server: Depend only on cluster version to detect downgrade

758fc0f

Problem with old code was that during downgrade only members with downgrade target version were allowed to join. This is unrealistic as it doesn't handle any members to disconnect/rejoin.

server: Detect when WAL includes unapplied cluster version set to hig…

f5d71fa

…her version This is because etcd v3.5 will panic when it encounters ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to work we need to make sure this entry is snapshotted.

server: Use server version to decide if to downgrade has finished

335dc98

server: Implement storage downgrades

431adc5

By validating if WAL doesn't include any incompatible entries we can implement storage downgrades.

tests: Add e2e tests for downgrades

6c2be08

server: Remove lock from adapter to avoid deadlock

9d47a97

serathius force-pushed the downgrade-b branch from 530da33 to 9d47a97 Compare October 29, 2021 10:55

ptabor merged commit 6c2f5dc into etcd-io:main Oct 29, 2021

serathius deleted the downgrade-b branch June 15, 2023 20:39

siyuanfoundation mentioned this pull request May 3, 2024

[3.5] Backport cluster downgrade test. #17931

Merged

ahrtr reviewed Dec 18, 2024

View reviewed changes

ahrtr mentioned this pull request Dec 18, 2024

Add more info in the error message in downgrade detection #19081

Merged

ahrtr mentioned this pull request Jan 20, 2025

Create a v2 snapshot when running etcdutl migrate command #19168

Closed

ahrtr reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement single node downgrades #13405

Implement single node downgrades #13405

serathius commented Oct 8, 2021

ptabor Oct 14, 2021

serathius Oct 19, 2021

ptabor Oct 21, 2021

serathius Oct 29, 2021

ptabor left a comment

serathius commented Oct 19, 2021

ptabor Oct 21, 2021

hexfusion commented Oct 21, 2021

ptabor commented Oct 22, 2021

serathius commented Oct 26, 2021

hexfusion left a comment

serathius commented Oct 29, 2021

serathius commented Oct 29, 2021

ptabor commented Oct 29, 2021

ahrtr Dec 18, 2024 •

edited

Loading

serathius Dec 19, 2024 •

edited

Loading

ahrtr Jan 23, 2025

serathius Jan 24, 2025

ahrtr Jan 24, 2025

serathius Jan 24, 2025

	if raftReq.ClusterVersionSet != nil {
	ver, err := semver.NewVersion(raftReq.ClusterVersionSet.Ver)
	if err != nil {
	return err
	}
	err = visitor(msg.Descriptor().FullName(), ver)

	a.cluster.SetVersion(newVersion, api.UpdateCapability, shouldApplyV3)
	// Force snapshot after cluster version downgrade.
	if prevVersion != nil && newVersion.LessThan(*prevVersion) {
	lg := a.lg
	if lg != nil {
	lg.Info("Cluster version downgrade detected, forcing snapshot",
	zap.String("prev-cluster-version", prevVersion.String()),
	zap.String("new-cluster-version", newVersion.String()),
	)
	}
	a.snapshotServer.ForceSnapshot()

Implement single node downgrades #13405

Implement single node downgrades #13405

Conversation

serathius commented Oct 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptabor left a comment

Choose a reason for hiding this comment

serathius commented Oct 19, 2021

Choose a reason for hiding this comment

hexfusion commented Oct 21, 2021

ptabor commented Oct 22, 2021

serathius commented Oct 26, 2021

hexfusion left a comment

Choose a reason for hiding this comment

serathius commented Oct 29, 2021

serathius commented Oct 29, 2021

ptabor commented Oct 29, 2021

ahrtr Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

serathius Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr Dec 18, 2024 •

edited

Loading

serathius Dec 19, 2024 •

edited

Loading