etcdserver: change the snapshot + compact into sync operation #18283

clement2026 · 2024-07-04T17:49:40Z

@serathius @ahrtr
Per the suggestion in (#18235 (comment)), I have changed the snapshot and compact operations to a synchronous process for simplification.

As a first step, I just removed s.GoAttach(func() {}).

I will add benchmark results once all tests pass.

Signed-off-by: Clement <[email protected]>

k8s-ci-robot · 2024-07-04T17:49:50Z

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

henrybear327 · 2024-07-04T19:54:31Z

/ok-to-test

clement2026 · 2024-07-05T03:46:55Z

Here is the benchmark results:

Reading performance change: -1.67% ~ 27.31%
Writing performance change: -1.12% ~ 28.64%

Performance drops when the value size hits 2^13. I will perform CPU profiling for scenarios with large value sizes.

main.csv
patch.csv

The benchmarks were conducted on a cloud VM with 8 vCPUs and 16 GB of memory using the following script:

export RATIO_LIST="4/1"
export REPEAT_COUNT=3
export RUN_COUNT=50000
echo RATIO_LIST=$RATIO_LIST
echo REPEAT_COUNT=$REPEAT_COUNT
echo RUN_COUNT=$RUN_COUNT
date; cd ~/etcd-sync/tools/rw-heatmaps && ./rw-benchmark.sh && cd ~/etcd/tools/rw-heatmaps &&  sleep 30 &&./rw-benchmark.sh; date

According to the log, the task started at Thu Jul 4, 2024, 07:29:48 PM UTC and finished at Thu Jul 4, 2024, 11:43:24 PM UTC, taking a total of 4 hours and 13 minutes.
nohup.out.txt

serathius · 2024-07-05T10:22:16Z

~~That doesn't look very good, making snapshot sync creates up to 30% regression. cc @ahrtr~~

Nevermind, incorrectly read it.

purr100 · 2024-07-05T10:37:03Z

That doesn't look very good, making snapshot sync creates up to 30% regression. cc @ahrtr

Actually, my observations show a performance improvement of up to 30%. The patch.csv shows higher throughput. @serathius Could you please recheck?😬

serathius · 2024-07-05T10:40:20Z

Actually, my observations show a performance improvement of up to 30%. The patch.csv shows higher throughput. @serathius Could you please recheck?😬

Oh yea, sorry I read it incorrectly. I just have been reading benchmark results which show average request duration, and not throughput so made me think more is worse.

purr100 · 2024-07-05T10:48:48Z

Oh yea, sorry I read it incorrectly. I just have been reading benchmark results which show average request duration, and not throughput so made me think more is worse.

lol. I got it. It happens.🤪

As the graph indicates a performance drop with larger value sizes, I am running the rw-benchmark.sh script with larger value sizes to verify this issue.

clement2026 · 2024-07-09T12:34:26Z

Summary

Here are the results of the 4 benchmarks performed using the rw-benchmark.sh script.

Value Size Range		Performance Change
256 B ~ 16 KB	read	-1.67% ~ 27.31%
	write	-1.12% ~ 28.64%
256 B ~ 16 KB	read	-0.67% ~ 30.40%
	write	-1.32% ~ 30.71%
256 B ~ 32 KB	read	3.68% ~ 33.13%
	write	3.00% ~ 34.37%
8 KB ~ 32 KB	read	0.11% ~ 20.38%
	write	0.97% ~ 21.13%

Details

Hardware

Test 1 was conducted on a cloud VM with 8 vCPUs and 16 GB RAM.
The remaining 3 tests were conducted on cloud VMs with 8 vCPUs and 32 GB RAM.

Script
All 4 tests use this script but differ in their VALUE_SIZE_POWER_RANGE variable.

export RATIO_LIST="4/1"
export REPEAT_COUNT=3
export RUN_COUNT=50000
./rw-benchmark.sh

Test 1

export VALUE_SIZE_POWER_RANGE="8 14"

main.csv patch.csv

Test 2

export VALUE_SIZE_POWER_RANGE="8 14"

main.csv patch.csv

Test 3

export VALUE_SIZE_POWER_RANGE="8 15"

main.csv patch.csv

Test 4

export VALUE_SIZE_POWER_RANGE="13 15"

main.csv patch.csv

clement2026 · 2024-07-09T14:17:12Z

I ran multiple CPU profiles with different value size and connection count. The results show that MVCC operations
like mvcc.(*keyIndex).get, mvcc.(*keyIndex).isEmpty, and mvcc.(*keyIndex).findGeneration significantly impact
total CPU time. Other functions worth noting are runtime.memmove, syscall.Syscall6, and cmpbody.

Since this patch tends to increase throughput, so higher CPU usage wasn’t surprising. These results didn't give me a
clear conclusion. To better understand the issue, I should have collected and compared CPU profile data when the patch
showed lower throughput. Unfortunately, I didn't record the throughput during the CPU profiles.

Anyway, I’m sharing these results here and would love to know what you think before I dig deeper.

CPU Time Usage

Connection Count	Value Size Range	Main	Patch	Change in CPU Time Usage	Files
32	16 KB	322.51s	331.32s	2.73%	main.pb.gz patch.pb.gz
32	32 KB	467.14s	460.28s	-1.47%	main.pb.gz patch.pb.gz
32	64 KB	596.94s	588.16s	-1.47%	main.pb.gz patch.pb.gz
1024	16 KB	319.78s	332.02s	3.83%	main.pb.gz patch.pb.gz
1024	32 KB	424.93s	435.31s	2.44%	main.pb.gz patch.pb.gz
1024	64 KB	544.16s	547.28s	0.57%	main.pb.gz patch.pb.gz

Script
run.sh.zip
All these tests use this script but with different VALUE_SIZE and CONN_CLI_COUNT

serathius · 2024-07-14T20:08:26Z

cc @ahrtr

ahrtr · 2024-07-15T09:30:37Z

Thanks @clement2026 for the test report. The throughput increase up to 30% is a little weird. Theoretically, the performance should be very close.

From implementation perspective, the only possible reason for the throughput increase I can think of could be due to the code snippet below, which won't be executed anymore in this PR. Could you double confirm this to help us have a better understanding? e.g. temporarily remove the code snippet on main branch, and then compare with this PR again.

etcd/server/etcdserver/server.go

Lines 2432 to 2440 in b433760

    
           s.wgMu.RLock() // this blocks with ongoing close(s.stopping) 
        
           defer s.wgMu.RUnlock() 
        
           select { 
        
           case <-s.stopping: 
        
           	lg := s.Logger() 
        
           	lg.Warn("server has stopped; skipping GoAttach") 
        
           	return 
        
           default: 
        
           }

The CPU usage is very close, which looks fine.

clement2026 · 2024-07-15T10:02:48Z

From implementation perspective, the only possible reason for the throughput increase I can think of could be due to the code snippet below, which won't be executed anymore in this PR. Could you double confirm this to help us have a better understanding? e.g. temporarily remove the code snippet on main branch, and then compare with this PR again.

@ahrtr The 30% increase is really puzzling for me too. Can't wait to do the comparison and see what we find out.

clement2026 · 2024-07-23T18:40:36Z

Summary

Please disregard the earlier benchmark results. They were incorrect. Here are the reliable ones. Each branch was tested multiple times, with main 01 as the baseline.

Branch		Performance Change
main 01	read	-
	write	-
main 02	read	[-5.38%, 6.66%]
	write	[-5.09%, 6.52%]
main 03	read	[-4.45%, 7.12%]
	write	[-3.82%, 7.20%]
patch 01	read	[-3.49%, 5.95%]
	write	[-4.78%, 6.40%]
patch 02	read	[-4.68%, 4.62%]
	write	[-5.07%, 6.42%]
remove-rwlock 01(based on #18283 (comment))	read	[-3.41%, 4.79%]
	write	[-3.87%, 5.34%]
remove-rwlock 02	read	[-5.34%, 4.81%]
	write	[-5.74%, 6.65%]

It seems this PR/patch doesn't show significant performance changes.

The benchmarks were conducted using the following script on a cloud VM with 8 vCPUs and 16 GB RAM.

export RATIO_LIST="4/1"
export REPEAT_COUNT=3
export RUN_COUNT=50000
date; cd ~/etcd/tools/rw-heatmaps && ./rw-benchmark.sh; date;

Details

@ahrtr You were right about the strange 30% increase. The 30% turns out to be wrong data from my faulty script:

date; cd ~/etcd-sync/tools/rw-heatmaps && ./rw-benchmark.sh && cd ~/etcd/tools/rw-heatmaps &&  sleep 30 &&./rw-benchmark.sh; date

When running this script to benchmark 2 branches, the second one always shows the 30% drop in performance. I’m not sure if it’s a machine issue, as I didn't see unusual I/O, swap, or CPU activity after each benchmark.

Anyway, I managed to get solid benchmark results by rebooting the machine after each run. Below are benchmark details. Machine was rebooted after each benchmark.

Test 1

Benchmark main branch for 3 times to ensure the results is reliable.

main-01.csv main-02.csv main-03.csv

Test 2

Benchmark this PR/patch twice

patch.csv patch-02.csv

Test 3

Benchmark #18283 (comment) twice. Code is here

remove-rwlock.csv remove-rwlock-02.csv

ahrtr

Thanks @clement2026 for the hard & nice work!

It seems this PR/patch doesn't show significant performance changes.

This seems reasonable, and it aligns with our understanding.

Almost all the heatmap diagrams have very similar color distribution, so it's very clear that they have very similar performance data.

A separate but related topic... I still think it's worthwhile to implement the line charts (see #15060) as another visualisation method, which is clearer for comparison when there is bigger performance difference. cc @ivanvc

ahrtr · 2024-07-23T19:18:55Z

cc @ivanvc @jmhbnz @serathius

ivanvc · 2024-07-23T20:55:00Z

A separate but related topic... I still think it's worthwhile to implement the line charts (see #15060) as another visualisation method, which is clearer for comparison when there is bigger performance difference. cc @ivanvc

Hey @ahrtr, I actually have a branch with this change, which I was working on some months ago. However, because of other tasks, I haven't been able to revisit it. I'll try to get back to this soon.

ivanvc

LGTM. Thanks, @clement2026.

serathius · 2024-07-24T08:42:30Z

Thanks @clement2026 for thorough investigation. Exemplary work!

When running this script to benchmark 2 branches, the second one always shows the 30% drop in performance. I’m not sure if it’s a machine issue, as I didn't see unusual I/O, swap, or CPU activity after each benchmark.

We should note this and keep in mind when doing any future performance testing. Would be worth figuring out how we can protect against such cases.

k8s-ci-robot · 2024-07-24T08:42:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, clement2026, ivanvc, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahrtr,serathius]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

etcdserver: change the snapshot + compact into sync operation

d820cd2

Signed-off-by: Clement <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress needs-ok-to-test labels Jul 4, 2024

k8s-ci-robot added the size/M label Jul 4, 2024

clement2026 marked this pull request as ready for review July 4, 2024 17:54

k8s-ci-robot removed the do-not-merge/work-in-progress label Jul 4, 2024

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Jul 4, 2024

clement2026 mentioned this pull request Jul 5, 2024

etcdserver: separate "raft log compact" from snapshot #18235

Closed

ahrtr approved these changes Jul 23, 2024

View reviewed changes

k8s-ci-robot added the approved label Jul 23, 2024

ivanvc approved these changes Jul 23, 2024

View reviewed changes

serathius approved these changes Jul 24, 2024

View reviewed changes

serathius merged commit 9a6c9ae into etcd-io:main Jul 24, 2024
51 checks passed

clement2026 deleted the change-snapshot-and-compact-into-sync-operation branch July 27, 2024 05:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: change the snapshot + compact into sync operation #18283

etcdserver: change the snapshot + compact into sync operation #18283

clement2026 commented Jul 4, 2024

k8s-ci-robot commented Jul 4, 2024

henrybear327 commented Jul 4, 2024

clement2026 commented Jul 5, 2024 •

edited

Loading

serathius commented Jul 5, 2024 •

edited

Loading

purr100 commented Jul 5, 2024

serathius commented Jul 5, 2024 •

edited

Loading

purr100 commented Jul 5, 2024 •

edited by serathius

Loading

clement2026 commented Jul 9, 2024

clement2026 commented Jul 9, 2024

serathius commented Jul 14, 2024

ahrtr commented Jul 15, 2024

clement2026 commented Jul 15, 2024

clement2026 commented Jul 23, 2024

ahrtr left a comment

ahrtr commented Jul 23, 2024

ivanvc commented Jul 23, 2024

ivanvc left a comment

serathius commented Jul 24, 2024

k8s-ci-robot commented Jul 24, 2024

etcdserver: change the snapshot + compact into sync operation #18283

etcdserver: change the snapshot + compact into sync operation #18283

Conversation

clement2026 commented Jul 4, 2024

k8s-ci-robot commented Jul 4, 2024

henrybear327 commented Jul 4, 2024

clement2026 commented Jul 5, 2024 • edited Loading

serathius commented Jul 5, 2024 • edited Loading

purr100 commented Jul 5, 2024

serathius commented Jul 5, 2024 • edited Loading

purr100 commented Jul 5, 2024 • edited by serathius Loading

clement2026 commented Jul 9, 2024

Summary

Details

Test 1

Test 2

Test 3

Test 4

clement2026 commented Jul 9, 2024

serathius commented Jul 14, 2024

ahrtr commented Jul 15, 2024

clement2026 commented Jul 15, 2024

clement2026 commented Jul 23, 2024

Summary

Details

Test 1

Test 2

Test 3

ahrtr left a comment

Choose a reason for hiding this comment

ahrtr commented Jul 23, 2024

ivanvc commented Jul 23, 2024

ivanvc left a comment

Choose a reason for hiding this comment

serathius commented Jul 24, 2024

k8s-ci-robot commented Jul 24, 2024

clement2026 commented Jul 5, 2024 •

edited

Loading

serathius commented Jul 5, 2024 •

edited

Loading

serathius commented Jul 5, 2024 •

edited

Loading

purr100 commented Jul 5, 2024 •

edited by serathius

Loading