Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The suggested measurement time is somewhat too long #351

Closed
Fullstop000 opened this issue Nov 1, 2019 · 5 comments
Closed

The suggested measurement time is somewhat too long #351

Fullstop000 opened this issue Nov 1, 2019 · 5 comments
Milestone

Comments

@Fullstop000
Copy link

I tried to benching a system that has several threads communicating by channels. And after I started iter_batched, the Criterion givens me some info:

Benchmarking RawNode::cluster/1: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 5265.0s or reduce sample count to 10
Benchmarking RawNode::cluster/1: Collecting 100 samples in estimated 5265.0 s (5050 iterations)
@bheisler
Copy link
Owner

bheisler commented Nov 3, 2019

Hey, thanks for trying Criterion.rs!

I'm not sure what the issue is here? Your benchmark takes too long (approximately 1s per iteration), which makes it impossible to perform the minimum of 5050 iterations in the default 10 seconds. Criterion.rs is recommending that you reduce the sample count from the default of 100 to the minimum of 10 (which would result in 55 iterations for a benchmark time of approximately 60s).
You could also reduce the amount of work done in the benchmark to make it more amenable to statistical benchmarking.

Criterion.rs currently doesn't handle long-running benchmarks very well - see #320.

@Fullstop000
Copy link
Author

How does Criterion.rs calculate the estimated benching time ?

@bheisler
Copy link
Owner

bheisler commented Nov 9, 2019

The estimate is calculated from the warmup period. Criterion.rs looks at the number of iterations that were completed during the warmup and uses that to estimate how long each iteration took. From there, it's a simple multiplication to estimate how long the benchmark will take.

@Fullstop000
Copy link
Author

In tikv/raft-rs#315, I try to bench a channel-based cluster with Criterion.rs. It seems every iter cost almost 20ms+ (with the smallest throughput). But the benching process hangs after running a while ( about 10minutes) and the CPU is up to 400% in my Macbook. Do you have any suggestions for this scenario? @bheisler

@bheisler
Copy link
Owner

Hey, thanks for your patience. I've taken a look at your benchmark and I think I can answer your question now.

This comes down to a minor aspect of how Criterion.rs does the warmup phase. In the warm-up phase, the time to execute the outer closure is included in the timing, where it is not included in the actual measurements. Your outer closure includes a call to thread::sleep to sleep for one second, presumably waiting for something to get ready on another thread. When Criterion.rs performs the warmup with a target time of 500ms, it performs one "iteration" (including the 1-second sleep time) and sees that more than 500ms have elapsed so the warmup stops. It then erroneously calculates that each iteration takes roughly a second.

I will adjust this in the next version to make it measure time correctly during the warm-up period. In the meantime, I would recommend a few changes to your benchmarks:

  • If at all possible, start and wait for the cluster just inside the for loop rather than inside the outer closure. The outer closure can be called many times during a benchmark, and if you have to wait for a full second each time your benchmarking process is still going to be painful.
  • Try not to use thread::sleep to synchronize things in your benchmarks. Using a Condvar would allow you to block just as long as necessary for the cluster to be started.

Note for self:

  • Have the Bencher track the elapsed time around the measurement loop, outside of the actual measurements. That will exclude the setup time.
  • This change could increase the time to execute a benchmark suite in ways that users don't expect. It might be a good idea to warn about that, my measuring the difference in the time it took to perform the warmup (which includes the setup time) and the time measured by the Bencher (which excludes the setup time). If the two are significantly different, the benchmark is probably doing a bunch of work in the setup. On the other hand, it's not clear that such a warning would be actionable so maybe it's not worth it.

@bheisler bheisler added this to the Version 0.3.1 milestone Nov 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants