The suggested measurement time is somewhat too long #351

Fullstop000 · 2019-11-01T03:31:15Z

I tried to benching a system that has several threads communicating by channels. And after I started iter_batched, the Criterion givens me some info:

Benchmarking RawNode::cluster/1: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 5265.0s or reduce sample count to 10
Benchmarking RawNode::cluster/1: Collecting 100 samples in estimated 5265.0 s (5050 iterations)

The text was updated successfully, but these errors were encountered:

bheisler · 2019-11-03T16:23:47Z

Hey, thanks for trying Criterion.rs!

I'm not sure what the issue is here? Your benchmark takes too long (approximately 1s per iteration), which makes it impossible to perform the minimum of 5050 iterations in the default 10 seconds. Criterion.rs is recommending that you reduce the sample count from the default of 100 to the minimum of 10 (which would result in 55 iterations for a benchmark time of approximately 60s).
You could also reduce the amount of work done in the benchmark to make it more amenable to statistical benchmarking.

Criterion.rs currently doesn't handle long-running benchmarks very well - see #320.

Fullstop000 · 2019-11-09T12:48:19Z

How does Criterion.rs calculate the estimated benching time ?

bheisler · 2019-11-09T18:02:06Z

The estimate is calculated from the warmup period. Criterion.rs looks at the number of iterations that were completed during the warmup and uses that to estimate how long each iteration took. From there, it's a simple multiplication to estimate how long the benchmark will take.

Fullstop000 · 2019-11-15T02:58:17Z

In tikv/raft-rs#315, I try to bench a channel-based cluster with Criterion.rs. It seems every iter cost almost 20ms+ (with the smallest throughput). But the benching process hangs after running a while ( about 10minutes) and the CPU is up to 400% in my Macbook. Do you have any suggestions for this scenario? @bheisler

bheisler · 2019-11-30T17:22:08Z

Hey, thanks for your patience. I've taken a look at your benchmark and I think I can answer your question now.

This comes down to a minor aspect of how Criterion.rs does the warmup phase. In the warm-up phase, the time to execute the outer closure is included in the timing, where it is not included in the actual measurements. Your outer closure includes a call to thread::sleep to sleep for one second, presumably waiting for something to get ready on another thread. When Criterion.rs performs the warmup with a target time of 500ms, it performs one "iteration" (including the 1-second sleep time) and sees that more than 500ms have elapsed so the warmup stops. It then erroneously calculates that each iteration takes roughly a second.

I will adjust this in the next version to make it measure time correctly during the warm-up period. In the meantime, I would recommend a few changes to your benchmarks:

If at all possible, start and wait for the cluster just inside the for loop rather than inside the outer closure. The outer closure can be called many times during a benchmark, and if you have to wait for a full second each time your benchmarking process is still going to be painful.
Try not to use thread::sleep to synchronize things in your benchmarks. Using a Condvar would allow you to block just as long as necessary for the cluster to be started.

Note for self:

Have the Bencher track the elapsed time around the measurement loop, outside of the actual measurements. That will exclude the setup time.
This change could increase the time to execute a benchmark suite in ways that users don't expect. It might be a good idea to warn about that, my measuring the difference in the time it took to perform the warmup (which includes the setup time) and the time measured by the Bencher (which excludes the setup time). If the two are significantly different, the benchmark is probably doing a bunch of work in the setup. On the other hand, it's not clear that such a warning would be actionable so maybe it's not worth it.

bheisler added this to the Version 0.3.1 milestone Nov 30, 2019

bheisler closed this as completed in 76061c7 Jan 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The suggested measurement time is somewhat too long #351

The suggested measurement time is somewhat too long #351

Fullstop000 commented Nov 1, 2019

bheisler commented Nov 3, 2019

Fullstop000 commented Nov 9, 2019

bheisler commented Nov 9, 2019 •

edited

Loading

Fullstop000 commented Nov 15, 2019

bheisler commented Nov 30, 2019

The suggested measurement time is somewhat too long #351

The suggested measurement time is somewhat too long #351

Comments

Fullstop000 commented Nov 1, 2019

bheisler commented Nov 3, 2019

Fullstop000 commented Nov 9, 2019

bheisler commented Nov 9, 2019 • edited Loading

Fullstop000 commented Nov 15, 2019

bheisler commented Nov 30, 2019

bheisler commented Nov 9, 2019 •

edited

Loading