Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing to wrk #617

Open
waghanza opened this issue Nov 30, 2024 · 5 comments
Open

Comparing to wrk #617

waghanza opened this issue Nov 30, 2024 · 5 comments

Comments

@waghanza
Copy link

Hi @hatoo,

I'm considering using oha for https://github.com/the-benchmarker/web-frameworks, but I have some questions.

Actually, we are using wrk. Using https://github.com/the-benchmarker/web-frameworks/blob/master/rust/actix/src/main.rs, I have some results with wrk

Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    36.11us   75.22us   3.78ms   97.44%
    Req/Sec   167.06k    11.08k  180.07k    92.08%
  3358383 requests in 10.10s, 240.21MB read
Requests/sec: 332518.67
Transfer/sec:     23.78MB

and with oha http://172.17.0.2:3000 -j --no-tui -z 10s -c 10 (which is the same options), I have

{
  "summary": {
    "successRate": 1.0,
    "total": 10.001275574,
    "slowest": 0.010046543,
    "fastest": 8.583e-6,
    "average": 0.00007257664067671008,
    "requestsPerSec": 136928.03381601934,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000008583": 1,
    "0.0010123789999999999": 1369180,
    "0.002016175": 47,
    "0.0030199709999999998": 25,
    "0.004023767": 5,
    "0.005027563": 134,
    "0.006031359": 55,
    "0.007035155": 0,
    "0.008038950999999999": 0,
    "0.009042746999999999": 1,
    "0.010046542999999998": 4
  },
  "latencyPercentiles": {
    "p10": 0.000031499,
    "p25": 0.000050375,
    "p50": 0.000072624,
    "p75": 0.00008625,
    "p90": 0.00009825,
    "p95": 0.000114624,
    "p99": 0.000212581,
    "p99.9": 0.000264623,
    "p99.99": 0.004690378
  },
  "rps": {
    "mean": 136906.43878477108,
    "stddev": 26479.44862149209,
    "max": 200460.94012580605,
    "min": 54256.50815324148,
    "percentiles": {
      "p10": 97178.95504571438,
      "p25": 131526.83147360053,
      "p50": 143429.84775131833,
      "p75": 152822.892869349,
      "p90": 161744.178760928,
      "p95": 167339.7097131157,
      "p99": 175201.4416347006,
      "p99.9": 186797.24664659952,
      "p99.99": 200460.94012580605
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.0001614946,
      "fastest": 0.000038208,
      "slowest": 0.000365706
    },
    "DNSLookup": {
      "average": 0.000012528900000000001,
      "fastest": 1e-6,
      "slowest": 0.000063083
    }
  },
  "statusCodeDistribution": {
    "200": 1369452
  },
  "errorDistribution": {
    "aborted due to deadline": 3
  }
}

and with more realistic test option I've found on README
oha http://172.17.0.2:3000 -j --no-tui -z 10s -c 10 --latency-correction --disable-keepalive
I have

{
  "summary": {
    "successRate": 1.0,
    "total": 10.00244618,
    "slowest": 0.012744112,
    "fastest": 0.000034208,
    "average": 0.000673916113492022,
    "requestsPerSec": 14813.976234761405,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000034208": 1,
    "0.0013051984": 131514,
    "0.0025761888": 3664,
    "0.0038471792": 4492,
    "0.0051181696": 5130,
    "0.00638916": 2255,
    "0.0076601504": 718,
    "0.008931140800000001": 304,
    "0.0102021312": 81,
    "0.0114731216": 6,
    "0.012744112": 4
  },
  "latencyPercentiles": {
    "p10": 0.000113207,
    "p25": 0.000163832,
    "p50": 0.000253914,
    "p75": 0.000390497,
    "p90": 0.001891944,
    "p95": 0.004174885,
    "p99": 0.006056246,
    "p99.9": 0.008555936,
    "p99.99": 0.009799259
  },
  "rps": {
    "mean": 14216.714782684752,
    "stddev": 17382.270376030592,
    "max": 64122.66094837892,
    "min": 849.88484060425,
    "percentiles": {
      "p10": 2261.3753088447224,
      "p25": 2474.4886115784316,
      "p50": 2717.37172026533,
      "p75": 31742.430106323052,
      "p90": 42629.453030938785,
      "p95": 46716.50961449759,
      "p99": 54725.266655616586,
      "p99.9": 62137.75489987862,
      "p99.99": 64122.66094837892
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.00042168944500536507,
      "fastest": 0.000012875,
      "slowest": 0.010655295
    },
    "DNSLookup": {
      "average": 1.1297418218385604e-6,
      "fastest": 4.58e-7,
      "slowest": 0.003425224
    }
  },
  "statusCodeDistribution": {
    "200": 148169
  },
  "errorDistribution": {
    "aborted due to deadline": 7
  }
}

How can you explain to variations beetween wrk and oha ?

Regards,

@hatoo
Copy link
Owner

hatoo commented Nov 30, 2024

Hi, I think it's simply because wrk is more optimized than oha.

Put simply, Ideally, we can distribute all works to each thread statically for this kind a application.
But our runtime tokio distributes works dynamically using work-stealing, which involves some overheads for locking things.
The overhead is apparent in extreme conditions that benchmark the fast server on localhost.

You can check this hypothesis by using strace.
strace (-f) on wrk is very clean while oha's strace has many futex related things.

But we can implement real-time tui easily by using tokio framework. it is a good point.

See. https://emschwartz.me/async-rust-can-be-a-pleasure-to-work-with-without-send-sync-static/

@TapGhoul
Copy link

TapGhoul commented Dec 24, 2024

This could potentially be improved by setting up tokio in a runtime-per-core fashion, couldn't it? Doing so would give it performance closer to a runtime such as https://github.com/compio-rs/compio - bonus points if you can pin the thread to the core by setting affinity.

The way compio does this is it has a dispatcher with a given set of threads, and each dispatched task gets allocated to the instance of the runtime on whatever thread receives it - see https://compio.rs/docs/compio/dispatcher for an example of this. Compio is a successor to projects such as monio/glommio (in my view), and supports windows as a bonus, which is why I am looking at it - but it's obviously not the only option, and tokio is very capable of such performance improvements.

Tokio can be set up in a similar fashion by having a single-threaded runtime on each thread of a threadpool (for example with a rayon threadpool, setting up the instance via the broadcast() method) and then spawning each task on that threadpool directly.

The main performance loss when it comes to futexes etc generally comes down to the issue of work-stealing, as described in the article you linked - so a setup like this may be a relatively easy route to improving the situation here if you don't want to manage the dispatch yourself to properly evenly distribute load (though it's not hard to build a basic round-robin dispatcher).

If you wanted to avoid a dependency here, compio does this by creating a Vec<> of thread JoinHandles, and makes all threads listen on an async channel - they use flume for task dispatch throughput, but tokio's own unbounded channels to dispatch tasks would probably serve just fine here.

@hatoo
Copy link
Owner

hatoo commented Jan 11, 2025

@waghanza
oha v1.6.0 includes #646 becoming faster there is still some performance gap between wrk though.

https://github.com/hatoo/oha?tab=readme-ov-file#performance

@waghanza
Copy link
Author

@hatoo the performance gap is less important for me than consistency.

I mean if wrk give that the results that framework A is better than framework B with 50% performance more, I expect that oha give the same result (or nearly)

@waghanza
Copy link
Author

Based on the command wrk http://172.17.0.2:3000 and ~/.cargo/bin/oha http://172.17.0.2:3000 -z 10s -c 10 -j --no-tui, I have some more details @hatoo

Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   144.54us  204.28us  12.09ms   99.30%
    Req/Sec    36.51k     1.69k   45.44k    72.50%
  726508 requests in 10.00s, 51.96MB read
Requests/sec:  72648.96
Transfer/sec:      5.20MB
{
  "summary": {
    "successRate": 1.0,
    "total": 10.002132448,
    "slowest": 0.012102818,
    "fastest": 0.000011209,
    "average": 0.0001493152938589139,
    "requestsPerSec": 66487.72183905398,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000011209": 2,
    "0.0012203699": 664439,
    "0.0024295308": 248,
    "0.0036386917": 55,
    "0.0048478526": 28,
    "0.0060570135": 179,
    "0.0072661744": 56,
    "0.0084753353": 1,
    "0.0096844962": 1,
    "0.0108936571": 0,
    "0.012102818": 4
  },
  "latencyPercentiles": {
    "p10": 0.000056291,
    "p25": 0.000088166,
    "p50": 0.000170708,
    "p75": 0.000196333,
    "p90": 0.000204916,
    "p95": 0.000211584,
    "p99": 0.000259208,
    "p99.9": 0.001067749,
    "p99.99": 0.005996617
  },
  "rps": {
    "mean": 66488.9445348776,
    "stddev": 11061.71990612855,
    "max": 200117.82937793774,
    "min": 53453.523012592625,
    "percentiles": {
      "p10": 58920.9463964461,
      "p25": 61416.20159398043,
      "p50": 64900.08437010652,
      "p75": 69390.87428897024,
      "p90": 73052.14462083163,
      "p95": 76854.12325878603,
      "p99": 94786.61600970964,
      "p99.9": 200109.47989645135,
      "p99.99": 200117.82937793774
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.00012242459999999998,
      "fastest": 0.0000665,
      "slowest": 0.000234041
    },
    "DNSLookup": {
      "average": 0.000012754000000000002,
      "fastest": 2.708e-6,
      "slowest": 0.000022458
    }
  },
  "statusCodeDistribution": {
    "200": 665013
  },
  "errorDistribution": {
    "aborted due to deadline": 6
  }
}
Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    36.54us   84.46us   4.06ms   97.59%
    Req/Sec   165.53k    13.39k  240.67k    94.53%
  3311800 requests in 10.10s, 236.88MB read
Requests/sec: 327929.66
Transfer/sec:     23.46MB
{
  "summary": {
    "successRate": 1.0,
    "total": 10.001754459,
    "slowest": 0.00220447,
    "fastest": 9.625e-6,
    "average": 0.00004065595079742935,
    "requestsPerSec": 242093.9256133362,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000009625": 1,
    "0.0002291095": 2410075,
    "0.00044859399999999997": 8094,
    "0.0006680785": 2169,
    "0.000887563": 748,
    "0.0011070475": 197,
    "0.001326532": 50,
    "0.0015460165": 13,
    "0.001765501": 4,
    "0.0019849855": 5,
    "0.00220447": 1
  },
  "latencyPercentiles": {
    "p10": 0.000015875,
    "p25": 0.000021542,
    "p50": 0.000036084,
    "p75": 0.000048417,
    "p90": 0.000066917,
    "p95": 0.000081709,
    "p99": 0.000138292,
    "p99.9": 0.00049992,
    "p99.99": 0.000906589
  },
  "rps": {
    "mean": 242097.31066207887,
    "stddev": 5207.43781862387,
    "max": 254550.60466019373,
    "min": 198427.09255277851,
    "percentiles": {
      "p10": 236192.6150775945,
      "p25": 239009.2750405487,
      "p50": 242308.7958092891,
      "p75": 245443.68897663642,
      "p90": 248328.68196278517,
      "p95": 249928.86678410994,
      "p99": 252729.1902214669,
      "p99.9": 254414.52706948287,
      "p99.99": 254550.60466019373
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.0001030256,
      "fastest": 0.000040042,
      "slowest": 0.00022571
    },
    "DNSLookup": {
      "average": 0.0000109418,
      "fastest": 1.417e-6,
      "slowest": 0.000022125
    }
  },
  "statusCodeDistribution": {
    "200": 2421357
  },
  "errorDistribution": {
    "aborted due to deadline": 7
  }
}
Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   574.21us    2.00ms  80.65ms   98.78%
    Req/Sec    13.55k     2.74k   17.25k    79.70%
  272472 requests in 10.10s, 87.05MB read
Requests/sec:  26977.28
Transfer/sec:      8.62MB
{
  "summary": {
    "successRate": 1.0,
    "total": 10.002291605,
    "slowest": 0.00905916,
    "fastest": 0.000103333,
    "average": 0.00040535336760578187,
    "requestsPerSec": 24578.767517346343,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000103333": 1,
    "0.0009989157": 224180,
    "0.0018944983999999998": 20234,
    "0.0027900811": 1210,
    "0.0036856638": 158,
    "0.0045812464999999995": 40,
    "0.005476829199999999": 2,
    "0.0063724119": 6,
    "0.007267994599999999": 2,
    "0.0081635773": 0,
    "0.00905916": 1
  },
  "latencyPercentiles": {
    "p10": 0.000141792,
    "p25": 0.000161833,
    "p50": 0.000264083,
    "p75": 0.00051525,
    "p90": 0.000928416,
    "p95": 0.001199416,
    "p99": 0.001726999,
    "p99.9": 0.002618248,
    "p99.99": 0.004052497
  },
  "rps": {
    "mean": 24579.56483432097,
    "stddev": 1204.8223821470078,
    "max": 28177.657311583807,
    "min": 10021.887802961668,
    "percentiles": {
      "p10": 23145.246642660448,
      "p25": 23842.17566190794,
      "p50": 24632.55191735948,
      "p75": 25381.23772759455,
      "p90": 26056.258066792245,
      "p95": 26393.759908404554,
      "p99": 26886.61792831882,
      "p99.9": 27770.695356076278,
      "p99.99": 28177.657311583807
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.0002408749,
      "fastest": 0.000033,
      "slowest": 0.000896541
    },
    "DNSLookup": {
      "average": 0.0000261877,
      "fastest": 1.375e-6,
      "slowest": 0.00010025
    }
  },
  "statusCodeDistribution": {
    "200": 245834
  },
  "errorDistribution": {
    "aborted due to deadline": 10
  }
}
Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   253.98us  534.72us  19.39ms   93.51%
    Req/Sec    35.84k     4.64k   41.90k    85.15%
  720292 requests in 10.10s, 127.08MB read
Requests/sec:  71318.68
Transfer/sec:     12.58MB
{
  "summary": {
    "successRate": 1.0,
    "total": 10.002648437,
    "slowest": 0.015493073,
    "fastest": 0.000030833,
    "average": 0.00012683846277234634,
    "requestsPerSec": 78369.544319915,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000030833": 1,
    "0.0015770570000000002": 781369,
    "0.0031232810000000003": 2134,
    "0.004669505": 344,
    "0.0062157290000000006": 33,
    "0.007761953000000001": 7,
    "0.009308176999999999": 1,
    "0.010854401": 3,
    "0.012400625": 0,
    "0.013946849": 0,
    "0.015493073000000001": 1
  },
  "latencyPercentiles": {
    "p10": 0.000049709,
    "p25": 0.000057875,
    "p50": 0.000095209,
    "p75": 0.00014475,
    "p90": 0.000190291,
    "p95": 0.000282958,
    "p99": 0.000723792,
    "p99.9": 0.002475373,
    "p99.99": 0.004206205
  },
  "rps": {
    "mean": 78375.9462269467,
    "stddev": 10451.58276451326,
    "max": 95623.57434512717,
    "min": 25652.711190955175,
    "percentiles": {
      "p10": 65808.98960829696,
      "p25": 73134.38106443692,
      "p50": 80032.06884998479,
      "p75": 85680.7369584364,
      "p90": 89787.22829228407,
      "p95": 91934.153538042,
      "p99": 94954.27586408584,
      "p99.9": 95513.08749559669,
      "p99.99": 95623.57434512717
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.0005560453999999999,
      "fastest": 0.000057792,
      "slowest": 0.001605665
    },
    "DNSLookup": {
      "average": 0.000010800099999999998,
      "fastest": 1.375e-6,
      "slowest": 0.000022
    }
  },
  "statusCodeDistribution": {
    "200": 783893
  },
  "errorDistribution": {
    "aborted due to deadline": 10
  }
}
Running 10s test @ http://172.17.0.2:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   107.99us  293.84us  10.89ms   96.77%
    Req/Sec    66.05k    18.69k   75.80k    88.12%
  1327895 requests in 10.10s, 92.69MB read
Requests/sec: 131475.81
Transfer/sec:      9.18MB
{
  "summary": {
    "successRate": 1.0,
    "total": 10.001422443,
    "slowest": 0.003776709,
    "fastest": 0.000018708,
    "average": 0.00008148780117765962,
    "requestsPerSec": 121564.00821274659,
    "totalData": 0,
    "sizePerRequest": 0,
    "sizePerSec": 0.0
  },
  "responseTimeHistogram": {
    "0.000018708": 1,
    "0.0003945081": 1212951,
    "0.0007703082": 2618,
    "0.0011461083000000002": 151,
    "0.0015219084000000001": 45,
    "0.0018977085": 12,
    "0.0022735086000000002": 8,
    "0.0026493087": 6,
    "0.0030251088": 2,
    "0.0034009089": 9,
    "0.003776709": 1
  },
  "latencyPercentiles": {
    "p10": 0.000038542,
    "p25": 0.000051292,
    "p50": 0.000072,
    "p75": 0.000097875,
    "p90": 0.000130875,
    "p95": 0.000159583,
    "p99": 0.000249625,
    "p99.9": 0.000585916,
    "p99.99": 0.000968209
  },
  "rps": {
    "mean": 121570.43556402421,
    "stddev": 11476.56738260729,
    "max": 133215.9778302904,
    "min": 31226.90719846694,
    "percentiles": {
      "p10": 117018.02077520668,
      "p25": 120630.13340732458,
      "p50": 123663.87239009385,
      "p75": 126253.632543104,
      "p90": 128092.65134764195,
      "p95": 129180.71210892896,
      "p99": 131663.60668838862,
      "p99.9": 132223.11260008343,
      "p99.99": 133215.9778302904
    }
  },
  "details": {
    "DNSDialup": {
      "average": 0.00004298324592994573,
      "fastest": 0.000018417,
      "slowest": 0.000319417
    },
    "DNSLookup": {
      "average": 2.703637395165278e-6,
      "fastest": 9.16e-7,
      "slowest": 0.000096959
    }
  },
  "statusCodeDistribution": {
    "200": 1215804
  },
  "errorDistribution": {
    "aborted due to deadline": 9
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants