Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ztunnel does not scale up with number of worker threads in expected way due to h2 locking #1174

Open
howardjohn opened this issue Jun 24, 2024 · 14 comments

Comments

@howardjohn
Copy link
Member

howardjohn commented Jun 24, 2024

Some testinging:

Running locally (i.e. ONE ztunnel connecting to itself)
2 workers
1:  11.51 Gbits/sec
2:  11.92 Gbits/sec
3:  11.33 Gbits/sec
4:  11.30 Gbits/sec
6:  12.15 Gbits/sec
8:  12.77 Gbits/sec
16: 14.95 Gbits/sec

4 workers
1:  9.52 Gbits/sec
2:  15.89 Gbits/sec
3:  16.49 Gbits/sec
4:  17.05 Gbits/sec
6:  16.28 Gbits/sec
8:  16.15 Gbits/sec
16: 15.74 Gbits/sec

8 workers
1:  7.88 Gbits/sec
2:  12.27 Gbits/sec
3:  15.73 Gbits/sec
4:  17.14 Gbits/sec
6:  17.95 Gbits/sec
8:  18.63 Gbits/sec
16: 19.03 Gbits/sec

On Kubernetes, cross node
2 workers
1:  9.27 Gbits/sec
2:  10.11 Gbits/sec
3:  10.60 Gbits/sec
4:  11.72 Gbits/sec
6:  13.55 Gbits/sec
8:  14.34 Gbits/sec
16: 16.09 Gbits/sec

4 workers
1:  8.09 Gbits/sec
2:  9.16 Gbits/sec
3:  9.72 Gbits/sec
4:  11.31 Gbits/sec
6:  11.64 Gbits/sec
8:  13.67 Gbits/sec
16: 15.08 Gbits/sec

8 workers
1:  8.09 Gbits/sec
2:  8.61 Gbits/sec
3:  9.27 Gbits/sec
4:  10.63 Gbits/sec
6:  11.70 Gbits/sec
8:  13.17 Gbits/sec
16: 15.57 Gbits/sec


On kubernetes, raw TCP (not HBONE/TLS)
2 workers
1:  22.18 Gbits/sec
2:  35.72 Gbits/sec
3:  35.36 Gbits/sec
4:  35.28 Gbits/sec
6:  37.48 Gbits/sec
8:  37.37 Gbits/sec
16: 38.62 Gbits/sec

8 workers
1:  23.96 Gbits/sec
2:  34.57 Gbits/sec
3:  43.82 Gbits/sec
4:  51.85 Gbits/sec
6:  50.36 Gbits/sec
8:  47.48 Gbits/sec
16: 43.99 Gbits/sec

Note: I have a 16c machine.

The weird thing is I do not see the same behavior in our benchmarks (with #1171)

                        thrpt:  [12.107 Gelem/s 13.392 Gelem/s 14.496 Gelem/s]
                 change:
                        time:   [-3.0115% +5.8051% +18.549%] (p = 0.40 > 0.05)
                        thrpt:  [-15.647% -5.4866% +3.1050%]
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
throughput/hbone2       time:   [376.68 ms 384.08 ms 392.21 ms]
                        thrpt:  [21.901 Gelem/s 22.365 Gelem/s 22.804 Gelem/s]
                 change:
                        time:   [-12.720% -9.1954% -5.5575%] (p = 0.00 < 0.05)
                        thrpt:  [+5.8845% +10.127% +14.574%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
throughput/hbone8       time:   [501.59 ms 507.84 ms 514.72 ms]
                        thrpt:  [16.689 Gelem/s 16.915 Gelem/s 17.125 Gelem/s]
                 change:
                        time:   [-2.8183% -0.8154% +1.2152%] (p = 0.46 > 0.05)
                        thrpt:  [-1.2006% +0.8221% +2.9000%]
                        No change in performance detected.

Things I have tried with success:

  • H2 Window sizes - no impact
  • Tracing to try to see where we are waiting - inconclusive
  • Disabling pooling, so we have 1:1 connections - no impact

I would expect HBONE to behave like TCP: with 2 workers, we see ~2x throughput with 2 connections as 1. 2x is the theoretical best, of course, so something more like 1.5x-1.8x may be more realistic. We don't see that. We see virtually zero increase. We do see improvements at very high connection counts, though.

We need to find out what is holding up the processing

@keithmattix
Copy link
Contributor

The weird thing is I do not see the same behavior.

You don't see the same behavior where?

@howardjohn
Copy link
Member Author

The weird thing is I do not see the same behavior.

You don't see the same behavior where?

🤦 I updated the comment

@keithmattix
Copy link
Contributor

I would totally suspect connection pooling if you didn't rule it out. We only use HBONE for TLS right? Maybe some weird default in the hyper client/server?

@howardjohn
Copy link
Member Author

Ah.. I think I just disabled pooling improperly. The benchmark doesn't pool due to using different client network namespaces. I can reproduce full now and see good scaling with number of connections.

At 2 workers (default):

1:  11.05 Gbits/sec
2:  17.17 Gbits/sec
3:  17.20 Gbits/sec
4:  16.24 Gbits/sec
6:  14.84 Gbits/sec
8:  15.95 Gbits/sec
16: 14.71 Gbits/sec

So looks pretty good and align with TCP numbers

@howardjohn
Copy link
Member Author

Its still pretty interesting that 1->2 yields no improvement, but we do still scale up to some extent. 16c gives 15Gb/s which is consistent with the non-pooling...

@keithmattix
Copy link
Contributor

Should I be looking at the most recent numbers you posted? I see improvement from 1-2 but it looks about the same as the first set you posted in the description

@howardjohn
Copy link
Member Author

C Pool No Pool
1 9.27 11.05
2 10.11 17.17
3 10.60 17.20
4 11.72 16.24
6 13.55 14.84
8 14.34 15.95
16 16.09 14.71

Actually kind of interesting that we see inverse behavior

2024-06-24_15-11-41

@keithmattix
Copy link
Contributor

That feels somewhat expected: with connection pooling you get a smaller but monotonic throughput improvement with increased concurrency. Without pooling, you're more susceptible to longer running streams.

I wonder if the delay is on client or server processing....I assume the concurrency increase affects both

@bleggett
Copy link
Contributor

bleggett commented Jun 24, 2024

When you say "workers" you mean you're constraining the Tokio runtime to X threads? How are you doing that, exactly?

Also - if ztunnel is connecting to itself, I wouldn't expect pooling to help at all, really?

EDIT: Oh, I see, you did cross node too.

We do see improvements at very high connection counts, though.

Yeah I wouldn't expect pooling to by-itself offer much of a boost at either end of the graph, it's more that it (among other things) mitigates starving the local node as the conn count increases (or would, if we didn't pool).

@howardjohn
Copy link
Member Author

ZTUNNEL_WORKER_THREADS, we default to 2.

Ztunnel is only connecting to itself in some of the tests, the ones that are meaningful (the ones running in k8s) is cross-node.

@bleggett
Copy link
Contributor

bleggett commented Jun 24, 2024

Also note that the one place pooling does lock is when it hits the conn limit to Dest A and needs to open another conn to Dest A.

We sort of have to lock there, as the alternative is we let clients use us to eagerly race connections to Dest A, and then we prune them after-the-fact - which is worse.

It's an intentional ratelimit for those scenarios, where a client is spamming new conns to us.

We also don't consult worker thread count when deciding to schedule new pool conns at all (we could, but do not)

@howardjohn
Copy link
Member Author

These tests are "open connection once" then "send a ton of information" on them. The connection establishment is out of the measurement path entirely basically.

@bleggett
Copy link
Contributor

bleggett commented Jun 24, 2024

These tests are "open connection once" then "send a ton of information" on them. The connection establishment is out of the measurement path entirely basically.

Oh ok, if the clients are opening A Single connection to ZT then H2 multiplexing doesn't matter at all then.

If connection establishment is controlled for fully then it's got to be crypto overhead on the wire or h2 server overhead somehow, right? Can't think of anything else.

It's possible that the h2 server code just isn't very good at handling multiple connections with the same src/dest tuple, since the http/2 spec doesn't exactly encourage that.

@howardjohn howardjohn changed the title Ztunnel does not scale up with number of worker threads in expected way Ztunnel does not scale up with number of worker threads in expected way due to h2 locking Aug 23, 2024
@howardjohn
Copy link
Member Author

Root caused to hyperium/h2#531 (comment) a while back, forgot to update this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants