Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stop the PMTUD search at the interface MTU #2135

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU, and only start from 1280 when that fails?

WIP

Should we optimistically *start* the search at the interface MTU, and
only start from 1280 when that fails?
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (5677bd1) to head (5ff1a7d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2135      +/-   ##
==========================================
+ Coverage   95.39%   95.40%   +0.01%     
==========================================
  Files         112      112              
  Lines       36373    36372       -1     
==========================================
+ Hits        34697    34700       +3     
+ Misses       1676     1672       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@larseggert larseggert marked this pull request as ready for review September 26, 2024 13:08
@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

Copy link

github-actions bot commented Sep 26, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert
Copy link
Collaborator Author

All true, but in practice, the local interface is most often the limiting hop.

Copy link

github-actions bot commented Sep 26, 2024

Benchmark results

Performance differences relative to 55e3a93.

coalesce_acked_from_zero 1+1 entries: 💚 Performance has improved.
       time:   [109.82 ns 110.16 ns 110.51 ns]
       change: [-2.9114% -2.4202% -1.8027%] (p = 0.00 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.
       time:   [123.78 ns 124.18 ns 124.61 ns]
       change: [-29.538% -29.258% -28.981%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💚 Performance has improved.
       time:   [123.25 ns 123.50 ns 123.85 ns]
       change: [-36.277% -31.789% -29.118%] (p = 0.00 < 0.05)

Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low mild
4 (4.00%) high severe

coalesce_acked_from_zero 1000+1 entries: 💚 Performance has improved.
       time:   [101.06 ns 101.31 ns 101.60 ns]
       change: [-29.311% -28.682% -27.990%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.45 ms 111.51 ms 111.57 ms]
       change: [+0.0464% +0.1135% +0.1826%] (p = 0.00 < 0.05)

Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) low mild
1 (1.00%) high mild

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [25.530 ms 26.407 ms 27.279 ms]
       change: [-12.035% -7.6053% -2.5828%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [34.298 ms 35.702 ms 37.119 ms]
       change: [-12.235% -6.7940% -0.8800%] (p = 0.02 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [25.555 ms 26.339 ms 27.135 ms]
       change: [-7.4868% -3.6913% +0.4903%] (p = 0.08 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [40.692 ms 42.746 ms 44.849 ms]
       change: [-8.5706% -2.2515% +4.3434%] (p = 0.49 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
       time:   [111.94 ms 112.59 ms 113.21 ms]
       thrpt:  [883.31 MiB/s 888.14 MiB/s 893.31 MiB/s]
change:
       time:   [-3.4473% -2.8717% -2.3106%] (p = 0.00 < 0.05)
       thrpt:  [+2.3653% +2.9566% +3.5704%]

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) low mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: Change within noise threshold.
       time:   [311.43 ms 314.67 ms 317.83 ms]
       thrpt:  [31.463 Kelem/s 31.779 Kelem/s 32.110 Kelem/s]
change:
       time:   [-3.3142% -1.7738% -0.2049%] (p = 0.03 < 0.05)
       thrpt:  [+0.2053% +1.8059% +3.4278%]
1-conn/1-1b-resp (aka. HPS)/client: Change within noise threshold.
       time:   [34.024 ms 34.199 ms 34.393 ms]
       thrpt:  [29.076  elem/s 29.241  elem/s 29.391  elem/s]
change:
       time:   [+0.6039% +1.3595% +2.2038%] (p = 0.00 < 0.05)
       thrpt:  [-2.1563% -1.3413% -0.6003%]

Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) low mild
3 (3.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 193.5 ± 124.5 99.8 556.2 1.00
neqo msquic reno on 225.6 ± 16.8 204.6 264.2 1.00
neqo msquic reno 274.2 ± 90.5 208.9 459.4 1.00
neqo msquic cubic on 283.2 ± 76.3 217.5 406.8 1.00
neqo msquic cubic 256.6 ± 68.6 218.2 419.4 1.00
msquic neqo reno on 129.4 ± 86.1 82.6 354.0 1.00
msquic neqo reno 118.0 ± 60.8 85.1 335.0 1.00
msquic neqo cubic on 125.9 ± 77.9 84.4 403.0 1.00
msquic neqo cubic 96.5 ± 26.7 81.4 183.9 1.00
neqo neqo reno on 180.9 ± 85.4 122.6 400.6 1.00
neqo neqo reno 229.3 ± 119.0 132.7 468.7 1.00
neqo neqo cubic on 182.6 ± 65.8 128.5 394.9 1.00
neqo neqo cubic 216.2 ± 104.4 129.7 511.9 1.00

⬇️ Download logs

@larseggert
Copy link
Collaborator Author

This PR exposed a bug in mtu 🫤 mozilla/mtu#26

@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

All true, but in practice, the local interface is most often the limiting hop.

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.

We only start probing once the connection is confirmed.

fn set_confirmed(&mut self) -> Res<()> {
self.set_state(State::Confirmed);
if self.conn_params.pmtud_enabled() {
self.paths
.primary()
.ok_or(Error::InternalError)?
.borrow_mut()
.pmtud_mut()
.start();

Say that a client's path MTU is smaller than their local interface MTU. Given that probing only starts once confirmed, i.e. after receiving HANDSHAKE_DONE from the server, initial HTTP requests would not be delayed, but only one subsequent flight of requests would be delayed by up to one PTO. Is that correct?

Thus this optimization, and really all of PMTUD probing, assumes that the potential delay of one subsequent flight of HTTP requests by up to one PTO is worth the trade off of potentially increasing the overall connection throughput.

Is that correct?

@larseggert
Copy link
Collaborator Author

Should we also optimistically start the search at the interface MTU

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.
We only start probing once the connection is confirmed.

This would need to change. What I think we should do is roughly this:

  • Start sending at the local interface MTU when the connection starts (i.e., for the Initial).
  • If we detect a loss n times, we revert to pobing up from 1280.

n should probably be something like 2, so we don't cause undue delay.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

In the case where a client's path MTU is smaller than their local interface MTU, this would add a delay of 2*PTO to every connection establishment, right? If so, isn't that a high cost for the potential increase in throughput? Or is such scenario just very rare?

@larseggert
Copy link
Collaborator Author

larseggert commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

We could also cache a probed MTU towards a destination IP, like the OS does for a TCP MSS it has determined.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

How about we:

  1. Role out PMTUD on Firefox Nightly without the optimization (i.e. starting at the local interface MTU).
  2. Enable the optimization, monitoring whether connection establishment times stay stable.

@larseggert
Copy link
Collaborator Author

I was thinking we just log the log the local interface MTU together with the discovered PMTUD, and check for differences.

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good to me. Minor comments.

neqo-transport/Cargo.toml Outdated Show resolved Hide resolved
neqo-transport/src/cc/classic_cc.rs Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
@larseggert larseggert marked this pull request as draft October 8, 2024 05:51
@larseggert larseggert marked this pull request as ready for review October 8, 2024 12:47
@larseggert larseggert marked this pull request as draft October 8, 2024 12:48
@larseggert larseggert marked this pull request as ready for review October 8, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants