feat: Stop the PMTUD search at the interface MTU #2135

larseggert · 2024-09-26T12:42:46Z

Should we also optimistically start the search at the interface MTU, and only start from 1280 when that fails?

WIP Should we optimistically *start* the search at the interface MTU, and only start from 1280 when that fails?

codecov · 2024-09-26T12:55:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (5677bd1) to head (5ff1a7d).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2135      +/-   ##
==========================================
+ Coverage   95.39%   95.40%   +0.01%     
==========================================
  Files         112      112              
  Lines       36373    36372       -1     
==========================================
+ Hits        34697    34700       +3     
+ Misses       1676     1672       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mxinden · 2024-09-26T13:33:01Z

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

github-actions · 2024-09-26T13:42:55Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

msquic vs. neqo-latest: U
mvfst vs. neqo-latest: Z A L1 C1
ngtcp2 vs. neqo-latest: run cancelled after 20 min
quinn vs. neqo-latest: V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L2 C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L1 L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R Z B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

larseggert · 2024-09-26T13:43:10Z

All true, but in practice, the local interface is most often the limiting hop.

github-actions · 2024-09-26T13:44:03Z

Benchmark results

Performance differences relative to 55e3a93.

coalesce_acked_from_zero 1+1 entries: 💚 Performance has improved.

       time:   [109.82 ns 110.16 ns 110.51 ns]
       change: [-2.9114% -2.4202% -1.8027%] (p = 0.00 < 0.05)
Found 10 outliers among 100 measurements (10.00%)

4 (4.00%) high mild

6 (6.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.

       time:   [123.78 ns 124.18 ns 124.61 ns]
       change: [-29.538% -29.258% -28.981%] (p = 0.00 < 0.05)
Found 14 outliers among 100 measurements (14.00%)

1 (1.00%) low mild

13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💚 Performance has improved.

       time:   [123.25 ns 123.50 ns 123.85 ns]
       change: [-36.277% -31.789% -29.118%] (p = 0.00 < 0.05)
Found 7 outliers among 100 measurements (7.00%)

3 (3.00%) low mild

4 (4.00%) high severe

coalesce_acked_from_zero 1000+1 entries: 💚 Performance has improved.

       time:   [101.06 ns 101.31 ns 101.60 ns]
       change: [-29.311% -28.682% -27.990%] (p = 0.00 < 0.05)
Found 14 outliers among 100 measurements (14.00%)

6 (6.00%) high mild

8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [111.45 ms 111.51 ms 111.57 ms]
       change: [+0.0464% +0.1135% +0.1826%] (p = 0.00 < 0.05)
Found 8 outliers among 100 measurements (8.00%)

7 (7.00%) low mild

1 (1.00%) high mild

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [25.530 ms 26.407 ms 27.279 ms]
       change: [-12.035% -7.6053% -2.5828%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [34.298 ms 35.702 ms 37.119 ms]
       change: [-12.235% -6.7940% -0.8800%] (p = 0.02 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.

       time:   [25.555 ms 26.339 ms 27.135 ms]
       change: [-7.4868% -3.6913% +0.4903%] (p = 0.08 > 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.

       time:   [40.692 ms 42.746 ms 44.849 ms]
       change: [-8.5706% -2.2515% +4.3434%] (p = 0.49 > 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

       time:   [111.94 ms 112.59 ms 113.21 ms]
       thrpt:  [883.31 MiB/s 888.14 MiB/s 893.31 MiB/s]
change:
       time:   [-3.4473% -2.8717% -2.3106%] (p = 0.00 < 0.05)
       thrpt:  [+2.3653% +2.9566% +3.5704%]
Found 7 outliers among 100 measurements (7.00%)

7 (7.00%) low mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: Change within noise threshold.

       time:   [311.43 ms 314.67 ms 317.83 ms]
       thrpt:  [31.463 Kelem/s 31.779 Kelem/s 32.110 Kelem/s]
change:
       time:   [-3.3142% -1.7738% -0.2049%] (p = 0.03 < 0.05)
       thrpt:  [+0.2053% +1.8059% +3.4278%]

1-conn/1-1b-resp (aka. HPS)/client: Change within noise threshold.

       time:   [34.024 ms 34.199 ms 34.393 ms]
       thrpt:  [29.076  elem/s 29.241  elem/s 29.391  elem/s]
change:
       time:   [+0.6039% +1.3595% +2.2038%] (p = 0.00 < 0.05)
       thrpt:  [-2.1563% -1.3413% -0.6003%]
Found 11 outliers among 100 measurements (11.00%)

8 (8.00%) low mild

3 (3.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			193.5 ± 124.5	99.8	556.2	1.00
neqo	msquic	reno	on	225.6 ± 16.8	204.6	264.2	1.00
neqo	msquic	reno		274.2 ± 90.5	208.9	459.4	1.00
neqo	msquic	cubic	on	283.2 ± 76.3	217.5	406.8	1.00
neqo	msquic	cubic		256.6 ± 68.6	218.2	419.4	1.00
msquic	neqo	reno	on	129.4 ± 86.1	82.6	354.0	1.00
msquic	neqo	reno		118.0 ± 60.8	85.1	335.0	1.00
msquic	neqo	cubic	on	125.9 ± 77.9	84.4	403.0	1.00
msquic	neqo	cubic		96.5 ± 26.7	81.4	183.9	1.00
neqo	neqo	reno	on	180.9 ± 85.4	122.6	400.6	1.00
neqo	neqo	reno		229.3 ± 119.0	132.7	468.7	1.00
neqo	neqo	cubic	on	182.6 ± 65.8	128.5	394.9	1.00
neqo	neqo	cubic		216.2 ± 104.4	129.7	511.9	1.00

⬇️ Download logs

larseggert · 2024-09-26T13:58:53Z

This PR exposed a bug in mtu 🫤 mozilla/mtu#26

mxinden · 2024-09-26T19:09:00Z

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

All true, but in practice, the local interface is most often the limiting hop.

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.

We only start probing once the connection is confirmed.

neqo/neqo-transport/src/connection/mod.rs

Lines 2794 to 2802 in 55e3a93

    
           fn set_confirmed(&mut self) -> Res<()> { 
        
               self.set_state(State::Confirmed); 
        
               if self.conn_params.pmtud_enabled() { 
        
                   self.paths 
        
                       .primary() 
        
                       .ok_or(Error::InternalError)? 
        
                       .borrow_mut() 
        
                       .pmtud_mut() 
        
                       .start();

Say that a client's path MTU is smaller than their local interface MTU. Given that probing only starts once confirmed, i.e. after receiving HANDSHAKE_DONE from the server, initial HTTP requests would not be delayed, but only one subsequent flight of requests would be delayed by up to one PTO. Is that correct?

Thus this optimization, and really all of PMTUD probing, assumes that the potential delay of one subsequent flight of HTTP requests by up to one PTO is worth the trade off of potentially increasing the overall connection throughput.

Is that correct?

larseggert · 2024-09-27T05:29:49Z

Should we also optimistically start the search at the interface MTU

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.
We only start probing once the connection is confirmed.

This would need to change. What I think we should do is roughly this:

Start sending at the local interface MTU when the connection starts (i.e., for the Initial).
If we detect a loss n times, we revert to pobing up from 1280.

n should probably be something like 2, so we don't cause undue delay.

mxinden · 2024-09-27T07:54:34Z

In the case where a client's path MTU is smaller than their local interface MTU, this would add a delay of 2*PTO to every connection establishment, right? If so, isn't that a high cost for the potential increase in throughput? Or is such scenario just very rare?

larseggert · 2024-09-27T07:57:16Z

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

We could also cache a probed MTU towards a destination IP, like the OS does for a TCP MSS it has determined.

mxinden · 2024-09-27T08:16:55Z

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

How about we:

Role out PMTUD on Firefox Nightly without the optimization (i.e. starting at the local interface MTU).
Enable the optimization, monitoring whether connection establishment times stay stable.

larseggert · 2024-09-27T08:19:49Z

I was thinking we just log the log the local interface MTU together with the discovered PMTUD, and check for differences.

mxinden

Generally looking good to me. Minor comments.

neqo-transport/Cargo.toml

neqo-transport/src/cc/classic_cc.rs

neqo-transport/src/path.rs

feat: Stop the PMTUD search at the interface MTU

d5191fb

WIP Should we optimistically *start* the search at the interface MTU, and only start from 1280 when that fails?

Remove unused function, while I'm here.

ead6c3c

larseggert marked this pull request as ready for review September 26, 2024 13:08

larseggert requested review from KershawChang, martinthomson and mxinden as code owners September 26, 2024 13:08

Try and fix Firefox CI build

3f83c05

larseggert added 2 commits September 27, 2024 15:20

Use stop when we hit the iface MTU

5f2e4b5

Log

5f03837

mxinden reviewed Sep 30, 2024

View reviewed changes

larseggert marked this pull request as draft October 8, 2024 05:51

larseggert added 3 commits October 8, 2024 05:26

Merge branch 'main' into feat-stop-pmtud-at-iface-mtu

4084085

Try new PR for MTU

cfb24b7

Remove stale comment

e485804

larseggert marked this pull request as ready for review October 8, 2024 12:47

larseggert marked this pull request as draft October 8, 2024 12:48

Set policy

d35f7e4

larseggert marked this pull request as ready for review October 8, 2024 13:10

larseggert added 2 commits October 8, 2024 16:26

Again

56a7a4b

Again

5ff1a7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Stop the PMTUD search at the interface MTU #2135

feat: Stop the PMTUD search at the interface MTU #2135

larseggert commented Sep 26, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading

mxinden commented Sep 26, 2024

github-actions bot commented Sep 26, 2024 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

larseggert commented Sep 26, 2024

github-actions bot commented Sep 26, 2024 •

edited

Loading

larseggert commented Sep 26, 2024

mxinden commented Sep 26, 2024

larseggert commented Sep 27, 2024

mxinden commented Sep 27, 2024

larseggert commented Sep 27, 2024 •

edited

Loading

mxinden commented Sep 27, 2024

larseggert commented Sep 27, 2024

mxinden left a comment

feat: Stop the PMTUD search at the interface MTU #2135

Are you sure you want to change the base?

feat: Stop the PMTUD search at the interface MTU #2135

Conversation

larseggert commented Sep 26, 2024 • edited Loading

codecov bot commented Sep 26, 2024 • edited Loading

Codecov Report

mxinden commented Sep 26, 2024

github-actions bot commented Sep 26, 2024 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

larseggert commented Sep 26, 2024

github-actions bot commented Sep 26, 2024 • edited Loading

Benchmark results

Client/server transfer results

larseggert commented Sep 26, 2024

mxinden commented Sep 26, 2024

larseggert commented Sep 27, 2024

mxinden commented Sep 27, 2024

larseggert commented Sep 27, 2024 • edited Loading

mxinden commented Sep 27, 2024

larseggert commented Sep 27, 2024

mxinden left a comment

Choose a reason for hiding this comment

larseggert commented Sep 26, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading

larseggert commented Sep 27, 2024 •

edited

Loading