fix: Make the server obey the amplification limit#3552
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3552 +/- ##
==========================================
- Coverage 94.53% 94.43% -0.11%
==========================================
Files 127 131 +4
Lines 39632 39954 +322
Branches 39632 39954 +322
==========================================
+ Hits 37466 37729 +263
- Misses 1329 1378 +49
- Partials 837 847 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Clean, well-motivated fix. Removing the DCID-based path validation from postprocess_packet is the right call — RFC 9000 §8.1 designates Handshake packets as the address validation mechanism, and Initial keys are publicly derivable, so accepting an Initial DCID match was weaker than intended.
The production change is minimal and correct. The new test directly reproduces the split-Initial scenario and checks the 3× invariant. Two minor suggestions inline.
There was a problem hiding this comment.
Pull request overview
This PR fixes QUIC server address validation so that it no longer lifts the anti-amplification limit based on a retransmitted client Initial whose DCID matches the server’s Initial SCID; instead, only incoming Handshake packets validate the server’s path during connection setup (per RFC 9000 §8.1).
Changes:
- Restrict setup-time path validation on the server to Handshake packets only (removes DCID/SCID matching shortcut).
- Add a regression test that reproduces the “MLKEM-split ClientHello + retransmitted Initial uses server SCID as DCID” scenario and asserts the server never exceeds 3× amplification.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
neqo-transport/src/connection/mod.rs |
Removes DCID-based setup path validation so amplification limits aren’t dropped on retransmitted Initials. |
neqo-transport/src/connection/tests/handshake.rs |
Adds a targeted regression test ensuring amplification limits remain enforced across the retransmit scenario. |
| loop { | ||
| match server.process_output(*now) { | ||
| Output::Datagram(d) => n += d.len(), | ||
| Output::Callback(t) if t < AT_LEAST_PTO => *now += t, |
There was a problem hiding this comment.
The comment says this drains only pacing delays and stops before PTOs fire, but the t < AT_LEAST_PTO condition will also advance time across any timer shorter than 1s (including PTO in this test suite, e.g. 300ms). That can change the scenario by potentially triggering PTO probes during drain. Either update the comment to match the behavior, or tighten the condition to only advance across pacing-scale delays (e.g., using rtt::GRANULARITY).
| Output::Callback(t) if t < AT_LEAST_PTO => *now += t, | |
| Output::Callback(t) if t < Duration::from_millis(1) => *now += t, |
db3b814 to
353d24f
Compare
There was a problem hiding this comment.
Correct fix. Removing the DCID-based path validation from postprocess_packet is the right approach — RFC 9000 §8.1 designates Handshake packets as the address validation signal during connection setup. Initial packets don't prove the client received the server's TLS messages (Initial keys are publicly derivable from the DCID), so the SCID-matching shortcut was unsound as an amplification-limit bypass.
The production change is minimal (2 lines removed, no new code paths) and no other call sites use local_initial_source_cid for path validation, so the fix is complete.
The regression test directly reproduces the MLKEM split-Initial retransmit scenario and verifies the 3× invariant. The c_init2 assertion from the previous review round has been addressed.
Two prior suggestions remain open and are still valid:
- The
drainclosure'sAT_LEAST_PTOthreshold (1s) is significantly wider than the "sub-millisecond pacing delays" the comment describes — this won't break the test (the server is amplification-limited regardless), but tightening the guard or updating the comment would improve clarity. - Asserting
server.state() != Confirmedat the end would guard against future changes that accidentally validate the path through a different code path.
The server was validating the client's address — and dropping the amplification limit — upon receiving a retransmitted client Initial whose DCID matched the server's chosen source CID. RFC 9000 Section 8.1 designates Handshake packets as the validation mechanism; the CID-based path (a MAY) is weaker here because Initial keys are publicly derivable, so the SCID is visible to any on-path observer. In practice, with MLKEM the ClientHello spans two Initial packets. If only the first reaches the server (as in the QNS amplificationlimit test with drops_to_server=2,3,4,5,6,7), the server sends PTO probes that exhaust most of the amplification budget. When the client retransmits the second fragment using the server's SCID as DCID, the old code validated the path and let the server send the full ~12 KB Handshake flight unrestricted — over 4× amplification.
353d24f to
ec50d2f
Compare
There was a problem hiding this comment.
Correct fix. The production change is minimal and well-targeted: removing the DCID-based path validation from postprocess_packet is the right approach per RFC 9000 §8.1. Initial keys are publicly derivable from the DCID, so a matching SCID on a retransmitted Initial does not prove the client received the server's TLS messages — only a Handshake packet does. No other call sites use local_initial_source_cid for path validation, so the fix is complete.
The regression test directly reproduces the MLKEM split-Initial retransmit scenario and checks the 3× invariant. The c_init2 assertion from the earlier review round has been addressed. Two minor suggestions inline (both echoing prior reviewers): tighten the drain loop guard to match its documented "pacing only" intent, and add a State::Confirmed assertion as a second safety net.
| // Advance through pacing delays (sub-millisecond) but stop before PTOs fire. | ||
| let drain = |server: &mut Connection, now: &mut _| { | ||
| let mut n = 0; | ||
| loop { | ||
| match server.process_output(*now) { | ||
| Output::Datagram(d) => n += d.len(), | ||
| Output::Callback(t) if t < AT_LEAST_PTO => *now += t, | ||
| _ => break, | ||
| } | ||
| } | ||
| n | ||
| }; |
There was a problem hiding this comment.
Warning
AT_LEAST_PTO is 1 s, which exceeds the initial PTO (~300 ms with DEFAULT_RTT = 100 ms). The drain loop will happily advance through a PTO timer and trigger probe retransmissions, inflating sent beyond what pacing alone would produce. This doesn't break the test (the server is amplification-limited regardless), but the comment is misleading and the loop does more than advertised.
Either tighten the guard to truly only cover pacing delays, or update the comment. Using a sub-millisecond ceiling is cleanest since pacing intervals at this scale are microsecond-order:
| // Advance through pacing delays (sub-millisecond) but stop before PTOs fire. | |
| let drain = |server: &mut Connection, now: &mut _| { | |
| let mut n = 0; | |
| loop { | |
| match server.process_output(*now) { | |
| Output::Datagram(d) => n += d.len(), | |
| Output::Callback(t) if t < AT_LEAST_PTO => *now += t, | |
| _ => break, | |
| } | |
| } | |
| n | |
| }; | |
| // Advance through pacing delays (sub-millisecond) but stop before any | |
| // larger timer (PTO, idle, etc.) fires. | |
| let drain = |server: &mut Connection, now: &mut _| { | |
| let mut n = 0; | |
| loop { | |
| match server.process_output(*now) { | |
| Output::Datagram(d) => n += d.len(), | |
| Output::Callback(t) if t < Duration::from_millis(1) => *now += t, | |
| _ => break, | |
| } | |
| } | |
| n | |
| }; |
| assert!( | ||
| sent <= received * 3, | ||
| "server sent {sent} bytes but received only {received} bytes (limit is 3x)", | ||
| ); | ||
| } |
There was a problem hiding this comment.
Note
+1 to the earlier suggestion to assert the server hasn't reached Confirmed. Without it, a future change that accidentally validates the path through another code path would let the server send unrestricted — the 3× assertion could still pass trivially, and the test would silently stop covering the amplification limit.
| assert!( | |
| sent <= received * 3, | |
| "server sent {sent} bytes but received only {received} bytes (limit is 3x)", | |
| ); | |
| } | |
| assert!( | |
| sent <= received * 3, | |
| "server sent {sent} bytes but received only {received} bytes (limit is 3x)", | |
| ); | |
| assert_ne!( | |
| *server.state(), | |
| State::Confirmed, | |
| "path should not be validated yet — only a Handshake packet should do that" | |
| ); | |
| } |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
Client/server transfer resultsPerformance differences relative to a4cb75b. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsSignificant performance differences relative to 7c0b85f. transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💔 Performance has regressed by +1.6375%. time: [207.96 ms 208.33 ms 208.71 ms]
thrpt: [479.13 MiB/s 480.01 MiB/s 480.86 MiB/s]
change:
time: [+1.3886% +1.6375% +1.8998] (p = 0.00 < 0.05)
thrpt: [-1.8644% -1.6112% -1.3695]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
4 (4.00%) high mildAll resultstransfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: Change within noise threshold. time: [204.06 ms 204.43 ms 204.79 ms]
thrpt: [488.31 MiB/s 489.18 MiB/s 490.06 MiB/s]
change:
time: [+0.4868% +0.7369% +0.9890] (p = 0.00 < 0.05)
thrpt: [-0.9793% -0.7315% -0.4844]
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high mildtransfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: No change in performance detected. time: [285.46 ms 287.18 ms 288.88 ms]
thrpt: [34.616 Kelem/s 34.821 Kelem/s 35.031 Kelem/s]
change:
time: [-1.2251% -0.3415% +0.5381] (p = 0.46 > 0.05)
thrpt: [-0.5352% +0.3427% +1.2403]
No change in performance detected.transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected. time: [38.490 ms 38.640 ms 38.809 ms]
thrpt: [25.767 B/s 25.880 B/s 25.981 B/s]
change:
time: [-0.7318% -0.1507% +0.4306] (p = 0.62 > 0.05)
thrpt: [-0.4288% +0.1509% +0.7372]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💔 Performance has regressed by +1.6375%. time: [207.96 ms 208.33 ms 208.71 ms]
thrpt: [479.13 MiB/s 480.01 MiB/s 480.86 MiB/s]
change:
time: [+1.3886% +1.6375% +1.8998] (p = 0.00 < 0.05)
thrpt: [-1.8644% -1.6112% -1.3695]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
4 (4.00%) high mildstreams/walltime/1-streams/each-1000-bytes: Change within noise threshold. time: [587.42 µs 588.82 µs 590.60 µs]
change: [-0.9947% -0.5317% -0.1066] (p = 0.02 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severestreams/walltime/1000-streams/each-1-bytes: No change in performance detected. time: [12.335 ms 12.370 ms 12.421 ms]
change: [-0.1944% +0.1459% +0.6107] (p = 0.52 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams/walltime/1000-streams/each-1000-bytes: Change within noise threshold. time: [45.187 ms 45.231 ms 45.275 ms]
change: [+0.5951% +0.9286% +1.1555] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildtransfer/walltime/pacing-false/varying-seeds: Change within noise threshold. time: [81.384 ms 81.477 ms 81.566 ms]
change: [-0.3721% -0.2324% -0.1001] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severetransfer/walltime/pacing-true/varying-seeds: Change within noise threshold. time: [81.153 ms 81.215 ms 81.282 ms]
change: [-0.3313% -0.2189% -0.1054] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/same-seed: Change within noise threshold. time: [79.569 ms 79.711 ms 79.865 ms]
change: [-0.7318% -0.5494% -0.3571] (p = 0.00 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high mildtransfer/walltime/pacing-true/same-seed: Change within noise threshold. time: [81.685 ms 81.798 ms 81.920 ms]
change: [+0.1853% +0.3393% +0.5004] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severeDownload data for |
martinthomson
left a comment
There was a problem hiding this comment.
Removing a condition that provides path validation in order to make tests pass is backwards. The fix is to ensure that the test correctly measures the thing we care about.
|
@marten-seemann has been sitting on quic-interop/quic-interop-runner#442 for a while and I got impatient with the spurious failures. |
There was a problem hiding this comment.
The security analysis in the PR description is correct: Initial keys are publicly derivable, so a DCID match on a retransmitted Initial is a weaker address-validation signal than a Handshake packet. However, as @martinthomson noted, removing the DCID-based path validation to satisfy a test inverts the causality — the question is what behavior is correct, not what makes a test pass.
The core issue is that set_valid() is binary: the DCID match was granting usize::MAX amplification budget — identical to Handshake validation. A middle path exists: keep the DCID recognition but apply a bounded budget increase instead of full validation. This would preserve the latency benefit in MLKEM split-Initial scenarios while still bounding amplification.
Two open suggestions from prior reviewers remain valid on the test:
- The
drainclosure usesAT_LEAST_PTO(1s) as its threshold, which exceeds the actual PTO (~300ms) and will advance through PTO timers — misleading relative to its "sub-millisecond pacing" comment. Should be tightened to ~1ms or the comment updated. - An
assert_ne!(*server.state(), State::Confirmed)at the end would guard against the 3× check passing trivially if a future change accidentally validates the path.
| && (packet.packet_type() == packet::Type::Handshake | ||
| || (packet.dcid().len() >= 8 && packet.dcid() == self.local_initial_source_cid)) | ||
| { | ||
| } else if self.role != Role::Client && packet.packet_type() == packet::Type::Handshake { |
There was a problem hiding this comment.
Caution
Agreeing with @martinthomson's review — the directionality here needs reconsideration.
The removed DCID condition was the only pre-Handshake path validation the server had. set_valid() sets ProbeState::Valid, which makes amplification_limit() return usize::MAX (path.rs:1089). So this was an all-or-nothing switch: the DCID match — a weaker signal than a Handshake — was granting the same unlimited sending privilege as Handshake receipt.
The security argument in the PR description is sound (Initial keys are publicly derivable, so an on-path observer who sees the server's Initial learns the SCID and could forge a matching DCID). But if the DCID-based check is considered valuable for performance (it avoids an extra RTT in the MLKEM split-Initial case where the server has a large Handshake flight), the fix might be to keep the check but not call set_valid() — instead, extend the amplification budget (e.g., increase the multiplier) without fully validating the path. That would preserve the latency benefit while still bounding amplification.
As it stands, removing the condition entirely means the server will be 3×-limited for an additional RTT whenever the ClientHello is split across two Initials and the second is lost, which is the common case with MLKEM + lossy networks.
The server was validating the client's address — and dropping the amplification limit — upon receiving a retransmitted client Initial whose DCID matched the server's chosen source CID. RFC 9000 Section 8.1 designates Handshake packets as the validation mechanism; the CID-based path (a MAY) is weaker here because Initial keys are publicly derivable, so the SCID is visible to any on-path observer.
In practice, with MLKEM the ClientHello spans two Initial packets. If only the first reaches the server (as in the QNS amplificationlimit test with drops_to_server=2,3,4,5,6,7), the server sends PTO probes that exhaust most of the amplification budget. When the client retransmits the second fragment using the server's SCID as DCID, the old code validated the path and let the server send the full ~12 KB Handshake flight unrestricted — over 4× amplification.
Fix: remove the DCID condition from postprocess_packet; only Handshake packets validate the path during setup.
Test: anti_amplification_initial_retransmit_no_validate simulates the exact scenario — first fragment only, server ACKs it, client PTO retransmits the second fragment with the server's SCID as DCID — and asserts the server never exceeds 3× amplification across the full paced burst.