UDP send optimizations#20
Open
endel wants to merge 5 commits into
Open
Conversation
Three coordinated send-path optimizations following Cloudflare's "Accelerating UDP packet transmission for QUIC" post, each gated by an env var so we can bisect regressions without rebuilding: - `SendBatch.flush` on Linux now uses `sendmmsg` to batch packets grouped by ECN mark; ~60× fewer syscalls for bulk transfer. Partial-send policy drops silently with a rate-limited warn. macOS/Windows stay on the per-packet `sendmsg` loop. Kill switch: `QUIC_ZIG_NO_SENDMMSG=1`. - Layered on top, UDP Generic Segmentation Offload (`UDP_SEGMENT`) coalesces same-peer same-size contiguous packets into one GSO super-buffer per mmsghdr entry. Default **off** because zig-client → neqo-server transfer regresses over ns-3 veth (26/27 combos green; see SPEC/interop-results.md). Opt-in: `QUIC_ZIG_ENABLE_GSO=1`. - Pacer already gated inside `conn.send` and folded into `nextTimeoutNs`; this adds a symmetric kill switch for bisection parity: `QUIC_ZIG_NO_PACING=1`. Validated via quic-interop-runner against quic-go and neqo in both directions (handshake, transfer, chacha20, multiplexing, longrtt, http3, keyupdate): 14/14 pass with defaults, matching the 2026-03-24 baseline.
Prerequisite for Plan 4b (SO_TXTIME kernel pacing): the SCM_TXTIME cmsg requires a monotonic timestamp, so Pacer.last_sent_time now lives on CLOCK_MONOTONIC instead of CLOCK_REALTIME. Other subsystems (loss detection, PTO, idle timeout) stay on the existing clock since they only consume durations and are insensitive to the clock choice. - New `clock.monoNanos()` wraps `clock_gettime(CLOCK_MONOTONIC)` with a Windows fallback to `nanoTimestamp()`. - `conn.send()` reads both clocks; the three pacer call sites (timeUntilSend, onPacketSent, and the nextTimeoutNs pacer branch) now consume the monotonic value. - `nextTimeoutNs` computes the pacer delay on the monotonic clock but returns a REALTIME-based deadline so it remains comparable to the loss and idle deadlines the event loop also collects. - Pacer doc comment spells out the clock contract. Validated via quic-interop-runner: 27/28 combos pass (quic-go + neqo both directions); the single remaining failure is zig-client → neqo-server chacha20, which is pre-existing in the 2026-03-24 baseline.
Plan 4b: when QUIC_ZIG_ENABLE_TXTIME=1, conn.send() keeps producing packets while the user-space pacer would have blocked, stamping each one with a CLOCK_MONOTONIC target transmission time. SendBatch attaches the timestamp as an SCM_TXTIME cmsg so the kernel's fq qdisc releases the packet at the right time — moving pacing from user-space sleeps to a single syscall. Wiring: - `ecn_socket.zig`: SO_TXTIME / SCM_TXTIME constants, `probeTxtimeSupport` at SendBatch.init, new `addTxtime(...)` API, `txtimes[]` per-packet array, combined cmsg layout (UDP_SEGMENT + SCM_TXTIME stacked, 48 B per entry), GSO grouping respects timestamp identity within a super-buffer. - `connection.zig`: `last_target_txtime` field set when the pacer would block and kernel pacing is enabled; `isKernelPacingEnabled()` env cache mirrors the pattern used by `isPacingDisabled`. - `event_loop.zig`: send-loop sites pass `conn.last_target_txtime` to `batch.addTxtime` (default 0 = "send now"). On non-fq egress paths (including ns-3 in the interop runner) the kernel accepts the cmsg but ignores the timestamp — same observable behavior as without TXTIME. Validated with both modes on quic-go + neqo, both directions: no regressions in either default-off or opt-in. Real-hardware throughput benefit only kicks in with `tc qdisc add dev <iface> root fq`.
Plans 2 and 4b were prototyped end-to-end (commits 4a0ee92 + 3ae98a0) and proved correct in the interop matrix, but they target a workload we don't have: bulk CDN-style throughput. Our direction is real-time WebTransport (low-latency datagrams, browser interop), where: - GSO almost never groups (small variable-size datagrams), and the one failing combo (zig-client → neqo-server transfer over ns-3 veth) was a permanent maintenance tax behind an opt-in nobody would enable. - SO_TXTIME moves the pacer wait from user space to the kernel — useful when the kernel actually paces (fq qdisc on production hosts), but the user-space pacer already produces correct behavior, and "release at time T" is the opposite of what real-time datagram workloads want. Keeping: - `sendmmsg` batching (default-on, Linux): one syscall per ECN-mark run. Latency-critical paths still use `sendDirect` (single-packet, bypasses the batch entirely), so this only helps bulk WT streams without penalising real-time traffic. - `QUIC_ZIG_NO_PACING=1` kill switch + send-loop doc comments from Plan 3. - `clock.monoNanos()` and the Pacer migration to CLOCK_MONOTONIC from Plan 4a — standalone NTP-skew-resilience win, kept on its own merit. 353-line net deletion. ecn_socket.zig: 863 → 502 lines. Validated 28/28 across quic-go + neqo both directions.
Pacer state lives on CLOCK_MONOTONIC (NTP-skew resilience); everything else uses std.time.nanoTimestamp() (REALTIME). The boundary is crossed in exactly one place — Connection.nextTimeoutNs — and that's the only function readers need to understand to avoid future cross-clock comparison bugs. Spells out who uses which clock and why, the three rules for adding new clock-touching code, and why we chose not to migrate everything to MONOTONIC.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improvements based off of https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/