UDP send optimizations by endel · Pull Request #20 · endel/quic-zig

endel · 2026-04-15T23:40:27Z

Improvements based off of https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/

Three coordinated send-path optimizations following Cloudflare's "Accelerating UDP packet transmission for QUIC" post, each gated by an env var so we can bisect regressions without rebuilding: - `SendBatch.flush` on Linux now uses `sendmmsg` to batch packets grouped by ECN mark; ~60× fewer syscalls for bulk transfer. Partial-send policy drops silently with a rate-limited warn. macOS/Windows stay on the per-packet `sendmsg` loop. Kill switch: `QUIC_ZIG_NO_SENDMMSG=1`. - Layered on top, UDP Generic Segmentation Offload (`UDP_SEGMENT`) coalesces same-peer same-size contiguous packets into one GSO super-buffer per mmsghdr entry. Default **off** because zig-client → neqo-server transfer regresses over ns-3 veth (26/27 combos green; see SPEC/interop-results.md). Opt-in: `QUIC_ZIG_ENABLE_GSO=1`. - Pacer already gated inside `conn.send` and folded into `nextTimeoutNs`; this adds a symmetric kill switch for bisection parity: `QUIC_ZIG_NO_PACING=1`. Validated via quic-interop-runner against quic-go and neqo in both directions (handshake, transfer, chacha20, multiplexing, longrtt, http3, keyupdate): 14/14 pass with defaults, matching the 2026-03-24 baseline.

Prerequisite for Plan 4b (SO_TXTIME kernel pacing): the SCM_TXTIME cmsg requires a monotonic timestamp, so Pacer.last_sent_time now lives on CLOCK_MONOTONIC instead of CLOCK_REALTIME. Other subsystems (loss detection, PTO, idle timeout) stay on the existing clock since they only consume durations and are insensitive to the clock choice. - New `clock.monoNanos()` wraps `clock_gettime(CLOCK_MONOTONIC)` with a Windows fallback to `nanoTimestamp()`. - `conn.send()` reads both clocks; the three pacer call sites (timeUntilSend, onPacketSent, and the nextTimeoutNs pacer branch) now consume the monotonic value. - `nextTimeoutNs` computes the pacer delay on the monotonic clock but returns a REALTIME-based deadline so it remains comparable to the loss and idle deadlines the event loop also collects. - Pacer doc comment spells out the clock contract. Validated via quic-interop-runner: 27/28 combos pass (quic-go + neqo both directions); the single remaining failure is zig-client → neqo-server chacha20, which is pre-existing in the 2026-03-24 baseline.

Plan 4b: when QUIC_ZIG_ENABLE_TXTIME=1, conn.send() keeps producing packets while the user-space pacer would have blocked, stamping each one with a CLOCK_MONOTONIC target transmission time. SendBatch attaches the timestamp as an SCM_TXTIME cmsg so the kernel's fq qdisc releases the packet at the right time — moving pacing from user-space sleeps to a single syscall. Wiring: - `ecn_socket.zig`: SO_TXTIME / SCM_TXTIME constants, `probeTxtimeSupport` at SendBatch.init, new `addTxtime(...)` API, `txtimes[]` per-packet array, combined cmsg layout (UDP_SEGMENT + SCM_TXTIME stacked, 48 B per entry), GSO grouping respects timestamp identity within a super-buffer. - `connection.zig`: `last_target_txtime` field set when the pacer would block and kernel pacing is enabled; `isKernelPacingEnabled()` env cache mirrors the pattern used by `isPacingDisabled`. - `event_loop.zig`: send-loop sites pass `conn.last_target_txtime` to `batch.addTxtime` (default 0 = "send now"). On non-fq egress paths (including ns-3 in the interop runner) the kernel accepts the cmsg but ignores the timestamp — same observable behavior as without TXTIME. Validated with both modes on quic-go + neqo, both directions: no regressions in either default-off or opt-in. Real-hardware throughput benefit only kicks in with `tc qdisc add dev <iface> root fq`.

Plans 2 and 4b were prototyped end-to-end (commits 4a0ee92 + 3ae98a0) and proved correct in the interop matrix, but they target a workload we don't have: bulk CDN-style throughput. Our direction is real-time WebTransport (low-latency datagrams, browser interop), where: - GSO almost never groups (small variable-size datagrams), and the one failing combo (zig-client → neqo-server transfer over ns-3 veth) was a permanent maintenance tax behind an opt-in nobody would enable. - SO_TXTIME moves the pacer wait from user space to the kernel — useful when the kernel actually paces (fq qdisc on production hosts), but the user-space pacer already produces correct behavior, and "release at time T" is the opposite of what real-time datagram workloads want. Keeping: - `sendmmsg` batching (default-on, Linux): one syscall per ECN-mark run. Latency-critical paths still use `sendDirect` (single-packet, bypasses the batch entirely), so this only helps bulk WT streams without penalising real-time traffic. - `QUIC_ZIG_NO_PACING=1` kill switch + send-loop doc comments from Plan 3. - `clock.monoNanos()` and the Pacer migration to CLOCK_MONOTONIC from Plan 4a — standalone NTP-skew-resilience win, kept on its own merit. 353-line net deletion. ecn_socket.zig: 863 → 502 lines. Validated 28/28 across quic-go + neqo both directions.

Pacer state lives on CLOCK_MONOTONIC (NTP-skew resilience); everything else uses std.time.nanoTimestamp() (REALTIME). The boundary is crossed in exactly one place — Connection.nextTimeoutNs — and that's the only function readers need to understand to avoid future cross-clock comparison bugs. Spells out who uses which clock and why, the three rules for adding new clock-touching code, and why we chose not to migrate everything to MONOTONIC.

endel added 5 commits April 15, 2026 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDP send optimizations#20

UDP send optimizations#20
endel wants to merge 5 commits into
mainfrom
udp-send-optimizations

endel commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

endel commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant